Openai says that GPT-5 accumulates human beings in a wide range of jobs

A new Openai released standard On Thursday, he tests how to perform artificial intelligence models compared to human professionals through a wide range of industries and jobs. The test, GDPVAL, is an early attempt to understand the proximity of Openai’s systems to outperform humans in the economically value work – a major part of the company’s foundational mission to develop artificial general intelligence, or AGI.

Openai says he found that the GPT-5 and Claude OPUS 4.1 model “is already approaching the quality of work produced by industry experts.”

This does not mean that Openai models will start replacing humans in their functions immediately. Despite predictions by some CEOs Amnesty International will take human jobs in just a few years, Openai admits that GDPVAL today covers a very limited number of tasks that people do in their real functions. However, it is one of the latest methods that the company measures to provide towards this teacher.

GDP depends on nine industries that contribute more to GDP in America, including areas such as health care, financing, manufacturing and government. The standard tests the performance of the artificial intelligence model in 44 professions among these industries, from software engineers to nurses to journalists.

For the first version of the Openai of the test, GDPVAL-V0, the Openai request from experienced professionals compared to the reports created by artificial intelligence with those produced by other professionals, then choose the best. For example, one of the bank banknotes requested investment to create a competing scene for the last inclination delivery industry and compare them with reports created by artificial intelligence. Openai then the average “victory rate” of the artificial intelligence model against human reports in all 44 professions.

For the GPT-5-Hight, a copy of the GPT-5 with an additional mathematical power, the company says the artificial intelligence model has been classified as better than or equally with industry experts 40.6 % of time.

Openai has also tested the Claude OPUS 4.1 model of humans, which was classified as better than or equally with industry experts in 49 % of tasks. Openai says he believes that Claude has a very high record because of his tendency to make enjoyable graphics, rather than huge performance.

TECHRUNCH event

San Francisco
|
27-29 October, 2025

It should be noted that most working professionals make more than research reports to their boss, all GDPVAL-V0 tests. Openai admits this and says he is planning to create more powerful tests in the future that can explain more industries and interactive workflow.

However, the company sees the progress of GDP as noticeable.

In an interview with Techcrunch, the chief economist at Openai, Dr. Aaron Chattage, said that the results of GDP indicate that people in these jobs can now use artificial intelligence models to spend time in the most important tasks.

“(Because) benefit the model in some of these things, people in these functions can now use the model, increasingly with improvement in capabilities, to empty some of their works and do things of potential higher value,” says Chatgegi.

The Openai Tejal Patwardhan Techcrunch reviews have been informed that it encouraged the rate of progress in the gross domestic product. The GPT-4O recorded from OpenAi only 13.7 % (wins and relations against humans), which was released about 15 months ago. Now the GPT-5 records nearly three times that, the trend is expected to be packed.

Silicon Valley contains a wide range of criteria it uses to measure the progress of artificial intelligence models and evaluate whether a specific model is newer. Among the most popular AIME 2025 (Test for Competitive Mathematics Problems) and GPQA Diamond (Science questions test at the PhD level). However, many artificial intelligence models On the verge of saturation In some of these criteria, many researchers of artificial intelligence were martyred with the need for Better tests It can measure the efficiency of artificial intelligence in realistic tasks.

Standards such as GDP can become increasingly important in that conversation, because Openai shows that artificial intelligence models have value for a wide range of industries. But Openai may need a more comprehensive version of the test to categorically say that artificial intelligence models can outperform humans.

https://techcrunch.com/wp-content/uploads/2023/02/GettyImages-1065679054.jpg?resize=1200,849

Source link

Claudia Soliliuski’s engagement ring from Finn: cut, cost, karat

Customer Challenge

Leave a Comment Cancel reply