The reinforcement gap – or why some artificial intelligence skills improve faster than others

Photo of author

By [email protected]


Artificial intelligence coding tools quickly improve. If you do not work in code, it may be difficult to notice how much you change, but GPT-5 and Gemini 2.5 have made a completely new set of developed tricks possible to automate, and last week, Sonnet 2.4 did so again.

At the same time, other skills are more slowly progressing. If you use artificial intelligence to write emails, you are likely to get the same value you did a year ago. Even when the model improves, the product does not always benefit – especially when the product is a chatbot performs ten different functions at the same time. Artificial intelligence is still making progress, but it is not equally distributed as it was.

The difference is in progress simpler than it appears. Coding applications benefit from billions of easily measurable tests, which can be trained in the production of a practical symbol. This is reinforcement learning (RL), it can be said that it is the largest driver for progress in artificial intelligence over the past six months Get more complicated all the time. You can learn to reinforce with human class students, but it works better if there is a clear scale for traffic, so that you can repeat billions of times without having to stop human inputs.

Since the industry is increasingly dependent on reinforcement learning to improve products, we see a real difference between the capabilities that can be automatically classified and those that cannot. Girlfriends RL skills such as installation of mistakes and competitive mathematics improve, while skills such as writing make only gradual progress.

In short, there is a gap in reinforcement – and it has become one of the most important factors for what artificial intelligence systems can do and what you cannot do.

In some respects, software development is the ideal topic for reinforcement learning. Even before artificial intelligence, there was a complete sub-discipline dedicated to testing how programs appear under pressure-to a large extent because developers need to make sure that their symbol will not be broken before publishing it. So, even the most elegant code still has to test unit, integration test, safety test, etc. Human developers use these tests routinely to verify the health of the code and, The first director of Google at Dev Tools also told meIt is completely useful to verify the validity of the code created. More than that, it is useful for learning reinforcement, because it is already organized and widely repetitive.

There is no easy way to check the health of a well -written email or a good response to chat; These skills are in nature and are more difficult to measure them on a large scale. But not every task is carefully included in the categories of “easy to test” or “difficult to test”. We do not have a test group outside the fund for quarterly financial reports or actuarial sciences, but a good -headed account startup may be constructed from zero point. Some test groups will work better than others, of course, and some companies will be more intelligent about how to deal with the problem. But the procedure for the basic process will be the decisive factor in whether the basic process can be converted into a functional product instead of just an exciting pilot offer.

TECHRUNCH event

San Francisco
|
27-29 October, 2025

Some operations turn to be more testing than you think. If you asked me last week, I would put the video created by artificial intelligence The new Sora 2 model from Openai He explains that it may not be as difficult as it seems. In Sora 2, things no longer appear and disappear from nothingness. Faces carry their shape, look like a specific person instead of just a group of features. Sora 2, footage respects the laws of physics in both Intuitive and precise Methods. I think that if you look behind the curtain, you will find a strong learning system for each of these qualities. It was placed together, that it makes a difference between realism and entertainment hallucinations.

To clarify this, this is not a difficult and fast base of artificial intelligence. It is the result of learning to enhance the central role in developing artificial intelligence, which can easily change with the development of models. But as long as RL is the main tool for bringing in artificial intelligence products to the market, the reinforcement gap will only increase – with serious effects on both startups and the economy in general. If an operation on the right side ends from the reinforcement gap, it is possible that startups will succeed in automating – and anyone may now end this work by searching for a new profession. For example, the question about RL healthcare services, for example, has enormous effects in the form of the economy over the next twenty years. And if surprises like Sora 2 are any signal, we may not have to wait a long time to get an answer.



https://techcrunch.com/wp-content/uploads/2025/10/GettyImages-1575097396.jpg?resize=1200,675

Source link

Leave a Comment