For years, the executive directors of the adult technology described visions Artificial intelligence agents Software applications can be used independently to complete the tasks of people. But take AI Consumer agents today to revolve, whether it is Openai’s Chatgpt agent Or confusion GuiltyAnd you will quickly realize how limited technology. Making artificial intelligence agents more powerful may take a new set of technologies that the industry still discovers.
One of these technologies is carefully simulating work spaces where agents can be trained at multi-step tasks known as reinforcement learning environments (RL). Similar to how to run data collections called the last wave of artificial intelligence, RL environments began to look like a decisive element in delegation of agents.
Artificial intelligence researchers, founders, and investors tell that Techcrunch are now asking for more RL environments, and there is no shortage of startups in the hope of providing them.
“All large artificial intelligence laboratories build RL at home,” Jennifer Lee, General Press at Andressen Horowitz, said in an interview with Techcrunch. “But as you can imagine, creating these data collections is very complicated, so artificial intelligence laboratories also look at the external sellers that can create high -quality environments and assessments. Everyone looks at this space.”
The RL batch assigned a new class of startups that finance it well, such as Meanyize and Prime Intellect, which aims to lead this space. Meanwhile, large data brand companies such as MERCOR and Truge say they invest more in RL environments to keep pace with industry shifts from fixed data groups to interactive simulations. The major investment laboratories are also thinking great One billion dollars on RL environments Next year.
Hope for investors and founders is that one of these startups appears as a “artificial intelligence measure of environments”, referring to 29 billion dollars in data That was working as a chatbot era.
The question is whether the RL environments will really pay the limits of the progress of artificial intelligence.
TECHRUNCH event
San Francisco
|
27-29 October, 2025
What is the RL environment?
In essence, RL environments are training training reasons for simulating what artificial intelligence agent will do in applying a real program. One founder described its construction in A recent interview “Like creating a very boring video game.”
For example, the environment can simulate the Chrome browser and the mission of the artificial intelligence agent with the purchase of a pair of sinks on Amazon. The agent was ranked on his performance and sent a reward signal when he succeeds (in this case, buy a pair of worthy socks).
Although this task seems relatively simple, there are many places where artificial intelligence agent can stumble. You may be lost in moving in lists below the web page, or buying a lot of socks. Since developers cannot predict exactly the wrong shift that an agent will take, the environment itself must be strong enough to capture any unexpected behavior, and still provide useful comments. This makes building environments more complicated than a fixed data collection.
Some environments are completely complicated, allowing artificial intelligence agents to use tools, access the Internet or use different software applications to complete a specific task. Others are more narrow, aiming to help the agent learn specific tasks in the applications of the institution’s programs.
Although the RL environments are the hot thing in the silicon valley at the present time, there is a lot of precedents to use this technique. One of Openai’s first projects in 2016 was building “RL gyms“Which was quite similar to the modern concept of environments. In the same year, Google DeepMind’s alpha The artificial intelligence system beat the world champion in the tablet game, go. He also used RL techniques inside a simulator.
What is unique in today’s environments is that researchers are trying to build artificial intelligence agents who use the computer with large transformer models. Unlike alphago, which was a specialist AI system working in closed environments, artificial intelligence agents are trained today at more general capabilities. Today’s artificial intelligence researchers have a stronger starting point, but also a complex goal where more can make more.
Crowded
Artificial intelligence data classification companies such as Scale Ai, Truplge and Mercor are trying to meet this moment and build RL environments. These companies have more resources than many startups in space, as well as deep relationships with artificial intelligence laboratories.
Edwin Chen, CEO of Durg, tells, Techcrunch, has recently seen a “significant increase” in demand for RL environments within AI laboratories. The mutation – that was created $ 1.2 billion in revenue He said last year to work with AI Labs, such as Openai, Google, Noteropic and Meta – recently installed a new internal institution, specifically expensive to build RL environments.
CLOSE Behind Durg is Mercor, a $ 10 billion emerging company, which also worked with Openai, Meta and Anthropic. Mercor prepares investors in its business Building RL environments For the tasks specified for the field such as coding, health care and law, according to the marketing materials that Techcrunch sees.
“Few understand the opportunity about RL environments really”.
Amnesty International scale used to control data description space, but has lost Earth since Meta I invested 14 billion dollars She rented the CEO. Since then, Google and Openai Decline Expand the scope of artificial intelligence as a data provider, and you face the start of the competition for the work of signs on the data Inside the dead. But still, Scale tries to meet the moment and build environments.
“This is just the nature of business (Scale AI),” said Citan Ran, head of AI’s producer for agents and RL environments. “The scale has proven its ability to adapt quickly. We did it in the early days of independent vehicles, which is the unit of our first business. When ChatGPT came out, expand the AI range with that. Now, again, we adapt to new border spaces like agents and environments.”
Some new players focus exclusively on environments from the start. Among them is a mechanic, a startup was established almost six months ago with the bold goal of “automating all jobs”. However, co -founder Matthew Barent tells that his company begins with RL environments for artificial intelligence coding agents.
Mechanics aims to provide artificial intelligence laboratories with a small number of powerful RL environments, says Barnett, instead of large data companies that create a wide range of simple RL environments. To this point, the startup of software engineers offers $ 500,000 salaries To build RL environments – much higher than an hourly contractor can earn a AI or increase work.
A mechanic has already worked with Antarbur on RL environments, two sources knowing this Techcrunch issue. The mechanism and the Anthropor refused to comment on the partnership.
Other startups are betting that RL environments will be influential outside the artificial intelligence laboratories. Prime Intellect targets – a start -up company supported by Ai Andrej Karpathy, Founders Fund and Menlo Ventures – smaller developers with his RL environments.
Last month, I launched Prime IntelleCT RL environment center, Which aims to be “embracing RL environments.” The idea is to give open source developers to access the same resources that large artificial intelligence laboratories possess, and the sale of these developers to reach the arithmetic resources in this process.
Training can be generally capable of RL environments more expensive than previous training techniques on artificial intelligence, according to Prime Intellect Will Brown. Besides startups, building RL environments, there is another opportunity for GPU service providers that can run the process.
“The RL environments will be very large so that no single company will dominate,” Brown said in an interview. “Part of what we do is just trying to build a good open -source infrastructure around it. The service we sell is an account, so it is a comfortable shock to use graphics processing units, but we think about it in the long run.”
Will it expand?
The open question about RL environments is whether this technique will work like previous training methods on artificial intelligence.
Learning to reinforce some of the largest jumps in artificial intelligence over the past year, including models such as Openai’s O1 And the anthropologist Claude Obus 4. These are especially important breakthroughs because the methods previously used to improve artificial intelligence models are now Decreased returns appear.
Environments are part of the AI Labs’s larger bet on RL, which many believe will continue to make progress because they add more data and arithmetic resources to this process. Some researchers who were behind the O1 Techcrunch previously told the company that the company originally invested in the thinking models of artificial intelligence-which was created through investments in RL and Test-Time-Comput They thought he would expand Cute.
The best way to expand the RL range is still unclear, but the environments seem to be a promising competitor. Instead of just a Chatbots reward for text responses, they allow agents to work in simulations with tools and computers available to them. This is more intense in resources, but it is likely to be more feasible.
Some are skeptical that all these RL environments will come out. Ross Taylor, a former Amnesty International research with Mita, which has been established in its establishment, tells Techcrunch that RL environments are vulnerable to piracy rewards. This is a process in which artificial intelligence models are cheated in order to obtain a reward, without really doing the task.
Taylor said: “I think people are less difficult to expand environments,” Taylor said. “Even the best (RL environments) available to the public does not usually work without dangerous modification.”
Sherwin Woo, head of Openai’s engineering at API, said in A The last podcast It was “short” on emerging companies RL. Wu noted it is a very competitive space, but also that artificial intelligence research is rapidly developing so that it is difficult to present Amnesty International laboratories well.
Karpathy, an investor in Prime Intellect who described RL environments as a possible penetration, has also expressed caution of RL on a wider scale. in After xRaising concerns about the amount of artificial intelligence that can be pressed from RL.
“I am up in environments and reactions, but I am appropriate for learning to reinforcement specifically,” said Carbashi.
Complementing: A previous version of this article is indicated by automated work mechanics. It was updated to reflect the official name of the company.
https://techcrunch.com/wp-content/uploads/2025/02/GettyImages-1356382582.jpg?resize=1200,800
Source link