From time to time, researchers in the largest technology companies drop a bomb. There was a time when Google said The latest quantum chip It indicates the presence of multiple universes. Or when the Antarbur gave an artificial intelligence agent Claudius a snack selling machine to operate Amok has gone and called on people to people, He insisted that he was human.
This week, Openai’s role was to raise our collective eyebrows.
On Monday, Openai has released some research that has shown How to prevent artificial intelligence models from “Scheming”. It is a practice in which “artificial intelligence” behaves one way on the surface while hiding its real goals. Specified in a tweet About the search.
In the paper, which was conducted with Apollo Research, the researchers went a little further, similar to Amnesty International, a human arrows that break the law to earn as much money as possible. However, the researchers argued that most Amnesty International “skip” was not harmful. They wrote: “The most common failures include simple forms of deception – for example, pretending to have completed a task without doing this already.”
The paper was mostly published to show that “deliberative alignment”-the anti-abbreviation technology they were testing-worked well.
But he also made it clear that the developers of artificial intelligence did not discover a way to train their models on not planning. This is because such a training can actually teach the model how to better plan to avoid discovering it.
The researchers wrote: “The main method of failure to try to” training “is simply teaching the model to plan more carefully and secretly.”
TECHRUNCH event
San Francisco
|
27-29 October, 2025
Perhaps the most surprising part is that if a model understands that it is tested, he can pretend that he is not planning just to pass the test, even if he is still planning. “The models often become more aware that they are evaluated. This circumstantial awareness can reduce planning, regardless of real alignment,” the researchers wrote.
It is not news that artificial intelligence models will lie. Currently, most of us have witnessed Amnesty International Hell, or that the model gives confidence in an answer to a simply not correct demand. But hallucinations are mainly provided by guessing with confidence, and Openai’s research has been released Earlier this month documented.
Planning is another. It is deliberate.
Even this revelation – that the model will deliberately mislead humans – not new. Apollo Research first A paper was published in December Documenting how five models were planned when they were granted instructions to achieve a “cost” goal.
The news here is in fact good news: Researchers have witnessed major discounts in planning using “deliberative alignment”. This technology includes the form of teaching the form “anti -exit specifications” and then making the model go to review it before acting. It is somewhat similar to making young children repeat the rules before allowing them to play.
Openai researchers insist that the lying they discovered with their own models, or even with ChatGPT, is not this serious. WoJCIECH ZAREMBA, one of the founders of Openai, MAXWELL Zef’s Maxwell Zeff about this research: “This work has been done in simulation environments, and we believe that it represents future use. However, we have not seen this type of planning that he might tell this type, and it is done, and they express to tell us anything. Wonderful work. “This is just a lie. There are some simple forms of deception that we still need to treat. “
The fact that artificial intelligence models of multiple players deceive humans, perhaps, are understood. They were built by humans, to imitate humans and (aside artificial data) for the largest part trained in human data.
It is also Bonker.
Although we all witnessed the frustration of bad technology performance (thinking about you, and home printers yesterday), when was the last time you lied to your non -mood program? Has your inbox manufactured emails on its own? Did CMS record new horizons that were not present to heat their numbers? Has your Fintech appointed its banking transactions?
It is worth thinking about this as the world’s drums for companies towards the future of Amnesty International, as companies believe that agents can be treated like independent employees. Researchers in this paper have the same warning.
They wrote: “Since AIS is assigned more complicated tasks with severe consequences and begins to follow more mysterious and long-term goals, we expect that the possibility of harmful planning-so our guarantees and our ability to strictly test in return will grow.”
https://techcrunch.com/wp-content/uploads/2025/03/GettyImages-484036335.jpg?resize=1200,912
Source link