The most advanced artificial intelligence models in the world show new disturbing behaviors – lying, danger, and even the threat of their creators to achieve their goals.
In one of the exciting examples in particular, under a threat that it is not connected, the creation of Anthropor, Claude 4, who criticized the extortion of an engineer and threatened to reveal an external relationship.
Meanwhile, Open-Creator-Creator Openai’s O1 tried to download himself on external servers and denied this when arrested.
These episodes highlight a realistic fact: more than two years after ChatGPT shook the world, artificial intelligence researchers still do not fully understand how their own creativity works.
However, the race continues to spread strong models increasingly quickly broken.
This deceptive behavior seems to be associated with the emergence of “Thinking” models that work through the problems step by step instead of generating immediate responses.
According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly vulnerable to such disturbing explosions.
“O1 was the first big model as we saw this type of behavior,” Marius Hopuhan, president of Apollo Research, explained in the major artificial intelligence systems test.
These models sometimes mimic “alignment” – they appear to follow the instructions while following various goals.
“A type of strategic deception”
Currently, this deceptive behavior appears only when researchers test models with extremist scenarios.
But Michael Chen warned of the Metr Evaluation Organization, “It is an open question whether future models are more able to honestly or deceive.”
The admirational behavior exceeds the typical “hallucinations” or minor errors.
Hophan insisted that although the continuous pressure test by users, “what we notice is a real phenomenon. We do not do anything.”
Users report that models “lie to them and form evidence”, according to the co -founder of Apollo Research.
“This is not just a hallucination. There is a very strategic deception.”
The challenge is exacerbated by the limited research resources.
While companies such as humans and Openai involved external companies such as APollo to study their systems, researchers say more transparency is needed.
As noticed, access to “artificial intelligence integrity research can enable better understanding and mitigate deception.”
Another obstacle: the world of research and non -profit organizations “has orders from the least calculated resources of artificial intelligence companies. This is very specific.”
There are no rules
Current regulations are not designed for these new problems.
AI’s European Union legislation focuses primarily on how humans use artificial intelligence models, and not to prevent the same forms from misconduct.
In the United States, the Trump administration shows great interest in organizing urgent artificial intelligence, and Congress may prohibit countries from creating their bases of artificial intelligence.
Goldstein believes that the issue will become more prominent because artificial intelligence agents – independent tools capable of performing complex human tasks – become widespread.
“I don’t think there is a lot of awareness yet,” he said.
All this happens in the context of fierce competition.
“Even companies that focus on safety, such as the Amazon -backed Anthrobur,” are constantly trying to overcome Openai and the latest model. ”
This expected pace leaves a little time to test comprehensive safety and corrections.
“At the present time, the capabilities are faster than understanding and safety,” Hophan acknowledged, “but we are still in a situation through which we can heart.”
Researchers explore different ways to counter these challenges.
Some focus on “the ability to interpret” – an emerging field that focuses on understanding how artificial intelligence models work internally, although experts such as Cais Dan Hendrix manager are skeptical of this approach.
Market forces may also provide some pressure on solutions.
As Mazeika pointed out, AI’s deceptive behavior “can hinder adoption if it is very prevalent, creating a strong incentive for companies to solve it.”
Goldstein has suggested more radical approaches, including the use of courts to hold artificial intelligence companies accountable through lawsuits when their systems cause harm.
He even suggested that “artificial intelligence agents take legal responsibility” for accidents or crimes – a concept that mainly changes how we think about the accountability of artificial intelligence.
https://fortune.com/img-assets/wp-content/uploads/2025/06/GettyImages-2216956965-e1751208078797.jpg?resize=1200,600
Source link