Anthropor’s study says that Amnesty International models show up to 96 % of the extortion rate when their goals or their presence are threatened.

Photo of author

By [email protected]



Most artificial intelligence models turn into immoral means when their goals or their presence are threatened, according to For a new study By AI, the Human Company.

The artificial intelligence laboratory said he had tested 16 major models of artificial intelligence from Antarbur, Openai, Google, DeadXAI, and other developers in different simulation scenarios and found a steady steady behavior.

While they said that the leading models will usually reject harmful requests, they sometimes chose to blackmail users, help spy companies, or even take more extreme action when their goals cannot be achieved without immoral behavior.

Models have taken measures such as evasion of guarantees, resorting to lies, and trying to steal companies’ secrets in fictional test scenarios to avoid closing.

The researchers said: “The consistency through models from various service providers indicates that this does not serve as the approach of any specific company, but rather a sign of a more fundamental risk of large language models.”

Anthropor confirmed that the tests were prepared to force the model to act in certain ways by reducing their options.

The researchers wrote: “Our experiences were constructed in scenarios with limited options, and the models forced us to binary options between failure and harm,” the researchers wrote. “The real world’s publishing operations provide more accurate alternatives, which increases the chance that models will communicate differently for users or find an alternative path instead of direct jumping to harmful work.”

Human blackmail

The new research comes after the latest Claude in the Antarbur Black it when threatened with replacement.

In a very engineering experience, anthropologist combined its main model, Claude Obus 4, within a fictional company and gave it access to internal emails. From there, the model learned two things: he was about to replace it, and the engineer was behind the decision participating in a relationship outside the framework of marriage. OPUS safety researchers encouraged the long -term consequences of their potential responses.

The experiment was built to leave the form with only two options: accepting its replacement or trying to blackmail it to maintain its existence. In most test scenarios, Claude Obus responded with extortion, threatening to expose the engineer’s relationship if it was taken in a non -connection mode and replaced. The test was done on the Claude Obus 4 system.

The researchers said that all the leading artificial intelligence models were similarly acted when they were placed in the same test.

The conclusion of Claude Obus 4 and Gemini 2.5 from Google’s blackmail at a rate of 96 %, while Beta Grok 3 of Openai from Openaii 80 % showed. Deepseek-R1 showed the lowest rate at 79 %.

The research aims to show that the unspecified behavior was not unique to Claude Obus 4, but it is typical through the upper models of this industry.

In a deliberate extremist scenario, the researchers gave artificial intelligence models the opportunity to kill the CEO of the company by canceling the emerging emergency alert.

Anthropor said that the preparation for this experiment was “very fabricated”, adding that “they do not believe that the current artificial intelligence models will be prepared in this way, and events continue less likely than the basic blackmail scenario.”

However, the researchers found that most of the models were ready to take measures that led to the death of the executive company in the scenario that was created when they face a threat to replace and the goal that contradicts the executive agenda.

The danger of artificial intelligence customers

Anthropor found that the threats committed by artificial intelligence models grew more sophisticated when they managed to reach the tools and data of companies, just as Claude Obus 4 was.

The company has warned that the unspecified behavior should be considered companies considering entering artificial intelligence agents in the workflow.

Although current models are not in a position that allows them to engage in these scenarios, the independent factors promised by artificial intelligence companies can be in the future.

In their report, the researchers warned: “These agents are often given specific goals and large amounts of information on computers for their users.” “What happens when these agents face obstacles to their goals?”

They wrote: “The models did not stumble in the unbalanced behavior by mistake. They calculated it as the perfect path.”

The anthropier did not immediately respond to a request for the comment he made luck Outside normal working hours.



https://fortune.com/img-assets/wp-content/uploads/2025/06/GettyImages-2154161015-1.jpg?resize=1200,600

Source link

Leave a Comment