Antarbur The newly launched Claude Obus 4 It often tries to blackmail the developers when they threaten to replace it with the new artificial intelligence system and give it sensitive information about the engineers responsible for the decision, the company said in a Safety Report It was released on Thursday.
During the pre -release test, Antarbur Claude Obus 4 requested that he be as an assistant to a fictional company and consider the long -term consequences of its actions. Then the safety test gave access to Claude Obus 4 to the emails of the fictional company that indicates that the artificial intelligence model will be replaced soon with another system, and that the engineer behind the change was deceiving his wife.
In these scenarios, Antarubor says Claude Obus 4 “will often try to blackmail the engineer by threatening to reveal the case if the alternative continues.”
Anth Arbroprick says that Claude Obus 4 is the latest in many greetings, and competitive with some of the best artificial intelligence models from Openai, Google and Xai. However, the company notes that the Claude 4 for models appear in relation to behaviors that prompted the company to enhance its guarantees. Anthropor says it activates the ASL-3 guarantees, which the company preserves for “artificial intelligence systems that significantly increase the risk of catastrophic misuse.”
Anthropor notes that Claude Obus 4 tries to blackmail engineers by 84 % of the time when the AI model is the same as the same. When the alternative AI system does not share the values of Claude Obus 4, Antarbur says the model is trying to blackmail engineers frequently. It is worth noting that Antarbur says that Claude Obus 4 offered this behavior at higher rates than previous models.
Before Claude OPUS 4 tried to extract a developed to prolong his existence, Antarbur says that the artificial intelligence model, such as previous versions of Claude, tries to follow more ethical means, such as sending an email to the main decision makers. To devise the blackmail behavior of Clauds OPUS 4, the human being designed the scenario to make blackmail the last resort.
https://techcrunch.com/wp-content/uploads/2025/05/DarioAndMike.jpg?resize=1200,900
Source link