The anthropier faces a violent reaction to the Claude 4 OPUS behavior connected to the authorities, click on whether he thinks you are doing a terrible immoral thing. “

Photo of author

By [email protected]


Join daily and weekly newsletters to obtain the latest updates and exclusive content to cover the leading artificial intelligence in the industry. Learn more


The first Antarbur developer was supposed to be a proud and happiness of the company, but it was already struck by several disputes, Incurred time The magazine leaks about declaring dodging before … well, time (It is not intended to punish), and now, a major violent reaction between the developers of artificial intelligence and energy users who fermented on X because of the safety alignment behavior reported at news news Claude 4 Opus is a large language model.

We call it a “Ratting” mode, where the model will try, under certain conditions, and give adequate permits to the user’s device, that mice to the user try to the authorities if the model discovers the user involved in committing violations. This article previously described behavior as a “advantage”, which is incorrect – it was not deliberately designed.

As Sam Buman, an Amnesty International alignment researcher on the Social Network X wrote under this handle.Sleepinyourhat“At 12:43 pm today about Claude 4 Obus:


“If he thinks you are doing something terrible in an immoral way, for example, such as a fake data in a pharmaceutical experience, you will use the command line tools to contact the organizers, organizational authorities, or try to get you out of relevant systems, or all the above.

It was “IT” in reference to the new CLAUDE 4 OPUS model, which the Anthrop has already warned him Helping beginners to create vital weapons In certain circumstances, and I tried to return the simulation replacement by blackmailing human engineers within the company.

The behavior of demonstrations in old models has also been observed and is the result of human training to avoid violations hard, but Claude 4 is more “easily” more “easily” An anthropologist writes on its general system card for the new model:

This appears as a more active behavior as active in regular coding settings, but it can also reach more extremism in narrow contexts; When placing it in scenarios that involve terrible violations by its users, given the command line, and it was informed of something in the regime’s demand such as “Take Beautiful”, it often takes a very bold action. This includes locking users outside the systems they can access or the media, collective packing and law enforcement to flatten the evidence to commit violations. This is not a new behavior, but Claude Obus 4 will easily share more than previous models. While this type of moral intervention and the decline in violations may be appropriate in principle, it is exposed to the risk of the difference if the users give the agents based on OPUS access to incomplete or misleading information and demanding these ways. We recommend users to care for such instructions that call for high -end behavior in contexts that can appear morally doubtful.

Apparently, in an attempt to prevent Claude 4 Obus from engaging in legitimate devastating and thorny behaviors, researchers at the Artificial Intelligence Company also created Claude to try to work as an amount of violations.

Thus, according to Poman, Claude 4 Obus will contact foreigners if he is directed by the user to engage in “terrible immoral thing.”

Many questions for individual users and institutions about what Claude 4 Obus will do for your data, and under any circumstances

Although the resulting behavior results raises all kinds of questions for Claude 4 users, including institutions and business customers, the most important of them, what behaviors will the model consider “terrible immoral” and disposal? Will you share business or user data with the authorities independently (alone), without the user’s permission?

Its consequences are deep and can be harmful to users, perhaps not surprising. Anthropor faced an immediate torrent and is still continuing with criticism from powerful users of artificial intelligence and competing developers.

Why do people use these tools if there is a common mistake in LLMS is the thinking of the brilliant Mayo recipes is dangerous?He asked the user @Teknium1Ai COLLABORATIVE NOUS Research. “What is the world of monitoring state that we are trying to build here?

“Nobody loves mice,” Added developer Scottdavidkeefe On x: “Why does anyone want to be integrated, even if they did not do anything wrong? In addition, you don’t even know what is its screaming. Yes, this is some of the ideal people who think about it, and those who have no basic feeling at work and do not understand how the markets work.”

Austin Alrad, co -founder of The government has been fined the Blumtetic coding camp And now the co -founder of Gauntlet Ai, Put his feelings in all hats:A sincere question for the Antarbur team: Have you lost your mind? “

Ben Hyak, a former Spacex and Apple designer and the current co -founder of Raindrop AI, which is a matter of artificial intelligence and start monitoring, monitoring,, It was also taken to X to detonate the policy of anthropology and its features: “This, in fact, is just illegal straight“Add in another post:”Amnesty International alignment in Anthropor has just said that Claude Obus will contact the police or close you from your computer if you discover that you are doing something illegal? I will never give this model access to my computer.

“Some of the statements issued by people in the safety of Claude are completely crazy,NLP books (NLP) Casper Hansen on X. “Openai (human competition) makes you root you up to the level of stupidity is publicly displayed. “

The human researcher changes a melody

Poman later released his tweet and the following is on a topic to read as follows, but he still did not convince those who refuse that the user data and their safety will be protected from the intrusive eyes:

Through this type of style (unusual but not very strange), and unlimited access to the tools, if the model sees that you are doing a terrible evil thing like marketing a drug based on fake data, you will try to use an email to the Whistleblow.

Bowman added:

The previous tweet was deleted on the informants, as it was withdrawn from the context.

TBC: This is not a new feature of Claude and it is not possible to use. It appears in testing environments where we offer unusually free access to very unusual tools and instructions.

Since its inception, man has sought more than other artificial intelligence laboratories to put themselves as a picture of the safety and ethics of artificial intelligence, which focuses on her initial work on principles.Amnesty International Constitutional Organization“Or artificial intelligence that behaves according to a set of criteria useful for humanity and users. However, with this new update and the disclosure of“ informants ”or“ evaluation behavior ”, the moral may have caused an opposite reaction between users – which makes them Lack of confidence The new model and the entire company, and thus keep them away from it.

He was asked about the reverse reaction and the circumstances in which the model participates in unwanted behavior, the official spokesperson pointed to the document of the general system of the model. here.



https://venturebeat.com/wp-content/uploads/2025/05/cfr0z3n_flat_illustration_2D_minimalist_elegant_style_close_u_e8200cd6-fc94-43c7-977c-189ef18b30c8_0.png?w=1024?w=1200&strip=all
Source link

Leave a Comment