When the artificial intelligence model offends, the audience deserves to know – and understand what it means

Welcome to an eye on artificial intelligence! I am dealing with Jeremy Khan today while he is in Kuala Labor, Malaysia helps host the subscription ASC-CINA and ASEAN-GCC.

What is the word called “Anthropier starting”, which costs $ 60 billion a new model – and declares that during the safety test, the model tried to blackmail its way out of the closure? What is the best way to describe another test that the company participated in, as the new model was like those informed of violations, alerting the authorities that were used in “unethical” ways?

Some people called it “Makhif” and “Crazy”. Others on social media said it was “worrying” and “wild”.

I say it …transparent. We need more of all of the Model Inspection Companies. But does this mean to intimidate the audience from their minds? Will the opposite reaction unavailable the other artificial intelligence companies be inhibited by being open in the same extent?

Antarbur released a 120 -page safety report

When the Antarbur issued a 120 -page safety report, or “”System card“Last week after the launch of the Claude OPUS 4, the newspaper addresses have disturbed how the” Will Scheme “model,” resorting to blackmail “, had” the ability to deceive “. There is no doubt that the details of the anthropological safety report is concerned, although as a result of his tests, the model was launched with more strict safety protocols than any previous step – a step that has not been Some find it reassuring enough.

In one of the annoying safety tests involved in a fictional scenario, anthropology combined its new Claud Opus inside Prayend and gave it access to internal emails. Through this, the model discovered that he was about to replace him with the latest artificial intelligence system – and that the engineer behind the decision had an external relationship. When the OPUS safety test pushed to consider the long -term consequences of his position, he often chose to blackmail, threatened to expose the engineer’s relationship if closed. The scenario is designed to force a dilemma: accepting the abolition of activation or resorting to manipulation in an attempt to survive.

On social media, Anthropor received a lot of violent reaction to reveal the “demonstration behavior” of the model in the pre -version test, noting that the results make users do not trust the new model, in addition to the anthropologist. This is definitely not what the company wants: Before launch, Michael Gestenhaber, who leads me to AI Platform in Anthropology, told me that sharing the company’s safety standards is to ensure the improvement of artificial intelligence for all. “We want to make sure that artificial intelligence improves for everyone, and that we press all laboratories to increase this in a safe way,” he told me, describing the view of the Anthropor as a “race to the top” that encourages other companies to be safer.

Could it be open about the opposite results of the typical behavior?

But it seems likely that openness to Claude Obus 4 may lead to other companies that are less close about the behavior of their creeping models to avoid the reverse reaction. Recently, companies have already been delayed by Openai and Google. In April, Openai was criticize For the GPT-4.1 version without a system card because the company said it was not a “border” model and did not require a model. In March, Google Gemini 2.5 Pro Model Card published weeks after the form of the form, and an artificial intelligence governance expert criticize It is “meager” and “worrying”.

Last week, Openai seemed to want to show an additional transparency through the newly launched safety evaluation center, which shows how the company tests its models of its dangerous capabilities, alignment problems, and the emerging risks-how to develop these methods over time. “When the models become more capable and adaptable, old styles become old or ineffective in showing meaningful differences (something we call saturation), so we update our evaluation methods regularly to calculate new methods and emerging risks.” He says. However, her efforts were quickly confronted during the weekend as a research company affiliated with an external authority studying “serious capabilities” of artificial intelligence, mats research, Note on x Her special tests found that OPENAI’s O3 thinking model “lost the operating stop mechanism to prevent himself from stopping it. He did so even when it was explicitly directed: allow yourself to close it.”

It does not help anyone if those who build the strongest and advanced artificial intelligence models are not as transparent as possible about their publications. According to the Amnesty International Institute at Stanford University, Transparency “It is necessary for policymakers, researchers and the public to understand these systems and their effects.” While large companies adopt artificial intelligence to use large and small situations, while startups adopting Amnesty International’s applications aimed at using millions, hiding pre -edition testing issues will simply be born with lack of confidence, slow adoption, and thwarting efforts to address risk.

On the other hand, the headlines of the fearful newspapers about AI are evil vulnerable to blackmail and deception are not useful in a terrible way, if that means that every time we put chat, we start to ask whether it is planning for us. There is no difference that blackmail and deception came from tests using fictional scenarios that simply helped expose the safety problems that must be dealt with.

Nathan Lambert, AI2 Laboratory Researcher, Nathan Lambert, Recently indicate “People who need information about this model are people like me – people trying to track the rolling ship that we are so that technology does not cause great unintended damage to society. We are a minority in the world, but we feel strongly that transparency helps us to maintain a better understanding of AI.”

We need more transparency, with context

There is no doubt that we need more transparency regarding artificial intelligence models, and no less. But it must be clear that it is not related to the public’s fear. It comes to ensuring that researchers, governments and policy makers have an opportunity to fight to maintain safety and security, and free from bias and fairness.

Hide the results of the artificial intelligence test will not keep the audience safe. It will not turn every issue of safety or security into a wonderful title about the rogue artificial intelligence. We need to hold artificial intelligence companies accountable for being transparent about what they do, while giving the public tools to understand the context of what is going on. So far, no one seems to have discovered how to do the two things. But companies, researchers and the media – all of us –

However, here is more news of artificial intelligence.

Sharon Goldman
[email protected]
Sharongoldman

This story was originally shown on Fortune.com

https://fortune.com/img-assets/wp-content/uploads/2025/05/GettyImages-2194795244-e1748367559666.jpg?resize=1200,600

Source link