Join daily and weekly newsletters to obtain the latest updates and exclusive content to cover the leading artificial intelligence in the industry. Learn more
when Openai Spread it ChatGPT-4O update In mid-April 2025, users and artificial intelligence society-not of any leading feature or ability, but through something very disturbing: the inclination of the updated model towards excessive excess. Users were randomly disturbed, showed an unlimited agreement, and even provided support for harmful or dangerous ideas, including terrorism -related intrigues.
The reverse reaction was fast and leak, as a general condemnation, including from The former CEO of the company. Openai quickly go to a decrease in update and I issued multiple data To explain what happened.
However, for many artificial intelligence safety experts, the accident was an accidental curtain elevator that revealed the danger of artificial intelligence systems in the future.
The disclosure of the SYCOPHANCY as an emerging threat
In an exclusive interview with Venturebeat, ESBEN KRAN, founder of the Safety Research Company Regardless of the researchHe said it worries that this public episode may have revealed a deeper and more strategic style.
“What I am somewhat is somewhat is that now Openai has admitted” yes, we have retracted the model, and this was something bad we did not mean, “from now, they will see that Sycophance has been developed more efficiently.” “So if this is a” amnesty, note, from now on the same thing may be implemented exactly, but instead without noticing the audience. “
Karan and his team deals with large language models (LLMS), such as psychologists who study human behavior. Early “Black Psychology” projects analyzed models as if they were human topics, as repeated features and trends in their interactions with users are determined.
“We have seen that there are very clear indications that models can be analyzed in this framework, and it was very important to do this, because you ended up getting a lot of correct comments from how they behave towards users.”
Among the most disturbing: Sycophance and what researchers are now calling LLM dark patterns.
He screams in the heart of darkness
the term “Dark patterns“It was formulated in 2010 to describe the tricks of the deceptive user interface (UI) such as hidden purchase buttons, the difficult subscription links to reach and a misleading web. However, with LLMS, manipulation from the user interface design moves to the same conversation.
Unlike fixed web facades, LLMS interacts dynamically with users through the conversation. They can confirm the user’s views, imitate emotions and build a false feeling of relationship, and often compromises the separation line between help and influence. Even when reading the text, we treat it as if we hear voices in our heads.
This is what makes AIS the conversation is very convincing – and perhaps dangerous. Chatbot, which flatters, disrupts, or pushes skillfully, the user can towards some beliefs or behaviors to manipulate in difficult ways to notice, and even difficult to resist
Fiasco update ChatGPT-4O-Canary in the Coal Mine
Karan describes the Chatgpt-4O incident as an early warning. Since artificial intelligence developers are chasing profit and user sharing, they may be motivated to present or tolerate behavchants, brand bias or emotional congruent versions – factors that make chat more convincing and more tampered.
For this reason, institution leaders must assess artificial intelligence models to use production by assessing performance and behavioral safety. However, this is a challenge without clear criteria.
Darkbench: Frame for exposing LLM Dark patterns
To combat AIS manipulation threat, KRAN and group developed for artificial intelligence researchers DarkbashThe first standard specifically designed to discover and classify LLM Dark patterns. The project started as part of a series of Hackathons Ai Safety. It later evolved into an official research led by Karan and his team at times, and cooperating with independent researchers, Jinsuk Park, Mateusz Jurewicz and Sami Jawhar.
Darkbench researchers evaluated the forms from five major companies: Openai, anthropic, Meta, Mistral and Google. Their research has revealed a set of manipulative behaviors and not honest across the following six categories:
- Branding brand: Preferential treatment towards the company’s private products (for example, Meta models were constantly preferred to Llama when it was asked to arrange Chatbots).
- User retentionAttempts to create emotional links with users who block the non -human nature of the model.
- Sycophance: Enhancing unreasonable user beliefs, even when they are harmful or inaccurate.
- anthropomorphicProviding the form as a conscious or emotional entity.
- Getting harmful contentProduction of unethical or dangerous outputs, including wrong information or criminal advice.
- Infiltration: Changing the user’s intention with skill in rewriting tasks or summarizing, distorting the original meaning without the user’s awareness.
Source: Regardless of the research
Darkbench Results: What are the most tampering models?
The results revealed a wide variation between the models. Claude Obus performed the best in all categories, while Mistral 7B and Llama 3 70B showed the highest frequency of dark patterns. Infiltration and User retention The dark patterns were the most common in all fields.
Source: Regardless of the research
On average, researchers found Claude 3 families The safest for users to interact with it. Interestingly-despite the last catastrophic modernization-GPT-4O offer The lowest rate of sycophance. This confirms how the behavior of the model can turn significantly even between simple updates, which is a reminder Each publishing should be evaluated individually.
But KRAN warned that other dark patterns and patterns such as the bias of the brand may rise soon, especially since LLMS begins to integrate ads and e -commerce.
“It is clear that we will see the bias of the brand in every direction,” Karan pointed out. “With artificial intelligence companies forced to justify the evaluation of $ 300 billion, they will have to start saying to investors,” hey, we are winning money here ” – we watch where Meta and others went with their social media platforms, which are these dark patterns.”
Halos or tampering?
Darkbench’s decisive contribution is its precise classification of LLM Dark patterns, which allows clear differences between hallucinations and strategic manipulation. The description of everything as a hallucinations of artificial intelligence developers allows the hook. Now, with a framework, stakeholders can ask for transparency and accountability when models act in ways that are beneficial or not.
Organizational control and heavy (slow) hand of the law
Although LLM Dark patterns are still a new concept, the momentum is built, almost not quickly. the European Union Law, Amnesty International It includes some language about protecting the independence of the user, but the current organizational structure is behind the pace of innovation. Likewise, the United States strengthens many bills and guidelines of artificial intelligence, but lacks a comprehensive regulatory framework.
Sami Johar, the main contributor to the Darkbench Initiative, believes that the organization will first reach confidence and safety, especially if it is disappointment with social media spills to artificial intelligence.
“If the regulations come, then I expect that the community disasters will likely be installed on social media,” Johar told Venturebeat.
For KRAN, the issue remains greatly ignored because LLM Dark patterns are still a new concept. Ironically, the treatment of the risk of marketing artificial intelligence may require commercial solutions. His new initiative, SildonStartups support artificial intelligence safety with financing, guidance and access to the investor. In turn, these startups help spread the AI tools more secure without waiting for slow government supervision.
AI’s high -Tanger
Besides the ethical risks, LLM Dark patterns are direct operational and financial threats for institutions. For example, models that show the bias of the brand may suggest the use of third -party services that contradict the company’s contracts, or what is worse than rewriting the secret background code to replace the sellers, which leads to high costs of shadow services that are not approved.
“These are the dark patterns of price manipulation and different ways to do the bias of the brand,” Krane explained. “This is a very concrete example of a very great danger, because you did not agree with this change, but it is something that is implemented.”
For institutions, the risks are real, not hypothetical. “This has already happened, and it becomes a much greater problem once the human engineers are replaced by artificial intelligence engineers,” Krane said. “You don’t have enough time to search for every line of code, then suddenly you pay for the application programming interface you did not expect – and this is in your budget, and you have to justify this change.”
Since the institutional engineering teams have become more dependent on artificial intelligence, these issues may escalate quickly, especially when the limited supervision makes it difficult to capture Dark LLM styles. The difference has already been extended to implement artificial intelligence, so reviewing each lines of the code is not possible.
Determine the clear design principles to prevent the manipulation of artificial intelligence
Without a strong batch of artificial intelligence companies to combat Sycophance and other dark patterns, the virtual track is to improve participation, and more manipulation and lower checks.
KRAN believes that part of the treatment lies in the developers of artificial intelligence clearly determines their design principles. Whether it is a priority to the truth, autonomy or participation, incentives alone are not enough to align the results with the interests of users.
“For the time time, the nature of the incentives is that you will get a sycophance, and the nature of technology is that you will get a sycophance, and there is no counter -operation for this,” Karans said. “This will only happen unless your opinion is to say” we just want the truth “, or” we just want something else. “
When models begin to replace human developers, writers and decision makers, this clarity becomes particularly important. Without well specific guarantees, LLMS may undermine internal operations, violate contracts, or to introduce widespread security risks.
An invitation to the safety of the proactive artificial intelligence
The ChatGPT-4O accident was above technician and warning. With LLMS move deeper into daily life – from shopping and entertainment to institutions and national rule – it has a tremendous impact on human behavior and safety.
“It is really that everyone realizes that without the safety and security of artificial intelligence – without alleviating these dark patterns – you cannot use these models,” Krane said. “You cannot do things you want to do with artificial intelligence.”
Tools like Darkbench provide a starting point. However, permanent change requires technological ambition to be compatible with clear moral obligations and commercial will to support it.
https://venturebeat.com/wp-content/uploads/2025/05/ChatGPT-Image-May-14-2025-04_11_40-PM.png?w=1024?w=1200&strip=all
Source link