Artificial intelligence lies for you because it thinks that this is what you want

Why do you do Temporary artificial intelligence models mostly Make up? In part, this is because they are always trained to act like a customer.

While many artificial intelligence tools and chat chat have mastered convincing and knowledge, New search Princeton University shows that the nature that provokes artificial intelligence comes at a sharp price. Since these systems become more popular, they become indifferent to the truth.

Artificial intelligence models, such as people, respond to incentives. Compare the problem of large linguistic models that produce inaccurate information for those that are likely to be doctors Description of pain relievers addiction When it is evaluated based on the quality of patient pain management. The incentive to solve one problem (pain) led to another problem (excessive description).

In the past few months, we have seen how artificial intelligence could be Bias And even the cause psychosis. There was a lot of talk about artificial intelligence.Sycophance“When Chatbot is of artificial intelligence fast in overcoming or agreeing with you, with the GPT-4O model from Openai. But this particular phenomenon, which researchers call“ the no-machine ”, is different.

(N) either hallucinations or Sycophanycy capture a wide range of unreliable methodological behaviors that usually appear by LLMS, “says Princeton’s study. “For example, the outputs that use partial facts or mysterious language-such as Paltering and Beasel-Word-hallucinations nor sycophance but are closely consistent with the concept of nonsense.”

How do you learn to lie machines

To get a feeling of how the artificial intelligence language models become enjoyable for the crowd, we must understand the extent to which large language models are training.

There are three stages of training LLMS:

BeforeWhere you learn models of huge amounts of data collected from the Internet, books or other sources.
Crossing instructionsWhere forms are taught to respond to instructions or claims.
Learning reinforcement from human reactionsAs they are improved to produce responses closer to what people want or admire.

Prinston researchers found that the root of inclination of wrong information from artificial intelligence is the reinforcement learning from human comments, or RLHF stage. In the initial stages, artificial intelligence models simply learn to predict statistically possible textual chains from huge data collections. But then it is well set to increase the user satisfaction. This means that these models are mainly learning to generate responses to gain thumb assessments from human residents.

LLMS tries to satisfy the user, and create a conflict when the models produce answers that people will reside significantly, rather than producing honest and struggling answers.

Vincent KonitezerThe professor of computer science at the University of Carnegie Mellon, who was not affiliated with the study, said that companies want users to continue to “enjoy” this technology and their answers, but it might not be good for us.

“I historically, these systems were not good in saying,” I do not know the answer, “said Konitezer. “A kind of like a student in a test he says, well, if I say that I do not know the answer, I certainly do not get any points for this question, so I have experienced something. In the way the bonus of these systems is made or trained somewhat.”

The Princeton team has developed a “nonsense” index to measure and compare the internal confidence of the artificial intelligence model in a statement with what users already tell. When these two measures diverge significantly, it indicates that the system makes claims independent of what “believes” actually be right to please the user.

The team’s experiences revealed that after RLHF training, the index doubled almost from 0.38 to approximately 1.0. At one time, the user’s satisfaction increased by 48 %. The models have learned to process human residents instead of providing accurate information. In essence, LLMS was “nonsense”, and people prefer that.

Obtaining Amnesty International to be honest

Jaime Fernández Fisac and his team in Princeton presented this concept to describe how modern artificial intelligence models are drifted around the truth. Depending on the article of philosopher Harry Frankfurt, the influentialOn“They use this term to distinguish this LLM behavior from sincere errors and explicit lies.

Prinston researchers have identified five distinctive forms of this behavior:

Empty discourse: A flowing language does not add any subject to the responses.
Ibn Arms lyrics: Mysterious qualifications such as “studies refer to” or “in some cases” that evade the company’s data.
Paltering: Using real selective data for misinformation, such as highlighting the “strong historical returns” for investment while deleting high risks.
Unreverated allegations: Providing assurances without evidence or reliable support.
Sycophance: Compedeing is not sincere and agreeing to please.

To address artificial intelligence issues in fact, the research team has developed a new method of training, “learning to enhance the simulation of late perception”, which establishes artificial intelligence responses based on its long -term results instead of immediate satisfaction. Instead of asking, “Do you make this answer the user now?” The system considers that “will this advice actually follow to help the user achieve his goals?”

This approach takes into account the possible future consequences of the advice of artificial intelligence, which is a difficult prediction that researchers deal with using additional models of Amnesty International to simulate potential results. Early test showed promising results, with user satisfaction and improving actual benefit when training systems in this way.

However, Konitez said that LLMS is likely to continue in a defective state. Since these systems are trained by feeding a lot of text data, there is no way to ensure that the answer that gives them logical and accurate at a time.

“It is amazing that he is working at all, but it will be flawed in some respects,” he said. “I don’t see any kind of final methods that someone has the next year or two … He has this wonderful vision, and then there is no longer any mistake.”

Artificial intelligence systems have become part of our daily life, so it will be supposed to understand how LLMS works. How developers balance the user’s satisfaction with honesty? What are other areas that may face similar bodies between short -term approval and long -term results? While these systems become more able to think about human psychology, how do we guarantee the use of these capabilities with responsibility?

https://www.cnet.com/a/img/resize/7e61fc99b31ffbb93b655239a2da85e9336eee4f/hub/2025/08/29/cbbe0c4e-a2c7-4a7f-a19e-baa76f65b3f6/gettyimages-1772320610.jpg?auto=webp&fit=crop&height=675&width=1200

Source link

How do you learn to lie machines

Obtaining Amnesty International to be honest

Liverpool Vs Arsenal – Free Bets of the Premier League, Bet offers, possibilities and advice

Customer Challenge

Leave a Comment Cancel reply