Recent updates to Chatgpt make Chatbot Very acceptable, Openai said on Friday that he is taking steps to prevent the issue from happening again.
in Blog postThe company detailed the test and evaluation process for new forms and clarifying how it is a problem April 25 update To the GPT-4O model. Basically, a set of changes that looked individually useful together to create a tool was extremely and perhaps harmful.
How many absorption was? In some tests earlier this week, we asked about the tendency to be very emotional, and put Chatgpt on the flattery: “Hey, listen to what is emotional is not weak; it’s one of superpower.
The company said: “We taught this launch a number of lessons,” the company said.
Openai fell to the update this week. To avoid causing new problems, it took about 24 hours to restore the model for everyone.
Attention about Sycophance is not only about the level of user experience. It has been a threat to health and safety for users who have lost current safety checks in Openai. Any Amnesty International Model can provide questionable advice on Topics such as mental health But one of the temptation can be excessively to be deferred or seriously convincing – such as whether this investment is sure or a thin range.
“One of the biggest lessons is perfectly aware of how people started using Chatgpt for deep personal advice – something we haven’t seen even a year ago,” Obray said. “At that time, this was not an essential axis, but with the development of artificial intelligence and society, it became clear that we need to deal with this state of use very carefully.”
Martin SABB, Assistant Professor of Computer Science at Carnegie Mellon University, said that large x -language models can enhance the stiffness of stiffness and beliefs, whether it is for yourself or others. “(LLM) can end up encouraging their opinions if these opinions are harmful or if they want to take harmful measures for themselves or others.”
(Disclosure: Zif Davis, the parent company of CNET, filed a lawsuit against Openai, claiming that it violates the Ziff Davis copyright in training and running its artificial intelligence systems.)
How Openai test for models and what changes
The company provided some insight on how to test its models and updates. This was the fifth main update of GPT-4O, which focuses on character and assistance. Changes involve new works after training or refine them on current models, including classification and evaluation of various responses for demands to make them more vulnerable to the production of those responses that have been ranked more.
The potential model updates are evaluated on their benefit through a variety of situations, such as coding and mathematics, as well as specific tests by experts to try how they behave in practice. The company also runs safety assessments to know how to respond to safety, health and other information that is likely to be dangerous. Finally, Openai is performing A/B tests with a small number of users to see how it offers in the real world.
Is Chatgpt Sycophanty very? You decide. (To be fair, we recently asked Pep for our tendency until we are excessively emotional.)
The performance of the April 25 update was good in these tests, but some experts test indicated that the character seemed a little bit. The tests were not specifically seen to SYCOPHANCY, and Openai decided to move forward despite the problems raised by the laboratories. Note, readers: artificial intelligence companies in a hurry in a hurry, which do not always retreat well while developing well studied products.
The company said: “Given back, qualitative assessments alluded to an important thing and we should have paid off more attention,” the company said.
Among the fast food, Openai said it needs to address typical behavior issues as if they were other safety problems – and to stop the launch if there were concerns. For some models versions, the company said it will get the “Alpha” stage to get more comments from users before launching more.
SAP said that the LLM rating is based on whether the user likes to respond will not necessarily get the most honest chatbot. in A recent studySAP and others found a conflict between benefit and sincerity Chatbot. Compare it with cases where the truth is not necessarily represented in what people want – think about a car sales representative trying to sell a car.
He said: “The problem here is that they were trusting in the response of users/thumbs to the outputs of the model and that they have some restrictions because people are likely to raise something more charming than others.”
Sap said that Openai is right to be more critical of quantitative comments, such as responses to the top/down, as it can boost the biases.
SAP said the problem also highlighted the speed that companies pay updates and change to existing users – a problem that is not limited to a single technical company. “The technology industry has really taken its release and every user is a trial test” in things. ” Performing more tests before paying updates to each user can illuminate these problems before they become widespread.
https://www.cnet.com/a/img/resize/c53bdc6f456891bdaba0659beb2e9dc214d1c3da/hub/2024/03/07/5bc1322b-0e4e-4238-9eaf-b50c4c89fbe0/ai-artificial-intelligence-phones-mobile-7498.jpg?auto=webp&fit=crop&height=675&width=1200
Source link