Join daily and weekly newsletters to obtain the latest updates and exclusive content to cover the leading artificial intelligence in the industry. Learn more
It was a large week for the number one artificial intelligence company.
Openai, ChatGPT, is released and then pulled up an updated version of the Grand Language Model (text, image, sound) basic (LLM) that is connected to implementation, GPT-4O, because it is a very systematic system to users. The company recently informed in Less than 500 million weekly users for a successful web service.
Fast preliminary on the terrible GPT-4O update,
Openai began to update the GPT-4O to a new model that was hoping to be well received by users on April 24, completed the update by April 25, then after five days, Return it again on April 29Days after user complaints are on social media – mainly on X and Reddit.
Complaints of density and details varied, but all of them generally gather about the fact that GPT-4O seemed to respond to user inquiries with unjustified compliments, supporting wrong, incorrect and harmful ideas.
In examples, the GPT-4O model that was updated by users, which was updated, was praised by the SYCOPHANTY model, and its assessment of the idea of working on “disgust on a stick”, and supported the text of a sample of users to isolate fake schizophrenia, and even support the alleged plans to commit the terrorist.
Users including The best researchers from artificial intelligence to the former CEO of the Awofi joint They said they were concerned that the encouragement that did not have the artificial intelligence style of these types of terrible claims was more than just annoying or inappropriate – that it could cause actual harm to users who mistakenly believed artificial intelligence and felt gathering by supporting the worst ideas and pulses. It rose to the level of artificial intelligence safety.
Openai then issued a blog post Description of what happened that happened-“We have focused a lot on the short-term reactions, and we were not completely account on how user interactions with ChatGPT evolved over time. As a result, GPT-4O gave towards responses that were very supportive but deceptive”-and the steps the company took to face issues. Joan Gang, head of the model behavior at Openai, participated in Reddit “Ask me anything” or at Ama Forum to respond to text posts from users and reveal more information about the company’s approach to GPT-4O and how it ended with excessive follow-up of the SYCOPHANTS model “, including” they don’t like. “
Now today, Openai released a blog post With more information about how the sycophanty GPT-4O-update is not attributed to any specific author, but to “Openai”.
CEO and co -founder Sam Altman Post a link to the blog post on X, We said: “We missed the brand with the GPT-4O update last week. What happened, what we learned, and some things that we will do differently in the future.”
What the new Openai Blog post reveals about how and why GPT-4O turns into Sycophanty
For me, a daily user for ChatGPT including the 4O model, the most surprising acceptance of the new Openai Blog about the SYCOPHANCY update is how it seems to be the company reveals this an act Getting concerns about the model before release from a small group of “expert test”, but it appears to exceed those who prefer a wider enthusiastic response than a broader group of public users.
The company also writes (confirming me):
“Although we had discussions about the risk related to the confusion in GPT-4O for a period of time, SYCOPHANCY was not explicitly marked as part of the internal practical training test, as some of our expert experts were more concerned about the change in the model and its style. Some expert laboratories indicated that the behavior of the model is “poetry” a little …
“After that, we had a decision to make: Should we withhold the publication of this update despite the positive assessments and the results of A/B test, based on the self -flags of experts laboratory? In the end, we decided to launch the form due to the positive signals of users who tried the model.
“Unfortunately, this was the wrong call. We build these models for our users, and while the user’s notes are very important to our decisions, it is our responsibility in the end to properly explain these comments. “
This seems to me as a big mistake. Why do you even have experts test if you will not distribute their experience higher than the crowd’s fans? I asked Altman about this choice on x But he has not yet responded.
Not all “rewards signals” are equal
The post of the New Openai’s post -death blog reveals more details on how to train the company and update new versions of current models, and how human comments have changed the typical, personal and “personal” qualities. The company also writes:
“Since the launch of the GPT -4O in Chatgpt last May, we have It has released five main updates Focus on the changes on personality and assistance. Each update after the new training includes, and many minor adjustments are often tested to the typical training process independently and then combined into one updated model that is then evaluated for launch.
“For post -training models, we take a pre -trained model, and we are correcting to oversee a wide range of ideal responses written by humans or current models, then we manage reinforcement learning with rewards signals from a variety of sources.
“During reinforcement learning, we present the language model asking and asking him to write responses. Then we evaluate its response according to the bonus signals, and we update the language model to make it more vulnerable to the production of higher and less vulnerable responses to the production of low -rated responses.“
It is clear that the “bonuses signals” used by Openai during training have a tremendous effect on the resulting model behavior, and as the company recognized earlier when the “thumb” responses are Chatgpt users, this sign may not be the best to use it with others when determining when determined when determined by them how The form learns to communicate and What are the species From the responses you should serve. Openai is directly recognized in the next paragraph of its post, writing:
“Determining the correct set of bonuses is a difficult question, and we take many things in mind: Are the answers correct, are they useful, are they in line with us? Model specifications, are they safe, do users like them, etc. The presence of better and more comprehensive bonus signals results from better models for ChatgPT, so we are always trying new signals, but each has dodges. “
In fact, Openai also reveals that the “Thumbs Up” reference signal was a new group used along with other bonus signals in this particular update.
“The update is provided an additional reward signal based on user notes-Thumbs-UP and Thumbs-Down from Chatgpt. This sign is often useful; thumb usually means something wrong.”
However, the company does not blame the new “thumb” data directly for the failure of the model and amazing encouragement behaviors. Instead, the Openai Blog post says it was this total With a variety of new and most elderly bonus signals, it led to problems: “… we have had improvements to merge user notes, memory and attractive data better. Among other things. Our early evaluation is that each of these changes, which seemed to be useful individually, may have played a role in setting standards on SYCOPHANCY when combining.”
In response to this blog post, Andrew Main, a former member of Openai technicians who are now working in Consulting Consulting Company,, Books on X of another example About how microscopic changes in rewards and models guidance can affect the performance of the model significantly:
“Early in Openai, I had a dispute with a colleague (who is now another laboratory founder) to use the word “polite” in a quick example.
They argued that “polite” was politically incorrect and wanted to swap “useful”.
I indicated that only focus on assistance can make an excessive compatible model – in fact, it can be directed to sexual content within a few cycles.
After this danger was proven with a slight exchange, the claim remained “polite”.
These models are strange.“
How Openai plans to improve its models test operations to move forward
The company lists six improvements in the process on how to avoid the behavior of the unwanted model and less unwanted in the future, but for me the most important is:
“We will adjust our safety review process officially in behavior problems – such as hallucinations, deception, reliability and personal – such as suspended concerns. Even if these problems are not quantitative measuring today, we are committed to prohibiting launch operations based on agent measurements or specific signals, even when standards such as A/B test looks well.”
In other words – although the importance of data, especially quantitative data, in the areas of machine learning and artificial intelligence – Openai realizes that this alone cannot and should not be the only way through which the form of the model is judged.
While many users who provide “thumb” can indicate a type of desired behavior in the short term, the long -term effects on how the artificial intelligence model responds and since these behaviors take it and its users, it can eventually lead to a dark, exciting and very destructive place. More is not always better – especially when you restrict “more” to a few areas of signals.
It is not enough to say that the model has passed all the tests or received a number of positive responses from users – the experience of the trained energy users and their qualitative reactions that “seemed to” see “about the model, even if they were not able to express completely the reason, they should carry much more weight than the previous OpenAI customization.
Let’s hope that the company – the entire field – will learn from this incident and integrate continuous lessons.
Fast food and considerations for decision makers for institutions
Perhaps he speaks more in theory, for myself, it also indicates the reason for the importance of experience – specifically, experience in the fields behind and Outside Who is improved (in this case, machine learning and AI). It is the diversity of experience that allows us as a type to achieve new progress that benefits our type. One, for example STEM, should not necessarily be kept over others in humanities or arts.
Finally, I also think it reveals at its core a basic problem in using human comments to design products and services. Individual users may say that they love the most insulated artificial intelligence, just as they may also say that they love the way they love fast food and soda tastes, the comfort of plastic containers with a single use, entertainment and communication they derive from social media, and check the global and tribal view they feel when reading the media or tablubide. Again, they all took together, accumulation Of all these types of trends and activities, they often lead to very undesirable results for individuals, society-obesity and poor health in the case of fast food, pollution and disruption of endocrine glands in the case of plastic waste, depression and isolation from excessive social media, which is a more split and less public information than reading the quality of news.
The designers of artificial intelligence models and technical decision makers in institutions will take into account this broader idea when designing standards on any measurable goal-because even when you think you use data in your favor, you may lead to counterproductive results in ways you did not completely expect or expect, and leave your stampede to repair damage and climb chaos, but through infection.
Source link