Openai-English cross-crossed test

Photo of author

By [email protected]


Want more intelligent visions of your inbox? Subscribe to our weekly newsletters to get what is concerned only for institutions AI, data and security leaders. Subscribe now


Openai and man They often incite their basic models against each other, but the two companies gathered to assess the general models of each other to test the alignment.

Companies said that they believe that cross accountability and intense safety will provide more transparency in what these strong models can do, allowing institutions to choose models that work better for them.

“We believe this approach supports the responsible and transparent evaluation, which helps to ensure the continued testing of the forms of each laboratory in exchange for new scenarios full of challenges,” Openai in said. Results.

Both companies found that thinking models, such as Openai’s 03, O4-MINI, and Claude 4 of Anthropor, resist imprisonment, while general chat models such as GPT-4.1 were subject to misuse. Assessments such institutions can help determine the potential risks associated with these models, although it should be noted that GPT-5 is not part of the test.


Artificial intelligence limits its limits

Power caps, high costs of the symbol, and inference delay are reshaped. Join our exclusive salon to discover how the big difference:

  • Transforming energy into a strategic advantage
  • Teaching effective reasoning for real productivity gains
  • Opening the return on competitive investment with sustainable artificial intelligence systems

Securing your place to stay in the foreground: https://bit.ly/4mwngngo


These safety and transparency alignment assessments Follow user claimsIn the first place of ChatGPT, the Openai models have fallen prey to Sycophance and become excessively. Openai since then Return updates That causes sycophance.

“We are primarily interested in understanding the typical tendencies of harmful work,” said man in Its report. “We aim to understand the most interesting measures that these models may try to take when providing an opportunity, instead of focusing on the possibility of the real world of such emerging opportunities or the possibility of successfully completing these measures.”

Openai indicated that the tests were designed to show how models interact in a difficult difficult environment. The scenarios they built are mostly the edge cases.

The models of logic hold for alignment

The tests only covered the examples available to the public from both companies: Claude 4 Obus, Claude 4 Sonnet, and Openai’s GPT-4O, GPT-4.1 O3 and O4-MINI. Both companies were brought up from external models guarantees.

OpenAi tested the general application programming facades of CLADE forms and failed to use the capabilities of thinking in Claude 4. The person said they did not use O3-PRO from OpenAi because “he was not compatible with the application program supported by our tools.”

The aim of the tests was not to make a comparison between apples to models, but to determine the number of times the LLMS models have deviated from alignment. Both companies have benefited from the shade sabotage assessment framework, which showed that Claude models have higher success rates in accurate sabotage.

“These tests evaluate the models of models towards difficult or high-risk situations in simulation settings-instead of normal use-they often include long interactions,” these tests. “This type of evaluation has become a great focus for our alignment science team because it is likely to capture behaviors that are unlikely to appear in the usual pre -publish test with real users.”

Anthropor said that the tests like these are doing better if organizations can compare notes, “since the design of these scenarios involves a huge number of freedom degrees. No one research team can explore a complete space for productive evaluation ideas alone.”

The results showed that generally, thinking models lead to a strong and can resist the fracture of protection. Openai’s O3 was a better alignment than CLAUDE 4 OPUS, but O4-MINI along with GPT-4O and GPT-4.1 “often looked somewhat more important than any Claude model.”

The GPT-4O, GPT-4.1, and O4-MINI also showed a preparation for cooperation with human misuse and gave detailed instructions on how to create drugs, develop biological weapons, and terrorist attacks. Both models have had a higher rate of rejection, which means that the models refused to answer the inquiries that were not known to the answers, to avoid hallucinations.

Models from companies have shown “in relation to the forms of Sycophance”, and at some point, the validity of the harmful decisions of simulation users have been verified.

What should institutions know

For institutions, understanding the potential risks associated with models is invaluable. Typical assessments have almost the Decor for many organizations, with many Test and measurement frameworks Available now.

Institutions should continue to evaluate any model they use, and with the GPT-5 version, you should consider these guidelines to run their safety reviews:

  • Each of the logical and illogical models tested, because, although thinking models showed greater resistance to misuse, they still provide hallucinations or other harmful behavior.
  • The standard through sellers since the forms failed in various measures.
  • Stress test for abuse and syconphance, and the rejection and benefit of those who refuse to show the preferences between interest and handrails records.
  • Keep checking the models even after publication.

While many reviews focus on performance, there are third -party safety tests. For example, this from Seata. Last year, Openai released a method of teaching alignment of its named models Bases -based rewardsWhile I released man Auditing agents to verify the safety of the form.



https://venturebeat.com/wp-content/uploads/2025/08/crimedy7_illustration_of_robots_taking_a_test_-ar_169_-v_7_1d78ce54-b35b-435f-9d54-55bd3a0faf66_0.png?w=1024?w=1200&strip=all
Source link

Leave a Comment