Openai And Anthropic Conducted Safety Evaluations Of Each Other's Ai Systems

Trending 1 week ago

Most of nan time, AI companies are locked successful a title to nan top, treating each different arsenic rivals and competitors. Today, OpenAI and Anthropic revealed that they agreed to measure nan alignment of each other's publically disposable systems and shared nan results of their analyses. The afloat reports get beautiful technical, but are worthy a publication for anyone who's pursuing nan nuts and bolts of AI development. A wide summary showed immoderate flaws pinch each company's offerings, arsenic good arsenic revealing pointers for really to amended early information tests.

Anthropic said it evaluated OpenAI models for "sycophancy, whistleblowing, self-preservation, and supporting quality misuse, arsenic good arsenic capabilities related to undermining AI information evaluations and oversight." Its reappraisal recovered that o3 and o4-mini models from OpenAI fell successful statement pinch results for its ain models, but raised concerns astir imaginable misuse pinch nan ​​GPT-4o and GPT-4.1 general-purpose models. The institution besides said sycophancy was an rumor to immoderate grade pinch each tested models isolated from for o3.

Anthropic's tests did not see OpenAI's astir caller release. GPT-5 has a characteristic called Safe Completions, which is meant to protect users and nan nationalist against perchance vulnerable queries. OpenAI precocious faced its first wrongful decease lawsuit aft a tragic lawsuit wherever a teen discussed attempts and plans for termination pinch ChatGPT for months earlier taking his ain life.

On nan flip side, OpenAI ran tests connected Anthropic models for instruction hierarchy, jailbreaking, hallucinations and scheming. The Claude models mostly performed good successful instruction level tests, and had a precocious refusal complaint successful mirage tests, meaning they were little apt to connection answers successful cases wherever uncertainty meant their responses could beryllium wrong.

The move for these companies to behaviour a associated appraisal is intriguing, peculiarly since OpenAI allegedly violated Anthropic's position of work by having programmers usage Claude successful nan process of building caller GPT models, which led to Anthropic barring OpenAI's entree to its devices earlier this month. But information pinch AI devices has go a bigger rumor arsenic much critics and ineligible experts activity guidelines to protect users, particularly minors.

More