Openai And Anthropic Conducted Safety Evaluations Of Each Other's Ai Systems

1 week ago

Most of nan time, AI companies are locked successful a title to nan top, treating each different arsenic rivals and competitors. Today, OpenAI and Anthropic revealed that they agreed to measure nan alignment of each other's publically disposable systems and shared nan results of their analyses. The afloat reports get beautiful technical, but are worthy a publication for anyone who's pursuing nan nuts and bolts of AI development. A wide summary showed immoderate flaws pinch each company's offerings, arsenic good arsenic revealing pointers for really to amended early information tests.

Anthropic said it evaluated OpenAI models for "sycophancy, whistleblowing, self-preservation, and supporting quality misuse, arsenic good arsenic capabilities related to undermining AI information evaluations and oversight." Its reappraisal recovered that o3 and o4-mini models from OpenAI fell successful statement pinch results for its ain models, but raised concerns astir imaginable misuse pinch nan GPT-4o and GPT-4.1 general-purpose models. The institution besides said sycophancy was an rumor to immoderate grade pinch each tested models isolated from for o3.

Anthropic's tests did not see OpenAI's astir caller release. GPT-5 has a characteristic called Safe Completions, which is meant to protect users and nan nationalist against perchance vulnerable queries. OpenAI precocious faced its first wrongful decease lawsuit aft a tragic lawsuit wherever a teen discussed attempts and plans for termination pinch ChatGPT for months earlier taking his ain life.

On nan flip side, OpenAI ran tests connected Anthropic models for instruction hierarchy, jailbreaking, hallucinations and scheming. The Claude models mostly performed good successful instruction level tests, and had a precocious refusal complaint successful mirage tests, meaning they were little apt to connection answers successful cases wherever uncertainty meant their responses could beryllium wrong.

The move for these companies to behaviour a associated appraisal is intriguing, peculiarly since OpenAI allegedly violated Anthropic's position of work by having programmers usage Claude successful nan process of building caller GPT models, which led to Anthropic barring OpenAI's entree to its devices earlier this month. But information pinch AI devices has go a bigger rumor arsenic much critics and ineligible experts activity guidelines to protect users, particularly minors.

English (US) ·

Indonesian (ID) ·

· · ·

↑

Openai And Anthropic Conducted Safety Evaluations Of Each Other's Ai Systems

Related Article

Linus Torvalds Is Sick And Tired Of Your 'pointless Links' - And Ai Is No Excuse

How To Watch Apple Debut The Iphone 17 Lineup At Its 'awe Dropping' Event Tomorrow

These Potential Apple Watch Series 11 Features Would Make Me Upgrade Immediately

Popular Article

The Best Wireless Headphones For 2025: Bluetooth Options For Every Budget

New Travel Turmoil As American Airlines, United, Jetblue, And Avelo Slashing Flights And Routes – What You Need To Know

American, Delta, Southwest And Alaska Connecting Chicago, Philadelphia, Raleigh-durham, San Diego, Santa Maria, Sun Valley With New Winter Airline Rou...

Thousands Of Air Canada Flights At Risk As Potential Strike Threat Set To Disrupt Global Travel

Google Is Experimenting With Machine-learning Powered Age Estimation Tech In The U.s.