
OpenAI has achieved a caller milestone successful nan title to build AI models that tin logic their measurement done analyzable mathematics problems.
On Saturday, nan company announced that 1 of its models achieved golden medal-level capacity connected nan International Math Olympiad (IMO), wide regarded arsenic nan astir prestigious and difficult mathematics title successful nan world.
We achieved golden medal-level capacity 🥇on nan 2025 International Mathematical Olympiad pinch a general-purpose reasoning LLM!
Our exemplary solved world-class mathematics problems—at nan level of apical quality contestants. A awesome milestone for AI and mathematics. https://t.co/u2RlFFavyT
Critically, nan winning exemplary wasn't designed specifically to lick IMO problems, successful nan measurement that earlier systems for illustration DeepMind's AlphaGo -- which famously beat nan world's starring Go player successful 2016 -- were trained connected a monolithic dataset wrong a very narrow, task-specific domain. Rather, nan victor was a general-purpose reasoning model, designed to deliberation done problems methodically utilizing earthy language.
Also: Is ChatGPT down? You're not alone. Here's what OpenAI is saying
"This is an LLM doing mathematics and not a circumstantial general mathematics system," OpenAI wrote successful its X post. "It's portion of our main push towards wide intelligence."
(Disclosure: Ziff Davis, ZDNET's genitor company, revenge an April 2025 suit against OpenAI, alleging it infringed Ziff Davis copyrights successful training and operating its AI systems. Ziff Davis besides owns DownDetector.)
Not overmuch is known astatine this constituent astir nan personality of nan exemplary that was used. Alexander Wei, a interrogator astatine OpenAI who led nan IMO research, called it "an experimental reasoning LLM" successful an X post, which included an illustration of a strawberry wreathed successful a golden medal, suggesting it's built atop nan company's o1 family of reasoning models, which debuted successful September.
"To beryllium clear: We're releasing GPT-5 soon, but nan exemplary we utilized astatine IMO is simply a abstracted experimental model," OpenAI added connected X. "It uses caller investigation techniques that will show up successful early models -- but we don't scheme to merchandise a exemplary pinch this level of capacity for galore months."
How good did nan exemplary perform?
The IMO, which began successful 1959, attracts astir 50 contestants from much than 100 countries each year.
Contestants must supply proof-based responses to a full of six questions complete nan people of 2 days. Those proofs are assessed by erstwhile IMO golden medalists, pinch unanimous statement required for each last score. Fewer than 9% of participants execute gold.
According to Wei, OpenAI's experimental exemplary solved 5 retired of nan six problems and earned 35 retired of 42 imaginable points (about 83%), earning a golden medal. Each impervious comprised hundreds of lines of text, representing nan individual steps nan exemplary took to activity done its reasoning process. In keeping pinch nan competition's prohibition against nan usage of calculators aliases different outer tools, OpenAI's exemplary had nary entree to nan internet; it was purely reasoning done each of nan problems step-by-step.
Also: My 8 ChatGPT Agent tests produced only 1 near-perfect consequence - and a batch of replacement facts
The "model thinks for a long time," Noam Brown, different OpenAI interrogator progressive successful nan investigation project, wrote successful an X post. "o1 thought for seconds. Deep Research for minutes. This 1 thinks for hours. Importantly, it's besides much businesslike pinch its thinking."
Analysts had antecedently estimated that location was only an 18% chance that an AI strategy would triumph golden successful nan IMO by 2025, according to OpenAI.
The large picture
For each of its awesome abilities, AI has agelong struggled pinch elemental arithmetic and basal mathematics connection problems -- tasks that 1 mightiness deliberation should beryllium comparatively straightforward for precocious algorithms. But dissimilar much constrictive logical puzzles, mathematics requires a level of absurd reasoning and conceptual juggling that has been beyond nan scope of astir AI systems.
That's been changing, however, astatine an extraordinarily accelerated pace. A small complete a twelvemonth ago, AI models were still being assessed utilizing people school-level mathematics benchmarks for illustration nan GSM8K. Reasoning models for illustration o1 and DeepSeek's R1 quickly excelled, first acing precocious school-level benchmarks for illustration AIME and past advancing to nan assemblage level and beyond.
A capacity for high-level mathematics has go nan golden modular for reasoning models, since moreover a mini magnitude of mirage aliases corner-cutting tin very quickly and intelligibly ruin a model's output. It's easier to get distant pinch erstwhile generating different kinds of responses, for example, providing thief pinch a written essay, since they're very often unfastened to various kinds of interpretation.
Also: 5 tips for building instauration models for AI
OpenAI's IMO golden badge shows that a scalable, general-purpose reasoning attack tin surpass domain-specific models successful tasks that person agelong been believed to beryllium beyond nan scope of existent AI systems. As it turns out, you don't request to build hyperfocused, AlphaGo-like models trained to do thing but math; it's capable to train them to parse connection and cautiously logic done their thought process, and if they're fixed capable time, they'll beryllium capable to build AI systems that are capable to compete connected par pinch world-class quality mathematicians.
According to Brown, nan existent gait of invention happening passim nan AI manufacture suggests that its mathematical and reasoning prowess will only turn from here. "I afloat expect nan inclination to continue," he wrote connected X. "Importantly, I deliberation we're adjacent to AI substantially contributing to technological discovery."
Want much stories astir AI? Sign up for Innovation, our play newsletter.