Ai Just Passed A Brutal Finance Exam Most Humans Fail - Should Analysts Be Worried?

Trending 2 months ago
gettyimages-1292156357
Kevin Cartr / iStock / Getty Images Plus

Follow ZDNET: Add america arsenic a preferred source on Google.


ZDNET's cardinal takeaways

  • Frontier models passed nan CFA Level III exam.
  • Less than half of (human) candidates passed nan exam successful February.
  • AI is quickly becoming amended astatine definite tasks. 

Some tasks that request Herculean cognitive effort from humans are trivially easy for AI systems, which are designed to observe and replicate analyzable patterns gleaned from tremendous troves of data. The exertion has already scored apical marks successful world-class competitions successful mathematics and coding; soon -- according to nan imagination of immoderate developers -- thief quality researchers make caller technological discoveries. 

Also: OpenAI tested GPT-5, Claude, and Gemini connected real-world tasks - nan results were surprising

Now, AI's capabilities are besides quickly catching up pinch nan astir skilled quality financial analysts.

A new study conducted by nan New York University Stern School of Business and AI-powered wealthiness guidance level GoodFin recovered that immoderate frontier AI models were capable to successfully walk a mock type of nan Chartered Financial Analyst (CFA) Level III exam, wide regarded arsenic nan world's astir difficult and prestigious testing benchmark for aspiring finance guidance professionals.

Passing nan trial  

The study analyzed nan capabilities of 23 industry-leading models, some proprietary and open-source, from developers for illustration Google, OpenAI, Anthropic, Meta, xAI, and DeepSeek. 

Also: AI helps beardown dev teams and hurts anemic ones, according to Google's 2025 DORA report

Previous studies had demonstrated that AI could walk nan CFA Levels I and II exams, but it struggled pinch nan 3rd and last (and astir difficult) phase. The Level III exam, designed to trial candidates' expertise to use their knowledge of portfolio guidance and wealthiness readying to hypothetical real-world scenarios, includes a group of multiple-choice arsenic good arsenic effort questions. 

"This dual format comprehensively tests higher-order cognitive skills including analysis, synthesis, and master judgement complete rote memorization," nan researchers down nan caller study wrote successful a paper that was primitively posted to nan preprint server tract arXiv successful June. "The exam's rigorous standards...make it an fantabulous benchmark for assessing precocious financial reasoning capabilities."

Less than half (49%) of quality test-takers passed nan Level III exam successful February, according to nan CFA Institute.

The results

A fistful of reasoning models, which specialize successful breaking problems down into a bid of sub-problems that are past tackled successively, were capable to walk nan mock exam. 

Also: This app will salary you $30/day to grounds your telephone calls for AI - but is it worthy it?

OpenAI's o4-mini scored nan apical spot pinch an wide people of 79.1%, followed by Google's Gemini 2.5 Flash, which scored 77.3%. The passing period for nan exam is 63%.

The researchers statement successful their insubstantial that while astir of nan models included successful nan study scored successful nan aforesaid scope connected nan multiple-choice conception of nan exam (around 71%-75%), their scores varied overmuch much wide connected nan much rigorous and challenging effort segment.

"This suggests that simpler, much straightforward tasks person go commoditized crossed models, whereas analyzable and nuanced reasoning tasks still differentiate frontier and reasoning-enhanced models from their peers," they write.

Also: How AI tin thief you negociate your finances (and what to watch retired for)

A caller study from Microsoft listed individual financial advisors arsenic 1 of nan forty occupation categories most apt to beryllium taken complete by AI. Still, Anna Joo Fee, laminitis and CEO of GoodFin, told CNBC that she's not concerned astir contiguous replacement. 

"There are things for illustration discourse and intent that are difficult for nan instrumentality to measure correct now," she said. "That's wherever a quality shines, successful knowing your assemblage connection and cues."

More