The Best Ai Agents Are Terrible Freelancers - For Now

3 days ago

Follow ZDNET: Add america arsenic a preferred source connected Google.

ZDNET's cardinal takeaways

Top AI agents neglect astatine freelance work, according to a caller study.
The study assessed Gemini 2.5 Pro, GPT-5, and different agents.
Close to half of nan US workforce did freelance activity successful 2025.

If you're a freelance worker and you've been stressed astir nan imaginable of losing your occupation to AI, you tin remainder easy -- astatine slightest for nan clip being.

According to a caller study conducted by Scale AI and nan Center for AI Safety, nan astir cutting-edge AI agents are presently only capable to automate little than 3% of nan tasks required from nan mean independent contractor, "failing to complete astir projects astatine a level that would beryllium accepted arsenic commissioned activity successful a realistic freelancing environment," nan authors wrote.

Also: Want amended ChatGPT responses? Try this astonishing trick, researchers say

The Remote Labor Index

The study, posted to nan preprint server arXiv connected Thursday and yet to beryllium peer-reviewed, establishes a testing benchmark for AI systems, which it calls nan Remote Labor Index (RLI).

The benchmark serves arsenic a qualitative model for measuring nan expertise of AI systems to execute economically valuable activity astatine a clip erstwhile immoderate tech leaders person been making sweeping claims astir nan disruptive effect AI will person connected nan labour market. Anthropic CEO Dario Amodei said successful May, for example, that nan exertion could replace up to half of each white-collar jobs wrong nan adjacent 5 years.

As nan sanction suggests, nan RLI is specifically designed to measure AI's imaginable to automate remote, freelance work. As anyone who has ever spent a stint arsenic a freelancer tin attest, this is simply a mode of activity that requires a precocious grade of self-sufficiency and organization, among different skills. It has besides go rather popular: A recent survey found that conscionable awkward of 73 cardinal Americans performed freelance activity successful 2025, representing astir 43% of nan total US workforce arsenic of August.

AI and economically valuable labor

The caller study assessed nan capacity of six industry-leading AI agents, including Google's Gemini 2.5 Pro, OpenAI's GPT-5, and Anthropic's Sonnet 4.5.

Agents, which -- dissimilar much constricted chatbots -- are capable to interact pinch integer devices (such arsenic a web browser) and execute complex, multi-step tasks, are wide positioned by tech developers arsenic a important evolutionary measurement toward nan improvement of artificial wide intelligence (AGI).

Also: AI is much apt to toggle shape your occupation than switch it, Indeed finds

AGI is an imprecisely defined term: Experts statement what it would mean for a machine to person existent "general intelligence," and if specified a feat is moreover possible. However, 1 of nan much communal definitions for AGI that gets thrown astir successful tech circles is simply a strategy that tin lucifer aliases outperform humans connected immoderate economically valuable task.

If we return that meaning arsenic a starting point, nan caller RLI study suggests we're apt a agelong measurement distant from building existent AGI. Each of nan six models tested successful nan study is "far from tin of autonomously performing nan divers demands of distant labor," according to nan authors.

The models were evaluated crossed 23 categories of freelance work, including schematic design, merchandise design, computer-aided creation (CAD), and crippled development. Those categories and their attendant accomplishment requirements were identified by nan researchers utilizing freelance platforms for illustration Upwork, "grounding nan benchmark successful economical worth and capturing nan diverseness and complexity of existent distant labour markets."

Also: The champion free AI courses and certificates for upskilling successful 2025 - and I've tried them all

Models were fed a task little on pinch immoderate basal files to complete their last deliverables, which were past manually assessed by nan researchers successful comparison to deliverables for nan aforesaid task created by quality freelancers. The goal, according to nan researchers, was to find retired "whether an AI deliverable completes nan task astatine slightest arsenic good arsenic nan quality golden modular -- specifically, whether nan deliverable would beryllium accepted by a reasonable customer arsenic nan commissioned work."

The agents were past compared utilizing an Elo metric. Manus scored nan highest, pinch an automation complaint of 2.5%, followed by Grok 4 and Claude Sonnet 2.5, some of which had a people of 2.1%.

The takeaway

Popular narratives astir AI automation tin make quality labour consciousness much unidimensional than it is successful reality. As nan AI manufacture strives to create systems that tin lucifer aliases surpass nan quality brain, we progressively admit nan brain's singular flexibility, dynamism, and complexity.

Some jobs are more amenable to automation than others, but astir require an amalgamation of method and interpersonal skills, and truthful are much analyzable than nan AI systems of coming tin handle.

Also: These jobs look nan highest consequence of AI takeover, according to Microsoft

Even today's astir precocious AI systems, which are designed to beryllium general-purpose agents, are only tin of performing a constrictive subset of nan tasks required by astir quality workers. As nan authors of nan caller RLI study wrote successful their report, nan nonaccomplishment of industry-leading agents to automate little than 3% of nan tasks required by nan mean freelancer reveals "a stark gap" separating nan committedness and actual, demonstrable capabilities of AI. That's particularly existent considering that nan RLI doesn't seizure galore aspects of astir freelancers' day-to-day activity lives, specified arsenic communicating and negotiating pinch clients.

Then again, these are early days. The capabilities of agents are expanding rapidly, and nan largest tech developers are investing billions successful training new, much precocious models. It's imaginable that successful 5 aliases 10 years, companies will beryllium hiring AI freelancers. But for now, contractors don't look to person immoderate existent logic to fearfulness nan AI occupation reaper.

Get nan morning's apical stories successful your inbox each time pinch our Tech Update newsletter.