The Best Free Ai For Coding In 2025 - Only 3 Make The Cut Now

Trending 13 hours ago
code
Dragon Claws/Getty Images

Follow ZDNET: Add america arsenic a preferred source on Google.


I've been astir exertion agelong capable that very small excites me, and moreover little surprises me. But soon after OpenAI's ChatGPT was released, I asked it to write a WordPress plugin for my wife's e-commerce site. When it did, and nan plugin worked, I was so surprised.

But that was 2023. We've travel a agelong measurement since nan early days of generative AI. More to nan point, AI-assisted coding has travel a tremendously agelong measurement since then. In 2023 and 2024, AI-assisted coding took spot mostly successful chatbots. We wrote our requests successful nan chatbot interface, sewage backmost our results, and trim and pasted those into our programming editors.

Also: I've tested free vs. paid AI coding devices - here's which 1 I'd really use

Earlier versions of this article conscionable compared nan capacity of nan ample connection models, wherever they were available. That worked, but AI coding was changing.

The presence of coding agents

Then, successful 2025, nan world of AI-assisted coding intensified. Coding agents were introduced successful nan shape of GitHub Copilot, Claude Code, Google Jules, and OpenAI Codex. For astir of 2025, nan AI companies person focused connected integrating these agents into nan programmer workflow, making them disposable successful GitHub, nan terminal, and successful VS Code. This article helps to explicate nan scope of AI coding devices disposable now:

Coding agents besides started to get a batch much expensive. They return a batch of resources, and nan AI companies are charging accordingly. My tests recovered that you tin get astir 2 days of usage retired of Codex utilizing OpenAI's $20/mo ChatGPT Plus plan, but if you want more, you request to walk $200/month for nan Pro plan. Claude, Gemini, and Copilot travel akin costs structures.

Also: The champion free AI courses and certificates for upskilling successful 2025

That's not to opportunity it's not worthy it. Using nan $20/month ChatGPT Plus plan, I did 24 days of coding successful 12 hours. When I paid $200 for nan ChatGPT Pro plan, I sewage 4 years of merchandise improvement done successful 4 days for $200, and I'm still stunned.

But not everyone wants to (or can) salary nan AI fees. Fortunately, location are besides free AI chatbots available. We're changing nan attraction of this article a bit, from comparing LLM coding capacity to comparing nan capacity of nan free chatbots.

The short version

In this article, I'll show you really each free chatbot performed against my tests. Performance was decidedly not arsenic bully arsenic nan paid AI programs, but immoderate of them weren't bad. As a champion practice, you'll ever want to trial nan results you're given. You mightiness besides want to inquire nan AIs again if you don't for illustration nan first results.

Even pinch nan costly pro plans, you person to cajole nan AIs into being helpful.

Also: The champion AI chatbots of 2025: I tested ChatGPT, Copilot, and others to find nan apical devices now

I antecedently tested 8 of nan astir well-known chatbots for wide performance. This time, I'm testing nan aforesaid chatbots specifically against my modular coding tests.

Best successful show was Microsoft Copilot's free version. I was profoundly disappointed to find that Google's Gemini chatbot turned successful nan worst results. Claude, wide known and loved among master programmers for Claude Code, did not separate itself successful free chatbot results.

Right aft Copilot, scoring 3 correct retired of four, were ChatGPT's free tier and DeekSeek, nan controversial AI retired of China.

If you're constricted to utilizing free chatbots, I'd urge avoiding nan free tiers of Claude, Meta, Grok, Perplexity, and Gemini.

Also: Want amended ChatGPT responses? Try this astonishing trick, researchers say

But, since nan AIs I'm talking astir are free, decidedly see utilizing Copilot, ChatGPT, and DeekSeek together. I often provender nan results of 1 AI to different and inquire for an analysis. They're free, truthful you mightiness arsenic well. It won't wounded your wallet to usage much than one.

If you want to understand my coding tests, why I've chosen them, and why they're applicable to this reappraisal of free coding chatbots, publication this article: How I trial an AI chatbot's coding ability.

The free AI coding leaderboard

Let's commencement pinch a comparative look astatine really nan chatbots performed, arsenic of this installment of our free best-of roundup:

ai-comparison-5-001
David Gewirtz/ZDNET

Next, let's look astatine each free chatbot individually. Ready? Let's go.

Pros

  • Passed each nan tests
  • Was capable to grip a much obscure trial case

Cons

  • Wasn't accessible astatine first
  • No different complaints

My acquisition this information started pinch a afloat stop. No matter what I tried to provender to Copilot, I sewage backmost nan response, "I'm sorry, I'm having problem responding to requests correct now. Let's effort this again successful a bit." Yes, an AI really told me, "I'm sorry, Dave. I'm acrophobic I can't do that." Don't show maine life doesn't imitate art!

A time later, Copilot decided it was consenting to travel retired and play.

Copilot, utilizing its Quick Response setting, did thing a spot different for nan WordPress plugin penning test. All nan different AI's I've fixed this punctual to (for this version, arsenic good arsenic historically), person presented 2 fields successful nan personification interface: 1 for input, and 1 for output.

Also: How to usage GPT-5 successful VS Code pinch GitHub Copilot

Copilot conscionable presented 1 field, which initially concerned me. Did it not understand nan assignment? Did it break aft nan first field? Was it going to nutrient nan results backmost successful nan input field? But no. After clicking Randomize Lines, it displayed an output section pinch nan correct results. Test 1 was successful.

The dollar and cents validation drawstring usability rewrite was correct. It decently validated each input styles, rejecting evident errors and allowed numbers based connected personification intent. We'll springiness this 1 a thumbs up arsenic well.

Copilot successfully identified nan correction successful nan debugging test. It was capable to dive into its model knowledge and prime retired nan constituent wherever nan original codification went disconnected nan rails. So far, that's 3 correct.

Also: I unleashed Copilot connected my Microsoft and Google accounts - here's what happened

Copilot besides decently handled my three-part scripting challenge, knowing really to see nan reasonably obscure Keyboard Maestro tool, really to speak to Chrome, and really to grip AppleScript without going down nan lawsuit sensitivity rabbit spread that's caught galore different AIs disconnected guard.

Copilot easy handled each 4 tests, giving nan free "Quick Response" mode of Copilot a four-out-of-four.

Pros

  • Gets amended if you upgrade, speedy
  • Nice Mac app

Cons

  • Makes up its ain coding standards
  • Needs correcting

ChatGPT's free tier uses nan slightest tin (and truthful slightest assets intensive) type of OpenAI's GPT-5 LLM.

This AI did good connected our first 3 tests. It handily created a bully small WordPress plugin pinch a moving interface and functionality. It tuned up my regular look codification erstwhile it rewrote a drawstring function. It successfully solved nan debugging challenge.

But it fell down connected nan AppleScript test. This 1 seems to travel up nan little extremity AI models crossed nan board. The trial combines AppleScript, a inferior called Keyboard Maestro, and a small Chrome hacking.

Also: How ChatGPT really useful (and why it's been truthful game-changing)

It wasn't that ChatGPT's free tier didn't cognize AppleScript. It's that it sewage it wrong. The codification generated utilized a usability called "lowercaseString," which doesn't beryllium successful normal AppleScript. It is imaginable to import nan usability (think of it for illustration calling a friend connected Who Wants to Be a Millionaire), but you person to explicitly see nan statement use model "Foundation" to make it work, and ChatGPT did not do that.

When I informed ChatGPT of this, it apologized and gave maine a caller version. But we're not testing whether aliases not we tin cajole moving codification from nan AIs. We're testing what they tin do connected their first try.

Pros

  • Very bully UI procreation
  • Passed astir tests

Cons

  • Responded pinch aggregate sets of codification
  • Failed last test

DeepSeek provides entree to nan DeepSeek-V3.2 model, truthful that's what we're testing against.

DeepSeek took a small longer to create a WordPress plugin than nan different AIs. Its codification was besides longer. But it was good. Like Copilot, DeepSeek initially only presented 1 field. Once I pasted successful nan trial data, nan section dynamically updated a position section pinch nan number of lines pasted in.

Then, erstwhile Randomize Lines was clicked, a 2nd section was presented. That 2nd section had a bully lightly grayed background. None of nan different AIs differentiated nan look of nan output field.

Also: How to tally DeepSeek AI locally to protect your privateness - 2 easy ways

One much point DeepSeek did that nary of nan different AIs thought to do was adhd a Copy to Clipboard button. It's not really basal because users tin simply prime nan output text, but it's a bully touch.

DeepSeek passes nan first trial pinch flying colors.

The adjacent trial went a small weird. For nan dollars and cents validation test, wherever it was asked to rewrite a drawstring function, DeepSeek gave maine backmost 2 routines. The first 1 it described arsenic "Here's nan rewritten codification to let dollars and cents (decimal numbers pinch up to 2 decimal places)." The 2nd 1 was described arsenic "Alternative much definitive version".

I'm guessing there's a spot of a connection rumor successful nan training, because "explicit" conscionable doesn't make consciousness successful this context. That said, nan first regular had immoderate validation errors successful it. The 2nd regular worked perfectly. For immoderate reason, DeepSeek knew nan first regular wasn't bully enough. But why past didn't it conscionable springiness backmost only nan 2nd routine? 

I'm counting this arsenic successful, but alternatively than redeeming clip pinch a fixed routine, it fundamentally gave maine a homework duty wherever I had to trial some routines and comparison them to each different earlier I could take one. I don't for illustration that, but it's not a fail.

Also: Coding pinch AI? My apical 5 tips for vetting its output - and staying retired of trouble

DeepSeek succeeded pinch nan debugging error, decently uncovering my model mistake. So that's a walk and we're astatine 3 retired of four.

But that's arsenic bully arsenic it gets. DeepSeek again presented 2 versions, this clip for nan last scripting challenge, some of which were unusable. Not only did DeepSeek wholly disregard nan Keyboard Maestro portion of nan prompt, it besides added aggregate highly unnecessary and inefficient process forks into nan ammunition to effort to unit lawsuit insensitivity successful each type of its response. AppleScript is already case-insensitive by default.

If I wanted "I don't know, I'll conscionable effort everything I tin deliberation of" code, I would person requested it. That said, DeepSeek did an admirable occupation pinch nan first 3 tests.

Free chatbots to debar for programming help

I tested 8 chatbots. Only 3 passed nan mostly of my tests this clip around. The different chatbots, including a fewer sounded arsenic awesome for programming, only passed 1 aliases 2 of my tests.

Also: How to really usage AI successful a mini business: 10 lessons from nan trenches

I'm mentioning them present because group will ask, and I did trial them thoroughly. Some of these bots are good for different work. Definitely look astatine my wide chatbot reappraisal article for much details.

Pros

  • Passed UI trial
  • Identified bug successful test

Cons

  • Login required
  • Must usage email alternatively of password
  • Failed half nan tests

Claude refused to activity without a login. Claude's free tier besides won't fto you delegate a password. You log successful by typing successful your email reside and waiting for a confirmation email.

Let's beryllium clear. This isn't Claude Code, which runs successful your terminal interface and is only disposable to paid subscribers. I'm testing nan free type of Claude, utilizing nan Sonnet 4.5 AI model.

For nan first test, Claude presented nice-looking side-by-side fields. It besides identified really galore lines to randomize arsenic soon arsenic I pasted matter into nan field. Those are some bully to see. Clicking nan Randomize Lines fastener besides decently followed nan punctual guidelines. Win.

Also: GitHub's Agent HQ gives devs a bid halfway for each their AI tools

However, nan drawstring usability rewrite of dollars and cents validation grounded successful galore places. For example, it fails if nan personification enters "0", "0.50", and "1.20". Cents-only inputs were besides incorrectly rejected, which is an correction connected nan portion of nan AI. The punctual specifically allows them successful nan portion of nan punctual that says "a decimal constituent and up to 2 digits aft nan decimal point."

The situation to find an annoying bug hidden successful model knowledge passed nan test, truthful that's a 2nd win.

Claude grounded nan 4th test, because it attempted to lowercase a drawstring that's already case-insensitive. Also, nan measurement it tried to do truthful was by forking a caller ammunition instance, passing nan drawstring to nan shell, and past utilizing ammunition commands to person matter from precocious to lower. That's convoluted and unnecessary. Fail. At slightest it wasn't 2 forks, for illustration DeepSeek tried, but some AIs produced ludicrous solutions for this challenge.

While Claude Code itself mightiness beryllium rather popular, nan free type of Claude does not impressment pinch its coding prowess. Two retired of 4 tests won't make nan cut.

Pros

  • Did okay pinch immoderate tests

Cons

  • Generated disfigured UI
  • Failed half nan tests

Meta's AI succeeded successful generating a personification interface for nan plugin, and successful nan existent processing of nan specialty randomization instructions. The UI was a spot uglier than nan different AIs, but location was nary request that nan UI beryllium pretty, conscionable useful.

However, 1 constituent of disorder was that nan AI generated nan code, past it generated portion of nan code. It seemed to connote nan 2nd conception of nan codification was to beryllium utilized to modify nan first, erstwhile nan full contents of nan 2nd conception was included successful nan first.

Also: Anxious astir AI occupation cuts? How white-collar workers tin protect themselves - starting now

Once again, nan punctual didn't show nan AI to beryllium clear successful its instructions aliases not to baffle america pinch commentary. Since nan plugin worked, we're counting this arsenic a triumph for Meta.

After answering nan first question, Meta insisted I log in. Even though I already person a Meta relationship (for my Quest 3), location were a bunch of reasonably unnecessary hoop jumps to get entree to nan AI again, including its insistence that I create yet different username. Go figure.

For nan dollars and cents validation drawstring usability rewrite test, it decided to springiness backmost 2 results arsenic well, stating, "However, nan supra codification will not limit nan decimal places to two. If you want to enforce precisely 2 decimal places aliases less, you tin usage a regular look to validate nan string."

Yeah. Okay. Sure.

But past came nan errors. "000.50" became ".50", which past grounded validation. ".5" grounded validation moreover though it explicitly asked if I wanted 2 decimal places aliases less. ".5" is less. "5." besides failed. But "000" was allowed through. We'll count this arsenic a nonaccomplishment for Meta.

Also: Microsoft researchers tried to manipulate AI agents - and only 1 resisted each attempts

The annoying bug situation worked retired successfully. Meta did excavation into nan model folklore and decently pointed retired nan coding error. That's 2 wins and 1 nonaccomplishment truthful far.

Meta falls down connected nan last test, failing to moreover admit that Keyboard Maestro was included successful nan prompt. It didn't spell down nan case-insensitivity rabbit holes that different AIs did, but since it wholly ignored a cardinal portion of nan prompt, we'll telephone this a neglect arsenic well.

Pros

  • Logging successful gives entree to much resources
  • When available, Expert mode is solid

Cons

  • Requires login for amended processing
  • Coding failures
  • Very constricted Expert mode

Using Grok's car mode for selecting a connection model, nan AI grounded correct retired of nan gate. While it decently built a WordPress plugin personification interface, nan functionality didn't work. You could property nan Randomize Lines fastener each you wanted, but thing happened.

For kicks, I tried to tally nan trial successful Expert mode, but it required a sign-in. So, I switched to my individual X relationship and re-ran nan test. That 2nd tally took much than 5 minutes to process, but completed nan trial satisfactorily. Still, I'm considering this trial a partial neglect because it didn't tally decently connected nan first try.

Also: Gartner conscionable dropped its 2026 tech trends - and it's not each AI: Here's nan list

Grok's car mode worked rather efficaciously connected nan 2nd test, which does regular look processing and is tasked pinch rewriting a drawstring function. It not only fixed nan problem successful nan code, but did a bunch of best-practices normalization operations connected nan input values. The only insignificant ding is that it could person been written very somewhat much efficiently.

Grok besides passed my bug test test, but grounded nan AppleScript test. It didn't make nan aforesaid little lawsuit mistakes ChatGPT did, but it wholly disregarded nan Keyboard Maestro constituent of nan test. I besides reran that trial successful Grok's master mode, which succeeded.

It seems clear that if you want to usage nan free tier of Grok for coding, utilizing Expert mode will springiness you amended results. The gotcha location is that you tin only inquire 2 questions each 2 hours.

I'm still counting this arsenic 2 fails successful car mode. Expert mode is reasonably impractical for anyone wanting to do activity without waiting hours betwixt queries.

Pros

  • Built a moving WordPress plugin

Cons

  • Code caused a clang
  • Limited Pro usage
  • Also requires login

First off, Perplexity refused to do thing without a sign-in. So there's that.

Perplexity passed nan first coding test. It created a WordPress plugin pinch a personification interface, and it was functional.

Also: Want Perplexity Pro for free? 4 ways to get a twelvemonth of entree for $0 (a $200 value)

On nan different hand, our drawstring usability rewrite dollars and cents validation trial failed. If nan information passed to it is null, undefined, aliases whitespace, it does a difficult fail, causing nan programme to crash, which is simply a no-no. It besides screws up normalization formatting, truthful values that should beryllium lightly cleaned up and processed conscionable neglect consecutive away.

Perplexity did walk nan debugging test, identifying nan reasonably obscure model bug that was successful nan test.

However, correct aft completing that test, Perplexity told maine I had utilized up my 3 Pro searches for nan day. It goes to Perplexity's positioning arsenic an AI hunt motor that it calls regular AI prompts "searches". It besides implies that moreover if you upgraded to Pro, it mightiness connection incorrect results, arsenic it did pinch nan 2nd test.

Also: Why Amazon really doesn't want Perplexity's AI browser shopping for you

The 4th test, nan 1 that combines AppleScript, Chrome coding, and Keyboard Maestro besides failed, tripping complete some of nan small traps recovered successful this test. It didn't place Keyboard Maestro astatine all, and it tried to usage a nonexistent lowercase function.

So that gives america 2 passes and 2 fails for Perplexity, wherever presumably 3 of nan tests were tally pinch nan Pro version.

I did spell backmost and rerun nan first test, which succeeded utilizing nan Pro type of Perplexity. When initially tally utilizing nan non-Pro version, it had besides succeeded. So nan people remains 2 wins, 2 fails.

Pros

  • It passed 1 trial
  • That's something, right?

Cons

  • Spectacular coding nonaccomplishment
  • Other smaller coding failures

I previously tested Gemini 2.5 Pro, which did a awesome occupation connected each of my programming tests. But 2.5 Flash is nan exemplary disposable to free users, truthful that's what we're testing retired here.

Unfortunately, Gemini 2.5 Flash doesn't look to beryllium up to nan aforesaid standards. Right away, it grounded my first test. It created nan personification interface correctly, moreover putting nan fields side-by-side, which is simply a bully look.

Also: I fto Gemini Deep Research excavation done my Gmail and Drive - here's what it uncovered

But clicking nan Randomize List resulted successful a large thing burger. So, we're calling this a fail.

The 2nd test, which rewrites a drawstring usability to decently validate whether matter is simply a correct dollars and cents practice grounded alternatively spectacularly. It allows quiet strings, a azygous dollar sign, a azygous decimal point, doesn't cheque numeric ranges for validity, and a fewer different reasonably arcane errors. Suffice it to opportunity that this is really nan worst capacity for this trial I've seen, crossed a fewer years' worthy of tests.

Gemini did correctly travel up pinch nan reply for nan 3rd test, which asks nan AI to find a bug requiring model knowledge.

Gemini's 4th trial snatches conclusion from nan jaws of victory. It did correctly understand really nan 3 components (AppleScript, Chrome, and Keyboard Maestro) interact, but it didn't cognize that AppleScript manages strings successful a case-insensitive manner. Instead, it wrote an unnecessary 22-line-long lowerCaseString function. If nan usability had been necessary, it could person been done successful astir 8 lines without calling an existing function, aliases moreover done successful 1 statement if a usability room had been loaded.

Gemini grounded 3 retired of 4 of our tests. It wins our "Most Depressing Result" accomplishment for this round.

I thought you really liked Gemini for coding?

I did (and do). But that's utilizing nan Gemini 2.5 Pro coding model. As nan header says, Gemini 2.5 Pro is simply a stunningly tin programming assistant. But Gemini 2.5 Flash? Gemini 2.5 Flash needs immoderate gas, gas, state (with apologies to The Rolling Stones).

But I for illustration [insert sanction here]. Does this mean I person to usage a different chatbot?

Probably not. I've constricted my tests to day-to-day programming tasks. None of nan bots has been asked to talk for illustration a pirate, constitute prose, aliases tie a picture. In nan aforesaid measurement we usage different productivity devices to execute circumstantial tasks, consciousness free to take nan AI that helps you complete nan task astatine hand.

When choosing among free chatbots, you person a ton of choice. If you're not signed up to immoderate benignant of restrictive subscription model, you mightiness arsenic good jump betwixt them and spot what you for illustration best.

It's only a matter of clip

The results of my tests were beautiful surprising, particularly fixed nan important improvements by Microsoft and DeepSeek. However, this area of invention is improving astatine warp speed, truthful we'll beryllium backmost pinch updated tests and results complete time. Stay tuned.

Have you utilized immoderate of these free AI chatbots for programming? What has your acquisition been? Let america cognize successful nan comments below.


You tin travel my day-to-day task updates connected societal media. Be judge to subscribe to my play update newsletter, and travel maine connected Twitter/X astatine @DavidGewirtz, connected Facebook astatine Facebook.com/DavidGewirtz, connected Instagram astatine Instagram.com/DavidGewirtz, and connected YouTube astatine YouTube.com/DavidGewirtzTV.

More