I Tested Gpt-5's Coding Skills, And It Was So Bad That I'm Sticking With Gpt-4o (for Now)

5 hours ago

ZDNET's cardinal takeaways

OpenAI's caller GPT-5 flagship grounded half of my programming tests.
Previous OpenAI releases person had conscionable astir cleanable results.
Now that OpenAI has enabled fallbacks to different LLMs, location are options.

So GPT-5 happened. It's out. It's released. It's nan talk of nan virtual town. And it's sewage immoderate problems. I'm not gonna hide nan lede. GPT-5 has grounded half of my programming tests. That's nan worst that OpenAI's flagship LLM has ever done connected my cautiously designed tests.

Also: The champion AI for coding successful 2025 (and what not to use)

Before I get into nan details, let's return a infinitesimal to talk 1 different small characteristic that's besides a spot wonky. Check retired nan caller Edit fastener connected nan apical of nan codification dumps it generates.

Clicking nan Edit fastener takes you into a bully small codification editor. Here, I replaced nan Author field, correct successful ChatGPT's results.

That seemed nice, but it yet proved futile. When I closed nan editor, it asked maine if I wanted to save. I did. Then this unhelpful connection showed up.

I ne'er did get backmost to my original session. I had to taxable my original punctual again, and fto GPT-5 do its activity a 2nd time.

But wait. There's more. Let's excavation into my trial results…

1. Writing a WordPress plugin

This was my very first test of coding prowess for immoderate AI. It's what gave maine that first "the world is astir to change" feeling, and it was done utilizing GPT-3.5.

Subsequent tests, utilizing nan aforesaid punctual but pinch different AI models, generated mixed results. Some AIs did great, immoderate didn't. Some AIs, for illustration those from Microsoft and Google, improved complete time.

Also: How I trial an AI chatbot's coding expertise - and you can, too

ChatGPT's exemplary has been nan golden modular for this trial since nan very beginning. That makes nan results of GPT-5 each that overmuch much curious.

So, look, nan existent coding pinch GPT-5 was partially successful. GPT-5 generated a azygous artifact of code, which I pasted into a record and was capable to run. It provided nan requisite UI.

When I pasted successful nan trial names, it dynamically updated nan statement count, though it described it arsenic "Line to randomize" alternatively of "Lines to randomize."

But then, erstwhile I clicked Randomize, it didn't. Instead, it redirected maine to tools.php. What?? ChatGPT has ne'er had a problem pinch this test, whether GPT-3.5, GPT-4, aliases GPT-4o. You mean to show maine that OpenAI's much-anticipated GPT-5 is failing correct retired of nan gate? Ouch.

I past gave GPT-5 this prompt.

When I click randomize, I'm taken to http://testsite.local/wp-admin/tools.php. I do not get a database of randomized results. Can you fix?

The consequence was a statement to patch. I'm not thrilled pinch that attack because it requires nan personification to excavation done codification and to make nary mistakes replacing a line.

So, I asked GPT-5 for a afloat plugin. It gave maine nan afloat matter of nan plugin to transcript and paste. This time, it worked.

This time, it did randomize nan lines. When it encountered duplicates, it separated them from each other, arsenic it was instructed. Finally.

Also: I recovered 5 AI contented detectors that tin correctly place AI matter 100% of nan time

I'm sorry, OpenAI. I person to neglect you connected this test. You would person passed if nan only correction was not utilizing nan plural of "line" erstwhile appropriate. But nan truth that it gave maine backmost a non-working plugin connected nan first effort is neglect territory, moreover if nan AI did yet make it activity connected nan 2nd try.

No matter really you rotation it, this is simply a measurement back.

2. Rewriting a drawstring usability

This 2nd trial is designed to rewrite a drawstring usability to amended cheque for dollars and cents. The original codification that GPT-5 was asked to rewrite did not let for cents (it only checked for integers).

GPT-5 did good pinch this test. It did return a minimal consequence because it didn't do immoderate correction checking. It didn't cheque for non-string input, other whitespace, thousands separators, aliases rate symbols.

But that's not what I asked for. I told it to rewrite a function, which itself did not person immoderate correction checking. GPT-5 did precisely what I asked pinch nary embellishment. I'm benignant of gladsome of that because it doesn't cognize whether aliases not codification anterior to this regular already did that work.

GPT-5 passed this test.

3. Finding an annoying bug

This trial came astir because I was struggling pinch a less-than-obvious bug successful my code. Without going into nan weeds astir really nan WordPress model works, nan evident reply is not nan correct answer.

You request immoderate reasonably arcane knowledge astir really WordPress filters walk their information. This trial has been a stumbling artifact for much than a fewer AI LLMs.

Also: Gen AI disillusionment looms, according to Gartner's 2025 Hype Cycle report

GPT-5, however, for illustration GPT-4 and GPT-4o earlier it, did understand nan problem. It articulated a clear solution.

GPT-5 passed this test.

4. Writing a book

This trial asks nan AI to incorporated a reasonably obscure Mac scripting instrumentality called Keyboard Maestro, arsenic good arsenic Apple's scripting connection AppleScript, and Chrome scripting behavior.

It's really a trial of nan scope of nan AI successful position of knowledge, its knowing of really web pages are constructed, and nan expertise to constitute codification crossed 3 interlinked environments.

Quite a fewer AIs person grounded this test, but nan nonaccomplishment constituent is usually a deficiency of knowledge astir Keyboard Maestro. GPT-3.5 didn't cognize astir Keyboard Maestro. But ChatGPT has been passing this trial since GPT-4. Until now.

Where should we start? Well, nan bully news is that GPT-5 handled nan Keyboard Maestro portion of nan problem conscionable fine. But it sewage nan coding truthful incorrect that it moreover doubled down connected its deficiency of knowing of really lawsuit useful successful AppleScript.

It really invented a property. This is 1 of those cases wherever an AI confidently presents an reply that is wholly wrong.

Also: ChatGPT comes pinch characteristic presets now - and different upgrades you mightiness person missed

AppleScript is natively case-insensitive. If you want AppleScript to salary attraction to case, you request to usage a "considering case" block. So, this happened.

The logic nan correction connection referred to nan title of 1 of my articles is because that was nan beforehand model successful Chrome. This usability checks nan beforehand model and does worldly based connected nan title.

But misunderstanding really lawsuit useful wasn't nan only AppleScript correction GPT-5 generated. It besides referenced a adaptable named searchTerm without defining it. That's beautiful overmuch an error-creating believe successful immoderate programming language.

Fail, fail, fail, McFaildypants.

The net hath spoken

OpenAI seemed to suffer from nan aforesaid hubris that its AIs do. It confidently moved everyone to GPT-5 and burned nan bridges backmost to GPT-4o. I'm paying $200 a period for a ChatGPT Pro account. On Friday, I couldn't move backmost to GPT-4o for coding work. Neither could anyone else.

There was, however, conscionable a mini spot of personification pushback connected nan full bridges burning thing. And by tiny, I mean the full frickin' internet. So, by Saturday, ChatGPT had a caller option.

To get to this, spell to your ChatGPT settings and move connected "Show bequest models." Then, arsenic it has ever been, conscionable driblet down nan exemplary paper and take nan 1 you want. Note: this action is only disposable to those connected paid tiers. If you're utilizing ChatGPT for free, you'll return what you're given, and you'll emotion it.

Ever since nan full generative AI point kicked disconnected astatine nan opening of 2023, ChatGPT has been nan golden modular of programming tools, astatine slightest according to my LLM testing.

Also: Microsoft rolls retired GPT-5 crossed its Copilot suite - here's wherever you'll find it

Now? I'm really not sure. This is only a time aliases truthful aft GPT-5 has been released, truthful its results will astir apt get amended complete time. But for now, I'm sticking pinch GPT-4o for coding, though I do for illustration nan heavy reasoning capabilities successful GPT-5.

What astir you? Have you tried GPT-5 for programming tasks yet? Did it execute amended aliases worse than erstwhile versions for illustration GPT-4o aliases GPT-3.5? Were you capable to get moving codification connected nan first try, aliases GPT-4o did you person to guideline it done fixes? Are you going to usage GPT-5 for coding aliases instrumentality pinch older models? Let america cognize successful nan comments below.

You tin travel my day-to-day task updates connected societal media. Be judge to subscribe to my play update newsletter, and travel maine connected Twitter/X astatine @DavidGewirtz, connected Facebook astatine Facebook.com/DavidGewirtz, connected Instagram astatine Instagram.com/DavidGewirtz, connected Bluesky astatine @DavidGewirtz.com, and connected YouTube astatine YouTube.com/DavidGewirtzTV.