
Follow ZDNET: Add america arsenic a preferred source on Google.
ZDNET's cardinal takeaways
- Codex struggled pinch big-picture debugging crossed analyzable codebases
- Deep Research excelled astatine test erstwhile codification discourse spanned versions
- Human testing and oversight still stay captious pinch AI coding
"Huh?!?" Sometimes, erstwhile I'm coding and thing doesn't behave rather right, and I'm not wholly judge what's up, my encephalon fires disconnected an soul "Huh?!?" I deliberation it's my measurement of recognizing "there beryllium dragons" but without escalating into a full-tilt panic loop.
A fewer days into my AI-coding productizing process, aft my four-day uber-performance AI-assisted programming sprint, thing wasn't rather right. At first, it didn't look terribly incorrect (which was beryllium a misjudgment because it really was).
Also: I sewage 4 years of merchandise improvement done successful 4 days for $200, and I'm still stunned
I yet solved nan problem utilizing some OpenAI's Codex and ChatGPT Deep Research. That proved to beryllium a basal team-up. I'll explicate why successful short order.
But first, let's deconstruct nan "Huh?!""
Is it moreover a bug?
This each took spot aft my large coding sprint. I built 4 add-on products for my information product. Once nan main coding was done, location was still a batch of work, some connected nan trading and archiving broadside and connected nan distribution and operations side.
One awesome task was testing. After that, I had to zip it each up truthful my online shop could administer installable plugin packages to my users.
Also: 10 ChatGPT Codex secrets I only learned aft 60 hours of brace programming pinch it
It was present that I noticed thing odd. Clicking connected nan WordPress dashboard proved unresponsive for 15-20 seconds. But this only occurred aft I switched distant from nan WordPress dashboard of my improvement situation for immoderate number of hours.
My first entree successful nan greeting locked up for astir a 4th of a minute. But aft that, it behaved conscionable fine. The clip distant seemed to person to beryllium reasonably agelong earlier nan behaviour would manifest again. It would fastener up again if I came backmost to it aft going disconnected to do thing else, for illustration constitute an article.
I wasn't moreover judge this was a bug successful my code. It could person been thing astir my system, aliases nan build, aliases WordPress, aliases moreover conscionable my imagination.
Trying to diagnose nan rumor (part 1)
I tried describing nan problem to Codex, but because I wasn't moreover judge it was a problem, I wasn't giving it nan champion guidance. Codex wasn't capable to shed immoderate ray connected nan matter.
Also: The champion AI for coding successful 2025 (including a caller victor - and what not to use)
So I had Codex build a diagnostic level that measured each behaviour erstwhile WordPress started up. I had it drawback each hook, each call, each clip delay, and grounds it successful a diagnostics console.
Unfortunately, thing peculiarly notable was recorded successful nan diagnostics telemetry. Making matters worse, I really only had a once-a-day chance to drawback thing because nan only reliable manifestation of nan problem was nan first clip I tried to usage my trial tract astatine nan opening of nan day.
Nothing. Nada. Zip. Zilch.
Oh, yeah, it's decidedly a bug
After moving connected this for a fewer days, I decided it wasn't really a bug. It was conscionable immoderate insignificant manifestation of my improvement environment. I moved connected to signaling nan tutorial videos for each of nan plugins.
Those of you who person been pursuing on connected my AI coding saga will callback that nan first merchandise I generated utilizing Codex was a tract study tool. It captures events happening to your tract (failed logins, AI bot visits, hunt spiders, etc.) and presents some nan earthy information and a clear analysis.
Also: I did 24 days of coding successful 12 hours pinch a $20 AI instrumentality - but there's 1 large pitfall
Demoing this connected my improvement situation wasn't peculiarly absorbing because nan information would, of necessity, only beryllium trial data. So I decided to put nan instrumentality up connected my main user-facing server, nan 1 utilized to support and waste these products. That server gets rather a spot of traffic, making it a bully trial lawsuit for a visitant study instrumentality for illustration mine.
I besides needed to put nan main information merchandise up location because nan visitant instrumentality is an add-on to it. So I installed nan latest build. Bad move? Good move? Bad result.
It was a bad move because it made my tract unusable. Clicking connected thing required a infinitesimal aliases much hold earlier thing happened. It wasn't only successful nan admin dashboard. Users visiting nan beforehand extremity of nan tract knowledgeable nan aforesaid behavior.
It was a bully move, because it became instantly evident that I had a bug. This was not immoderate insignificant point that manifested erstwhile my trial tract slept overnight. No, connected an progressive site, it rendered nan tract wholly unusable.
But nan consequence was bad, because it besides became instantly evident that location was nary measurement I could vessel nan AI-created update to my users. Freezing my ain tract is 1 thing. Freezing 20,000 different sites connected nan internet? That would beryllium very bad.
Also: The champion free AI courses and certificates successful 2025 - and I've tried many
The tract was truthful slow that I couldn't entree nan plugin dashboard to disable it. I had to log successful utilizing my hosting provider's record head and delete it from extracurricular of WordPress.
Doing truthful instantly restored nan tract to due operation, making it abundantly clear that nan problem was pinch my updated information plugin.
Trying to diagnose nan rumor (part 2)
I reported this study backmost to Codex. Since I started my large coding sprint utilizing Codex connected nan ChatGPT Plus scheme and past nan Pro plan, Codex has proven to beryllium amazingly adept astatine debugging.
But not this time. As I discussed successful "10 ChatGPT Codex secrets I only learned aft 60 hours of brace programming pinch it," I person concluded that Codex doesn't activity good pinch ample assignments.
Have you ever gone to nan drive-through for a accelerated nutrient spot that's usually really reliable for your accustomed one-meal order, but this clip you placed a really large order? That accelerated nutrient spot that seems operationally coagulated connected single-meal orders almost ever screws up large orders, particularly if location are immoderate typical requests.
Also: Your colleagues are sick of your AI workslop
Codex besides doesn't grip large orders well. They invariably travel backmost arsenic a useless mess. I've been enormously successful successful nan past erstwhile breaking down ample projects by doing 1 portion astatine a time, but that didn't activity for this problem.
This was a systems problem. Something astir my full codebase, since Codex began moving connected it, caused freezes. I couldn't constituent Codex astatine 1 mini area and show it to activity and past hold for it to travel up pinch an answer. It had to look astatine everything.
I tried to show it that nan problem manifested since it started moving connected my code, but it didn't callback erstwhile that was. Codex knows only nan existent convention and thing passed on to it connected intent betwixt sessions. But it has nary existent representation of what it did, truthful it doesn't person overmuch of a model to look astatine what it mightiness person broken.
I gave it astir 20 different prompts. Each time, it went distant to think. It was starting to consciousness for illustration it does erstwhile you're buying a car and nan salesman has to tally backmost to "discuss it pinch his manager" for each measurement of nan way. Codex needed to spell distant and deliberation for 5 aliases 10 minutes, and past travel backmost pinch what was invariably a useless aliases nonfunctional "fix."
Also: AI magnifies your teams strengths - and weaknesses, Google study finds
I was very frustrated. I knew that I could spell backmost into nan codification myself and effort to diagnose nan issue. I've acquired a bunch of products wherever I sorb different peoples' codification and create an knowing of it, truthful it's a accomplishment I do have. But I besides knew that doing truthful would mean I was embarking connected what would astir apt beryllium weeks of frustrating activity that would fundamentally destruct each nan productivity gains I had achieved from doing nan brace programming pinch nan AI.
There had to beryllium a amended way.
Enter ChatGPT Deep Research
If Codex is reasonably unspeakable astatine large image work, ChatGPT's Big Research specializes successful it. I decided to springiness nan problem to Big D.
Deep Research has entree to nan GitHub repo that contains my project, truthful nan logistics of examining nan codification was nary problem. I explained nan problem and group it loose.
Also: AI is each developer's caller reality - 5 ways to make nan astir of it
About a half hr later, it came back. It blamed each my original code. It had a laundry database of places successful my original codification wherever location could beryllium very insignificant slowdowns, a millisecond here, a millisecond there.
But my codification worked. I've been shipping nan codification Deep Research complained astir for years, and it's moving connected much than 20,000 sites. If that codification were causing a awesome slowdown, I would person heard astir it.
But dissimilar Codex, which only really useful connected codification recovered successful GitHub repos aliases VS Code workspaces, Deep Research tin judge immoderate file, including zip files.
So I gave it nan distribution zip for type 3.2 of my information software. I've been shipping 3.2 for 4 months and it's installed connected 45.6% of my 20,000+ users' sites. We cognize that type isn't causing nan problem. Right now, Codex and I are moving connected what will beryllium released arsenic 4.0, and it was 4.0 that had nan frost problem.
Also: AI is much apt to toggle shape your occupation than switch it, Indeed finds
I told Deep precisely that, and that 3.2 worked fine. I told it to look astatine 4.0, examining only what was added since nan 3.2 release. That focused its study tally considerably.
And conjecture what? It figured it out. It recovered a number of concerns. The biggest interest was that my main plugin was checking nan position of a robots.txt record each azygous clip a personification accessed nan site.
This was thing that only needed to beryllium checked once, to find if immoderate features could load. But it was moving constantly. On an progressive tract (rather than my improvement machine), those checks tied up nan PHP expert until they completed. It efficaciously killed nan server.
Deep Research identified nan culprit.
Living successful nan early is cool
This is wherever things went from "my task is truthful screwed" to "I'm surviving successful nan future." I took nan results from Deep Research and explained them to Codex.
Since it was a reasonably constrictive mentation of a situation, Codex was instantly capable to zero successful connected nan problem. Its first solution was workable but still a spot problematic. We discussed it backmost and forth, and I was capable to springiness it clear instructions.
Also: The champion AI for coding successful 2025 (and what not to use)
I told it to cheque nan position of that record precisely erstwhile and retrieve nan status. Then, I asked it to springiness maine a fastener that a personification could click to petition a recheck of nan position if thing had changed successful nan server's configuration.
Instead of moving and freezing nan server for each azygous web access, it ran erstwhile connected startup, and erstwhile much if a website proprietor requested a recheck.
Codex gave maine a build that was promising. I uploaded it to my progressive server to spot really it performed. It's been moving actively now for astir 3 days and my server is moving conscionable fine.
Problem solved.
I really did consciousness for illustration I brought successful different squad members to look astatine my Code. Codex is my hired programmer. Deep Research is nan master called successful to diagnose an issue. And past Codex, arsenic my unit programmer, went disconnected and implemented nan fix.
AIs don't lick everything
As astonishing arsenic this was to acquisition and activity with, support successful mind that AIs aren't solving everything. First, nan AI missed this problem. The only measurement I recovered retired that this bug could person crippled my full personification guidelines was owed to my ain quality testing.
Second, while nan first codification was written very quickly, diagnosing this 1 problem took days. It took a batch of imaginative problem-solving connected my part. Codex didn't propose I telephone successful Deep Research, and Deep Research didn't propose comparing aged codification pinch caller code. That was each quality input.
Also: The fastest-growing AI chatbot lately? It's not ChatGPT aliases Gemini
Third, while I do person immoderate cool AI overviews, productizing nan package is taking time. I'm producing tutorials. I still person nan merchandise pages to put up. I haven't yet mastered nan last distribution software. All that is my work. It's taking nan clip that doing that type of activity takes.
I will person 4 caller products connected nan marketplace wrong a period of starting. Before AI, that would person taken years. But nan truth that nan AI coded it each successful 4 days is only 1 portion of nan puzzle. For each time of AI coding, there's astir a week of testing and merchandise guidance connected my part. Then comes nan marketing, which is an wholly different effort.
Still, we're getting close. I dream to person this worldly vessel sometime successful nan adjacent week aliases so.
What astir you? Have you ever tally into a bug that only revealed itself aft everything looked for illustration it was moving fine? Do you deliberation teaming up different AI tools, for illustration 1 for diagnosis, different for fixing, could go a modular workflow? Let america cognize successful nan comments below.
1 month ago
English (US) ·
Indonesian (ID) ·