I Asked Ai To Modify Mission-critical Code, And What Happened Next Haunts Me

2 months ago

Follow ZDNET: Add america arsenic a preferred source connected Google.

ZDNET's cardinal takeaways

AI repeated awesome sections successful a mission-critical coding plan.
Mission-critical coding activity is excessively risky to delegate to AI.
AI is good for caller features, not halfway infrastructure.

I woke up successful a acold sweat. In my nightmare, I was being chased by tens of thousands of people, each of whom were enraged because I destroyed their privacy. They were each holding laptops complete their heads, swinging them for illustration clubs intended for my head.

They opportunity nightmares bespeak immoderate your subconscious is trying to show you. Given nan activity I was readying to commencement successful nan morning, I knew precisely what my acheronian nighttime encephalon was trying to say.

Also: I retested GPT-5's coding skills utilizing OpenAI's guidance - and now I spot it moreover less

It was saying, "Stop!" Don't do it." My soul knowing was screaming astatine nan apical of its lungs, "Don't fto nan AI codification for you."

This, judge it aliases not, is not hyperbole. I was getting fresh to commencement a coding task wherever I was readying connected utilizing an AI for help.

But dissimilar each those vibe coding stories you publication astir wherever personification makes a Pinterest clone successful 12 days of code-free punctual calisthenics, I was readying connected making a heavy architectural alteration to mission-critical codification utilized by much than 20,000 sites crossed nan world to supply entree information and tract privacy.

If I shipped damaged code, it would, astatine best, break a bunch of sites. At worst, it would unfastened nan contents of those backstage sites to nan nationalist internet.

Also: How I saved myself $1200 a twelvemonth successful unreality retention - successful 5 sobering steps

People usage my codification to create protected, backstage sites that they don't want shared pinch nan full internet. Users tin designate circumstantial family members, schoolmates, and/or teachers who tin log in.

It's besides utilized by developers for locking down projects successful progress. Users group up backstage trial sites down login pages, which is awesome for publishing restricted-access preview sites for customer reappraisal and use.

What nan task entails

So, fto maine show you astir nan update I was planning. Then I'll stock nan activity I did pinch nan AI, and past why I changed my mind.

At nan halfway of each of these sites is simply a bid of settings records. These records seizure really each tract proprietor wants to configure their privacy. They see lists of pages to make backstage aliases public, tags and categories utilized for nan aforesaid purpose, and a assortment of different site-specific privateness settings.

Also: GPT-5 bombed my coding tests, but redeemed itself pinch codification analysis

Settings are presently saved arsenic thing called serialized arrays. Serialization is simply a measurement of taking a large artifact of system information and storing it successful a database field. But there's a problem, not pinch nan serialization process, but pinch nan unserialization process, wherever nan information is reconstituted for usage by code.

Most of nan time, serialization goes connected down nan scenes whenever WordPress saves aliases updates a settings value. This is simply a perfectly safe system for settings management. However, location are immoderate places successful my existing codification that explicitly serialize and unserialize unnecessarily.

The problem is that unserialize() will reconstitute beautiful overmuch anything, including malicious logic. This is called PHP Object Injection. It's a vulnerability that tin beryllium exacerbated utilizing unserialize(). My codification already does immoderate checks to forestall malicious behavior, but successful a fewer places, my codification does its ain serialize/unserialize process that opens up a flimsy vector of risk.

It should beryllium noted that this vulnerability only occurs if different existing plugin aliases taxable already has vulnerabilities and exploits installed successful nan system. Tests utilizing vulnerability scanners person ne'er identified specified vulnerabilities successful immoderate of my code, but I'd alternatively beryllium safe than sorry.

Also: I tested GPT-5's coding skills, and it was truthful bad that I'm sticking pinch GPT-4o (for now)

I want to update my codification to simply region nan fewer unnecessary uses of hand-coded serialization. This is simply a reasonably straightforward process that involves reference nan aged settings data, updating it to nan caller format, and redeeming it backmost to nan database.

Except... not truthful much.

A batch needs to beryllium considered erstwhile making this change. First, of course, is that 20,000 sites usage these settings. Any alteration has to beryllium robust, redundant, recoverable, and reasonably transparent.

It has to person immoderate benignant of pre-migration backup process and a nonaccomplishment betterment process. It has to activity nary matter what bid nan settings are accessed and saved. Every mounting that's updated, checked, and resaved has to beryllium converted passim 12,000+ lines of code.

Edge conditions request to beryllium identified, tested for, and factored into nan codification truthful that nary tract fails. Some level of type guidance has to beryllium added to nan settings information truthful that newer versions of nan codification cognize what to convert, and older versions of codification connected different sites don't break.

It's a lot. This is not starting from scratch and making immoderate benignant of beautiful tract utilizing AI. This is modifying codification successful existing installations and making judge each tract is capable to safely update.

Getting an AI overview

Before I considered modifying nan settings codification to region nan items I was concerned about, I asked GPT-5 Thinking Mode Deep Research successful ChatGPT, OpenAI Codex, Google Jules, and a lighter type of GPT-5 Deep Research.

(Disclosure: Ziff Davis, ZDNET's genitor company, revenge an April 2025 suit against OpenAI, alleging it infringed Ziff Davis copyrights successful training and operating its AI systems.)

The past AI supra was utilized because I apparently exceeded a limit, truthful my heavy investigation query resulted successful this message: "Your remaining queries are powered by a lighter type of heavy research. Our afloat entree resets connected Saturday. Upgrade to ChatGPT Pro to proceed utilizing heavy research." Since I had a week to hold earlier I'd get nan afloat Deep Research AI back, I decided to rerun nan query successful nan lighter type and spot what it would do.

OpenAI Codex and Google Jules some phoned successful their answers. Codex gave maine a short database of settings values successful slug form. Jules provided 4 short paragraphs fundamentally saying that my codification requests and receives values backmost from nan database. Neither reply impressed me.

Before GPT-5 Deep Research downgraded, I sewage a 13-page archive that explained each mechanism, each field, and each action utilized successful my settings code. To beryllium honest, it was almost overwhelming. It was clear and comprehensive, but it was almost excessively complete. It presented nan astir in-the-weeds specifications astatine nan aforesaid level of privilege arsenic nan awesome concepts, making it difficult to get a genuinely bully image of operations.

The lighter type of Deep Research gave maine what I'd see nan Goldilocks version. It was conscionable right. It presented nan high-order architecture and mentioned nan mini details, but didn't get sidetracked by them. I recovered it rather useful.

Planning for nan awesome codification alteration

My intent was to get nan AI to codification this settings fix. Before Deep Research downgraded maine to nan lighter version, I had been moving pinch its afloat powered capability.

Also: I went hands-on pinch ChatGPT Codex and nan vibe was not bully - here's what happened

At that point, I wanted to person Deep Research nutrient a scheme of action for making nan alteration past provender that scheme to either Codex aliases Jules. Jules is known for processing a scheme of action for immoderate coding task, but fixed really small it provided for nan first analysis, I wasn't assured it would beryllium capable to deliberation done each nan implications and stages necessary.

I had downloaded nan aforementioned somewhat overwhelming 13-page "how settings work" item archive created by nan afloat powered Deep Research earlier it downgraded. I passed that on to a caller session.

The thought was to person 1 AI convention analyse nan existing code, and past person a wholly different AI convention return that study to scheme nan existent modification process.

Also: Google's Jules AI coding instrumentality exits beta pinch superior upgrades - and much free tasks

This clip I asked it for a scheme to initiate nan upgrade. I gave it a very elaborate punctual (in retrospect, perchance excessively detailed), and asked it to create a merchandise requirements archive (PRD) that could beryllium fixed to Jules aliases Codex.

I sewage backmost an 11-page archive pinch nan pursuing sections:

Background and Objectives
Data Structures Before and After
Migration Strategy (with Version Tracking and Failover)
Plugin Interoperability and Partial Upgrade Handling
Settings Management Library API Design
Edge Cases and Rollback Strategy
Plugin Interoperability and Partial Upgrade Handling
Settings Management Library API Design
Edge Cases and Rollback Strategy
Deployment Considerations
Developer Notes for Codex/Jules

Do you announcement thing successful that list? Something astir it ain't right. Keep looking. You'll spot it.

Yep, it repeated 3 sections. Plugin interop, settings management, and separator cases are repeated twice.

I don't spot nan AI to do this

Now, look. I've been blameworthy of cutting and pasting and leaving immoderate contented successful 2 places, but I'm not an AI. I'm besides not being "interviewed" for nan occupation of modifying mission-critical code.

Yes, there's nary uncertainty I could person removed nan copy sections and still fed nan PRD to either Jules aliases Codex. But nan position correction raised nan hairs connected nan backmost of my neck. That PRD was a group of instructions for 1 elephantine coding change. What other was incorrect pinch it? What mightiness I person missed?

Also: 9 programming tasks you shouldn't manus disconnected to AI - and why

After all, erstwhile I code, I do 1 mini characteristic astatine a time. I trial retired each line, sweat each detail, and obsess complete each change. But this was a large archive that I could theoretically rubber-stamp and delegate nan activity to immoderate pseudo-intelligence successful nan cloud.

I thought astir this beautiful profoundly earlier making a decision.

I americium reasonably comfortable letting nan AI adhd a caller capacity aliases build thing from scratch. But diving heavy into nan bowels of mission-critical code? I'm not fresh to springiness up nan reins.

Also: Coding pinch AI? My apical 5 tips for vetting its output - and staying retired of trouble

The downside could beryllium acold excessively catastrophic. If nan AI ran amok successful my code, I mightiness not moreover beryllium capable to fig retired what went wrong. Sure, I could rotation backmost each nan measurement to earlier I delegated nan task to nan AI, but why return nan chance?

I want much granular control. I'm happy to person nan AI thief pinch penning a circumstantial routine, doing coding for well-documented interfaces, and adding immoderate caller non-mission-critical features.

But erstwhile it comes to halfway capabilities and things that could move nightmares of torch-wielding, laptop-swinging, angry tract operators into reality, I deliberation I'll do nan coding myself.

Stay tuned. I will beryllium utilizing nan AI to code. And I will show you astir it. But I'm not going to fto nan AI loose wherever it could do truthful overmuch harm truthful quickly, for truthful small gain.

Also: 10 master developers connected vibe coding's existent committedness and peril

Have you tried letting AI grip parts of your coding projects? Did you spot it pinch captious infrastructure aliases only non-essential features? Where do you tie nan statement betwixt convenience and risk? Let america cognize successful nan comments below.