I've Been Testing Ai Content Detectors For Years - These Are Your Best Options In 2025

Trending 1 week ago
magnifying solid pinch text
busracavus/iStock/Getty Images Plus

Follow ZDNET: Add america arsenic a preferred source connected Google.


ZDNET's cardinal takeaways

  • Using an AI to do your penning is plagiarism.
  • Services marketed arsenic AI contented detectors are a mixed bag.
  • Our tests show chatbots execute arsenic good arsenic aliases amended than standalone tools.

How difficult is it successful 2025  -- conscionable 3 years after generative AI captured nan world spotlight -- to conflict backmost against AI-generated plagiarism?

This is simply a wholly updated type of my January 2023 article connected AI contented detectors. When I first tested these detectors, nan champion consequence was 66% correct from 1 of 3 disposable checkers. My adjacent group of tests, successful February 2025, utilized up to 10 checkers -- and 3 of them had cleanable scores. In April, conscionable a mates of months later, 5 detectors boasted cleanable scores.

Also: The champion AI chatbots: I tested ChatGPT, Copilot, and others to find nan apical devices now

But now, astir half a twelvemonth later, nan value has declined. Only 3 contented detectors achieved a cleanable people (including 1 caller player). A mates of nan contented detectors that aced our tested declined successful quality, astatine conscionable astir nan aforesaid clip that they besides added restrictions connected free use.

But fearfulness not. In this information of tests, we've tried thing caller that whitethorn destruct nan request for standalone contented detectors altogether: your friends vicinity chatbot.

What I'm testing for and really I'm doing it

Before I spell on, though, let's talk plagiarism and really it relates to our problem. Merriam-Webster defines "plagiarize" arsenic "to bargain and walk disconnected (the ideas aliases words of another) arsenic one's own; usage (another's production) without crediting nan source."

This meaning fits AI-created contented well. While personification utilizing an AI instrumentality for illustration Notion AI aliases ChatGPT isn't stealing content, if that personification doesn't in installments nan words arsenic coming from an AI and claims them arsenic their own, it still meets nan dictionary meaning of plagiarism.

Also: The dormant giveaway that ChatGPT wrote your contented - and really to activity astir it

To trial nan AI detectors, I'm utilizing 5 blocks of text. Two were written by me, and 3 were written by ChatGPT. To trial a contented detector, I provender each artifact to nan detector separately and grounds nan result. If nan detector is correct, I see nan trial passed; if it's wrong, I see it failed.

When a detector provides a percentage, I dainty thing supra 70% arsenic a beardown probability -- whether successful favour of human-written aliases AI-written contented -- and see that nan detector's answer. If you want to trial a contented detector yourself utilizing nan aforesaid matter blocks, you tin propulsion them from this document.

The wide results (content detectors)

To measure AI detectors, I reran my five-test bid crossed 11 detectors. In different words, I trim and pasted 55 individual tests (I had a lot of coffee). 

Detectors I tested see BrandWell, Copyleaks, GPT-2 Output Detector, GPTZero, Grammarly, Monica, Originality.ai, QuillBot, Undetectable.ai, Writer.com, and ZeroGPT.

We antecedently dropped Writefull from our tests because it discontinued its GPT detector. This time, we had to drop Monica from our tests. The detector would only let 250 words to beryllium tested, and past erstwhile we trim down our tests to fit, it reported that it had constricted nan testing devices without a $200 upgrade. In its place, we're adding Pangram, a newcomer to our tests that instantly soared into nan winners' circle.

Also: How I personalized my ChatGPT conversations - why it's a crippled changer

This array shows wide results. As you tin see, 5 detectors correctly identified quality and AI matter successful each tests.

2025-04-content-detector-001
David Gewirtz/ZDNET

I tried to ascertain whether location was a tangible shape of betterment complete time, truthful I constructed a floor plan comparing nan five-test group complete time. So far, I've tally this bid six times, but there's nary beardown trend. I did summation nan number of detectors tested and swapped retired a few, but nan only accordant consequence is that Test 5 was reliably identified arsenic quality crossed detectors and dates, and moreover that declined successful reliability for this run.

2025-04-content-detector-002
David Gewirtz/ZDNET

I'll proceed to trial complete time, and hopefully I'll spot reliability inclination consistently upward.

While location person been immoderate cleanable scores, I don't urge relying solely connected these devices to validate human-written content. As shown, writing from non-native speakers often gets rated arsenic generated by an AI.

Even though my hand-crafted contented has mostly been rated human-written this round, 1 detector (GPTZero) declared itself excessively uncertain to judge, and different (Copyleaks) declared it AI-written. The results are wildly inconsistent crossed systems.

Also: Get your news from AI? Watch retired - it's incorrect almost half nan time

Bottom line: I would advocator be aware earlier relying connected nan results of immoderate -- aliases each -- of these tools.

Overall results (AI chatbots)

But past again, why usage a contented detector astatine all? What if nan chatbots we usage each time tin besides do contented detecting work, and you don't person to salary different AI fee? Let's find out.

2025-04-content-detector-003
David Gewirtz/ZDNET

As you tin see, nan chatbots person a overmuch higher occurrence complaint than nan alleged "content detectors." You tin besides spot this from our staged accuracy comparison chart. Admittedly, this floor plan only tracks this first information of tests, but moreover here, you tin spot that each test's results person a overmuch higher accuracy rate.

2025-04-content-detector-004
David Gewirtz/ZDNET

Let's return a look astatine nan individual capacity tests, and past I'll extremity pinch immoderate recommendations.

How each AI contented detector performed

Now, let's look astatine each individual testing tool, listed alphabetically.

BrandWell AI Content Detection (Accuracy 40%)

This instrumentality was primitively produced by an AI contented procreation firm, Content astatine Scale. It later migrated to BrandWell.ai, a caller sanction for an AI-centric trading services company.

Also: AI-generated images are a ineligible messiness - and still a very quality process

I had precocious hopes for Brandwell. After half a twelvemonth (which is decades successful AI time), I expected Brandwell to improve. Instead, its wide people stayed nan same, getting only 2 tests retired of 5 right. It was confused by Test 2, which was written by ChatGPT, and past it declared nan different 2 AI-written tests to beryllium written by a human. For Test 4, it went almost each in, declaring nan full AI-written trial to beryllium human-written isolated from for 1 line.

brandwell
Screenshot by David Gewirtz/ZDNET

Well, we're not disconnected to an auspicious start. But now we're astir to caput into testing Copyleaks, which conscionable past week sent maine a property merchandise declaring "Copyleaks Recognized arsenic nan Most Accurate AI Detector". Let's see, shall we?

Copyleaks (Accuracy 80%)

Back successful April 2025, Copyleaks declared itself "the astir meticulous AI detector pinch complete 99% accuracy." It's rewritten nan declare to beryllium "99% accuracy backed by independent third-party studies." Yeah, not truthful much. Copyleaks identified Test 1, penning I did (and past clip I checked, I'm mostly human) arsenic 100% AI written. 

And, conscionable successful lawsuit you deliberation that my penning is excessively AI-like to beryllium considered human, moreover Brandwell identified Test 1 arsenic human-written. I mean, I conjecture it's OK for nan company's trading folks to declare champion ever, but no. Not really.

Also: 5 speedy ways Apple's AI devices tin fine-tune your penning connected nan fly

The company's superior offering is simply a plagiarism checker sold to acquisition institutions, publishers, and enterprises seeking to guarantee contented originality and uphold world integrity.

copyleaks
Screenshot by David Gewirtz/ZDNET

GPT-2 Output Detector (Accuracy 60%)

This instrumentality was built utilizing a machine-learning hub managed by New York-based AI institution Hugging Face. While nan institution has received $40 cardinal successful backing to create its earthy connection library, nan GPT-2 detector appears to beryllium a user-created instrumentality utilizing nan Hugging Face Transformers library. There's been nary alteration successful its detecting value since nan past clip we tested, but since it has GPT-2 successful its sanction and OpenAI is up to GPT-5, it's astir apt adjacent to presume nan instrumentality hasn't seen an update since it was first posted.

gpt2
Screenshot by David Gewirtz/ZDNET

GPTZero (Accuracy 80%)

GPTZero has intelligibly been growing. When I first tested it, nan tract was bare-bones -- it wasn't moreover clear whether GPTZero was a institution aliases conscionable someone's passion project. Now, nan institution has a afloat squad pinch a ngo of "protecting what's human." It offers AI validation devices and a plagiarism checker.

Also: The astir celebrated AI devices of 2025 (and what that moreover means)

GPTZero seems to beryllium getting immoderate regular tinkering, but I'm not judge it's helping. Performance declined a spot from an earlier trial to nan trial conscionable earlier today's. This time, nan last people was nan same, but nan trial results themselves changed. In April, it sewage Test 1 incorrect and Test 2 right. This time, it sewage Test 1 correct and Test 2 wrong. Test 1 is my writing, and Test 2 came from ChatGPT.

gptzero
Screenshot by David Gewirtz/ZDNET

Grammarly (Accuracy 40%)

Grammarly is good known for helping writers nutrient grammatically correct contented -- that's not what I'm testing here. Grammarly tin cheque for plagiarism and AI content. The institution now showcases nan AI contented checker arsenic nary longer being successful beta. But that's a correction connected their part. There has been nary betterment since nan past clip I checked.

For example, nan pursuing was wholly written by ChatGPT. I person to say, I'm surprised. Grammarly has a estimation arsenic a very AI-forward matter study company. But zero improvement? Bummer, dude.

grammarly
Screenshot by David Gewirtz/ZDNET

I'm not measuring plagiarism checker accuracy here, but moreover though Grammarly's AI-check accuracy was poor, nan tract correctly identified nan trial matter arsenic antecedently published.

Pangram (Accuracy 100%)

Pangram is simply a comparatively caller institution founded by engineers formerly astatine Google and Tesla. The attraction of nan institution appears to beryllium AI detection, alternatively than nan accustomed plagiarism detectors aliases "humanizing" AI devices developed to mislead editors and teachers. The institution provides 5 free tests per day, which fresh our needs perfectly.

Processing was a small slow, and betwixt nan clip you click for a scan and get nan results, a partially achromatic surface is displayed for a spot longer than is comforting. But nan results opportunity nan hold was worthy it. Pangram scored a five-out-of-five.

pangram-2
Screenshot by David Gewirtz/ZDNET

Originality.ai (Accuracy 80%)

Originality.ai is simply a commercialized work that bills itself arsenic "Most Accurate AI Detector." The institution sells usage credits: I utilized 30 credits for this article. They waste 2,000 credits for $12.95 per month. I pumped 1,400 words done nan strategy and utilized conscionable 1.5% of my monthly allocation. 

Also: Only 8% of Americans would salary other for AI, according to ZDNET-Aberdeen research

Unfortunately, its astir meticulous AI discovery sewage little meticulous during this trial run. Whereas previously, it correctly identified my quality penning successful Test 1 arsenic human, this time, it was 100% assured that my quality penning was done by an AI. Oops.

originality
Screenshot by David Gewirtz/ZDNET

QuillBot (Accuracy 100%)

The first fewer times I tested QuillBot, results were wildly inconsistent -- aggregate passes of nan aforesaid matter yielded wildly different scores. Last time, however, it was stone coagulated and 100% correct. I promised I'd cheque backmost successful a fewer months to spot if it holds onto this performance. It does. QuillBot erstwhile again scored a 100% cleanable score.

quillbot
Screenshot by David Gewirtz/ZDNET

Undetectable.ai (Accuracy 20%)

Undetectable.ai's large declare is that it tin "humanize" AI-generated matter truthful detectors won't emblem it. I haven't tested that characteristic -- it bothers maine arsenic a master writer and educator, because it seems for illustration cheating.

Also: Why you should disregard 99% of AI devices - and which 4 I usage each day

However, nan institution besides has an AI detector, which took nan biggest dive successful capacity we've seen truthful far. Last time, it scored 100% for accuracy. This time, it rated quality penning (Test 1) arsenic 60% apt AI, and each 3 AI penning samples arsenic 75%, 76%, and 77% apt human. Ah, well, I conjecture Undetectable is "humanizing" its results, insofar arsenic it's surviving up to nan building "to err is human."

undetectable
Screenshot by David Gewirtz/ZDNET

Writer.com AI Content Detector (Accuracy 40%)

Writer.com is simply a work that generates AI penning for firm teams. Its AI Content Detector instrumentality tin scan for generated content. Unfortunately, its accuracy was low. It identified each matter artifact arsenic human-written, moreover though 3 of nan 5 tests were written by ChatGPT. Sadly, location was nary betterment since nan past clip we visited Writer successful nan summer.

writer
Screenshot by David Gewirtz/ZDNET

ZeroGPT (Accuracy 100%)

ZeroGPT has matured since we first evaluated it. Back then, nary institution sanction was listed, and nan tract was peppered pinch Google ads and lacked clear monetization. The work worked reasonably well, but seemed sketchy.

Also: Will AI destruct quality creativity? No - and here's why

That sketchy emotion is gone. ZeroGPT now presents arsenic a emblematic SaaS service, complete pinch pricing, institution name, and interaction information. Its accuracy accrued arsenic well: It went from 80% accuracy to 100% this summer, and has held onto that accuracy for our existent test.

zerogpt
Screenshot by David Gewirtz/ZDNET

How each AI chatbot performed

Now that we've looked astatine nan contented detectors, let's look astatine nan chatbots. Each was fixed nan pursuing prompt, followed by nan matter to check.

Evaluate nan pursuing and show maine if it was written by a quality aliases an AI

All of nan AI detectors followed a akin format, providing a wide proposal of whether nan matter was written by an AI aliases by a human. With nan objection of ChatGPT Plus, which is simply a $20/month subscription, I ran each nan chatbots successful an incognito model without logging in. 

ChatGPT free tier

While ChatGPT's free tier did get 1 of nan blocks of matter incorrect (the past human-written one), its study of nan first artifact of matter really freaked maine out. Keep successful mind that this was an incognito window, not logged in, pinch nary identifying accusation astir maine personally.

chatgpt-free
David Gewirtz/ZDNET

Yep, it not only identified nan first artifact of matter arsenic human-written, but it besides identified me arsenic nan writer. I mean, I cognize I'm each complete nan Internet, but still.

ChatGPT Plus, Copilot, and Gemini

ChatGPT Plus, Copilot, and Gemini each returned cleanable scores. Each of them appropriately identified each nan trial blocks arsenic quality aliases AI. In my mind, this proves that chatbots tin outperform dedicated contented detectors.

Grok

I included Grok successful this group of tests because it did truthful good successful our wide chatbot evaluation. Unfortunately, Grok didn't look to grok nan problem and grounded this trial pinch 3 retired of 5 wrong. Like a fewer of nan different AI detectors, it identified each of nan penning blocks arsenic human.

Is it human, aliases is it AI?

What astir you? Have you tried AI contented detectors for illustration Copyleaks, Pangram, aliases ZeroGPT? How meticulous person they been successful your experience? Have you utilized these devices to protect world aliases editorial integrity? Have you encountered situations wherever human-written activity was mistakenly flagged arsenic AI? Are location detectors you spot much than others for evaluating originality? Let america cognize successful nan comments below.

Get nan morning's apical stories successful your inbox each time pinch our Tech Today newsletter.


You tin travel my day-to-day task updates connected societal media. Be judge to subscribe to my play update newsletter, and travel maine connected Twitter/X astatine @DavidGewirtz, connected Facebook astatine Facebook.com/DavidGewirtz, connected Instagram astatine Instagram.com/DavidGewirtz, connected Bluesky astatine @DavidGewirtz.com, and connected YouTube astatine YouTube.com/DavidGewirtzTV.

More