Are AI detectors actually reliable?
Last updated: May 2026
You paste in some text. The tool says "97% AI." What does that number actually mean? And should you trust it enough to fail a student, fire a writer, or pull an article? The short answer is no. The longer answer is what this piece is about.
If you've used an AI detector recently, you've probably noticed the same thing I have. You paste in some text. The tool thinks for a second. Then it gives you a percentage in big numbers, in a confident colour: green for human, red for AI. Sometimes there's a little progress bar. And that's the whole answer.
No explanation. No reasoning. No "here's what we noticed." Just the number.
This has caused real problems. There are documented cases of students failed for plagiarism based on detector scores, students who didn't actually use AI. Writers have lost jobs because something they wrote got flagged. Whole articles have been pulled from publication because the detector said so.
So it's worth asking honestly: how reliable are these tools? And if the answer is "less than they pretend to be," what should we do instead?
How AI detectors usually work
Most modern AI detectors fall into one of two camps.
The first is statistical. The tool looks at how predictable the text is. Given the first half of a sentence, how confidently could an AI model guess the second half? Human writing tends to make unexpected word choices. AI writing, optimised for plausibility, makes predictable ones. Tools like GPTZero use a metric called "perplexity" to measure this.
The problem with perplexity is that good writing is predictable too. A well-edited news article, a polished academic paper, a carefully written legal document. They all hit standard structures and standard vocabulary. They look statistically AI-like because they've been cleaned up to read smoothly. So perplexity flags lots of careful human writing as AI.
The second is pattern-based. The tool looks for specific words and structures that show up disproportionately in AI output. Words like "delve." Structures like the "it's not just X, it's Y" reframing. Sentence shapes that AI defaults to. This is closer to how a human reader spots AI: by noticing the small habits.
Pattern-based detection is more interpretable but it has the opposite failure mode. A human who happens to use one of these patterns gets flagged. Academic writers, who use a lot of formal vocabulary, get flagged constantly. Non-native English speakers, who often default to safer constructions, get flagged. Anyone trying to write "properly" trips the detector.
What that confident percentage actually represents
Here's the dirty secret. When a tool tells you "97% AI," that number isn't a probability in the way you'd expect.
It's not "97 out of 100 documents like this one were AI." It's not "we are 97% confident this is AI." It's a model's internal score, often a sigmoid output, sometimes calibrated, sometimes not. Two different detectors can score the same text 30% and 90%, and both will report their numbers with the same air of confidence.
OpenAI quietly retired their own AI Text Classifier in 2023 because, in their words, of "its low rate of accuracy." If the company that built ChatGPT couldn't make a reliable detector for ChatGPT's output, that should tell you something about the rest of the market.
The false positive problem
The most-cited paper on AI detection accuracy comes from a 2023 Stanford study that tested seven detectors on essays written by non-native English speakers. The results were striking. More than half the essays written by humans were flagged as AI-generated. Tools that worked well on native English writing fell apart on writing by anyone whose English was a second language.
This is the false positive problem, and it cuts in predictable directions:
- Non-native speakers get flagged because their writing tends to be more formal and structurally simpler, which looks AI-like to statistical detectors
- Academic writers get flagged because they use a lot of long words, careful sentence structures, and field-specific vocabulary
- Younger writers taught to write "properly" get flagged because they've been trained to use safe constructions
- Anyone who edits their writing carefully gets flagged, because editing makes prose smoother and more predictable
The people who get flagged most by AI detectors are exactly the people who can least afford to be wrongly accused: students, ESL writers, people early in their careers. The detectors aren't neutral. They have a class and education bias baked in.
The false negative problem
The other side of the coin: AI detectors miss a lot of actual AI. This is partly because the models keep getting better. Newer GPT-4 and Claude outputs are much harder to detect than 2022-era GPT-3 output. It's also because anyone trying to evade detection can do so trivially.
Run your AI-written text through a "humaniser" tool. Paraphrase it. Translate it to another language and back. Add a few personal anecdotes. Replace half the em dashes with commas. Most detectors fail completely.
This creates a strange dynamic. Honest writers, who don't think about gaming the detector, get caught for sounding too clean. Dishonest AI users, who run their output through a paraphraser, sail right through. The tools are worst at exactly the people they're supposed to catch.
What detectors are actually good for
None of this means AI detection is useless. It means we have to be honest about what it's actually for.
Here's what detectors do well:
- Spotting raw, unedited AI output. If someone pastes a ChatGPT response straight into a document without changing anything, almost any detector will catch it.
- Identifying patterns to investigate further. A detector flagging a piece of writing is a signal, not a verdict. It's a reason to look more closely.
- Helping writers identify their own AI habits. If you wrote something and a detector lights it up, it's worth asking why. Maybe you're leaning on AI-flavoured structures unconsciously.
Here's what they're bad at:
- Definitively proving authorship. The accuracy isn't there. Treating a detector score as proof is the same mistake as treating a polygraph reading as proof.
- Catching sophisticated AI use. Anyone serious about evasion can evade.
- Working fairly across different kinds of writers. The bias against non-native speakers and formal writers is well-documented.
How to think about detector results
If you're going to use AI detection tools, here's a healthier mental model than "this is X% AI":
- A detector result is a flag, not a verdict. A high score means "this is worth a second look." Nothing more.
- Pair the result with what you know about the writer. Does this read like their previous work? Does it match their voice? A detector can't answer these questions. You can.
- Look at the evidence, not just the score. A tool that just gives you a percentage is asking you to trust it blindly. A tool that shows you which patterns triggered and why lets you make your own judgement.
- Be especially careful with non-native English writers. The false positive rate is dramatically higher, for reasons that have nothing to do with AI use.
- Never use detector output as the sole basis for an accusation. If you can't make a case without the score, you don't have a case.
Telltale shows you every pattern it found, with examples and explanations. The score is there if you want it, but the evidence is the point. You can read the reasoning and decide for yourself whether it adds up.
The bigger picture
AI detection is a moving target. Models are improving faster than detectors can keep up. The tools we have today will be less useful a year from now. The fundamental problem (telling apart text written by something that learned from human text, from text written by an actual human) gets harder as the models get better.
This doesn't mean we give up on detection. It means we stop pretending detection is a solved problem with a clean answer. The honest framing is: AI detectors are indicators, useful for raising questions, not for closing them. The closer they get to acting like search engines for AI tells (here's what we noticed, and why) and the further they get from acting like polygraphs (here's the truth, in a number), the more useful they actually are.
The future of AI detection isn't going to be a tool that tells you yes or no. It's going to be a workflow where detection is one input among several. Alongside knowing the writer, knowing the context, and being willing to ask before you accuse.
Further reading
- How to spot AI-generated writing: a complete guide
- Stanford research on AI detection bias against non-native English writers
- Wikipedia's Signs of AI writing, the human-written guide that informs our methodology