Skip to Content

Discerning AI-Generated Text from Human Writing, Part 1

By Sean Brenner

If you’re like me, you got tired of hearing about generative AI (GenAI) a long time ago. Even if you’re not, you’ve no doubt been inundated in the flood of GenAI products and content in the past few years. 

As the tech industry’s new favorite toy has spread, part of the conversation around it has fixated on how to sniff out AI-generated content—especially AI-generated text, which can be alarmingly difficult to discern from real writing. 

For businesses, this is especially important. AI content is typically bad for your brand’s image, as it’s considered low effort and low quality. Many people will even interpret it as a sign that a brand doesn’t care about its customers, because if it did, it would surely be willing to pay a real writer or artist for higher-quality content. What’s more, the hallucinations that AI models are prone to mean that you need to know when something’s AI generated and needs further scrutiny. 

Several companies have offered easy answers to detecting GenAI text. They share AI “tells” that will instantly reveal whether something’s been generated by AI and promise that their products can spot AI-generated copy at 50 paces. 

Unfortunately, it’s not that easy. 

AI “Tells” That Don’t

If you’ve been following the conversation on discerning AI-generated text, you’ve probably heard of a few GenAI tells. Em dashes in particular have been vilified as the defining mark of AI generation, while other tells include the “it wasn’t just X, but Y” construction and rhetorical questions.

When used excessively, these sorts of writing habits can point to GenAI use, especially use of ChatGPT, which has particularly strong habits. But there’s a fatal flaw with using any of these to identify GenAI: Humans also use em dashes, the “it wasn’t just …” construction, rhetorical questions, and anything else detectors could list. (I myself am a recovering em-dash addict.) 

After all, AI-generated text is designed to mimic human writing, and GenAI models are trained on vast amounts of human writing. These models overuse certain sentence constructions, words, formats, and the like because the human content they were trained on did so. There are plenty of human writers that still do.

The most extreme AI hallucinations can reasonably be taken as tells of AI-generated content, but even that isn’t completely reliable. Humans get things wrong and spread misinformation too, if not as frequently. 

A genuine AI tell would require AI-generated text to have features that human writing never would. And when it comes to small, easy-to-spot things like em dashes and certain sentence structures, we’ve yet to see such a thing. 

GenAI Detectors That Don’t Detect

There are also dozens of tools available that claim to accurately detect AI-generated copy. Yet GenAI detectors are prone to flagging human-written copy as AI-generated. These have the same fatal flaw as looking for tells: The patterns they look for are patterns that GenAI has lifted from human writers. Any human writer with a style that’s even vaguely similar to a GenAI model is bound to get flagged by GenAI detectors, with potentially disastrous consequences for their credibility.

When it comes to the question of whether or not something is AI-generated, a false positive can be much worse than a false negative. Being accused of submitting AI-generated text as their own work is a serious blow to a writer’s credibility and professional reputation (not to mention being deeply insulting). 

GenAI detectors make this problem even worse because they give these accusations an unearned sense of credibility. As a piece of ostensibly objective, analysis-driven software rather than a fallible human, many people will give its judgment more weight than they would if a human accused a writer of GenAI use based on their own analysis. 

In reality, though, GenAI detectors are no more reliable than the GenAI tools they’re meant to sniff out. This isn’t just a flaw of current GenAI detectors; the very concept is flawed. It’s simply an automated, more sophisticated method of looking for tells. 

What’s more, the real differences between AI-generated and human copy stem from the differences in how GenAI predicts and how human brains think. A piece of software can’t spot something that wouldn’t be the product of human thinking for the same reasons GenAI gets it wrong in the first place.

Discerning GenAI content from human content isn’t completely impossible, however. While there’s no foolproof method, even with how AI models are advancing, there are still key differences between what they produce and what a human would write. 

Stay tuned: In part two of this series, I’ll take a look at what those differences are and why they matter.

Headshot of Sean Brenner

Sean Brenner is a freelance writer specializing in scripts for video essays and similar forms of content. He writes scripts for YouTube videos covering Star Wars lore for Frontier Media and Star Trek for Trek Central. You can learn more about his work at Imagined Worlds Writing Services and find him on Bluesky.

2 thoughts on “Discerning AI-Generated Text from Human Writing, Part 1

  1. Definitely eager to read part two. I have been caught in the AI issue when a peer reviewer stated that I used AI because of one the false positives on the detector stated a specific sentence (which if the peer reviewer had read the sentence, they would have noticed it was not AI written) was written by AI. I review books for a review platform that requires writing “I rate this book x out of 5 stars.” The AI detector flagged that sentence as written by AI.

reply

Your email address will not be published. Required fields are marked *