Pangram Policing is the New Grammar Nazism
On the ethics of AI writing, disclosure, and detection
The latest AI-writing scandal arrived, fittingly, with the Pope. On May 25, 2026, the Vatican published Pope Leo XIV’s first encyclical, Magnifica Humanitas, dated May 15, on safeguarding the human person in the age of artificial intelligence. Within days, people were feeding it to Pangram, the detector that has suddenly become the respectable instrument of literary and academic suspicion. Some portions were apparently flagged as AI-assisted, and the accusation was easy to understand: the Pope, or at least the Vatican, had used AI to write about AI.
My reaction was basically: OK, and then what? Encyclicals already pass through institutional drafting, staff work, consultation, revision, translation, and committee-like smoothing. If some Vatican official used Claude to turn Pope Leo’s ideas into prose, the relevant questions would still be whether the document is accurate, thoughtful, and worth reading. But I should admit I was wrong.
In Part II of my AI series, I wrote that AI writing detectors were bad and would probably remain bad. Pangram changed my mind: it has third-party evidence behind it, claims a very low false-positive rate, and has become the detector people reach for when they suspect undisclosed AI writing. Kelsey Piper recently wrote about Pangram Labs’ claims that several prize-winning short stories were AI-generated or substantially AI-assisted, and The Atlantic’s Matteo Wong has now written about Pangram’s growing power in schools, publishing, journalism, and the AI-writing accusation economy.
A working detector makes the ethics more urgent because it tempts people to treat provenance as a verdict. That is why the joke about using Pangram to filter out human-written content is sharper than it first sounds: in plenty of settings, AI-assisted writing may be more readable and more useful than unaided human prose. If the data underneath are sound, the human sentence-level struggle contributes little.
I should also be upfront that I am an interested party here. I am that infamous AI professor who proudly writes with the help of AI. An essay arguing that AI detection can become status policing is conveniently a defense of my own practice, so readers should ask whether I am drawing the line in a way that flatters me.
The provenance spectrum
AI-writing ethics starts with the promise the writer made. A writer can make a promise to a teacher, editor, reader, institution, or recipient. The ethical question depends on that promise before it depends on a detector score.
Let’s start where the skeptics are right. A student assignment that explicitly bans AI is the obvious case. A creative writing contest that promises to recognize new human writers is another. A condolence note belongs in a different category from an exam, but if someone grieving expects words from you, outsourcing the emotional act to a machine feels like a betrayal.
Some decisions also require accountable human judgment. If I am deciding whether someone gets a scholarship or grant, provenance matters because the applicant is owed my judgment. AI can help organize evidence or check consistency, but the evaluative act has to remain mine.
Consequences matter too. The more influential the decision, and the more it depends on personal judgment, the stronger the case for knowing who or what made the call. Human discretion can also be worse than AI discretion: a committee can be biased or arbitrary, and a well-designed AI system may eventually make some decisions more consistent.
There is also a special rule for first-person claims. When I write “I think” or “I feel,” that conviction should actually be mine. AI can help me phrase it, pressure-test it, or make it less awkward. It cannot supply the conviction itself.
Research and journalism sit closer to the middle. A byline is a promise that the author stands behind the claims, evidence, and judgment calls. It has never meant that the author personally typed every sentence without help from search engines, copy editors, co-authors, translators, or now LLMs. If my name is on an argument, the argument has to be mine; the prose can be assisted.
Much technical writing belongs closer to the content-matters side. If I ask AI to describe a chart, write a methods paragraph, or translate a regression result into normal English, the important question is whether the output is correct. I still have to verify the numbers and own the final text. Accuracy and accountability carry the moral weight.
At the far end are administrative messages where almost nobody cares about the human act of writing. If a department asks you to send a polite note confirming a committee meeting, use AI freely. The relevant standard is whether the note is true and clear.
A single AI score cannot answer the ethical question. The same level of AI assistance can be harmless in an administrative email, useful in a technical report, questionable in a personal essay, and disqualifying in a no-AI classroom assignment. Context is the whole point, even when the underlying detection is accurate.
Detection has a spectrum too
The ethics of detection should follow the ethics of use. If a teacher has told students to write without AI on a particular assignment, a detector can be part of an academic-integrity process. A Pangram score should never be the only evidence, especially given the stakes for students.
Creative contests face a similar problem. Piper’s argument about the Commonwealth Short Story Prize should be taken seriously because fiction prizes are partly about human craft. If a prize is rewarding a human writer’s voice, a fully AI-generated submission violates the premise. The organizer can allow AI, ban AI, or create a separate category. Trust alone will not settle the problem.
Peer review is harder. Seth Lazar gave the strongest version of the pro-detection case in response to my earlier Pangram post: AI-generated research output can become a denial-of-service attack on peer review. The cost of producing plausible-looking papers collapses, while the obligation to read them remains expensive. In that context, a detector may help preserve scarce review capacity.
The peer-review case still depends on the goal. If the goal is to catch students violating an explicit rule, provenance is the target. If the goal is to protect reviewers from worthless submissions, provenance is only a proxy: a detector estimates the probability that a text is AI, never the probability that it is bad. The real target is bad work: hallucinated data, fake citations, nonexistent methods, and papers with no question worth answering. A detector might help triage some of that, but someone still has to check the actual claims.
My worry is that we will police the em dashes while ignoring the hallucinated data underneath them. That would be a very academic way to lose the plot: exquisite attention to the surface marker, little attention to whether the thing says anything true.
Why disclosure mostly fails
The obvious compromise is disclosure. Let people use AI, require them to say so, and let readers decide how much it matters. That sounds attractive because it treats AI assistance as information and lowers the moral temperature.
I argued in Part II of the AI series that disclosure norms collapse under the incentives they create. The detector half of that argument now needs revision because Pangram appears to work much better than I expected. The disclosure half still looks right to me.
The more ethically questionable the AI use is, the stronger the incentive to hide it. A student who used AI after promising to write unaided, a contest entrant who submitted machine-written fiction to a human-writing prize, or a researcher who used AI to paper over fake citations has every reason to stay silent.
The people most likely to disclose are the ones using AI in low-stakes ways: cleaning up a paragraph, translating a chart, or turning rough notes into readable prose they still own. Those are also the cases where disclosure matters least. The likely equilibrium is a world full of ritual acknowledgments about harmless AI assistance, while the genuinely deceptive cases remain hidden until someone investigates them.
Disclosure can still help when the disclosure itself explains the work, as it does here. Editors, teachers, prize committees, and people exercising institutional authority should also be clear about the rules they enforce. But if the whole system depends on honest confession, it will punish the conscientious and leave the strategic users alone.
The new grammar policing
I know the phrase “grammar Nazism” is abrasive, and I mean something specific by it. I was born in the Soviet Union, and Russian elite culture can be intensely sensitive to grammar, pronunciation, stress patterns, and the small status markers embedded in speech. In practice, grammar correction often doubled as social sorting: the wrong school, the wrong region, the wrong family background, or the wrong kind of education could leak through the way you spoke.
America has its own version of this. Academic English is full of status signals masquerading as standards. The right kind of fluency makes you sound smart before anyone checks whether you are right, and the wrong accent or idiom can mark you as unserious before your argument gets a hearing.
AI detection is turning this old habit into a new technical ritual. The same people who once policed grammar now police “AI tells”: em dashes, smooth transitions, generic metaphors, oddly balanced paragraphs, prose that seems a little too clean. Sometimes they are right. AI writing does have recognizable patterns, which is why I have a style guide full of them.
If someone reads a piece, learns something new, and then makes the conversation about one suspicious phrase, the Pangrammar instinct has wasted everyone’s time. A reader’s attention should go first to the claim, the evidence, and the payoff, with style policing saved for cases where the prose actually blocks understanding or signals deception.
This status dynamic is very familiar. The detector score gives a scientific-looking license to dismiss work without reading it carefully. The people who benefit are usually the incumbent writers and credentialed gatekeepers who can turn a judgment call into a score. The accusation becomes especially convenient against lower-status writers and people who do not write English well but can now use AI to translate, draft, and reach an English-language audience. Too polished looks fake. Too awkward looks low quality. Either way, the gatekeeper wins.
The moral contamination logic makes the problem worse. Once AI involvement is treated like impurity, any trace of assistance becomes enough to condemn the whole work. That is a strange standard for a world where human writing has always been socially produced by editors, reviewers, co-authors, translators, and the sentence you read yesterday.
The funniest possible equilibrium is already here. AI tools write prose that is too clear, detectors punish the clarity, and then new “humanizer” tools rewrite the prose to look more awkward. TIME recently described people inserting mistakes and oddities to avoid sounding AI-generated. This is Grammarly in reverse: make the writing worse so it looks more authentic.
What to do instead
I am arguing for detector modesty. Pangram should only be used where provenance is part of the agreement: exams with explicit no-AI rules, contests that promise human craft, or institutional settings where the source of the text is part of the job. The institutional rule should be written before the score is consulted: define what AI use would violate the promise and what appeal process follows a high score.
In many domains, the standard should be quite simple: if you put your name on the work, you own it. You own the facts, the claims, the errors, the taste, the structure, and the judgment. If AI helped you produce an accurate technical summary, good. If AI helped you produce nonsense faster, that is on you.
Because attention is scarce, people will still rely on shortcuts. They will trust names they know, journals they respect, editors with a track record, friends who have read the work, and institutions that have something to lose if they publish junk. That is imperfect and often unfair. Outsiders and newcomers pay a price when reputation becomes the filter. But at least reputation is accountable over time. If a journal, prize, professor, or writer keeps endorsing bad work, people can notice.
A Pangram score is different. It gives a quick guess about textual provenance and invites us to stop reading before we have asked what the text is doing. Pangram seems to work, so the question is no longer whether we can detect AI. The question is what we should do with that information. Use it when provenance is part of the bargain and the stakes justify an investigation. Treat it as a prompt for judgment, never as a substitute for judgment.
If the work is fake, wrong, plagiarized, emotionally fraudulent, or a violation of a clear rule, say that and act accordingly. If the work is accurate, useful, and owned by the person whose name is on it, the fact that Codex, Claude, or ChatGPT helped assemble the sentences is a weak basis for scandal. The scandal would be building a culture where everyone learns to make writing worse so it can pass as human.
One last disclosure, since the whole piece is about this question: the essay above was written entirely in Codex from several hours of my dictated thoughts, earlier posts, saved style instructions, and recent social media exchanges. The cover image and spectrum chart were also produced by Codex. This was not a one-shot prompt. We went through more than a dozen iterations, mostly refining the argument and the chart. Yes, I am using Codex more than Claude Code now. Yes, I read the draft before publishing but did not line-edit the prose at all. By my own chart, this essay sits on the content-matters side of the spectrum, and I stand behind it.





Nuanced and well reasoned as usual, sir.
I think I'm still less of a pessimist on disclosure and *more* of a pessimist on detectors, but I recognize that both are moving targets with sometimes pernicious incentives baked in.
I hope this landscape levels out before the decade is over.
I agree that falling back on Pangram - or any other AI detector - is ultimately a dead end that's likely to have as many distorting effects on what and how we produce text as LLMs themselves, but I have a tangential question about one of the points on your chart, the "literature review."
I'm not interested in the ethics so much as the purpose of the activity. Isn't the point of a literature review to personally process the ideas and concepts of the other source material? For me, I would think that means reading them and then writing my own summaries as an artifact of what I've taken from the experience. An AI summary of the literature means I don't do any of that work.
This seems like an example of the AI is like bringing a forklift to the gym critique of LLM use. The point of the literature review isn't just so a literature review exists. It's to build the knowledge of the reviewer, right?
What am I missing?