40 Comments
User's avatar
Viachaslau Kozel's avatar

Alexander, I understand that using a heavy dose of optimism to shake the academic community is a fair tactical move. But there is a real danger that unchecked optimism might actually do more structural damage to academia than the denialism you are fighting.

I want to point out a few blind spots in this narrative, not to defend the old ways, but to look at the actual nature of the tool we are dealing with.

First, regarding the nature of the models. We don't need to rehash the "stochastic parrot" debate, but we have to stay grounded in how the current architecture actually works. Even with the recent introduction of reasoning steps, these systems remain statistical approximations at their core. They haven't developed a mechanism to genuinely understand their own errors; they are still running loops within their probabilistic weights. They provide a very high-quality simulation of reasoning, but they lack the metacognition to know when they are wrong. Doing heavy AI-assisted R&D I see such issues every single day.

This brings up the core problem: cognitive hacking. AI generates fluent, confident text and our brains naturally associate with expertise. It requires an enormous, unnatural amount of willpower to force yourself to rigorously validate something that already looks perfect. It works against our dopamine system, making us intellectually lazy without us even realizing it.

And this is where the specific vulnerability of your field comes in. In mathematics there is a strict apparatus for validation. But in social sciences, the validity of a hypothesis is often tied to how coherently it is argued and framed. In these fields, an imitation generator completely breaks the system. When a generated text looks highly expert and logically linked, verifying the actual truth of the claims becomes incredibly difficult and energy-consuming. The AI exploits the exact metric we usually use to judge quality there.

The goal shouldn't be to just accept the output because it saves time and looks good. The real challenge right now is designing strict, effective workflows that acknowledge these cognitive traps and the fundamental limitations of the tool.

Because of all this, I honestly think the current academic pushback and outright bans might actually be doing more good than harm at this specific moment. I would gladly be an optimistic advocate for progress myself, but there are critical, unresolved structural problems here. We need to figure out how to handle them before we let the current wave of hype drive widespread, uncritical adoption. The goal shouldn't be to just accept the output because it saves time. The real challenge right now is designing strict, effective workflows that acknowledge these cognitive traps and the fundamental limitations of the tool.

Alexander Kustov's avatar

Thank you, Viachaslau, for this very thoughtful pushback.

"It requires an enormous, unnatural amount of willpower to force yourself to rigorously validate something that already looks perfect" is on point, and a key insight that a lot of people miss. The problem is indeed that many people don't verify AI output as rigorously as they would if it came from a research assistant or their own "manual" work. I've certainly seen my fair share of work produced much faster but riddled with hallucinations, which defeats the purpose entirely.

Where I'd push back is that current models are already capable of addressing this. With agentic tools and the ability to incorporate structured verification workflows, many of these issues can be managed. My sense of working with Claude Code Opus 4.6 now is that it hallucinates less than my research assistance would, at least for the type of work that I'm doing.

So our disagreement may just be one of timing and context: I think we're largely there already with the models of the last several months, and you think we'll be there in a couple of years. Either way, the problems aren't unresolvable.

Viachaslau Kozel's avatar

I think the main thing I want to emphasize here is the danger of being blinded by success. I have no doubt that there are specific areas of research where current models deliver excellent results with minimal errors. But I would strongly caution against extrapolating those specific successes to everything else.

In fields where you don't have strict, formal frameworks for validation, you simply cannot afford to let your guard down; you have to pay hyper-attention to verifying the final output. I’ve had plenty of incredibly impressive results myself, but I’ve also had my share of complete failures.

The real issue with these failures is that they are entirely unpredictable and don't follow the logic of human error. They have a fundamentally different nature. And ironically, as the models get better, these errors actually become harder to spot because they are buried under increasingly convincing layers of coherence.

I know exactly that feeling when you finally build an effective pipeline and suddenly realize you can process a massive amount of work. It gives you this incredible sense of power, and it’s genuinely a great feeling worth enjoying. The key is just making sure you eventually come back down to earth and continue looking at the tool with the necessary level of criticality and rationality.

Alex Potts's avatar

Jesus Christ that slide really is embarrassingly bad. These conferences are supposed to have experts working at the frontier of human knowledge, and yet this academic is talking about the sort of thing that the average patron of your local watering hole is already aware of. Does he really think he's the first person in history to stumble across ideas like demographic change and racial conflict!?

Alexander Kustov's avatar

Yeah. To be honest, though, I don't even blame this person that much. They probably didn't realize the presentation was subpar, and no one told them. That's the real problem: the social norms of big academic conferences still allow this to happen unchecked. In a better world, this proposal wouldn't have been accepted. And if it had been, the presenter would have felt pressured to produce something much stronger.

Lynne Kiesling's avatar

That's what I don't get. Where's the faculty mentoring? Where's the, say, group of your fellow grad students where you give practice presentations to each other to work out these kinks in low-stakes settings?

DamienCh's avatar

Very good post, I agree with a lot of it, including (and especially) the slop part.

I'd push back on two things though:

1. First, I think you go a bit too quickly on the "writing is thinking" rebuttal: it's true that there other ways of thinking, but this does not lessen the value of writing (some things at least) yourself, not only for your own sake, but also to avoid being led in directions you might not have endorsed otherwise. This is the "cognitive hacking" Viachaslau mentioned in another comment, and I think it's worth making that point again and again. You write that "we underrate other people’s rationality", which is true, but the reverse too: we easily overrate our own rationality and level of competence.

2. Second, and relatedly, I would extend the category of things that should be the expression of a particular human. Partly because I agree with you that the AI detection bit is silly and a lost battle: AI-generated text will be everywhere (wrote about it here: https://artificialauthority.ai/i/192290331/did-an-ai-write-this). But I think "putting the information out there" is too simple a criterion here, because, in a world of competition for attention, we should also look at the relative value of information, and there will remain a human premium for, e.g., blog posts. (This post made this point well: https://www.sh-reya.com/blog/consumption-ai-scale/).

Both points, imho, derive from a certain confusion between different levels of AI use, on a spectrum between ancillary tool (e.g., spellcheck) to full copy and paste of an output. To a large extent, the confusion matches the criticism this responds to: there is a lot of motte-and-bailey out there, and it's sometimes hard to know whether the critics merely deplore AI-generated text or any kind of AI assistance. But this is why it's important to get this distinction right.

Alexander Kustov's avatar

Thank you, these are all great points. Personally, I'm genuinely uncertain about what to do with a lot of humanities writing, which sits between art (which is supposed to be human, more or less) and science (which doesn't have to be). That's exactly where the most skepticism of AI is coming from, and maybe for good reason.

DamienCh's avatar

Same. Your point on academic slop hits so hard because it's sometimes easy to see the artistic merits of some old "articles" compared to the junk out published now for the sake of being published (in which I participated myself). A good outcome could be to see a cleaner break between the art and science going forward.

Hollis Robbins's avatar

“For critics, the mental model of an AI user is stuck in 2023, which is ages ago.” Omg yes this…

_fin's avatar

I enjoyed your three articles, I am a senior academic and I really think LLMs are one of the best things to happen to academia for a long time, a really disruptive technology with plenty of upsides and downsides. One year ago, I would have read your articles in a completely different light, I thought LLMs were pointless, until I moved away from web-based interactions and into the full agentic harnesses, then they really do have some power. I don't call them AI, as they are not, they are token generators and extremely good tools. Like every tool we use, we should understand it and know it's limitations.

People are saying the LLMs hallucinate and are biased, yes they do and are, so attempt to mitigate it, use multiple models, they work fast so you can iterate prompts and approaches very quickly - that is the beauty of them. In short, use your existing research skills to actually work out how to use them.

Academia is a biased bullshit factory full of people with agendas, it is no different to the outside world at this point. We should forget the notion that academics are immune to all of the things that we accuse LLMs of.

It is an exciting time, we are at year 0-1 with respect to the LLMs that can actually more or less do what we ask of them. We are at the frontier of something genuinely novel. So this is a window of opportunity for those who want to jump on board. They are not going anywhere, I think those who do not engage with them will rapidly fall behind, not because they are dumb but they will not be able to keep up with the output. Like it or not academia has become a commodity, I do not especially like it but I can't change it. With a combination of Claude Code, Codex and some local models, I can work on about research 3 projects at once with a forth doing admin tasks on existing projects. So they are very productive.

I think we will look back at these years and laugh at how primitive our workflows are with these things. I am enjoying the learning experience of working out the workflow, how to create persistent memory, how to break down a problem in to small LLM bitesize chucks to avoid drift. The winners will be those who can make Observed(output tokens) ~ Expected(output tokens) . Essentially attempting to make a non-deterministic system behave as much like a deterministic one as possible.

Even though I am pretty positive and excited about LLMs. I do worry about de-skilling, currently we are at a transition where people who are using them can critically evaluate the output, and if they disappeared tomorrow people can go back to pre-LLM ways of working. However, it is clear that new PhDs are not going to be the same caliber as historic ones, as they will not need to be to produce the same "output". The question is how much do we value learning and moreover in general, how does learning shape society, not just in academia? We can now produce complex output without leaning anything. This is ok if you are coming from pre-existing experience but what happens in 2-3 generations, when output is produced by people that do not understand it. What happens to a society when there has been no learning experience to produce any output?

No one knows the answer to these questions, in many ways they are academic, we can't put the geany back in the bottle. No doubt politicians and senior academics (essentially the same thing) will flap around with mitigation strategies that involve curtailing their use. I however, think we should go the other way, we should concentrate on working out combinations or models to use in different domain areas, methods for error checking, validation all the things that can make their output more robust.

I think LLMs models will get smarter, the new MoE models are very good. I don't know how much mileage LLMs have left in them in general. Hopefully there will be new breakthroughs. Overall though, I think human knowledge will become very high level and conceptualized, we will be CEOs of our own domains with agentic workers doing our bidding.

John Maton's avatar

Excellent. After listening to your article using the AI Substack voice I went back and re-listened to Parts I & II. In this article above I in particular like the fact that you mention that "I plan to ban all electronic devices in my substantive classes and bring back in-person written and oral exams."

I just add that I think as with any new technology, some people jump in head first whereas some other people are frightened of it because they don't understand it, and don't understand how to use it securely to their benefit.

Alexander Kustov's avatar

Thanks, John! I actually just listened to myself on the Substack reader too, and it's fitting given how the series was produced. I agree it's not dissimilar from other technologies, but because of how transformative this one is, everything feels amplified tenfold at least. Though I wonder if that's what it would have felt like if we'd had social media during the Industrial Revolution?

John Maton's avatar

Yes, and just think about when the first printing presses were introduced. That was incredibly transformative maybe even more so than the internet, social media etc. And with the first printing presses there was a lot of negativity and conspiracies. Think all these tales about witches etc. But, the upside was that until then people were just living a very local existence and with printing it spread information and knowledge. Maybe AI will have an even greater impact. I am not in position to judge that. The AI readers on Substack are OK and very convenient, but they are far from perfect. From my experience their treatment of punctuation is not so great plus some of the pronunciation is a bit off. But, after about 4 or 5 times I got used to it. Sometimes I even read while I am listening.

Don't Make Me Greg's avatar

I'm with you that we don't need to be precious about who typed the words as long as they represent true statements about the world. But as the other commenters here demonstrate, a lot of this essay is stuffing all the counter-arguments at the end and waving them off with "Ah, well, nevertheless"

Timothy Morson's avatar

Excellent arguments. On a side note, it seems AI still needs to work on its grammar: "...People like Megan McArdle and ME are all living proof..." I had that drilled into I by my English father.

August Simonsen's avatar

I just read your three posts on AI in academia with great interest, Alexander. Whether written with the help of AI or not, they really got me thinking.

It seems clear that many parts of social science research can benefit from AI with limited downside, such as literature reviews and interview transcription. As AI agents evolve, they will likely reshape not only how research is done, but also why it is done and which tasks should be prioritized by humans. You capture this very well.

What I keep wondering is what this means for early-career researchers. Will a PhD in the social sciences become something almost anyone with a laptop and internet access can complete? What will be the value of a PhD going forward? Will we see a different type of person succeed in academia?

More practically, how should young people think about whether a PhD is the right next step in light of these changes?

If you consider writing a follow-up, I would be very interested in your thoughts on these questions.

Alexander Kustov's avatar

Thanks, August. I’ve been thinking about the same questions myself. I will certainly considering writing more about the likely net negative effects of AI on teaching and mentoring in the coming months.

Peter Gerdes's avatar

The reason humans produce slop is because that is what we incentivize. If you want better research:

1) Adopt a rule that all journals publish the best rebuttals to any paper they publish after a year and after 5 years. Most new ideas turn out to be false and this both does that work and conveys that rejecting a false finding is just as important as discovering a new result.

2) Adopt a rule that pre-registered papers count double in hiring and pre-accepted (accepted w/o conclusion) count 3x.

3) Count publishing a large dataset and making it useable as a publication itself. Maybe have journals literally list it as a publication.

4) Have all academics take an oath to serve the truth and not to, via commission or omission, mislead and not to create a false impression in outreach by staying quiet

Maybe it will work a bit to stop people from bad behavior themselves but most importantly it will give grad students a leg to stand on when their prof gets shady and push colleagues to do something.

Also it will give academics an excuse as to why they spoke up in a way that helped same politically goal -- they felt obliged.

Alexander Kustov's avatar

Thank you, Peter. All really good ideas!

Amarda Shehu's avatar

The diagnosis that AI exposes mediocrity is correct and probably understated. Your piece argues from the individual case up. The argument I have been developing runs from the system down: what the new affordance structure does to what gets pursued, whether institutional capacity can absorb what it produces, and what remains irreducibly embodied in the work of teaching. The best classroom teachers are still doing something no tutor framework yet describes; the verdict, I believe, on whether the models close that gap is not yet in. I write about the systems layer in my Latent Space series, including "The University Knows More Than It Can Say" on institutional tacit knowledge and "The Gravitational Pull of the Doable" on how collapsing costs reshape taste itself. The consequential argument, I think, lives at the systems layer.

Anthea Roberts's avatar

This is an excellent piece. Someone read my piece and shared your one in response, so I am copying my one here on a related set of ideas: https://www.dragonflythinking.com/insights/the-extended-mind

Alexander Kustov's avatar

Cool, thanks for sharing!

Len's avatar

Thank you for these thought provoking pieces, I find them very important to push for reflection on may different scholarly fields.

I would be very interested to hear your thoughts on the “knowledge collapse” risk explained in a recent paper by Acemoglu et al. 2026 “AI, Human Cognition and Knowledge Collapse. https://economics.mit.edu/sites/default/files/2026-02/AI%2C%20Human%20Cognition%20and%20Knowledge%20Collapse%2002-20-26.pdf

Alexander Kustov's avatar

This is a real issue, and honestly I need to spend some time reading the paper properly before I can say anything useful about it. Don't want my own cognitive capacity to atrophy :)

Kahlil Corazo's avatar

The similarity with the one drop rule of American blackness can be supported by anthropological theory, particularly Mary Douglas' work on dirt and pollution. I had a taste of celebrity as a first time novelist (people were asking for my autograph). Then a sudden expulsion from the industry when I highlighted my use of AI (it was always public). It was fascinating to me as a scholar of culture, since my focus was adjacent to taboo and the sacred. The shame I felt was strangely unavoidable and quite painful tbh. I felt like it was a duty for me as a scholar to capture the experience. I'm also experimenting heavily with LLMs for scholarship, so now I see ethnography and authoethnography as much more valuable: no robot can do this https://www.explorations.ph/p/ai-shame

Georgina Sturge's avatar

Very good point about the human slop. I included many examples of this in my book Bad Data, particularly in relation to silly and slapdash decisions made by academics during statistical modelling. Social science is rife with slop such as p-hacking, data mining, publication bias, and citation of 'canonical' papers which are largely irrelevant. I agree with you that it doesn't follow that AI would be any worse at performing most desk-based research tasks, unless we're talking about the now superseded early-generation LLMs which would come up with fake references - which as you point out, we shouldn't be at this point.

In terms of what to do, I would say it should be up to disciplines to internally police and uphold their own quality standards - and in the cases you've drawn attention to, they could clearly be doing a lot more. Having a clear idea of what acceptable quality looks like within a discipline should probably come before any discussion of a legitimate role for AI.

Alexander Kustov's avatar

Thanks, Georgina. I think you're spot on that disciplines need to define what acceptable quality looks like before they can have a productive conversation about AI's role. Though I worry some disciplines might not be up to it. In general, the less quantitative the discipline, the more resistant it seems to AI, whether or not that resistance makes sense on the merits.

P.S. Just ordered Bad Data, looking forward to checking it out!

Georgina Sturge's avatar

I definitely share your scepticism on that… Academia is full of incentives for individuals which run in pretty much counter to prioritising and protecting quality (now I sound even more cynical). I think it’s probably cyclical though to some extent that disciplines degenerate into slop and then start to recover, and I don’t know if there’s much to be done to hurry that recovery along.

And amazing, I do hope you enjoy reading the book!

John Curiel's avatar

I will say that while I believe the human the better author than AI in regards to what is published in APSR, I've had to review for some education journals, and I used a vast majority of the (anonymous) entries to teach students on how not to write. I can see validity in the idea that where AI converges on the average, that is greater than the global median. That said, I will never stop aiming to be the John Henry, albeit without dying at the end.

Alexander Kustov's avatar

Love the John Henry analogy. And yeah, I think we're largely in agreement: the floor matters as much as the ceiling.