20 Comments
User's avatar
Hollis Robbins's avatar

“For critics, the mental model of an AI user is stuck in 2023, which is ages ago.” Omg yes this…

Viachaslau Kozel's avatar

Alexander, I understand that using a heavy dose of optimism to shake the academic community is a fair tactical move. But there is a real danger that unchecked optimism might actually do more structural damage to academia than the denialism you are fighting.

I want to point out a few blind spots in this narrative, not to defend the old ways, but to look at the actual nature of the tool we are dealing with.

First, regarding the nature of the models. We don't need to rehash the "stochastic parrot" debate, but we have to stay grounded in how the current architecture actually works. Even with the recent introduction of reasoning steps, these systems remain statistical approximations at their core. They haven't developed a mechanism to genuinely understand their own errors; they are still running loops within their probabilistic weights. They provide a very high-quality simulation of reasoning, but they lack the metacognition to know when they are wrong. Doing heavy AI-assisted R&D I see such issues every single day.

This brings up the core problem: cognitive hacking. AI generates fluent, confident text and our brains naturally associate with expertise. It requires an enormous, unnatural amount of willpower to force yourself to rigorously validate something that already looks perfect. It works against our dopamine system, making us intellectually lazy without us even realizing it.

And this is where the specific vulnerability of your field comes in. In mathematics there is a strict apparatus for validation. But in social sciences, the validity of a hypothesis is often tied to how coherently it is argued and framed. In these fields, an imitation generator completely breaks the system. When a generated text looks highly expert and logically linked, verifying the actual truth of the claims becomes incredibly difficult and energy-consuming. The AI exploits the exact metric we usually use to judge quality there.

The goal shouldn't be to just accept the output because it saves time and looks good. The real challenge right now is designing strict, effective workflows that acknowledge these cognitive traps and the fundamental limitations of the tool.

Because of all this, I honestly think the current academic pushback and outright bans might actually be doing more good than harm at this specific moment. I would gladly be an optimistic advocate for progress myself, but there are critical, unresolved structural problems here. We need to figure out how to handle them before we let the current wave of hype drive widespread, uncritical adoption. The goal shouldn't be to just accept the output because it saves time. The real challenge right now is designing strict, effective workflows that acknowledge these cognitive traps and the fundamental limitations of the tool.

Alexander Kustov's avatar

Thank you, Viachaslau, for this very thoughtful pushback.

"It requires an enormous, unnatural amount of willpower to force yourself to rigorously validate something that already looks perfect" is on point, and a key insight that a lot of people miss. The problem is indeed that many people don't verify AI output as rigorously as they would if it came from a research assistant or their own "manual" work. I've certainly seen my fair share of work produced much faster but riddled with hallucinations, which defeats the purpose entirely.

Where I'd push back is that current models are already capable of addressing this. With agentic tools and the ability to incorporate structured verification workflows, many of these issues can be managed. My sense of working with Claude Code Opus 4.6 now is that it hallucinates less than my research assistance would, at least for the type of work that I'm doing.

So our disagreement may just be one of timing and context: I think we're largely there already with the models of the last several months, and you think we'll be there in a couple of years. Either way, the problems aren't unresolvable.

Viachaslau Kozel's avatar

I think the main thing I want to emphasize here is the danger of being blinded by success. I have no doubt that there are specific areas of research where current models deliver excellent results with minimal errors. But I would strongly caution against extrapolating those specific successes to everything else.

In fields where you don't have strict, formal frameworks for validation, you simply cannot afford to let your guard down; you have to pay hyper-attention to verifying the final output. I’ve had plenty of incredibly impressive results myself, but I’ve also had my share of complete failures.

The real issue with these failures is that they are entirely unpredictable and don't follow the logic of human error. They have a fundamentally different nature. And ironically, as the models get better, these errors actually become harder to spot because they are buried under increasingly convincing layers of coherence.

I know exactly that feeling when you finally build an effective pipeline and suddenly realize you can process a massive amount of work. It gives you this incredible sense of power, and it’s genuinely a great feeling worth enjoying. The key is just making sure you eventually come back down to earth and continue looking at the tool with the necessary level of criticality and rationality.

Alex Potts's avatar

Jesus Christ that slide really is embarrassingly bad. These conferences are supposed to have experts working at the frontier of human knowledge, and yet this academic is talking about the sort of thing that the average patron of your local watering hole is already aware of. Does he really think he's the first person in history to stumble across ideas like demographic change and racial conflict!?

Alexander Kustov's avatar

Yeah. To be honest, though, I don't even blame this person that much. They probably didn't realize the presentation was subpar, and no one told them. That's the real problem: the social norms of big academic conferences still allow this to happen unchecked. In a better world, this proposal wouldn't have been accepted. And if it had been, the presenter would have felt pressured to produce something much stronger.

DamienCh's avatar

Very good post, I agree with a lot of it, including (and especially) the slop part.

I'd push back on two things though:

1. First, I think you go a bit too quickly on the "writing is thinking" rebuttal: it's true that there other ways of thinking, but this does not lessen the value of writing (some things at least) yourself, not only for your own sake, but also to avoid being led in directions you might not have endorsed otherwise. This is the "cognitive hacking" Viachaslau mentioned in another comment, and I think it's worth making that point again and again. You write that "we underrate other people’s rationality", which is true, but the reverse too: we easily overrate our own rationality and level of competence.

2. Second, and relatedly, I would extend the category of things that should be the expression of a particular human. Partly because I agree with you that the AI detection bit is silly and a lost battle: AI-generated text will be everywhere (wrote about it here: https://artificialauthority.ai/i/192290331/did-an-ai-write-this). But I think "putting the information out there" is too simple a criterion here, because, in a world of competition for attention, we should also look at the relative value of information, and there will remain a human premium for, e.g., blog posts. (This post made this point well: https://www.sh-reya.com/blog/consumption-ai-scale/).

Both points, imho, derive from a certain confusion between different levels of AI use, on a spectrum between ancillary tool (e.g., spellcheck) to full copy and paste of an output. To a large extent, the confusion matches the criticism this responds to: there is a lot of motte-and-bailey out there, and it's sometimes hard to know whether the critics merely deplore AI-generated text or any kind of AI assistance. But this is why it's important to get this distinction right.

Alexander Kustov's avatar

Thank you, these are all great points. Personally, I'm genuinely uncertain about what to do with a lot of humanities writing, which sits between art (which is supposed to be human, more or less) and science (which doesn't have to be). That's exactly where the most skepticism of AI is coming from, and maybe for good reason.

John Maton's avatar

Excellent. After listening to your article using the AI Substack voice I went back and re-listened to Parts I & II. In this article above I in particular like the fact that you mention that "I plan to ban all electronic devices in my substantive classes and bring back in-person written and oral exams."

I just add that I think as with any new technology, some people jump in head first whereas some other people are frightened of it because they don't understand it, and don't understand how to use it securely to their benefit.

Alexander Kustov's avatar

Thanks, John! I actually just listened to myself on the Substack reader too, and it's fitting given how the series was produced. I agree it's not dissimilar from other technologies, but because of how transformative this one is, everything feels amplified tenfold at least. Though I wonder if that's what it would have felt like if we'd had social media during the Industrial Revolution?

John Maton's avatar

Yes, and just think about when the first printing presses were introduced. That was incredibly transformative maybe even more so than the internet, social media etc. And with the first printing presses there was a lot of negativity and conspiracies. Think all these tales about witches etc. But, the upside was that until then people were just living a very local existence and with printing it spread information and knowledge. Maybe AI will have an even greater impact. I am not in position to judge that. The AI readers on Substack are OK and very convenient, but they are far from perfect. From my experience their treatment of punctuation is not so great plus some of the pronunciation is a bit off. But, after about 4 or 5 times I got used to it. Sometimes I even read while I am listening.

Georgina Sturge's avatar

Very good point about the human slop. I included many examples of this in my book Bad Data, particularly in relation to silly and slapdash decisions made by academics during statistical modelling. Social science is rife with slop such as p-hacking, data mining, publication bias, and citation of 'canonical' papers which are largely irrelevant. I agree with you that it doesn't follow that AI would be any worse at performing most desk-based research tasks, unless we're talking about the now superseded early-generation LLMs which would come up with fake references - which as you point out, we shouldn't be at this point.

In terms of what to do, I would say it should be up to disciplines to internally police and uphold their own quality standards - and in the cases you've drawn attention to, they could clearly be doing a lot more. Having a clear idea of what acceptable quality looks like within a discipline should probably come before any discussion of a legitimate role for AI.

Alexander Kustov's avatar

Thanks, Georgina. I think you're spot on that disciplines need to define what acceptable quality looks like before they can have a productive conversation about AI's role. Though I worry some disciplines might not be up to it. In general, the less quantitative the discipline, the more resistant it seems to AI, whether or not that resistance makes sense on the merits.

P.S. Just ordered Bad Data, looking forward to checking it out!

John Curiel's avatar

I will say that while I believe the human the better author than AI in regards to what is published in APSR, I've had to review for some education journals, and I used a vast majority of the (anonymous) entries to teach students on how not to write. I can see validity in the idea that where AI converges on the average, that is greater than the global median. That said, I will never stop aiming to be the John Henry, albeit without dying at the end.

Alexander Kustov's avatar

Love the John Henry analogy. And yeah, I think we're largely in agreement: the floor matters as much as the ceiling.

Johann Harnoss's avatar

I agree with most of this (value of AI for research and the fact that most research is just token research) but can you give us 3 economics examples of this claim? If true, it would be shocking: “researchers start with the left-wing conclusion and work backward”

Alexander Kustov's avatar

I wrote about this at length regarding immigration, actually. Though you could argue that most social scientists who do this are not economists :)

https://www.faz.net/aktuell/karriere-hochschule/migration-gut-gemeinte-desinformation-accg-200687088.html

Johann Harnoss's avatar

I loved that FAZ piece. Super well done. And yet if I read it carefully again it doesn’t have any evidence for the claim that academics reverse engineer their findings to fit a left wing narrative, the main claim in the FAZ is about the posturing and over simplification of nuanced research results - at least in the German version. I know and admire many of the best folks in the immigration field and have near universally been impressed by their ethics and rigor. But curious to learn more.

Alexander Kustov's avatar

Thanks! I think and hope you're largely right. My different experience may be related to spending more time with sociologists and political scientists than economists.

But there I've definitely seen some post-hoc tweaking first-hand. It's hard to say how widespread it is. I do think most people genuinely want to contribute to knowledge and do their best ethically and rigorously. But they can also fail at that, sometimes in systematic ways. You've probably seen the Borjas and Breznau paper on ideological bias in immigration estimates. Though one way to read it, to your point, is that the bias is actually not that large.

Johann Harnoss's avatar

True. And yet from what I read: Borjas’ own papers (eg the latest on H1-B wage gaps, but also others) make some “curious” modeling choices, always in a certain direction. This isn’t hearsay but well documented by Clemens and others - and I know you know this too ;) anyways let’s continue IRL next time in DC!