AI Answers Aren’t Evidence: Why Accountability Still Matters in the Age of Chatbots

This column, in the interests of full disclosure, was itself produced through a proprietary AI pipeline after the author lobbed in a handful of rough notes instead of doing the decent thing and writing it properly. One hesitates to call this efficiency. Intellectual indolence, perhaps. Mechanical loafing. A small outsourcing of conscience by a man apparently too lazy to finish his own sentences, even while grumbling about the machine.

That irony, unfortunately, is the point.

A peculiar habit has taken hold online: people post screenshots of chatbot answers as though they were documentary evidence. “I asked AI, and it says…” now appears in arguments with the tone once reserved for an official report, a court filing or a paragraph from a well-edited reference work. The screenshot arrives as a conversation-stopper, a synthetic witness offered to settle the matter. This should alarm anyone who cares about how knowledge is established in public.

The problem is not simply that AI can be wrong, though it often is. The deeper problem is that a chatbot answer has almost no built-in accountability. A screenshot of a prompt and reply does not tell readers which sources were consulted, whether those sources were authoritative, whether the system blended fact with inference, or whether it quietly invented details with the confidence of a prep-school debater. Generative systems are designed to produce plausible language. Plausibility and provenance are not the same thing.

Recent audits of AI search and answer engines have found exactly the sort of behaviour one would expect from a technology that mimics authority better than it earns it. Researchers at the Tow Center for Digital Journalism found that leading AI search tools frequently returned incorrect answers and often supplied broken, fabricated or misdirected citations. Other studies have shown that chatbots tend to overgeneralise research findings and flatten nuance, especially in scientific and medical contexts where caveats are the substance. Users experience this not as a technical quirk but as a breach of trust. Once a machine speaks in a smooth, declarative voice, many readers stop asking the old and necessary question: how do you know?

There is an instructive comparison with Wikipedia. In its early years, invoking Wikipedia in an argument invited eye-rolling. The site was mocked for looseness, amateurism and vandalism. Yet over time it became more defensible, not because crowds became sages, but because the platform developed norms of verification. It built a culture around citations, edit histories, talk pages, reversion and visible disputes over sourcing. Wikipedia’s best principle remains wonderfully austere: the threshold for inclusion is verifiability. Readers can inspect the references, follow the trail and argue with something firmer than a glowing paragraph produced on demand.

A chatbot screenshot offers none of that architecture. It is an answer severed from an evidentiary chain. Even when the content happens to be accurate, the form encourages intellectual laziness. It asks the audience to trust a performance of knowledge rather than the labour of showing one’s workings. In a healthy argument, claims are attached to institutions, authors, data, documents and methods. Someone can be corrected. Someone can be blamed. Someone can be asked to defend the point. With AI, responsibility disperses into the fog.

None of this requires a puritan rejection of AI. These tools are useful. They are marvellous for drafting, summarising, brainstorming and helping one organise half-formed thoughts — as this column’s own scandalously idle gestation demonstrates. Yet using AI responsibly means treating it as a starting point for inquiry, not as the inquiry’s final court of appeal. If a chatbot gives you a statistic, find the study. If it summarises a historical event, locate the archive, article or book. If you want to persuade other people, bring them something they can inspect.

That is what accountability looks like in public discourse. Not a screenshot of synthetic confidence, but a claim with a trail behind it. AI can help us get to the library faster. It should not be mistaken for the library itself.

Sources: Tow Center for Digital Journalism/Columbia Journalism Review; Wikimedia Foundation materials on verifiability and no original research; Nature’s 2005 comparison of Wikipedia and Encyclopaedia Britannica science entries; Royal Society Open Science research on chatbot overgeneralisation of scientific studies; Scientific Reports on user-reported LLM hallucinations and trust.