Just an Assisted Memo Pad

Should AI Built from Public Data Be Open to All?

Should AI Built from Public Data Be Open to All?

The most radical claim in the argument for open artificial intelligence is also the simplest: if a technology has been built by ingesting the accumulated language, images, labour and culture of the public, then the burden of proof should lie with those who would keep it closed. The default ought not to be secrecy. It ought to be access.

That does not mean everything should be free. Computing power is expensive. Deployment, security, reliability, customer support and enterprise integration all cost real money. It is perfectly reasonable for firms to charge for those things. But that is different from claiming permanent private ownership over a model whose value was extracted from a commons that nobody in Silicon Valley created alone. The scandal of the current AI boom is not merely that companies want profit. It is that many seek monopoly rents on systems trained, in part, on material they did not ask to use and may not have had the right to use at all.

This is why the analogy to free software is more than romantic nostalgia. Richard Stallman’s great insight was not that programmers should starve; it was that users deserve freedoms as well as products. They should be able to inspect, modify, share and run the software that shapes their lives. AI now raises the same question at a far larger scale. If these models are to become part of education, media, science, administration and daily work, then treating them as inscrutable corporate black boxes is politically and morally inadequate.

The case for openness is not only ethical. It is practical. Open models diffuse power. They allow universities, startups, non-profits and poorer countries to adapt systems to local languages and needs, rather than renting intelligence from a handful of American platforms. They also permit independent scrutiny: researchers can test bias, security flaws, environmental costs and hidden capabilities. Closed models ask the public to trust assurances that cannot truly be verified. In any other infrastructure that mattered, such opacity would be considered a defect, not a feature.

To be sure, openness is not a magic spell. Releasing weights without meaningful documentation is not openness but theatre. Nor is every so-called “open” model genuinely open; some prominent licences still restrict use and fail the old-fashioned standards of open source. Training-data disclosure is also difficult, especially when datasets are vast, messy or legally contested. Yet that difficulty is an argument for better governance, not for surrender. Regulators are already moving towards transparency obligations, requiring providers of general-purpose AI models to publish summaries of training content and comply with copyright rules. That is a beginning, not an endpoint.

The deeper principle is that AI should not become the most powerful enclosure movement of the digital age: a machine for taking from the many and selling back to them what was already theirs, now wrapped in an API. If the public has supplied the raw material, then the public should enjoy more than the privilege of paying to use the final product.

What is needed, then, is not hostility to AI, nor naïve techno-utopianism, but a politics of digital reciprocity: open models where possible, transparent data practices by default, and commercial competition focused on service, integration and compute. In short, the industry needs its equivalent of a software freedom movement — someone, or many people, prepared to insist that intelligence built from the commons must not be locked away from it.

Sources: European Commission guidance and FAQs on the EU AI Act’s rules for general-purpose AI models and transparency obligations; EUR-Lex summary of the EU AI Act; Open Source Initiative statements on why Meta’s Llama licence does not meet established open-source standards.