Data residency is not sovereignty

I wrote recently about sovereign cloud as a governance choice. That argument lived at the infrastructure layer: who owns the provider, who operates it, which charter protects it. The harder frontier sits one layer up, at the AI models we increasingly route our data through. And there the comfortable label, “our data stays under GDPR”, hides a gap worth being honest about.

The badge answers the easy question

GDPR governs how your data is processed, gives you real rights over it, and even forbids handing it to a foreign authority without a proper international agreement - that is Article 48.⁴ What it cannot do is pull the company itself out of another government’s reach. The US CLOUD Act lets US authorities compel a US-based provider to produce data it controls, whatever region that data is stored in.¹ So a US hyperscaler can run a service it markets as GDPR-compliant and still sit under a legal reach GDPR tries to forbid but cannot neutralise, because no European law removes a US parent from US jurisdiction. The two regimes collide: comply with the CLOUD Act order and the disclosure itself may breach GDPR.

Residency answers “where do the bytes live”. Sovereignty answers “who can compel them”. Those are different questions, and the marketing rarely separates them. The distinction bites hardest at the AI layer, because that is where you actively send content out to be read, transcribed, summarised, or embedded. A document that never left your storage now travels to a model on every single query.

Reachable beats impressive

Go looking for a model that stays under GDPR jurisdiction and you learn quickly that quality is the easy part. A model is only usable if you can also reach it: served per call by a provider under the right jurisdiction, and affordable without renting an always-on GPU. The public leaderboards are topped by excellent open-weight models, but most come from labs outside the GDPR area, and you rarely find the strong ones served per call by a provider inside it. A top score you can only run by spinning up your own GPU fleet is out of reach for most teams, so it stays a research artifact.

What testing the reachable options actually showed

I spent time testing the vision and document models I could actually reach this way. A few patterns stood out, and none of them appear on a spec sheet:

Shared blind spots. Faced with one genuinely ambiguous photo, every model I tried independently misread the same object as the same wrong thing. That is the uncomfortable part: a second model will not catch it, because they share the bias.
Silent input mistakes. One service fed photos to the model without applying the orientation flag that phones write into every image, so portrait shots arrived sideways. The model read them anyway and never said a word about it.
Hidden cost. Reasoning models could spend tens of thousands of tokens on internal deliberation before answering, and under a fixed budget sometimes returned nothing at all, slowly and expensively.
Benchmarks that flatter. One widely used hosted OCR service markets around 95 percent on its own benchmark, but on an independent leaderboard it sits near 72, behind open models a fraction of its size, and its worst scores land exactly on old, degraded scans.² The clean formatting of its output can also mask a flipped digit a human eye would never question.

The tools still earn their keep. You just have to wrap them in gating, verification, and a human in the loop for anything that matters, and drop the idea of a hands-off oracle.

The field is thin, and it is consolidating

Here is the part that should worry anyone betting on European AI independence. The labs that both do real model research and sit under GDPR jurisdiction are few, and for document and vision work it largely comes down to a couple of French labs. And the roster is shrinking: one of Germany’s flagship model companies is being absorbed into a Canadian firm,³ and a Finnish one was bought by a US chipmaker. Each deal quietly moves a European AI champion out from under the very jurisdiction people chose it for.

There is a counter-current, and it is finally aimed at the right targets. The EU is putting real money into the two things this comes down to: open European models and the compute to run them. OpenEuroLLM, a 20-organisation consortium backed by the Digital Europe Programme, is building open-weight multilingual models, with its first release due around mid-2026.⁵ On the compute side, the AI Continent Action Plan and its InvestAI facility are standing up a network of AI Factories on Europe’s supercomputers, and in January 2026 the Council cleared the way for up to five AI gigafactories, each meant to pack over 100,000 accelerators on European soil.⁶ ⁷ Be realistic about the timing: OpenEuroLLM still admits it is short on compute and has yet to ship, and the first gigafactory chips are expected only from late 2026. This is direction of travel, and it is the first serious attempt to build the supply side itself.

So what does sovereignty actually require

Three things have to line up. Governance, so ownership and control cannot be quietly sold off. Jurisdiction, so no foreign government can reach in and compel the data on its own say-so. And a reachable model, good enough and served under that jurisdiction, that you can call without building a data centre. A compliance badge ticks the first of those and stops there.

Check only the residency box and you have answered the easy question while skipping the one that decides who really controls your data.

This blog entry has been written by me and reviewed with the help of AI. The illustration is AI-generated.