Mapping the EU AI Landscape (Part 5): Gigafactory & The Site Selection Problem

BlogPost

Jul 02, 2025

DISCALIMER:** The views and opinions expressed in this blog are solely my own and do not reflect those of my employer, or any of its affiliates.

TLDR; The EU’s call for AI Gigafactories received 76 proposals, creating a massive site-selection challenge. I explored this problem and built a tool, TerraSelect, to help. This post explains the difference between AI Factories and Gigafactories, and introduces a tool designed to make the first step of this “gold rush” a little more scientific.

Introduction

A few weeks ago, while scrolling through my LinkedIn feed, I stumbled upon a piece of news that felt like a starting pistol for a race: the European Commission announced an “overwhelming response” to its AI Gigafactories initiative¹. A staggering 76 expressions of interest from 16 member states, proposing 60 different sites.

Alt Text

My first thought was, “Wow, this is the kind of momentum Europe needs.” My second thought was… “How on earth do you choose?”

Picking a winner here is a hell of a job. It means untangling a massive web of logistics, environmental rules, and economic trade-offs to find just a handful of places that can truly anchor Europe’s AI ambitions. My goal here is to offer a data-driven perspective that could help policymakers in this crucial first step. It’s a classic site-selection problem, but at a continental scale. This challenge is exactly what sparked my curiosity and led me down a rabbit hole of mapping, data analysis, and eventually, building a tool to make sense of it all.

First, What’s a “Giga”factory?

In a previous post, I took a deep dive into the concept of AI Factories. I described them as centralized hubs—physical places combining supercomputers, datasets, and expertise to serve startups, researchers, and industries.

So, what makes a factory “Giga”?

Alt Text

In short: scale and ambition. While an AI Factory is designed to train general-purpose AI models, an AI Gigafactory is a hyperscale facility purpose-built to develop, train, and deploy the next generation of AI—models with hundreds of trillions of parameters. Think of a flashlight versus a lighthouse. One lights your path; the other guides entire fleets safely to shore.

With 76 entities eager to build or host such a complex (and only three to five sites planned), the EU has a problem on its hands. Choosing the right locations is critical; a misstep could mean wasted resources and a missed opportunity to compete on the global stage.

TerraSelect: A Tool for Smart Exclusion

This is where my weekend project comes in. Faced with this monumental task of site selection, I built a tool I’m calling TerraSelect. Its purpose is NOT to pinpoint the single “best” location in Europe. Instead, it’s designed to do something arguably more important at this early stage: intelligently exclude the thousands of locations that are obviously wrong.

Before I show you what it can do, let’s talk about what it can’t do.

The Limitations

No tool is perfect, and mine is no exception. I think it’s crucial to be transparent about its limitations:

Outdated Data: Some datasets, while the best publicly available, aren’t perfectly up-to-date. For example, the power plant data still shows nuclear facilities in Germany, even though the last one was decommissioned in 2023. A real-world analysis would require the most current, often proprietary, data.
The Missing Metric: Excess Electricity: The most significant missing piece is data on excess grid capacity. An AI Gigafactory needs a massive, stable power supply. Knowing where the grid has a surplus of energy is perhaps the single most important factor, but this data is incredibly complex and not publicly available at a granular level.

The Potential

So, with those caveats, what is the tool actually good for?

Its real power lies in making the haystack smaller. The EU is a vast continent. While there are many potentially suitable places for a Gigafactory, there are far, far more that are fundamentally unsuitable. The tool’s primary function is to act as a coarse filter, clearing away the noise so that experts can focus on a smaller, more promising set of candidates.

It excels at answering basic, but critical, exclusionary questions:

Is this location inside a Natura 2000 protected area? If so, it’s out.
Is it on a steep mountain slope? Out.
Is it in the middle of a dense city, or miles away from any transmission line or major road? Probably out.

By layering these and other constraints, TerraSelect can instantly visualize and disqualify vast swaths of land. It transforms the problem from “Where in Europe should we build?” to “Which of these candidate are actually worth a closer look?” This dramatically simplifies the initial search and allows human experts to apply their deeper knowledge to a much more manageable dataset.

Does it work?

This is the million-euro question. How do you actually know if a tool like this is on the right track? To test it, I used the known locations of the existing AI Factories as a “gold standard.” I fed all my other data layers—like proximity to universities, power plants, and different land types—into a prediction model. The idea was simple: can the model “learn” what makes these locations special and then find similar spots on the map?²

The initial results are in, and they’re… interesting.

alt text

The model is good at finding the known AI Factory locations (the yellow stars), what we call good “recall”. The problem is, it’s a bit too enthusiastic. For every correct location it identifies, it also flags hundreds of other spots that aren’t right (low “precision”). To find all 12 real AI Factories, the model suggests a list of over 1500 potential sites.

This isn’t a failure, though. It’s a starting point. It tells us that while the current data factors are useful, they need refining. The model’s coefficients—the importance it assigns to each layer, give us clues. For instance, some factors are almost always a good sign, while others are consistently negative:

Key Factor	Consistency of Positive Influence	Average Positive Influence Score	Interpretation
🎓 Universities	100%	Strong	Proximity to talent and research is a non-negotiable driver.
💶 Land Price	100%	Moderate	Lower agricultural land prices make a location more attractive.
⛰️ Slopes	99%	Strong	Flat terrain is overwhelmingly preferred.
🏭 Fossil Power Plants	68%	Weak	A slight positive, likely acting as a proxy for any industrial grid connection.
〰️ Transmission Lines	54%	Weak	A 50/50 factor; not as critical as being near a transmission line directly.
☀️ Renewable Power	2%	Weak	Almost always a negative predictor in the top models.
🏞️ Natura 2000 Sites	1%	Weak	Consistently a negative factor, indicating avoidance of protected areas.

The data tells us something. The ideal Gigafactory location, according to the model, is a flat, affordable area near a city with a strong university. But what’s really interesting is what’s not a strong predictor. Proximity to renewable power, on its own, seems to be a negative signal in the top models—perhaps because these sites are often in remote locations.

This is exactly the kind of insight TerraSelect is built to explore.

See It for Yourself

I believe that making complex decisions requires accessible tools. While TerraSelect is still a work in progress, it demonstrates a methodology for tackling large-scale site-selection problems.

You can explore the interactive tool here: terraselect.online

I’m hosting this on my own expenses, a small contribution to fostering transparency and discussion around these important initiatives. If you find the tool or these posts useful and feel like supporting this kind of independent analysis, check the link below!

Thanks for reading, and I’d love to hear your thoughts!

The official announcement came from the EuroHPC Joint Undertaking, highlighting the high level of interest from industry leaders, investors, and Member States. ↩
For the technically inclined: I used a logistic regression model with L1 regularization (Lasso) to evaluate thousands of predictor combinations. I then analyzed the coefficients of the top 100 models (ranked by AUC-ROC) to find which layers most consistently had a positive influence. ↩

If you like my work, consider buying me a coffee to support future posts.

← Introducing PolAI Graph: Visualizing the EU’s AI Landscape The Digital Poisoners: How Russian Propaganda Networks Are Corrupting AI Training Data →