Mapping the EU AI Landscape (Part 2): Data Strategy

BlogPost

Oct 23, 2024

hero image

DISCLAIMER: The views and opinions expressed in this blog are solely my own and do not reflect those of my employer, Fraunhofer or any of its affiliates. All content is based on my personal insights, experiences, and research, and should not be construed as representing official positions or policies of Fraunhofer.

TL;DR: This blog post covers the evolution of the EU’s data strategy, highlighting major milestones like the Open Data Directive (2019), the EU Strategy for Data (2020), the Data Governance Act (2022), and the Data Act (2024). It explores their impact on data sharing, AI development, and innovation across sectors. Key topics include user empowerment, B2B and B2G data sharing, Common European Data Spaces, and the push for interoperability. While the EU’s vision is ambitious, it faces technical and implementation challenges. Overall, these policies aim to foster a fair, innovative, and transparent data ecosystem in Europe. Don’t miss the footnotes for extra insights and fun facts!

Intro

Welcome back!

In the previous post, we talked about the EU’s coordinated AI plan. I personally learned a lot, and from your feedback, it seems like you enjoyed it too!

A bit after it was published, I was approached with the obligation opportunity to learn more about how data is used in AI systems according to the EU. So, this post will do exactly that. Specifically, we’ll talk about key EU regulations on data and how they shape AI development. I struggled a bit on how to present the information, but in the end, I decided that the best way is to tell you the story of how these data-related activities were created and why they were necessary. For this reason, we’ll cover the plans in chronological order:

TLDR: This post covers the evolution of the EU’s data strategy, focusing on key regulations like the Open Data Directive (June 2019), the EU Strategy for Data (Feb 2020), the Data Governance Act (June 2022), and the Data Act (Jan 2024). We’ll explore how these policies impact AI development, data sharing, and the roles of various stakeholders across Europe.

Just so you know, I usually express my opinions and give useless interesting facts in the footnotes, so don’t miss them!

A European Strategy for Data

So, you’ve probably heard the phrase “data is the new oil” (or something similar). Well, the EU definitely agrees, and that’s exactly why they created the European strategy for data. I just finished reading the EU data strategy document, and oh boy. You should know, though, that it’s from February 2020, so it’s a bit outdated and mainly focused on outlining problems and potential directions. Since its release, other documents (which we’ll see later) have addressed these problems in more detail.

What’s still relevant and interesting is the context this document comes from, the data-related problems it identified in the EU, and the initial solutions it hinted at, which we’ll get into more later on.

Context

It’s February 2020. You’ve heard about some new virus with a name like a beer on the news, but you brush it off as the latest panic. ChatGPT isn’t due for another two years, and the AI hype (as we see it now) is still in its infancy. However, data has already been recognized as the new oil1 , and the EU has an interest in innovating regulating in this space.

Joking aside, the EU understood the importance of big data for prediction. It’s no surprise that the more data you have, the better you can predict the future2.

And this comes into play in many situations, which we can divide based on who benefits from it:

  • Business [B]: More data means better understanding of what your customer wants and how to sell to them more effectively. Or, you can forget ethics and just sell user data (looking at you Meta)
  • Citizens/Consumers [C]: Take healthcare, for example. More data in this field leads to more accurate diagnoses and improved treatments.
  • Governments [G]: As a government, you can use data collected from satellites to predict when the next flood will hit and prevent damage to infrastructure. Or, ideally, solve climate change.

Of course, these entities don’t live in a vacuum, separated from one another. And that’s why it’s important to consider their interactions. So, we take the G from government and the B from business, mix them, and get B2B, B2G, G2B, C2B, G2C (I didn’t make this up). How are these entities relevant to the data strategy? What are even the problems of the data strategy? Let’s see.

Problems

The document lists six major problems (well, eight, but two aren’t that interesting in my opinion) related to data. As I mentioned earlier, these problems are divided by who faces them (businesses, citizens, governments, or combinations thereof), and they come with some fascinating insights.

Availability of Data

I’d argue this isn’t a huge problem anymore. IIn 2020, global data production stood at 2 zettabytes. By 2024, it skyrocketed to 150 zettabytes3 (source)A 75x increase.

Data sharing is often proposed as a solution, but it’s also part of the problem. Specifically, sharing data among:

  • G2B:This has been a “long-standing EU policy” (since 2003) and is mostly addressed in the Open Data Directive (more on that later). It refers to data produced by the government that can be used in the private sector to enhance decision-making.
  • B2B: Businesses sharing data! Well, this doesn’t happen much because there aren’t enough economic incentives, and businesses fear losing their competitive edge.
  • B2G: It would be great to have businesses share insights with governments to improve public policy, but back in 2020, there weren’t enough willing participants to make this work.
  • G2G: Imagine driving your German car in Spain and getting a speed camera ticket. But since Spain and Germany don’t share vehicle data, you don’t get the fine. Too bad! But seriously, imagine the possibilities in health and personal identification.
  • B2C: This involves businesses sharing data with consumers directly. One good example is smart home devices. Companies collect data on your energy consumption, but ideally, this data would be shared with you so you can make informed decisions—like finding the best time to run appliances to save on energy bills. However, the problem here is that most businesses are reluctant to make this data easily accessible to consumers, fearing the loss of control or monetization opportunities.
  • G2C: Here, governments share data directly with citizens. An example might be providing public health data or environmental information, like air quality in your area. The challenge lies in making this data available in a user-friendly format, ensuring citizens can actually benefit from it.

Data interoperability and quality

I have to admit, I wasn’t sure what “interoperability” meant at first. According to the Oxford Dictionary, it’s “the ability of computer systems or software to exchange and make use of information.” Since it’s a mouthful, let’s just call it “interops.” This ties into market fragmentation, a major concern for the EU, which is keen on creating frameworks and standards(see ICT Standardization).

Out of curiosity, I dug deeper4 and discovered the EU has been focused on interops since 2011 through Joinup, which later teamed up with ISA and ISA²5 in 2014 and 2019, respectively. TIn 2021, they renamed the initiative Interoperable Europe.The bottom line is data sharing in a pan-European network.

Data Infrastructures

Let’s say you’ve solved the data problem. You have plenty of data, and everyone is sharing it happily between each other (b2c2g2c2g2g or whatever). But where are you actually processing all this data?

You need infrastructures for that, and this has been (or maybe still is?) a problem in the EU. In the previous post, I mentioned EuroHPC JU and AI factories (perhaps these could be part of the solution).

What About the Citizens?

Finally, we have two problems that directly affect citizens. The first is that Article 20 of the GDPR (the one about controlling who and how your data is used) is not really feasible for the average citizen due to a lack of technical tools 6. The second issue is related to digital literacy, which ties into initiatives like the European Skill Agenda that we also covered in the previous post.

Possible directions

Along with this long list of problems, the document also points toward possible solutions. Keep in mind that most of this stuff is old and more like guidelines, but we’ll discuss how these directions have actually been implemented later in the post. Still, to “spezzare una lancia” (lit. to break a spear, meaning to defend or strike a blow in favor of) for the Commission, we should mention the concept of data spaces. ’ve been hearing about these data spaces for months, and I finally have a clearer idea of what they are. The document states that the Commission will invest 2 billion euros in this direction, and these spaces will follow FAIR principles, ensuring that data is Findable, Accessible, Interoperable, and Reusable.

Further down in the document, you can find nine examples of data spaces in high-impact sectors, similar to what we discussed in the previous post. One example includes intelligent transport systems.

Data Governance Act

The short version explanation of the Act doesn’t say much, apart from some vague objectives. So, I had a choice: either read the full document (44 pages-no, thank you) or check out this nice summary. I went with the latter.

The whole Act focuses on five key points, each tackling different aspects of data (or information) flow.

Protected Data

First things first, we need to define what kind of data we’re talking about. Is your account on that naughty website considered protected data? Unfortunately for you, not really. So, what is protected data?

The explanation isn’t super clear, but after checking the full document, we can find a few examples of protected data:

  • Commercially confidential data (point 10): Any data whose disclosure would impact the market position or financial health of a company. Think of something like the prompting techniques for OpenAI’s new model O1.
  • Data protected by intellectual rights: This covers any data that falls under copyright licenses, trademarks, or patents.
  • Personal data: The definition of personal data is found in the GDPR document(Article 4(1)) and states: “personal data means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, […]”. This could be anything from names and locations to genetic, mental, or economic features.

It’s also important to note that this protected data must be held by public sector bodies, meaning private businesses are exempt in this case.

Now that we know what protected data is, let’s also talk about what isn’t protected. For that, we need to look at the Open Data Directive.

Non-protected data

The Open Data Directive (ODD) (June 20, 2019) predates the Data Governance Act and even the EU’s data strategy. It focuses on the reuse of “open data,” which is essentially data produced by public entities that should benefit society as a whole.

It’s hard to capture the entire concept of open data just by listing its properties, but a few examples make it clear. These include:

  • Earth observation and environmental data, such as those provided by the European Space Agency (ESA)
  • Statistics, like anything you’d find in the Eurostat catalogue
  • Company and ownership information, such as data on registered businesses and their financial records.

So, what characteristics should this data have? It should be open by design. This means the data must be freely accessible online, in a machine-readable and platform-independent format. It should also be accompanied by a license with minimal restrictions7 and relevant metadata.

Metadata is of particular interest to me, so I looked into what they might mean. The only explanation I found was “metadata, at the best level of precision and granularity, in a format that ensures interoperability”. This feels like saying “it has to be good enough for things to work”. They do give an example, mentioning that “spatial information” should comply with the Directive 2007/2/EC, also known as INSPIRE. Apparently, INSPIRE is a fully developed project, and I went down a rabbit hole reading about it. You don’t need to know all the details to understand how spatial data is defined, but for the curious reader, check out the next footnote 8.

In terms of metadata, INSPIRE defines XML schemas as the standard format for spatial data, with specific specifications for different types of data. For instance, a disease can have “gender” as an attribute when describing a population.

How to Reuse

By now, we’re familiar with the concept of protected data and how it differs from public data. So, let’s see how the EU wants to promote data reuse while still protecting the privacy of the individuals involved. Most of the rules are on the public sector side, with the Commission setting out guidelines for how they should encourage the sharing of the protected data they already possess. Here are some of the more interesting rules:

  • (Reasonable) fees and timelines: The public sector can charge fees9 when you request data, and they have up to two months to decide on your request.
  • Assistance: If you ask for data you can’t access, the public sector is responsible for helping you contact the owner of that data so you can request it directly. Additionally, the EU has created the European register for protected data held by the public sector (ERPD) to help you find out who holds the data you’re looking for.
  • Technical requirements: This is the most interesting part. The Commission says member states “need to be technically equipped to ensure that the privacy and confidentiality of data is fully respected in reuse situations”. They mention techniques like anonymization, secure processing environments (e.g., data rooms10), or bilateral confidentiality agreements between the parties involved.

B2B Data Sharing

As mentioned in the EU strategy for data, one of the reasons businesses don’t share data with each other is the fear of losing competitive advantages. This is especially true if the company providing the data isn’t sure how securely the recipient will handle it. To address this, the Commission has outlined rules for potential “intermediation services.” These services would be responsible for facilitating B2B data sharing.

To better understand why we even need data from other businesses, let’s say you’re the CEO of a vehicle-sharing startup in Madrid, Spain. You’ve done your research and know the city is in desperate need of electric monocycles, but you’re unsure where to place them to avoid competing with other companies. Now, imagine you could hop on a website and see that Lime is offering their usage data for €4.99. That would be amazing! You’d find out that the best place for your monocycles is near the Carampa Circus School.

Connecting you with this dataset is exactly what these intermediation services are about, and the EU wants to regulate them (naturally). These services must remain as neutral as possible in handling the data. This prevents conflicts of interest, such as a service owning a large share of Lime and trying to block your monocycle business from competing in Madrid. These services are also required to provide metadata that “improve(s) the data intermediation service.”. If you play by the rules and respect all these regulations, the EU will give you a nice sticker. So far, there are 11 certified “good boys” in 4 countries (most of them are in France).

Data Altruism

Lovely name, questionable results. Here the EU is basically saying that if you’re kind-hearted and want to share your data for free, you’ll have to agree to a bunch of rules. First, you must be a nonprofit organization, and then you need to comply with an entire rulebook (which I couldn’t find). Once you meet the requirements, you can register as a data altruism organization and receive an even nicer sticker. Shoutout to the only registered organization so far: the Associació Dades pel Benestar Planetari from Spain.

European Data Innovation Board

Unsure how to navigate all these regulations? (Trust me, you’re not alone.) The EU has you covered with the newly created European Data Innovation Board (EDIB).

The EDIB is a group of experts whose goal is to define cross-sector standards and ensure interoperability for data across the EU. Honestly, I couldn’t find much more about this board. Who are the members? What have they accomplished in the past year? Who knows…

International data flows

The final point in the Data Governance Act addresses international data flows. You may have already heard that the EU enforces strict data privacy regulations on other countries handling EU citizens’ data. For example, the data privacy framework program is an EU-US agreement that regulates how EU data must be handled in the US. Interestingly, this often gives EU citizens more rights over their data in the US than American citizens themselves11.

The Data Governance Act extends the GDPR and aims to ensure that the handling of non-personal data in third countries is subject to the same protections. This usually takes the form of international agreements.

Recap

So far, the aim of the Data Governance Act is pretty straightforward: to regulate the flow of protected information within the EU. Data flows can occur between any entities (B2B, G2B, C2B, etc.) and must adhere to certain standards (fairness, transparency, privacy protection). The goal is to build trust in voluntary data-sharing mechanisms. That’s great! But what about non-voluntary applications? Is there a legal framework that regulates data access and use in general? Yes! That’s the Data Act, which we’ll cover in the next section.

Data Act

So far, we’ve covered various initiatives aimed at regulating data flow within (and outside) the EU. However, most of what we’ve seen has just been an introduction to the real juicy part: the Data Act.

In the following section, I’ll mainly refer to this great overview of the Data Act,, but I’ll also occasionally dive into the full document for additional insights. Let’s dive in!

Context

Here’s the timeline so far:

  • June 2019: The Open Data Directive was introduced to regulate how public data should be made open for societal benefit.
  • February 2020: The EU set the stage for its data strategy, outlining key problems and identifying possible solutions.
  • June 2022: Two years later, the first document focused on how data flow should be managed in the EU was released. It established some standards and set up registries and boards. However, this Act primarily centered around voluntary data sharing (including for-profit sharing in B2B contexts) without laying out clear legal boundaries.

Fast forward to today, and the EU clearly needs proper guidelines for data sharing between all levels: Governments, Businesses, and Consumers/Citizens. The Data Act consists of 9 chapters, each focusing on a specific aspect of data sharing.

Chapter I

This is just an introduction (also known as General Provisions). It gives an overview of the upcoming chapters and defines some key terms. Personally, I prefer defining terms as they come up rather than listing them here. So, let’s skip this one.

Chapter II: B2C data sharing

Chapter II “catches two pigeons with one fava bean”. This chapter forces businesses to allow customers to access their data easily and at no cost. It also allows users to share their data with other businesses. How is this helpful? Let’s take a look.

It’s Your Data!

As the title suggests, it’s your data, so you should be in control. I strongly agree with this principle, and I’m happy to see it being implemented. Specifically, the Act says that as a user, you are entitled to know what data you’re generating, how it’s being used, and to access it through a simple and free process.

For example, imagine you have a smart thermostat that collects data on your energy usage. Under the Data Act, you have the right to access that data, understand your consumption patterns, compare energy providers, or even share it with a third-party service that optimizes energy efficiency.

However, let’s clarify what kind of data falls under these rules. The Act covers “all raw and pre-processed data generated from the use of a […] service” both personal and non-personal. The keyword here is raw. The Act provides a helpful example: “For example, if a user watches a film on their connected TV, the film itself is not within scope but data on the brightness of the screen is within scope.”

Fostering Innovation

In the thermostat example above, we stumbled upon an entirely new business idea, right? A platform that allows you to compare energy providers would benefit on access to your consumption patterns. So, by simply sharing data, new businesses can emerge, and that’s the second catch!

By requiring businesses to share some of the data they collect, the EU is creating opportunities for new businesses to emerge. Of course, if you end up using that data, you can’t directly compete with the business that provided it (the data holder). Instead, you can focus on related or aftermarket services12. Also, the data holder is not required to share any data that would reveal trade secrets (understandable).

Chapter III: B2B Data-sharing

While the earlier rules focused on users, here we switch to B2B actions. The aim is to promote fairness and prevent giants from using their privileged positions and vast amounts of data to dominate the market.

The regulation involves all kinds of data mentioned in Chapter II and states that data holders are obligated to make data available to other businesses (referred to as “data recipients”) under appropriate remuneration 13. The catch here is that this obligation isn’t directly enforced by the Act itself. Instead, member states must pass legislation to make these obligations legally binding.

For example, if tomorrow Italy passes a law requiring OpenAI to share user-chat history with other businesses, OpenAI couldn’t refuse. So, it’s essential to understand that Chapter III provides a framework for creating laws rather than a law itself.

Chapter II vs III

What’s the difference between Chapter II and III? We’ve already seen that new businesses can emerge based on user data, so what’s new here?

First of all, Chapter II deals with user requests for data. So, you can go ask Google for your location data, get it, and then run it through theLocation History Visualizer to create a nice visualization. You’re essentially transferring data from one source to another.

Chapter III, on the other hand, requires companies to share certain types of data with other businesses. For example, a rideshare service like Uber collects real-time traffic data as part of its normal operations. Under Chapter III, this traffic data could be shared with other businesses, like a company developing smart navigation tools for logistics services. Unlike Chapter II, this isn’t about a user requesting data, it’s one business making data available to another under fair compensation rules. This helps foster innovation while preventing monopolies from hoarding data.

Data scope

When you request your own data from a business, you’re limited to the data you helped co-create through your interactions with the service. Chapter III, however, refers to any kind of data, meaning all the data generated by interactions from all users.

Chapter IV : Unfair Contractual Terms

This chapter builds on the previous one. Let’s say a law was passed requiring companies to share user interactions with their chatbots (properly anonymized, of course). What’s to stop the data holder from imposing restrictive terms on how their data can be used? This is where Chapter IV comes in. It actually provides a detailed list of unfair terms. Among them, we find:

  • Exclusion of liability for intentional acts or gross negligence: In the chatbot company example, this would be like the company adding a clause that exempts them from liability if the data they provide is manipulated or completely made up.
  • Exclusion of remedies for non-performance: Similar to the first point, but here the data quality is poor, and the company avoids being held accountable.
  • Exclusive right to interpret the contract or data conformity: This would be like the company arbitrarily deciding whether the data they gave you meets the agreed-upon standards.

The Act also lists presumed unfair terms, which I won’t cover here, but they generally aim to prevent power moves by bigger players14.

Alongside these “forbidden” terms, the Act discourages any “unfair” terms. These are unilateral, take-it-or-leave-it terms. If a company includes them in a contract, an unfairness test can be requested by the appropriate EU authorities.

Chapter V : B2G

This chapter is different. It regulates the flow of data from businesses to governments, both in emergency and non-emergency cases. Since this touches on a topic I’m particularly interested in (increased citizen surveillance during emergencies), I went ahead and read through the entire full document.

Should the EU Have Your Data During Emergencies?

If you’ve read my previous blog post, you already know I’m biased on this topic. History shows that measures adopted during emergencies tend to stick around even after the emergency ends15. So, I’m not thrilled with the idea of forcing big businesses to share their data for free in emergency situations.

That said, there are several articles and points in this chapter that seem designed to prevent a hypothetical dictatorial EU from abusing this law to spy on its citizens. For example:

  • Article 17(1.g) states that when personal data is requested, technical measures like pseudonymization or anonymization can be applied by the data holder before handing over the data. However, Article 18(4) says that if anonymization hinders the intended use of the data, then pseudonymization is allowed instead.
  • Article 19(1.c) , states that data must be deleted once it is no longer needed (unless archiving is required, lol).

While I’m still not a fan of these use cases, I get why accessing this kind of data might be necessary in some situations. Forget the usual suspects like terrorist attacks and illegal activities; take, for example, natural disaster response. Google and Apple shared anonymized location data during the COVID-19 pandemic to help public health officials track the spread of the virus and understand movement patterns.

What about non-emergencies?

In non-emergency cases, the government body requesting the data cannot ask for personal data, and businesses can demand remuneration (but not exceeding the technical and organizational costs). This is useful in cases where a city wants to optimize traffic (Rome, please take note) by using data from businesses like Uber or Ecooltra.

Chapter VI: Switching Between Data Processing Services

Honestly, this chapter would fit better after Chapter II because it still revolves around customers. The only reason I’m sticking to this structure is my obsession with ascending numerals.

Anyway, this is a simple one. It mandates that service providers must make it easy for customers to switch to another provider. This includes transferring customer data from one service to another within a maximum period of 2 months. After the switch, the original provider must also delete all customer data. I love this! It gives providers a clear incentive to retain their users, if they don’t, they lose revenue and data.

And starting in January 2027, switching will be free. For now, they can only charge for the necessary operations involved in switching (I wonder what the cap is).

Chapter VII: Unlawful Third-Country Government Access

This is very similar to what was discussed in the Data Governance Act, particularly in the section on international data flows. The goal is to control how EU data is handled by service providers outside the EU. This chapter focuses on non-personal data, while the DGA covers all types of data.

Chapter VIII: Interoperability

Finally, we’ve arrived at the part I’ve been waiting for. This chapter finally mentions EU data spaces, and it’s primarily focused on creating a European Single Market.

Basically, the short document didn’t provide much information about standardization, so I told myself, “I’ll just read the full document and report back.” Well, it turns out that the full documentation is also vague.

We have to remember that the Data Act is not really a legislative document but more of a preliminary step. It lays out a framework for creating laws later on. I’m just telling myself this to stay optimistic because it means I’ll have to do more digging before wrapping up this blog post. Ranting aside, this whole chapter can be summed up with the sentence: “We need standards, we don’t have them yet, but when we do, the EDIB will help.”

Common European Data Spaces

Since I was disappointed by the lack of mention of the Common European Data Spaces (CEDS), I looked online and found this short description. It gives a practical explanation of CEDS: they aim to create secure, trustworthy environments where data can be pooled, accessed, and shared in real-time across sectors like healthcare, agriculture, and energy. The goal isn’t just to put data online but to ensure it’s reused under clear rules, aligning with EU values like data sovereignty and privacy protection.

Since the document also referenced the Second Staff Working Document, I read that too.

The data spaces rely on interoperability and standardized protocols for data exchange. Although the sources don’t explicitly detail the use of federated approaches, hints suggest this might be part of the infrastructure. For example, the Data Act supports a distributed approach, allowing users to control and share data without needing a centralized repository. The Simpl platform, an open-source middleware, is being developed to support this interoperability.

Federated learning is also a key technique, especially in health data spaces, where data stays localized, but models can be trained across multiple datasets without moving sensitive information. This approach supports privacy while enabling innovation.

Although there’s no exact list of current users, the main participants are likely to include:

  • Businesses: Large and small companies can use the data spaces to innovate and optimize their operations.
  • Public Sector: Government bodies can use data to improve services like smart cities and transport management.
  • Researchers: Academics will benefit from access to data for scientific advancements.
  • Citizens: Individuals won’t directly access these spaces but will benefit from improved services and products.

Chapter IX: Enforcement and Overarching Provisions

In simple terms, this chapter says that each member state must appoint a data coordinator from among the existing authorities. The coordinator’s job is to manage and coordinate both public authorities and businesses in applying the laws surrounding the Data Act.

They also reference the European Data Innovation Board, which we discussed previously.

Summary

Let’s piece everything together and summarize the Data Act.

We’ve seen how the Data Act emphasizes access and user rights, complementing the Data Governance Act (DGA), which focuses on creating a trustworthy environment for voluntary data sharing. We’ve also observed how some chapters in the Data Act extend concepts covered in the DGA (such as Chapters VII, VIII, IX, and parts of Chapter III).

The juicy details, though, are in Chapters 2, 3, 4, and 5. I particularly like the focus on fairness and user empowerment. I believe that by enabling data to flow more easily in the ways we’ve explored, the Act will boost innovation—or at the very least make it easier for aftermarket businesses to emerge.

Conclusion

So, what’s the takeaway from all of this?

The EU’s data strategy is ambitious, and the Data Act represents a bold attempt to balance innovation with fairness and user empowerment. The shift towards making data more accessible (whether it’s for businesses, governments, or individuals) is an exciting development.

Key Takeaways

  • Empowering Users: The ability for individuals to control and share their data with third parties (like in the smart thermostat example).
  • Boosting Innovation: By forcing businesses to share some of their data (under fair terms), the EU opens the door for new startups and aftermarket services.
  • Regulating Fairness: Chapters on B2B and B2G data sharing are designed to prevent monopolies and ensure that data isn’t hoarded by industry giants.
  • Data Spaces: The vision for Common European Data Spaces shows a real effort to create sector-specific hubs for innovation. However, there’s still a lot of ambiguity around how these spaces will function in practice—whether federated learning and interoperability will be enough to solve the challenges ahead remains to be seen.

Personal Thoughts

I have to admit, while the EU’s vision for data-driven innovation is commendable, there’s a lot of “we’ll figure it out later” in the documents. The lack of clear technical details (especially around the architecture and implementation of data spaces) makes it hard to predict how successful this will be in practice. We engineers like to see blueprints, not just grand ideas!

That said, I’m optimistic that once the wrinkles are ironed out, the Data Act and the broader EU data strategy will lead to a more transparent, innovative, and fair digital ecosystem. The road ahead might be full of technical challenges, but if the EU sticks to its principles, we could see a real transformation in how data is shared and used across sectors.

Footnotes

  1. Clive Humby famously stated “Data is the new oil” all the way back in 2006. 

  2. Shout out to Asimov’s Foundation series and the concept ofPsychohistory

  3. If you heard about the dead internet theory you might argue most of this new data is AI-generated. While I don’t buy into the conspiracy, a recent study found that 75% of web data is machine-translated. 

  4. I dug into this because since joining Fraunhofer (only 7 months ago, though it feels like a lifetime), I’ve constantly heard about this huge Gaia-X project. After some searching, I finally found it mentioned in the Open source observatory (OSOR) collection on the Joinup website and found a mention of it. It doesn’t say much, but I’m happy I managed to locate it within the broader EU schemes. The project as also mentioned in the 2024 Rolling Plan for ICT standardisation (under related standardisation activities, section c.1) and in the EU Data Strategy Document itself! 

  5. Short for Interoperability solutions for public administrations, businesses and citizens. 

  6. Shout out to companies like Incogni that have made removing your data from data brokers their business model 

  7. Here they mention a “standard license” (Article 8) but don’t give examples.Reading the description it seems they refer to something lice CC0, PDDL, ODC-BY or MIT

  8. So you clicked, huh? Ok, if you’re that interested, who am I to let you down? INSPIRE is an EU directive and then project started in May 2007 aiming “to create a common spatial data infrastructure for the purposes of EU environmental policies and policies or activities which may have an impact on the environment”. Basically the project : (i) defines best practices for storing various types of metadata; (ii) provides data models for spatial data, i.e. ways to define specific data such as house addresses in a defined way; (iii) offers technical guidelines for a variety of spatial features (e.g. elevation); (iv) includes schemas in XML; (v) and even a metadata validator to check if your metadata conforms to standards. 

  9. They also suggest that fees should be low or non existent for research purposes or SMEs. 

  10. It’s not the first time I’ve come across the concept of data rooms. It’s one of the key ideas behind the GaiaX framework

  11. Check out this article for a quick overview of how the US and EU differ on privacy. 

  12. Aftermarket Service include sales, accessories, services, and enhancements that come after the product’s sale (source). 

  13. For SMEs and nonprofits, the remuneration must not exceed the costs incurred in making the data available! 

  14. The “greylist” terms in the Data Act aim to stop abusive practices in data-sharing contracts, ensuring fairness by protecting weaker parties from excessive liability, unreasonable contract terms, and restrictions that prevent them from accessing or using their data. The burden of proving the fairness of these terms falls on the stronger party imposing them. 

  15. Notable examples include the USA PATRIOT Act, UK Regulation of Investigatory Powers Act, Emergency Surveillance Laws Post-November 2015 Attacks (France), Australia’s Telecommunications (Interception and Access) Amendment (Data Retention) Act. 

← Mapping the EU AI Landscape (Part 1): Coordinated Plan on AIMapping the EU AI Landscape (Part 3): Can you Train on my Data? →