The regulatory future of synthetic data

The role of synthetic data in current and upcoming European data regulation

The Italian Institute for Privacy and Data Protection, in collaboration with the Data Intermediaries Alliance, recently released a paper on The Regulatory Future of Synthetic Data.

This paper explores the evolving European data regulation landscape and evaluates the role of synthetic data in it. Here are some of the key insights:

Synthetic data for anonymization

The paper begins by examining the distinction between anonymization and pseudonymization in the context of data protection laws. It investigates how synthetic data should be classified from a legal perspective.

Recognizing the variety of synthetic data generation techniques, the paper stresses that properly generated synthetic data is not subject to data regulations if the data is sufficiently anonymous to prevent or no longer allow the identification of data subjects.

The process of synthetic data generation through AI has two distinct phases, each with unique data protection considerations. The first phase is the initial construction of the AI model and is classified for the moment as personal data processing. The second phase is the generation of synthetic data using this AI model. The resulting synthetic data is considered anonymous as long as personal data can’t be deduced from it. The European Data Protection Supervisor has clarified that data protection principles do not apply to anonymous data, so when properly generated, the synthetic data is not subject to stringent personal data processing requirements.

The paper also takes an in-depth look at the 10 misunderstandings related to anonymization published jointly by the Spanish Data Protection Authority and the EDPS. For each of the ten issues, the paper assesses its potential repercussions with respect to synthetic data.

Synthetic data for debiasing

The paper provides a thorough overview of current European data regulations that impact synthetic data, including the GDPR, the Data Governance Act, the Data Act, and the AI Act.

Synthetic data, where explicitly mentioned, is viewed particularly favourably by European regulation. In particular, the AI Act recognizes synthetic data as a preferred means of debiasing datasets and increasing fairness in AI models: according to the regulation, if an AI system provider is able to detect and correct bias using synthetic or anonymised data, it is obliged to do so.

The paper concludes that synthetic data is a powerful tool for promoting scientific research and innovation, particularly in its ability to mitigate biases in datasets.

Synthetic data for data sharing

The paper highlights synthetic data’s potential to drive data sharing and innovation, particularly within the medical field. It also examines upcoming European regulatory initiatives, such as the European Health Data Space (EHDS).

The EHDS aims to facilitate better use of vast amounts of health data that are currently difficult to access. Nevertheless, the Regulation establishing the EHDS lays down strict requirements for accessing and reusing health data. Among the allowed secondary purposes are data for training, testing and evaluation activities of algorithms.

The paper suggests that these regulations will strongly influence the future of synthetic data in healthcare. Synthetic data is expected to be viewed positively within the framework of the EHDS, as it offers a privacy-preserving way to comply with strict data requirements. As demand for secure, shareable data grows, the paper underscores that synthetic data will play an increasingly vital role in healthcare innovation.

Conclusion

The paper demonstrates that synthetic data is a privacy-enhancing technology with immense promise. If properly generated, it is anonymous data that can play a key role in data protection, debiasing, data sharing, and promoting scientific innovation. Both the original Italian version of the paper and the English language version are open-access.

Stay tuned to our blog for more updates on synthetic data and European privacy regulations!

Transform your data to transform the future

The synthetic data platform for businesses that want to change the world.