Three ways in which synthetic data foster compliance with the AI Act
Synthetic data foster accuracy, robustness, and cybersecurity.
The AI Act is the most comprehensive and ambitious piece of regulation on Artificial Intelligence (AI) to date. It creates a regulatory framework for the use of AI in the public and private sectors that aims to ensure safe and fair use of AI. It will enter into force in August and companies that use AI systems need to prepare to meet its obligations.
Synthetic data technology is a privacy-enhancing technology that can help companies comply with the AI Act. Synthetic data are computer-generated data that preserve the analytic value of real data. They are recognized as non-personal by the AI Act (Article 59(1)(b)) and their use is recommended for bias detection and correction (Art. 10). In what follows, we describe how synthetic data technology can help meet the AI Act obligations related to accuracy, robustness, and cybersecurity.
The requirements of the AI Act
The AI Act imposes specific requirements and obligations on AI systems that process personal data. Among these obligations are the need for AI systems to be accurate, robust, and secure. According to Article 15:
“High-risk AI systems shall be designed and developed in such a way that they achieve an appropriate level of accuracy, robustness, and cybersecurity, and that they perform consistently in those respects throughout their lifecycle.” (Article 15: Accuracy, Robustness and Cybersecurity)
Synthetic data technology can help achieve accuracy, robustness, and security for AI systems. In the next three sections, we describe how.
Synthetic data for accuracy and fairness in AI
Real data often fail to accurately reflect society, underrepresenting core segments. The use of such data can lead to inaccurate or even biased AI systems. Synthetic data allow for a more balanced, accurate, and fair reflection of society.
For example, medical data are often tailored towards male patients. One reason for this is that many textbooks still reflect an outdated traditional focus on male health. This influences language models that use such texts as training data. Also, medical experiments often have more male than female participants, or fail to take sex into account as a variable. This results in a relative shortage of female data in the development of specialized AI tools. Such imbalances and information gaps in training data lead to inaccurate and unfair AI systems. For instance, diagnostic AI tools may be accurate for male patients, but less so for female patients, resulting in underdiagnosing certain conditions and an overall worse patient journey for women.
Synthetic data technology overcomes this problem by augmenting real datasets. Synthetic data can be generated to specifically rebalance and address shortcomings of real datasets. For instance, adding synthetic female records combats the imbalance in medical databases. This not only improves the accuracy of AI systems trained on such data, but also fosters fairness. In light of this, it is not surprising that the AI Act recommends the use of synthetic data for bias detection and correction (Art. 10).
AI robustness through synthetic data
Synthetic data can model unexpected scenarios for which real data is lacking. Such scenarios can help assess the robustness of AI systems during testing. They can also be used during AI training to incorporate robustness during the design stage. This leads to AI systems that are not only reliable under fixed circumstances, but also in dynamic and evolving contexts.
For example, suppose a municipality uses an AI model to track its population’s health. Using the model, the municipality allocates the right amount of resources to its hospitals. Under normal circumstances, this model works fine. However, the data used to train the model do not reflect unexpected events, such as natural disasters or pandemics. In such an event, the model is no longer reliable. This lack of robustness is dangerous, as a functioning healthcare system is vital in these rare events.
Synthetic data technology can help in the development and testing of more robust AI systems. It can for instance rebalance the proportion of a population requiring urgent care. This allows AI models to predict healthcare demands, even in rare scenarios.
Synthetic data for privacy protection and cybersecurity
The AI Act recognizes the non-personal nature of synthetic data (Article 59(1)(b)). Unlike records in real databases, synthetic records are not linked to real individuals. At the same time, synthetic data retain the analytic value of real data. This allows for their use in the development of accurate and reliable AI systems without invoking real data.
Suppose a hospital wants to optimize the scheduling of operating rooms. This is a highly challenging task. It involves optimally matching patients, hospital staff, and resources required in specific operations. It also has a degree of uncertainty, as patient journeys cannot always be predicted. To optimize the schedule, the hospital wants to invoke historical data on operating room usage. However, such data involve sensitive information about patients. Physicians are not allowed to disclose such information for the envisioned logistical purposes.
Synthetic data allow the hospital to get accurate information on operating room use. Synthetic data accurately mimic real patient data without leaking sensitive information. Read more about how synthetic data protect privacy in our previous blog.
Conclusion
Transitioning to synthetic data is a step forward towards secure, robust, accurate, and compliant AI systems. Companies that embrace this innovation not only stay ahead of regulation but are also more competitive and efficient.
At Aindo we are at the forefront of synthetic data technology. Contact us to learn how we can help your business today!