Synthetic data success stories
Synthetic data case studies
Synthetic data is revolutionizing how organizations leverage their data assets. By preserving the insights of real data without containing sensitive information, synthetic datasets make it possible to securely and rapidly capitalize on data opportunities. Applications include extraction and visualization of business intelligence; advanced analytics; software testing; product demonstrations; and development of AI models for prediction, personalization, profiling, and more.
For these applications, synthetic data will soon overtake real data in processed volume 1. Organizations must anticipate this change. To help them do so, we have collected some of the benefits of Aindo’s synthetic data platform, along with its success stories.
Benefits of Synthetic Data
Synthetic data allows organizations to extract the full value of their data assets. It enables secure and free exchange and analysis of data and removes data shortcomings through augmentation. As such, its key benefits include:
- Lead-time reduction and cost savings: real data are subject to the cumbersome processing steps, standards, and protocols. These do not affect synthetic data, which is available immediately. This significantly reduces the monetary, time, and human intellectual resources involved in data-intensive projects.
- Complete and fair data: Synthetic data can be used when insufficient volumes of real data are available. For instance, when studying rare diseases, synthetic data can artificially increase the number of patient records, allowing for better prognostic and diagnostic AI tools. It can also be used to remove bias by artificially increasing the number of records of underrepresented groups.
- Data privacy protection: by reconciling data utility with privacy protection, synthetic data makes secure and reliable AI innovation readily available.
- Increased data mobility and availability: synthetic data can be shared freely across departments and organizations. This enables healthcare organizations to rely on external consulting for AI development; data analysis; software testing; gathering and visualizing of business intelligence; and more. It also opens up opportunities for data trade and acquisition of missing pieces of information.
- Flexibility and data-centricity: synthetic data can be constructed with its final application in mind. This means that it is available in the right quantity and with the desired properties for any data project.
Success Story 1: Product demonstrations in Insurance
Challenge: A car insurance provider wants to use an internet-of-things (IoT) application to collect and manage customer data. The company collects data through IoT devices in the cars of their customers. It needs a platform in which this data is managed and leveraged to create business intelligence.
Four potential vendors are offering such platforms. The insurance provider wants product demonstrations from each of them to make an informed decision. Unfortunately, such a demonstration requires the insurer’s sensitive customer data.
Solution: The car insurance provider integrated Aindo’s Synthetic DataOps Platform on their infrastructure. They connected it to a relational database containing customer information. Our platform generated a database of artificial customer records with the same format and properties as the sensitive database. This synthetic data was securely generated on-site, without the insurer’s real data ever leaving its original IT environment.
The insurer provided the synthetic dataset to the four potential vendors. These vendors used it to demonstrate their products without needing access to the insurer’s confidential information.
Synthetic data was also applied to simulate special events. For example, an additional experiment was conducted in which data was rebalanced so that the number of long-distance commuters was relatively large. This showed how well the software responded to changes in customer behavior.
Benefits: Through Aindo’s synthetic data, the insurer could make an informed decision, substantially reducing risks. The process also showcased that synthetic data can dramatically shorten software development cycles. Risks were further reduced through data augmentation for simulating special events, showcasing the robustness of each of the products.
Success Story 2: Synthetic Data Trading for Improved Telemedicine
Challenge: A telemedicine company wants to leverage AI to improve its predictive model estimating fall risks of remote elderly patients. The company wants to combine its proprietary database with external socio-demographic data sources for a more complete understanding of its patients.
Solution: Synthetic data versions are created of the datasets the telemedicine company intends to acquire. The synthetic dataset are seamlessly compatibilized and integrated with the company’s proprietary data. Aindo’s platform also integrates other data sources, including automatically structured transcriptions of phone calls. All this data is combined to create a superb risk estimation model.
Benefits: The project leads to the development of a next-generation risk prediction model. Through the use of synthetic data, new synergies were explored and data could directly and safely be monetized.
Success Story 3: Data mobility for personalized finance
Challenge: A large investment bank wants to offer personalized guidance to small and medium-sized corporate clients. The bank has a large relational database of corporate clients and their business trajectories. Through AI, the bank wants to leverage this database to predict which clients are likely to encounter financial difficulties. It will then tailor advice to these clients’ specific needs.
However, external consulting is required to build the involved AI methods. This consultant needs data access and client data is highly confidential and contains trade secrets. Sharing the data goes against the bank’s commitment to discretion.
Solution: A synthetic client database is created with the same format and properties of the real database. The generation process takes place on a dedicated server at the bank. Hence, the data never leaves its original institution and remains subject to the bank’s customary privacy protocols and standards.
The fidelity, privacy and utility of the synthetic dataset are assessed, guaranteeing that quality and safety standards are met. The synthetic data is then provided to the external consulting firm. This firm builds a predictive AI model using the synthetic dataset. The model can then be employed by the bank to better tailor advice to clients.
Benefits: synthetic data allows the bank to effortlessly and safely consult external experts and service providers. This allows the bank to benefit from innovative AI methods to personalize their product offering.
Success Story 4: Data mobility for patient journey optimization in oncology
Challenge: A hospital wants to optimize the oncology patient journey. They have a large database of electronic health records (EHR) from previous patients. Through consulting, they know that by leveraging this database, they could improve the patient journey by detecting pathological signs early; improving and personalizing treatments; and offering guided support.
Unfortunately, the database is subject to substantial privacy restrictions. The EHR data is also unstructured, with information collected in text form. This makes analysis challenging at scale. Granting access to external data scientists for AI development involves time-consuming, costly processing steps.
Solution: The EHR data is automatically structured through Aindo’s generative AI technology. All involved attributes are recognized automatically and represented in tables. Subsequently, a synthetic database is created to mimic these tables, without containing sensitive information about real patients. This synthetic dataset can readily be transferred to an AI team.
The team uses the synthetic data to build three AI tools: a diagnostic model, helping physicians identify a collection of oncological pathologies; a prognostic model, able to predict the risk of patients developing oncological pathologies based on attributes in their EHRs; and a model that helped optimally administer treatments to oncology patients.
Benefits: through the synthetic data’s rapid availability, the project’s duration is only two months. This is a 78% decrease compared to the typical nine-month duration of AI projects in healthcare. This impressive pace is achieved as Aindo removed the need for cumbersome manual data preparation and anonymization processes and protocols. Similarly, the involved budget is significantly reduced compared to previous projects of similar scope.
Judah, S., White, A., Sicular, S., Jones, C.J., De Simoni, G., Friedman, T., Beyer, M., Heizenberg, J. and Parker, S., (2020). “Gartner Predicts 2021: Data and Analytics Strategies to Govern, Scale and Transform Digital Business.” ↩