Aindo's Synthetic Geolocation Data

Privacy and Fairness in Data-Driven Innovation

Synthetic Geolocation Data: Privacy and Fairness in Data-Driven Innovation

Artificial intelligence (AI) has tremendous potential to make living environments more sustainable, efficient, and comfortable. However, its successful application relies on the processing of vast amounts of data. Such data is often highly confidential and imperfect.

In this blog post, we describe a revolutionary solution to make AI innovation possible while respecting our rights and freedoms: synthetic data. In particular, we illustrate Aindo’s synthetic geolocation data technology - a proprietary innovation that provides fairness and privacy protection to projects involving location information.

What is synthetic geolocation data?

Geolocation data is any form of data that involves the whereabouts of individuals or artifacts. In today’s world, devices of all sizes are constantly collecting geolocation data, from our smartphones to our vehicles. This data could be leveraged to optimize how we interact with our environment.

Unfortunately, geolocation data are highly sensitive, as they involve people’s whereabouts and movements. They may also be biased, for instance when more data is available from affluent areas than from less affluent ones. Resulting AI systems may reflect this bias, favoring affluent individuals when applied in practice.

In a previous blog post, we described how synthetic tabular data preserving the information of real data is constructed and evaluated. Aindo’s proprietary innovations make synthetic geolocation data possible as well. Such data preserves all relevant geo-spatial data, realistically linking it to other pieces of information.

A new paradigm in geolocation analysis

Aindo’s synthetic data is free of privacy issues. It can substitute real data in AI development and data-driven projects. Furthermore, it can rebalance biased datasets, making sure no segments of society are underrepresented.

The figure below shows an example. The map on the left shows the location and prices of AirBnB listings in New York City, 2019 1. The figure on the right shows Aindo’s synthetic data. Note that it retains the overall concentration of locations, as well as the price distribution. In both figures, Manhattan has the most, but also priciest accommodations.

AirBnB NYC data Synthetic AirBnB data: the synthetic dataset does not contain any financial transactions of real tourists. However, it provides accurate insight into the overall affordability of tourist accommodation per area in New York City.

Why is synthetic geolocation data important?

Geolocation data has a plethora of impactful potential applications, which synthetic data enables securely and responsibly:

  • It can help match energy supply and demand across time and location, avoiding energy dissipation 2;
  • It can optimize public transport services 3, reducing pollution and road congestion, while vitalizing the economy and social inclusiveness;
  • It can help develop software deployed in vehicles to monitor traffic in real time, assisting drivers in minimizing their travel times;
  • It can even help design sustainable electrical grids to stimulate the conversion to electric vehicles 4;
  • It can help implement optimal congestion-based tolls, encouraging road users to decongest road networks 5.

Aindo’s advanced synthetic geolocation data features

Aindo’s synthetic geolocation data is best-in-class. It is accurate at the scale of local neighborhoods, but also on bigger scales. The figure below shows the demand for taxis in Porto, Portugal at three levels of specificity 6.

Porto taxi demand data Taxi demand per location in Porto, Portugal during 2013 and 2014. Aindo’s synthetic geolocation data is accurate at any level of detail, from global to very local.

Aindo not only accurately synthesizes static location data. It also synthesizes time-series location data, describing how people’s locations change over time. Thus, Aindo not only accurately replicates locations with strong demands for taxi services, but also replicates entire routes and trips. Synthetic trips have realistic origins and destinations, reflecting real traffic demands. They behave like real trips, taking shortest paths to their destination; having realistic durations and rates; and mimizing real travel demands across locations. Furthermore, trips are linked to more static and non-location-based data, such as passengers’ reasons to travel.

Porto taxi trajectories Aindo’s synthetic geolocation data also works for location time-series, such as taxi trips. It can combine dynamic geolocation data with more static information, such as passengers’ reasons for traveling.

Conclusion

Analysis of geolocation data can be highly beneficial to society. Unfortunately, such data is highly sensitive, as it documents individuals’ whereabouts. Furthermore, it may contain bias, improperly capturing all segments of society.

Aindo’s synthetic geolocation reconciles privacy and fairness with geolocation data availability. As such, it is a key enabler of fair AI innovation in the energy, infrastructure, and policy-making sectors.

Aindo’s synthetic geolocation data is accurate at both global and local levels. It can be accurately combined with other data formats. Furthermore, it accurately preserves both static location information and dynamic location patterns, such as entire taxi trips.

Interested in learning more about synthetic data? Visit Aindo’s blog!

Footnotes

  1. The New York City Airbnb Open Data: Retrieved from: http://insideairbnb.com/

  2. Duflou, J. R., et al. (2016). Impact Reduction Potential by Usage Anticipation under Comfort Trade-Off Conditions. CIRP Annals, 65(1), 33-36.

  3. Aktas, D., Sörensen, K., & Vansteenwegen, P. (2022). A Demand-Responsive Public Bus System with Short-Cut Trips. In Proceedings of the 2022 Conference on Advanced Systems in Public Transport (CASPT 2022).

  4. Knuepfer, K., Esteban, M., & Shibayama, Tomoya. (2019). A spatial, high-resolution electricity simulation model for renewable energy integration and the concurrent presence of various vehicle technologies: a case study for Japan.

  5. Ahmad, F., Almarri, O., Shah, Z., & al-Fagih, L.. (2023). Game theory applications in traffic management: A review of authority-based travel modelling. Travel Behaviour and Society. 32. 100585. 10.1016/j.tbs.2023.100585.

  6. University of California Irvine Machine Learning Repository: Retrieved from https://archive.ics.uci.edu/dataset/339/taxi+service+trajectory+prediction+challenge+ecml+pkdd+2015

Transform your data to transform the future

The synthetic data platform for businesses that want to change the world.