Why synthetic data makes real AI better

by | Jun 27, 2022 | Technology

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!

Data is precious – so it’s been asserted; it has become the world’s most valuable commodity.

And when it comes to training artificial intelligence (AI) and machine learning (ML) models, it’s absolutely essential.

Still, due to various factors, high-quality, real-world data can be hard – sometimes even impossible – to come by.

This is where synthetic data becomes so valuable.

Synthetic data reflects real-world data, both mathematically and statistically, but it’s generated in the digital world by computer simulations, algorithms, statistical modeling, simple rules and other techniques. This is opposed to data that’s collected, compiled, annotated and labeled based on real-world sources, scenarios and experimentation.

The concept of synthetic data has been around since the early 1990s, when Harvard statistics professor Donald Rubin generated a set of anonymized U.S. Census responses that mirrored that of the original dataset (but without identifying respondents by home address, phone number or Social Security number).

Synthetic data came to be more widely used in the 2000s, particularly in the development of autonomous vehicles. Now, synthetic data is increasingly being applied to numerous AI and ML use cases.

Synthetic data vs. real data

Real-world data is almost always the best source of insights for AI and ML models (because, well, it’s real). That said, it can often simply be unavailable, unusable due to privacy regulations and constraints, imbalanced or expensive. Errors can also be introduced through bias.

To this point, Gartner estimates that through 2022, 85% of AI projects will deliver erroneous outcomes.

“Real-world data is happenstance and d …

Article Attribution | Read More at Article Source

Share This