To further strengthen our commitment to providing industry-leading coverage of data technology, VentureBeat is excited to welcome Andrew Brust and Tony Baer as regular contributors. Watch for their articles in the Data Pipeline.
Have you had trouble with airplane seats because you’re too tall? Or maybe you haven’t been able to reach the top shelf at the supermarket because you’re too short? Either way, nearly all of these things are designed with the average person’s height in mind: 170cm — or 5’ 7″.
In fact, nearly everything in our world is designed around averages.
Most businesses only work with averages because they fit the majority of cases. They allow companies to reduce production costs and maximize profits. However, there are many scenarios where covering 70-80% of cases isn’t enough. We as an industry need to understand how to tackle the remaining cases effectively.
In this article, we’ll talk about the challenges of working with small data in two particular cases: When datasets have a few entries in general and when they are poorly represented sub-parts of bigger, biased datasets. You’ll also find applicable tips on how to approach these problems.
Join today’s leading executives at the Low-Code/No-Code Summit virtually on November 9. Register for your free pass today.
What is small data?
It’s important to understand the concept of small data first. Small data, as opposed to big data, is data that comes in small volumes that are often comprehensible to humans. Small data can also sometimes be a subset of a larger dataset that describes a particular group.