Why data remains the greatest challenge for machine learning projects

by | Nov 8, 2022 | Technology

Join us on November 9 to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers at the Low-Code/No-Code Summit. Register here.

Quality data is at the heart of the success of enterprise artificial intelligence (AI). And accordingly, it remains the main source of challenges for companies that want to apply machine learning (ML) in their applications and operations.

The industry has made impressive advances in helping enterprises overcome the barriers to sourcing and preparing their data, according to Appen’s latest State of AI Report. But there is still a lot more to be done at different levels, including organization structure and company policies.

The costs of data

The enterprise AI life cycle can be divided into four stages: Data sourcing, data preparation, model testing and deployment, and model evaluation. 

Advances in computing and ML tools have helped automate and accelerate tasks such as training and testing different ML models. Cloud computing platforms make it possible to train and test dozens of different models of different sizes and structures simultaneously. But as machine learning models grow in number and size, they will require more training data.

Low-Code/No-Code Summit
Learn how to build, scale, and govern low-code programs in a straightforward way that creates success for all this November 9. Register for your free pass today.

Register Here

Unfortunately, obtaining training data and annotating still requires considerable manual effort and is largely application specific. According to Appen’s report, “lack of sufficient data for a specific use case, new machine learning techniques that require greater volumes of data, or teams don’t have the right processes in place to easily and efficiently get the data they need.”

“High-quality training data is required for accurate model performance; and large, inclusive datasets are expensive,” Appen’s chief product officer Sujatha Sagiraju told VentureBeat. “However, it’s important to note that valuable AI data can increase the chances of your project going from pilot to production; so, the expense is needed.”

ML teams can start with prelabeled datasets, but they will eventually need to collect and label their own custom data to scale their efforts. De …

Article Attribution | Read More at Article Source

Share This