Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Mostly AI is moving to address a major AI training bottleneck for enterprises. The Austrian company, known for providing a platform for synthetic data generation, today announced the launch of synthetic text. This new functionality allows enterprises to unlock value from their proprietary datasets without worrying about privacy risks.
Starting today, the offering generates a synthetic version of an organization’s proprietary information, without including personally identifiable information (PII) or diversity gaps. This gives teams a way to train and fine-tune reliable large language models (LLMs) for faster innovation and better decision-making.
The capability comes at a time when AI training is hitting a plateau and enterprises are looking to go beyond public data sources to find sources that could offer greater value and potential than the residual public data.
How does Synthetic Text work?
Synthetic, or artificially generated data, is often seen as the go-to alternative when real data is too expensive, unavailable, imbalanced or unusable. Enterprises have been producing and working with synthetic information (mostly images) for quite some time, but the rise of generative AI is expected to propel its application to a whole new level, covering wider data types. According to Gartner, by 2026, 75% of companies will use gen AI to create synthetic data, up from less than 5% in 2023
However, even when AI is generating synthetic data, it may lack organization-specific context and insights. This could keep downstream models from learning and performing up to the expected mark.
To address this, Mostly AI provides enterprises with a platform to train their own AI generators that can produce synthetic data on the fly. The company started off by enabling the generation of structured tabular datasets, capturing nuances of tr …