The future of AI is distributed, said Ion Stoica, co-founder, executive chairman and president of Anyscale on the first day of VB Transform. And that’s because model complexity shows no signs of slowing down.
“For the past couple of years, the compute requirements to train a state-of-the-art model, depending on the data set, grow between 10 times and 35 times every 18 months,” he said.
Just five years ago the largest models were fitting on a single GPU; fast forward to today and just to fit the parameters of the most advanced models, it takes hundreds or even thousands of GPUs. PaLM, or the Pathway Language Model from Google, has 530 billion parameters — and that’s only about half of the largest, at more than 1 trillion parameters. The company uses more than 6,000 GPUs to train the most recent.
Even if these models stopped growing and GPUs continued to progress at the same rapid rate as in previous years, it would still take about 19 years before it’s sophisticated enough to run these state-of-the-art models on a single GPU, Stoica added.
“Fundamentally, this is a huge gap, which is growing month by month, between the demands of machine learning applications and the capabilities of a single processor or a single server,” he said. “There’s no other way to support these workloads than distributing them. It’s as simple as that. Writing these distributed applications is hard. It’s even harder than before, actually.”
The unique challenges of sca …