What AI and power plants have in common

by | Jul 15, 2022 | Technology

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!

The story of artificial intelligence (AI) development over the past five years has been dominated by scale. Huge progress has been made in natural language processing (NLP), image understanding, voice recognition and more by taking strategies that were developed in the mid-2010s and putting more computing power and more data behind them. This has brought about an interesting power dynamic in the usage and distribution of AI systems; one that makes the AI look a lot like the electrical grid.

For NLP, bigger really is better

The current state-of-the-art in NLP is being powered by neural networks with billions of parameters trained on terabytes of text. Simply holding these networks in memory requires multiple cutting-edge GPUs, and training these networks requires supercomputer clusters well beyond the reach of all but the largest organizations.

One could, using the same techniques, train a significantly smaller neural network on significantly less text but the performance would be significantly worse. So much worse, in fact, that it becomes a difference in kind instead of just a difference of degree; there are tasks such as text classification, summarization and entity extraction at which large language models excel and small language models perform no better than chance.

As someone who has been working with neural networks for about a decade, I am genuinely surprised by this development. It’s not obvious from a technical standpoint that increasing the number of parameters in a neural network would lead to such a drastic improvement in capability. However, here we are in 2022, training neural networks nearly identical to architectures first published in 2017, but with orders of magnitude more compute, and getting better results. 

This points to a new and interesting dynamic in the field. State-of-the-art models are too computationally expensive for nearly any com …

Article Attribution | Read More at Article Source

Share This