SambaNova breaks Llama 3 speed record with 1,000 tokens per second

by | May 29, 2024 | Technology

Time’s almost up! There’s only one week left to request an invite to The AI Impact Tour on June 5th. Don’t miss out on this incredible opportunity to explore various methods for auditing AI models. Find out how you can attend here.

There is no one simple speedometer to measure the speed of a generative AI model, but one of the leading approaches is by measuring how many tokens per second a model handles.

Today, SambaNova Systems announced that it has achieved a new milestone in terms of gen AI performance, hitting a whopping 1,000 tokens per second with the Llama 3 8B parameter instruct model. Until now the fastest benchmark for Llama 3 had been claimed by Groq at 800 tokens per seconds. The 1,000 token per second milestone was independently validated by testing firm Artificial Analysis. The faster speed has numerous enterprise implications that can potentially lead to significant business benefits, such as faster response times, better hardware utilization and lower costs.

“We are seeing the AI chip race accelerate at a faster rate than most expected and we were excited to validate SambaNova’s claims in our benchmarks which were conducted independently and which focus on benchmarking real-world performance,” George Cameron, Co-Founder at Artificial Analysis, told VentureBeat. “AI developers now have more hardware options to choose from and it is particularly exciting for those with speed-dependent use-cases including AI agents, consumer AI applications which demand low response times and high volume document interpretation.”

How SambaNova uses software and hardware to accelerate Llama 3 and gen AI

SambaNova is an enterprise focussed gen AI vendor, with both hardware and software assets.

VB Event

Article Attribution | Read More at Article Source

[mwai_chat context=”Let’s have a discussion about this article:nn
Time’s almost up! There’s only one week left to request an invite to The AI Impact Tour on June 5th. Don’t miss out on this incredible opportunity to explore various methods for auditing AI models. Find out how you can attend here.

There is no one simple speedometer to measure the speed of a generative AI model, but one of the leading approaches is by measuring how many tokens per second a model handles.

Today, SambaNova Systems announced that it has achieved a new milestone in terms of gen AI performance, hitting a whopping 1,000 tokens per second with the Llama 3 8B parameter instruct model. Until now the fastest benchmark for Llama 3 had been claimed by Groq at 800 tokens per seconds. The 1,000 token per second milestone was independently validated by testing firm Artificial Analysis. The faster speed has numerous enterprise implications that can potentially lead to significant business benefits, such as faster response times, better hardware utilization and lower costs.

“We are seeing the AI chip race accelerate at a faster rate than most expected and we were excited to validate SambaNova’s claims in our benchmarks which were conducted independently and which focus on benchmarking real-world performance,” George Cameron, Co-Founder at Artificial Analysis, told VentureBeat. “AI developers now have more hardware options to choose from and it is particularly exciting for those with speed-dependent use-cases including AI agents, consumer AI applications which demand low response times and high volume document interpretation.”

How SambaNova uses software and hardware to accelerate Llama 3 and gen AI

SambaNova is an enterprise focussed gen AI vendor, with both hardware and software assets.

VB Event
…nnDiscussion:nn” ai_name=”RocketNews AI: ” start_sentence=”Can I tell you more about this article?” text_input_placeholder=”Type ‘Yes'”]

Share This