Inference framework Archon promises to make LLMs quicker, without additional costs

by | Oct 1, 2024 | Technology

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Researchers from Stanford University‘s Scaling Intelligence Lab introduced a new inference framework that could help large language models (LLMs) go through potential responses faster. 

The framework, Archon, uses an inference-time architecture search (ITAS) algorithm to improve LLMs performance without additional training. It is model agnostic, open-source and designed to be plug-and-play for large and small models. 

Archon could ideally help developers design AI model systems using multiple inference-time techniques to cut down on models to determine responses. The Scaling Intelligence Lab said techniques like Archon would help cut down on costs related to building models and inference. As LLM development turns toward larger parameters or more advanced reasoning, costs could increase despite companies like OpenAI anticipating more affordability. 

According to the researchers, Archon automatically designs architectures that improve task generalization, enabling models to perform tasks beyond those they were initially trained on.

“Our Archon framework and ITAS algorithm draw inspiration from neural architectures and neural architecture search, respectively,” the researchers said in their paper. “Archon is constructed of layers of LLMs, in which models in the same layer run in parallel but each later runs sequentially.” 

These layers perform different inference-time techniques, “either transforming the number of candidate responses through generation and fusion (like linear transformations) or reducing the number of candidate responses to improve quality (like non-linearities).” 

Archon outperformed GPT-4o and Claude 3.5 Sonnet by 15.1 percentage points in benchmark tests such as MT-Bench, Arena-Hard-Auto, Alpaca-2.0 Eval, MixEval, MixEval Hard, MATH and CodeContests. When Archon faced open-source LLMs, it outperformed them by 11.2 percentage points. 

Archon components 

The ITAS algorithm is comprised of several LLM component …

Article Attribution | Read More at Article Source

Share This