Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Microsoft has unveiled a groundbreaking artificial intelligence model, GRIN-MoE (Gradient-Informed Mixture-of-Experts), designed to enhance scalability and performance in complex tasks such as coding and mathematics. The model promises to reshape enterprise applications by selectively activating only a small subset of its parameters at a time, making it both efficient and powerful.
GRIN-MoE, detailed in the research paper “GRIN: GRadient-INformed MoE,” uses a novel approach to the Mixture-of-Experts (MoE) architecture. By routing tasks to specialized “experts” within the model, GRIN achieves sparse computation, allowing it to utilize fewer resources while delivering high-end performance. The model’s key innovation lies in using SparseMixer-v2 to estimate the gradient for expert routing, a method that significantly improves upon conventional practices.
“The model sidesteps one of the major challenges of MoE architectures: the difficulty of traditional gradient-based optimization due to the discrete nature of expert routing,” the researchers explain. GRIN MoE’s architecture, with 16×3.8 billion parameters, activates only 6.6 billion parameters during inference, offering a balance between computational efficiency and task performance.
GRIN-MoE outperforms competitors in AI Benchmarks
In benchmark tests, Microsoft’s GRIN MoE has shown remarkable performance, outclassing models of similar or larger sizes. It scored 79.4 on the MMLU (Massive Multitask Language Understanding) benchmark and 90.4 on GSM-8K, a test for math problem-solving capabilities. Notably, the model earned a score of 74.4 on HumanEval, a benchmark for coding tasks, surpassing popular models like GPT-3.5-turbo.
GRIN MoE outshines comparable models such as Mixtra …