LMSYS launches ‘Multimodal Arena’: GPT-4 tops leaderboard, but AI still can’t out-see humans

by | Jun 28, 2024 | Technology

Don’t miss OpenAI, Chevron, Nvidia, Kaiser Permanente, and Capital One leaders only at VentureBeat Transform 2024. Gain essential insights about GenAI and expand your network at this exclusive three day event. Learn More

LMSYS organization launched its “Multimodal Arena” today, a new leaderboard comparing AI models’ performance on vision-related tasks. The arena collected over 17,000 user preference votes across more than 60 languages in just two weeks, offering a glimpse into the current state of AI visual processing capabilities.

?Exciting News — we are thrilled to announce Chatbot Arena’s Vision Leaderboard!Over the past 2 weeks, we’ve collected 17K+ votes across diverse use cases.Highlights:– GPT-4o leads the way, followed by Claude 3.5 Sonnet in #2 and Gemini 1.5 Pro in #3– Open model… https://t.co/lDu0QpJ5yh pic.twitter.com/G2D7oJjNhF— lmsys.org (@lmsysorg) June 28, 2024

OpenAI’s GPT-4o model secured the top position in the Multimodal Arena, with Anthropic’s Claude 3.5 Sonnet and Google’s Gemini 1.5 Pro following closely behind. This ranking reflects the fierce competition among tech giants to dominate the rapidly evolving field of multimodal AI.

Notably, the open-source model LLaVA-v1.6-34B achieved scores comparable to some proprietary models like Claude 3 Haiku. This development signals a potential democratization of advanced AI capabilities, potentially leveling the playing field for researchers and smaller companies lacking the resources of major tech firms.

The leaderboard encompasses a diverse range of tasks, from image captioning and mathematical problem-solving to document understanding and meme interpretation. This breadth aims to provide a holistic view of each model’s visual processing prowess, reflecting the complex demands of real-world applications.

Countdown to VB Transform 2024

Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now

Reality check: AI still struggles with complex visual reasoning

While the Multimodal Arena offers valuable insights, it primarily measures user preference rather than objective accuracy. A more sobering picture emerges from the recently introduced CharXiv benchmark, developed by Princeton University researchers to assess AI performance in understanding charts from scientific papers.

Article Attribution | Read More at Article Source

[mwai_chat context=”Let’s have a discussion about this article:nn
Don’t miss OpenAI, Chevron, Nvidia, Kaiser Permanente, and Capital One leaders only at VentureBeat Transform 2024. Gain essential insights about GenAI and expand your network at this exclusive three day event. Learn More

LMSYS organization launched its “Multimodal Arena” today, a new leaderboard comparing AI models’ performance on vision-related tasks. The arena collected over 17,000 user preference votes across more than 60 languages in just two weeks, offering a glimpse into the current state of AI visual processing capabilities.

?Exciting News — we are thrilled to announce Chatbot Arena’s Vision Leaderboard!Over the past 2 weeks, we’ve collected 17K+ votes across diverse use cases.Highlights:– GPT-4o leads the way, followed by Claude 3.5 Sonnet in #2 and Gemini 1.5 Pro in #3– Open model… https://t.co/lDu0QpJ5yh pic.twitter.com/G2D7oJjNhF— lmsys.org (@lmsysorg) June 28, 2024

OpenAI’s GPT-4o model secured the top position in the Multimodal Arena, with Anthropic’s Claude 3.5 Sonnet and Google’s Gemini 1.5 Pro following closely behind. This ranking reflects the fierce competition among tech giants to dominate the rapidly evolving field of multimodal AI.

Notably, the open-source model LLaVA-v1.6-34B achieved scores comparable to some proprietary models like Claude 3 Haiku. This development signals a potential democratization of advanced AI capabilities, potentially leveling the playing field for researchers and smaller companies lacking the resources of major tech firms.

The leaderboard encompasses a diverse range of tasks, from image captioning and mathematical problem-solving to document understanding and meme interpretation. This breadth aims to provide a holistic view of each model’s visual processing prowess, reflecting the complex demands of real-world applications.

Countdown to VB Transform 2024

Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now

Reality check: AI still struggles with complex visual reasoning

While the Multimodal Arena offers valuable insights, it primarily measures user preference rather than objective accuracy. A more sobering picture emerges from the recently introduced CharXiv benchmark, developed by Princeton University researchers to assess AI performance in understanding charts from scientific papers.

…nnDiscussion:nn” ai_name=”RocketNews AI: ” start_sentence=”Can I tell you more about this article?” text_input_placeholder=”Type ‘Yes'”]

Share This