Google Gemini-Exp-1206: The new Best LLM

Rifx.Online
Generative AI , Natural Language Processing , Technology/Web
12 Dec, 2024

Beats GPT-4.0, OpenAI-o1, Claude3.5 Sonnet and Gemini 1.5 on LMArena

Google Gemini, after a lacklustre debut in the GenAI space some months back, is picking up pace quite fast. It has now released another experimental model, Gemini-1206-Exp, which has outperformed every other model on the ChatArena leaderboard, hence leading the GenAI race.

Gemini-exp-1206 takes the top spot on LMArena, a prestigious platform for LLM ranking as visible in the below image

What is LMArena?

LMArena, or Chatbot Arena, is an open-source platform for assessing large language models (LLMs). Developed by LMSYS and UC Berkeley SkyLab, it aims to support community-driven evaluations of LLM performance through live tests and direct comparisons.

Understanding the Leaderboard

Arena Score: The Arena Score represents the average performance of a model across various tasks, with higher scores indicating superior overall capabilities. For instance, Gemini-Exp-1206 has an Arena Score of 1379, the highest on the board, slightly surpassing ChatGPT-4.0’s score of 1366, suggesting it performs better on average in evaluations. It even outperforms its counterpart, Gemini-Exp-1114
Rank (StyleCtrl and UB): The Rank (UB) reflects a model’s performance across multiple tasks without specific stylistic adjustments. In contrast, Rank (StyleCtrl) measures how well a model adapts its responses based on stylistic prompts, such as tone and formality.

Notably, Gemini-Exp-1206 hold rank 1 in both cases, surpassing ChatGPT-4o-latest

3. Votes: This metric indicates the number of evaluations each model has received on LMArena. ChatGPT-4.0 leads with 21,929 votes, significantly more than Gemini-Exp-1206’s 5052 votes. A higher vote count often suggests greater reliability due to extensive testing and user engagement.

4. 95% Confidence Interval (CI): The confidence interval illustrates the variability range in a model’s score with 95% confidence. For Gemini, the CI is ±10/-5, whereas for ChatGPT, it is ±4/-5. A smaller confidence interval indicates more consistent performance; thus, while Gemini scores slightly higher on average, ChatGPT-4.0 demonstrates greater stability in its evaluations.

What are Gemini Experimental Models?

Gemini Experimental Models are cutting-edge prototypes designed for testing and feedback. They give developers early access to Google’s latest AI advancements and showcase ongoing innovation.

These models are temporary, can be replaced without notice, and might not evolve into stable versions. As such, they’re unsuitable for production use.

How to use Gemini-Exp-1206 for free?

Just navigate to Google AI Studio and log in (free)
Go to Create prompt
Change Model to Gemini Experimental 1206 from settings
Start chatting

Concluding,

While the results are impressive, it’s important to remember that this is still an experimental model. The full potential will become clear with time. It’s exciting to witness such strong competition, and the prospect of a stable release is something to look forward to.