Alibaba Marco-o1 : Open-source alternative for OpenAI-o1

Rifx.Online
Programming , Machine Learning , Natural Language Processing
30 Nov, 2024

LLM for Open-Ended problems

OpenAI-o1 was a revolutionary release, enabling detailed reasoning by LLMs for complex tasks like maths or physics problem, not limiting GenAI to just generic text generation.

But OpenAI-o1 is paid

As anticipated, the open-source rival was not far and it was Alibaba again (after the Qwen series) who have got Marco-o1 out recently.

What is Marco-o1 by Alibaba?

Marco-o1 is designed to tackle complex reasoning tasks by employing advanced techniques such as Monte Carlo Tree Search (MCTS) and Chain-of-Thought (CoT) fine-tuning. Its primary focus is on generating multiple solutions for open-ended problems rather than settling for a single answer, which aligns more closely with human-like reasoning processes.

Marco-o1 isn’t just built for subjects with clear answers, like math, physics, or coding — where it’s easy to measure success using reinforcement learning (RL). It also focuses on solving open-ended problems where there aren’t fixed rules or obvious ways to judge success

For example :

Imagine Marco-o1 as a super-smart problem solver. Most models (like in math, physics, or coding) are good at questions with fixed answers — like “What’s 2+2?” or “How do I fix this code?” These are straightforward because we know when they’re right or wrong.

But Marco-o1 goes beyond that. It also tries to tackle messy, open-ended problems, like “How can we make the world happier?” or “What’s the best way to tell this story?” There’s no single “correct” answer here, and it’s hard to say exactly what makes an answer good or bad.

Key features

Open-Ended Reasoning: Unlike traditional models that aim for a definitive answer, Marco-o1 explores various potential solutions, making it particularly effective for ambiguous or complex queries
Monte Carlo Tree Search (MCTS): This technique allows the model to evaluate numerous possible paths to a solution, akin to how a chess player considers different moves before making a decision. MCTS helps in balancing exploration of new possibilities with exploitation of known successful strategies
Chain-of-Thought Fine-Tuning: By utilizing a combination of existing datasets and self-generated synthetic data, Marco-o1 has improved its ability to handle intricate tasks through structured reasoning steps
Reflection Mechanism: The model incorporates a self-reflection component that prompts it to reassess its reasoning after generating an answer, which enhances its problem-solving capabilities

How Marco-o1 works?

1. Dataset Sources (Top Section):

The model is trained on three datasets:

Open-o1 CoT Dataset (filtered): A refined dataset with chain-of-thought (CoT) annotations.
Marco-o1 CoT Dataset (synthetic): Synthetic data created to enhance learning.
Marco-o1 Instruction Dataset: A dataset focused on instructional prompts.

These datasets are combined and used for supervised fine-tuning to improve the model’s reasoning and response capabilities. Its basically uses Qwen2–7B model as base fine-tuned on above datasets.

2. Inference Process (Bottom Section):

During inference (when the model is generating answers):

MCTS (Monte Carlo Tree Search):

A tree structure is used to explore possible answers step by step.
Nodes represent different reasoning paths.
Yellow nodes are selected for further exploration.
Green nodes show the final answers.
Arrows like “Select” and “Backup” show how the system evaluates and refines choices.

Action Strategy: The model can operate at two levels:

Step Level: High-level reasoning (big picture).

Mini-Step Level: Detailed, smaller reasoning steps.

Confidence Score:

After generating an answer, the model calculates how confident it is in the result using probabilities (shown in the formula).
The confidence value helps refine the final output.

Performance and Metrics

Not many comparison metrics are reported except on MGSM (Multilingual Grade School Math) for English and Chinese

MGSM Dataset (English): Marco-o1 achieved a 6.17% increase in accuracy.
MGSM Dataset (Chinese): The model demonstrated a 5.60% increase in accuracy.

Acing at Language Translation

In translation tasks, Marco-o1 proves its skill at handling slang and casual language. For instance, it can translate “这件衣服太炸了” (which literally means “This outfit is explosive”) into “This outfit is amazing,” showing its ability to understand and convey colloquial nuances. The picture below explains it better

How to use Marco-o1?

Go straight to HuggingFace and everything is ready for you already. Just set your HF_TOKEN (read token, free to create) and get started using the below code

## Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AIDC-AI/Marco-o1")
model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Marco-o1")

Wrapping up, Marco-o1 by Alibaba stands out as an open-source alternative to OpenAI-o1, excelling in complex reasoning and open-ended problem-solving. With features like MCTS, CoT fine-tuning, and self-reflection, it handles nuanced tasks such as translation and ambiguous queries effectively. While more performance benchmarks would strengthen its case, Marco-o1’s accessibility and practical applications make it a promising tool for advanced reasoning tasks.