HuggingFace smolagents: The best Multi-Agent framework so far?
- Rifx.Online
- Programming , Machine Learning , Chatbots
- 14 Jan, 2025
Comparing Autogen, Langraph, CrewAI, Magentic-One, etc
As you must have read in multiple places, 2025 is the year of AI-Agents. So much so that Mark Zuckerberg has openly said Meta will have Mid-Senior Engineering being AI Agents.
This is very clear from the past few releases where Microsoft now owns 3 Multi-Agent Orchestration frameworks (AutoGen, Magentic-One, Tiny-Troupe), OpenAI releases Swarm, and AWS launched multi-agent-orchestrator apart from standalone Langraph and CrewAI. Now even HuggingFace has joined the party launching “smolagents” which is yet another Multi-Agent Framework but with a difference.
What are smolagents?
Smolagents is a newly launched agent framework by Hugging Face, designed to simplify the creation of intelligent agents that leverage large language models (LLMs). This lightweight library enables developers to build agents with minimal code, focusing on practicality and ease of use.
Key Features of Smolagents
Simplicity: Smolagents allows for rapid prototyping and deployment with a straightforward coding approach, making it accessible even for those with limited experience in creating agents.
Code-Centric Agents: The framework supports agents that execute actions directly as Python code. This method often results in higher accuracy and efficiency compared to traditional tool-based agents, which may require more steps.
LLM Compatibility: Smolagents is designed to be LLM-agnostic, meaning it can integrate seamlessly with any LLM available on the Hugging Face Hub, as well as other popular models through its LiteLLM integration.
Agentic Capabilities: The framework allows LLMs to control workflows by writing actions that can be executed via external tools. This flexibility enhances the ability of agents to tackle complex tasks that do not fit into predefined workflows.
Security Features: Smolagents includes mechanisms for executing code in secure, sandboxed environments, ensuring safe operation when running potentially risky code.
More on Code-Centric Agents
This is what makes smolagents unique from all other frameworks. But let’s first understand how code agents work in other frameworks
Code Agents in other Agentic frameworks:
In most other frameworks, agents perform tasks by calling tools using JSON-like formats. Here’s how it typically works:
- Action Representation: The agent writes actions as structured JSON. Each action includes:
Tool name: What tool to use (e.g., “search”, “translate”).
Arguments: Parameters required by the tool (e.g., “query”: “current weather”).
Execution Process:
The framework reads this JSON, figures out which tool to use, and runs the tool with the given arguments.
Once the tool returns a result, the agent processes it and decides the next action.
How Smolagents Are Different
Instead of using JSON, Smolagents lets the agent write actual Python code to perform actions. Here’s why this is different and better:
Action as Code:In Smolagents, the agent directly generates Python code to execute actions.
Direct Execution:The framework runs this code directly, without needing to translate JSON into tool calls. This makes execution faster, simpler, and more accurate.
Example:
Assume you pass this command to the agentic framework
Generate a sunset image and display
In other frameworks, this will be a more complex pipeline as two actions are to be taken separately
Other frameworks (usually)
Step 1: Generate Image
The agent sends a JSON request to generate the image:
{
"tool": "generate_image",
"args": { "prompt": "sunset over mountains" }
}
Step 2: Display Image
After the image is generated, the agent sends another action in JSON to display it:
{
"tool": "display_image",
"args": { "image": "generated_image.png" }
}
Each step involves parsing the JSON, figuring out the tool to call, and executing it. The agent cannot easily chain these actions together or reuse the result.
While using smolagents,
It will simply write a code to execute everything together
image = generate_image("sunset over mountains")
display(image)
Hence no multiple calls and less complexity
Why Smolagents is Better
Less Overhead:No need to convert JSON into tool calls — just run the code.
Greater Flexibility:Python code can handle complex tasks (loops, conditions, custom functions) that JSON cannot.
Better Alignment with LLM Training:Since LLMs are heavily trained on Python code, they perform better when generating actions in code rather than JSON.
Simplified Execution:The agent can handle tasks directly in one go, without needing extra steps to interpret JSON.
Compatible with any local LLMs and API as well
smolagents cons
Though the framework is great, there are some perils one must know before going for it
Buggy code execution : Even though Smolagents includes safeguards like a secure local interpreter and E2B remote execution, still, there can be cases where the framework runs something like “delete everything” code snippet making your PC faulty.
Doesn’t look as flexible as LangGraph.
Your LLM should be good at coding !! I tried it with a few non-coder LLMs and the results weren’t great
Is writing code for everything necesaary? Not at all.Smolagents may introduce unnecessary complexity.
So, is it the best framework? I dont think so but a pretty good one if you’re starting with Agentic frameworks. Plus its easy !!
How to use smolagents?
Taking an example straight from the repo
- pip install the package
pip install smolagents
2. Create your CodeAgent and pass it some tools
from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel
agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=HfApiModel())
agent.run("How many seconds would it take for a leopard at full speed to run through Pont des Arts?")
As you can see, CodeAgent is the standalone agent that would be using any sort of tools (internet search in this case). For the given prompt, CodeAgent will write codes to:
Search the internet to figure out different information required for answering the question
Write the code in such a way that the final output is “seconds” required, hence a completely customized code for any of your problem !!
Concluding,
The framework looks fun to use and is definitely worth giving a try. Below is the git repo to access it