Type something to search...
Data Exploration with Agentic AI: Exploring the Titanic Dataset using SmolAgents

Data Exploration with Agentic AI: Exploring the Titanic Dataset using SmolAgents

When I began my journey into machine learning a decade ago, like many of us, I started with the Titanic dataset. I vividly recall the thrill of performing my first exploratory data analysis (EDA), uncovering patterns and correlations. Fast forward to today, and the landscape of data analysis has evolved in ways I could never have imagined. In this era of agentic AI, we now have the capability to delegate much of our EDA to intelligent agents. The question is no longer can we automate EDA? but rather, how far can we push these capabilities?

The short answer: Quite far. With multi-agent frameworks powered by cutting-edge AI models, it’s possible to perform detailed, dynamic EDA simply by asking questions. Imagine interacting with your dataset conversationally — requesting insights, clarifications, and visualizations as naturally as you would with a data science colleague. Let’s explore this transformative capability.

Setting the Stage

What are SmolAgents?: SmolAgents is a versatile library from Hugging Face that allows developers to deploy agents with just a few lines of code. Despite its simplicity, it is highly effective at simplifying complex workflows.

Here is a simple workflow to demonstrate the power of SmolAgents for EDA:

## Step 1: Import necessary libraries
from dotenv import load_dotenv 
from smolagents import CodeAgent, LiteLLMModel, tool, GradioUI 
import pandas as pd

## Step 2: Load environment variables, including API keys, from a .env file
load_dotenv()  

## Step 3: Define the Language Model (LLM). Here, we use Google's Gemini model
model = LiteLLMModel(model_id="gemini/gemini-1.5-flash",  
                     api_key=os.environ["GOOGLE_API_KEY"])

This code begins by importing the necessary libraries, including smolagents for AI agent functionality. The environment variables, such as API keys, are loaded from a .env file using load_dotenv. The language model used is Google’s Gemini 1.5 Flash, instantiated via the LiteLLMModel class.

## Step 4: Define tools

## Tool 1: A custom tool for loading the Titanic dataset
@tool
def get_titanic_data() -> dict:
    """Returns titanic dataset in a dictionary format.
    """    
    df = pd.read_csv('data/Titanic-Dataset.csv')    
    return df.to_dict()

## Tool 2: A custom tool for saving a dataset as a CSV file
@tool
def save_data(dataset:dict, file_name:str) -> None:
    """Takes the dataset in a dictionary format and saves it as a CSV file.

       Args:
           dataset: dataset in a dictionary format
           file_name: name of the file of the saved dataset
    """    
    df = pd.DataFrame(dataset)
    df.to_csv(f'data/{file_name}.csv', index=False)  


## Step 5: Define the Agent
## Using SmolAgents, we configure the agent with tools, the chosen LLM, and authorized library imports
agent = CodeAgent(tools=[get_titanic_data],    
                  model=model, 
                  additional_authorized_imports=['numpy', 'pandas', 'matplotlib.pyplot'])

A custom tool, get_titanic_data, is defined to load the Titanic dataset from a CSV file and return it as a dictionary for further exploration. This tool is then integrated into a CodeAgent, part of the SmolAgents framework, which combines tools, LLM, and authorized Python libraries to perform exploratory data analysis (EDA) efficiently.

## Step 6: Launch a user-friendly chat interface with a single line of code
GradioUI(agent).launch()

Finally, the GradioUI class provides a user-friendly interface for interacting with the agent. With a single line of code, the Gradio-based chat interface can be launched.

Asking Questions

Here are some of the questions I posed to the agent.

The first set of questions I asked focused on understanding the Titanic dataset’s structure. These included explaining the columns based on their names, identifying missing values, and detecting outliers. The aim was to handle missing values, fix outliers and save the cleaned data using the save_datatool.

Next, I asked how specific features might influence survival rates. For example, I explored whether ticket class or age had any effect on survival and why these factors might play a role.

Finally, I shifted my focus to predictive modeling. I asked about new features that could enhance predictions and asked the agent to build a predictive model to report the F1 score.

Your Turn

If you’re intrigued by the potential of SmolAgents, why not try it yourself? Load your favorite dataset, start asking questions, and see what insights you uncover. The age of agentic AI is here — and it’s changing the game.

Do follow if you liked the article!

Related Posts

10 Creative Ways to Use ChatGPT Search The Web Feature

10 Creative Ways to Use ChatGPT Search The Web Feature

For example, prompts and outputs Did you know you can use the “search the web” feature of ChatGPT for many tasks other than your basic web search? For those who don't know, ChatGPT’s new

Read More
📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

In an industry as dynamic as AI and tech, staying ahead means constantly upgrading your skills. Whether you’re aiming to dive deep into AI model performance, master data analysis, or transform trad

Read More
10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

In today’s fast-paced digital world, marketers are always looking for smarter ways to streamline their efforts. Imagine having a personal assistant who can create audience profiles, suggest mar

Read More
10+ Top ChatGPT Prompts for UI/UX Designers

10+ Top ChatGPT Prompts for UI/UX Designers

AI technologies, such as machine learning, natural language processing, and data analytics, are redefining traditional design methodologies. From automating repetitive tasks to enabling personal

Read More
100 AI Tools to Finish Months of Work in Minutes

100 AI Tools to Finish Months of Work in Minutes

The rapid advancements in artificial intelligence (AI) have transformed how businesses operate, allowing people to complete tasks that once took weeks or months in mere minutes. From content creat

Read More
17 Mindblowing GitHub Repositories You Never Knew Existed

17 Mindblowing GitHub Repositories You Never Knew Existed

Github Hidden Gems!! Repositories To Bookmark Right Away Learning to code is relatively easy, but mastering the art of writing better code is much tougher. GitHub serves as a treasur

Read More