Automating CSV Analysis with CrewAI

Rifx.Online
Programming , Data Science , Generative AI
19 Jan, 2025

In this blog, we explore how to automate the process of analyzing a CSV dataset using CrewAI. We’ll build a workflow that includes agents for dataset context inference, data cleaning, visualization, and reporting, and culminates in a polished Markdown report.

1. Initializing the Environment

We begin by importing necessary libraries and initializing the tools and environment for the workflow.

import os
import pandas as pd
import chardet
from crewai import Crew, Task, Agent, LLM, Process
from crewai_tools import FileReadTool, BaseTool, CSVSearchTool
import matplotlib.pyplot as plt
import seaborn as sns

Explanation:

Imports: Essential Python libraries for data manipulation (pandas), visualization (matplotlib, seaborn), and CrewAI modules (Crew, Task, Agent, etc.).
CrewAI Tools: FileReadTool and CSVSearchTool enable file handling and CSV analysis.

2. Initializing the LLM

We use an LLM model for the agents to process and generate natural language insights.

Code:

llm = LLM(
    model='ollama/llama3.2-vision',
    base_url='http://localhost:11434',
)

Explanation:

The ollama/llama3.2-vision model serves as the core AI engine, running locally for efficient data processing.
It enables agents to process data contextually and generate meaningful outputs.

3. Defining the Tools

A FileReadTool is used for reading the CSV file, which forms the basis of the analysis.

Code:

csv_tool = FileReadTool(file_path='scm.csv')

Explanation:

FileReadTool: Reads the specified CSV file (scm.csv) to provide raw data for agents to analyze.

4. Defining the Agents

We create four specialized agents, each with a distinct role:

a) Dataset Context Specialist

Code:

dataset_inference_agent = Agent(
    role="Dataset Context Specialist",
    goal=(
        "Infer the context and purpose of the dataset by analyzing column names, data types, "
        "and a few sample rows. Extract insights about the domain and the type of data provided."
    ),
    backstory=(
        "An expert in understanding datasets and identifying their purpose. You have a deep understanding of data science, "
        "machine learning, and data analysis."
    ),
    tools=[csv_tool],
    llm=llm,
    verbose=True,
    allow_code_execution=True
)

Explanation:

Role: Understands the dataset’s structure and context.
Goal: Identifies column types (numerical, categorical, etc.) and infers the dataset’s real-world application.

b) Data Cleaning Specialist

Code:

data_analysis_agent = Agent(
    role="Data Cleaning Specialist",
    goal=(
        "Analyze the dataset to identify missing values, incorrect data types, and potential outliers. "
        "Generate statistical summaries like mean, median, and correlations between variables."
    ),
    backstory=(
        "Specializes in cleaning and preparing data for analysis with expertise in data cleaning and preprocessing."
    ),
    tools=[csv_tool],
    llm=llm,
    verbose=True
)

Explanation:

Role: Prepares the dataset for analysis.
Goal: Detects missing values, outliers, and incorrect data types. Computes statistical summaries.

c) Visualization Expert

visualization_agent = Agent(
    role="Visualization Expert",
    goal=(
        "Generate meaningful visualizations such as histograms, scatter plots, line plots, bar charts, "
        "and heatmaps to provide insights into the data. Save all visualizations to a 'graphs/' directory."
    ),
    backstory=(
        "Specializes in creating compelling and informative visualizations. You are an expert in Python, pandas, "
        "matplotlib, seaborn, and data visualization, capable of creating impactful data stories."
    ),
    tools=[csv_tool],
    llm=llm,
    verbose=True,
    allow_code_execution=True
)

Explanation:

Role: Creates visual insights from the dataset.
Goal: Generates plots (histograms, heatmaps, scatter plots) to highlight relationships and trends in the data.

d) Report Specialist

Code:

markdown_report_agent = Agent(
    role="Report Specialist",
    goal=(
        "Compile all findings, analysis, and visualizations into a structured markdown report. "
        "Embed graphs and provide clear sections for analysis and summary."
    ),
    backstory="An expert in synthesizing data insights into polished reports.",
    tools=[csv_tool],
    llm=llm,
    verbose=True,
)

Explanation:

Role: Produces a polished Markdown report.
Goal: Consolidates all findings and embeds visualizations for a professional and comprehensive output.

5. Defining the Tasks

Each task corresponds to an agent’s role:

a) Dataset Inference Task

Code:

dataset_inference_task = Task(
    description="Analyze the dataset to determine its context, purpose, and structure.",
    expected_output="A descriptive overview of the dataset's structure and purpose.",
    agent=dataset_inference_agent
)

Explanation:

Focus: Analyzes column names and sample rows to infer dataset structure and purpose.

b) Data Cleaning Task

Code:

data_analysis_task = Task(
    description="Perform a comprehensive analysis of the dataset to identify missing values, incorrect data types, and potential outliers.",
    expected_output="A summary of missing values, standardized data types, and statistical metrics.",
    agent=data_analysis_agent
)

Explanation:

Focus: Cleans and prepares data, providing actionable insights on data quality.

c) Visualization Task

Code:

visualization_task = Task(
    description="Generate visualizations dynamically based on the dataset's content.",
    expected_output="A set of annotated graphs saved in the 'graphs/' directory.",
    agent=visualization_agent
)

Explanation:

Focus: Produces visualizations that uncover trends and relationships within the dataset.

d) Markdown Report Task

Code:

markown_report_task = Task(
    description="Create a detailed markdown report summarizing all analysis and visualizations.",
    expected_output="A polished markdown report with embedded graphs and actionable insights.",
    agent=markdown_report_agent,
    context=[dataset_inference_task, data_analysis_task, visualization_task],
    output_file='report.md'
)

Explanation:

Focus: Consolidates all findings into a single Markdown file.

6. Forming the Crew

Code:

csv_analysis_crew = Crew(
    agents=[
        dataset_inference_agent,
        data_analysis_agent,
        visualization_agent,
        markdown_report_agent
    ],
    tasks=[dataset_inference_task, data_analysis_task, visualization_task, markdown_report_task],
    process=Process.sequential,
    verbose=True
)

Explanation:

The Crew connects all agents and tasks, executing them sequentially for a streamlined workflow.

7. Executing the Workflow

Code:

result = csv_analysis_crew.kickoff()
print("Crew Execution Complete. Final report generated.")
print(result)

Explanation:

The kickoff() method initiates the entire pipeline, producing the final Markdown report.

You can check the full code here : Code

You can also check out my GitHub from here(If you like the content don’t forget to follow)

I also showcase some of my work you can check it out here

Automating CSV Analysis with CrewAI

1. Initializing the Environment

Explanation:

2. Initializing the LLM

Code:

Explanation:

3. Defining the Tools

Code:

Explanation:

4. Defining the Agents

a) Dataset Context Specialist

Code:

Explanation:

b) Data Cleaning Specialist

Code:

Explanation:

c) Visualization Expert

Explanation:

d) Report Specialist

Code:

Explanation:

5. Defining the Tasks

a) Dataset Inference Task

Code:

Explanation:

b) Data Cleaning Task

Code:

Explanation:

c) Visualization Task

Code:

Explanation:

d) Markdown Report Task

Code:

Explanation:

6. Forming the Crew

Code:

Explanation:

7. Executing the Workflow

Code:

Explanation:

Tags :

Share :

Related Posts

10 Creative Ways to Use ChatGPT Search The Web Feature

📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

10+ Top ChatGPT Prompts for UI/UX Designers

100 AI Tools to Finish Months of Work in Minutes

17 Mindblowing GitHub Repositories You Never Knew Existed