Automating CSV Analysis with CrewAI
- Rifx.Online
- Programming , Data Science , Generative AI
- 19 Jan, 2025
In this blog, we explore how to automate the process of analyzing a CSV dataset using CrewAI. We’ll build a workflow that includes agents for dataset context inference, data cleaning, visualization, and reporting, and culminates in a polished Markdown report.
1. Initializing the Environment
We begin by importing necessary libraries and initializing the tools and environment for the workflow.
import os
import pandas as pd
import chardet
from crewai import Crew, Task, Agent, LLM, Process
from crewai_tools import FileReadTool, BaseTool, CSVSearchTool
import matplotlib.pyplot as plt
import seaborn as sns
Explanation:
- Imports: Essential Python libraries for data manipulation (
pandas
), visualization (matplotlib
,seaborn
), and CrewAI modules (Crew
,Task
,Agent
, etc.). - CrewAI Tools:
FileReadTool
andCSVSearchTool
enable file handling and CSV analysis.
2. Initializing the LLM
We use an LLM model for the agents to process and generate natural language insights.
Code:
llm = LLM(
model='ollama/llama3.2-vision',
base_url='http://localhost:11434',
)
Explanation:
- The
ollama/llama3.2-vision
model serves as the core AI engine, running locally for efficient data processing. - It enables agents to process data contextually and generate meaningful outputs.
3. Defining the Tools
A FileReadTool
is used for reading the CSV file, which forms the basis of the analysis.
Code:
csv_tool = FileReadTool(file_path='scm.csv')
Explanation:
FileReadTool
: Reads the specified CSV file (scm.csv
) to provide raw data for agents to analyze.
4. Defining the Agents
We create four specialized agents, each with a distinct role:
a) Dataset Context Specialist
Code:
dataset_inference_agent = Agent(
role="Dataset Context Specialist",
goal=(
"Infer the context and purpose of the dataset by analyzing column names, data types, "
"and a few sample rows. Extract insights about the domain and the type of data provided."
),
backstory=(
"An expert in understanding datasets and identifying their purpose. You have a deep understanding of data science, "
"machine learning, and data analysis."
),
tools=[csv_tool],
llm=llm,
verbose=True,
allow_code_execution=True
)
Explanation:
- Role: Understands the dataset’s structure and context.
- Goal: Identifies column types (numerical, categorical, etc.) and infers the dataset’s real-world application.
b) Data Cleaning Specialist
Code:
data_analysis_agent = Agent(
role="Data Cleaning Specialist",
goal=(
"Analyze the dataset to identify missing values, incorrect data types, and potential outliers. "
"Generate statistical summaries like mean, median, and correlations between variables."
),
backstory=(
"Specializes in cleaning and preparing data for analysis with expertise in data cleaning and preprocessing."
),
tools=[csv_tool],
llm=llm,
verbose=True
)
Explanation:
- Role: Prepares the dataset for analysis.
- Goal: Detects missing values, outliers, and incorrect data types. Computes statistical summaries.
c) Visualization Expert
visualization_agent = Agent(
role="Visualization Expert",
goal=(
"Generate meaningful visualizations such as histograms, scatter plots, line plots, bar charts, "
"and heatmaps to provide insights into the data. Save all visualizations to a 'graphs/' directory."
),
backstory=(
"Specializes in creating compelling and informative visualizations. You are an expert in Python, pandas, "
"matplotlib, seaborn, and data visualization, capable of creating impactful data stories."
),
tools=[csv_tool],
llm=llm,
verbose=True,
allow_code_execution=True
)
Explanation:
- Role: Creates visual insights from the dataset.
- Goal: Generates plots (histograms, heatmaps, scatter plots) to highlight relationships and trends in the data.
d) Report Specialist
Code:
markdown_report_agent = Agent(
role="Report Specialist",
goal=(
"Compile all findings, analysis, and visualizations into a structured markdown report. "
"Embed graphs and provide clear sections for analysis and summary."
),
backstory="An expert in synthesizing data insights into polished reports.",
tools=[csv_tool],
llm=llm,
verbose=True,
)
Explanation:
- Role: Produces a polished Markdown report.
- Goal: Consolidates all findings and embeds visualizations for a professional and comprehensive output.
5. Defining the Tasks
Each task corresponds to an agent’s role:
a) Dataset Inference Task
Code:
dataset_inference_task = Task(
description="Analyze the dataset to determine its context, purpose, and structure.",
expected_output="A descriptive overview of the dataset's structure and purpose.",
agent=dataset_inference_agent
)
Explanation:
- Focus: Analyzes column names and sample rows to infer dataset structure and purpose.
b) Data Cleaning Task
Code:
data_analysis_task = Task(
description="Perform a comprehensive analysis of the dataset to identify missing values, incorrect data types, and potential outliers.",
expected_output="A summary of missing values, standardized data types, and statistical metrics.",
agent=data_analysis_agent
)
Explanation:
- Focus: Cleans and prepares data, providing actionable insights on data quality.
c) Visualization Task
Code:
visualization_task = Task(
description="Generate visualizations dynamically based on the dataset's content.",
expected_output="A set of annotated graphs saved in the 'graphs/' directory.",
agent=visualization_agent
)
Explanation:
- Focus: Produces visualizations that uncover trends and relationships within the dataset.
d) Markdown Report Task
Code:
markown_report_task = Task(
description="Create a detailed markdown report summarizing all analysis and visualizations.",
expected_output="A polished markdown report with embedded graphs and actionable insights.",
agent=markdown_report_agent,
context=[dataset_inference_task, data_analysis_task, visualization_task],
output_file='report.md'
)
Explanation:
- Focus: Consolidates all findings into a single Markdown file.
6. Forming the Crew
Code:
csv_analysis_crew = Crew(
agents=[
dataset_inference_agent,
data_analysis_agent,
visualization_agent,
markdown_report_agent
],
tasks=[dataset_inference_task, data_analysis_task, visualization_task, markdown_report_task],
process=Process.sequential,
verbose=True
)
Explanation:
- The Crew connects all agents and tasks, executing them sequentially for a streamlined workflow.
7. Executing the Workflow
Code:
result = csv_analysis_crew.kickoff()
print("Crew Execution Complete. Final report generated.")
print(result)
Explanation:
- The kickoff() method initiates the entire pipeline, producing the final Markdown report.
You can check the full code here : Code
You can also check out my GitHub from here(If you like the content don’t forget to follow)
I also showcase some of my work you can check it out here