Type something to search...
Automating CSV Analysis with CrewAI

Automating CSV Analysis with CrewAI

In this blog, we explore how to automate the process of analyzing a CSV dataset using CrewAI. We’ll build a workflow that includes agents for dataset context inference, data cleaning, visualization, and reporting, and culminates in a polished Markdown report.

1. Initializing the Environment

We begin by importing necessary libraries and initializing the tools and environment for the workflow.

import os
import pandas as pd
import chardet
from crewai import Crew, Task, Agent, LLM, Process
from crewai_tools import FileReadTool, BaseTool, CSVSearchTool
import matplotlib.pyplot as plt
import seaborn as sns

Explanation:

  • Imports: Essential Python libraries for data manipulation (pandas), visualization (matplotlib, seaborn), and CrewAI modules (Crew, Task, Agent, etc.).
  • CrewAI Tools: FileReadTool and CSVSearchTool enable file handling and CSV analysis.

2. Initializing the LLM

We use an LLM model for the agents to process and generate natural language insights.

Code:

llm = LLM(
    model='ollama/llama3.2-vision',
    base_url='http://localhost:11434',
)

Explanation:

  • The ollama/llama3.2-vision model serves as the core AI engine, running locally for efficient data processing.
  • It enables agents to process data contextually and generate meaningful outputs.

3. Defining the Tools

A FileReadTool is used for reading the CSV file, which forms the basis of the analysis.

Code:

csv_tool = FileReadTool(file_path='scm.csv')

Explanation:

  • FileReadTool: Reads the specified CSV file (scm.csv) to provide raw data for agents to analyze.

4. Defining the Agents

We create four specialized agents, each with a distinct role:

a) Dataset Context Specialist

Code:

dataset_inference_agent = Agent(
    role="Dataset Context Specialist",
    goal=(
        "Infer the context and purpose of the dataset by analyzing column names, data types, "
        "and a few sample rows. Extract insights about the domain and the type of data provided."
    ),
    backstory=(
        "An expert in understanding datasets and identifying their purpose. You have a deep understanding of data science, "
        "machine learning, and data analysis."
    ),
    tools=[csv_tool],
    llm=llm,
    verbose=True,
    allow_code_execution=True
)

Explanation:

  • Role: Understands the dataset’s structure and context.
  • Goal: Identifies column types (numerical, categorical, etc.) and infers the dataset’s real-world application.

b) Data Cleaning Specialist

Code:

data_analysis_agent = Agent(
    role="Data Cleaning Specialist",
    goal=(
        "Analyze the dataset to identify missing values, incorrect data types, and potential outliers. "
        "Generate statistical summaries like mean, median, and correlations between variables."
    ),
    backstory=(
        "Specializes in cleaning and preparing data for analysis with expertise in data cleaning and preprocessing."
    ),
    tools=[csv_tool],
    llm=llm,
    verbose=True
)

Explanation:

  • Role: Prepares the dataset for analysis.
  • Goal: Detects missing values, outliers, and incorrect data types. Computes statistical summaries.

c) Visualization Expert

visualization_agent = Agent(
    role="Visualization Expert",
    goal=(
        "Generate meaningful visualizations such as histograms, scatter plots, line plots, bar charts, "
        "and heatmaps to provide insights into the data. Save all visualizations to a 'graphs/' directory."
    ),
    backstory=(
        "Specializes in creating compelling and informative visualizations. You are an expert in Python, pandas, "
        "matplotlib, seaborn, and data visualization, capable of creating impactful data stories."
    ),
    tools=[csv_tool],
    llm=llm,
    verbose=True,
    allow_code_execution=True
)

Explanation:

  • Role: Creates visual insights from the dataset.
  • Goal: Generates plots (histograms, heatmaps, scatter plots) to highlight relationships and trends in the data.

d) Report Specialist

Code:

markdown_report_agent = Agent(
    role="Report Specialist",
    goal=(
        "Compile all findings, analysis, and visualizations into a structured markdown report. "
        "Embed graphs and provide clear sections for analysis and summary."
    ),
    backstory="An expert in synthesizing data insights into polished reports.",
    tools=[csv_tool],
    llm=llm,
    verbose=True,
)

Explanation:

  • Role: Produces a polished Markdown report.
  • Goal: Consolidates all findings and embeds visualizations for a professional and comprehensive output.

5. Defining the Tasks

Each task corresponds to an agent’s role:

a) Dataset Inference Task

Code:

dataset_inference_task = Task(
    description="Analyze the dataset to determine its context, purpose, and structure.",
    expected_output="A descriptive overview of the dataset's structure and purpose.",
    agent=dataset_inference_agent
)

Explanation:

  • Focus: Analyzes column names and sample rows to infer dataset structure and purpose.

b) Data Cleaning Task

Code:

data_analysis_task = Task(
    description="Perform a comprehensive analysis of the dataset to identify missing values, incorrect data types, and potential outliers.",
    expected_output="A summary of missing values, standardized data types, and statistical metrics.",
    agent=data_analysis_agent
)

Explanation:

  • Focus: Cleans and prepares data, providing actionable insights on data quality.

c) Visualization Task

Code:

visualization_task = Task(
    description="Generate visualizations dynamically based on the dataset's content.",
    expected_output="A set of annotated graphs saved in the 'graphs/' directory.",
    agent=visualization_agent
)

Explanation:

  • Focus: Produces visualizations that uncover trends and relationships within the dataset.

d) Markdown Report Task

Code:

markown_report_task = Task(
    description="Create a detailed markdown report summarizing all analysis and visualizations.",
    expected_output="A polished markdown report with embedded graphs and actionable insights.",
    agent=markdown_report_agent,
    context=[dataset_inference_task, data_analysis_task, visualization_task],
    output_file='report.md'
)

Explanation:

  • Focus: Consolidates all findings into a single Markdown file.

6. Forming the Crew

Code:

csv_analysis_crew = Crew(
    agents=[
        dataset_inference_agent,
        data_analysis_agent,
        visualization_agent,
        markdown_report_agent
    ],
    tasks=[dataset_inference_task, data_analysis_task, visualization_task, markdown_report_task],
    process=Process.sequential,
    verbose=True
)

Explanation:

  • The Crew connects all agents and tasks, executing them sequentially for a streamlined workflow.

7. Executing the Workflow

Code:

result = csv_analysis_crew.kickoff()
print("Crew Execution Complete. Final report generated.")
print(result)

Explanation:

  • The kickoff() method initiates the entire pipeline, producing the final Markdown report.

You can check the full code here : Code

You can also check out my GitHub from here(If you like the content don’t forget to follow)

I also showcase some of my work you can check it out here

Related Posts

10 Creative Ways to Use ChatGPT Search The Web Feature

10 Creative Ways to Use ChatGPT Search The Web Feature

For example, prompts and outputs Did you know you can use the “search the web” feature of ChatGPT for many tasks other than your basic web search? For those who don't know, ChatGPT’s new

Read More
📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

In an industry as dynamic as AI and tech, staying ahead means constantly upgrading your skills. Whether you’re aiming to dive deep into AI model performance, master data analysis, or transform trad

Read More
10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

In today’s fast-paced digital world, marketers are always looking for smarter ways to streamline their efforts. Imagine having a personal assistant who can create audience profiles, suggest mar

Read More
10+ Top ChatGPT Prompts for UI/UX Designers

10+ Top ChatGPT Prompts for UI/UX Designers

AI technologies, such as machine learning, natural language processing, and data analytics, are redefining traditional design methodologies. From automating repetitive tasks to enabling personal

Read More
100 AI Tools to Finish Months of Work in Minutes

100 AI Tools to Finish Months of Work in Minutes

The rapid advancements in artificial intelligence (AI) have transformed how businesses operate, allowing people to complete tasks that once took weeks or months in mere minutes. From content creat

Read More
17 Mindblowing GitHub Repositories You Never Knew Existed

17 Mindblowing GitHub Repositories You Never Knew Existed

Github Hidden Gems!! Repositories To Bookmark Right Away Learning to code is relatively easy, but mastering the art of writing better code is much tougher. GitHub serves as a treasur

Read More