Type something to search...
Let’s build a Text Analysis Pipeline with LangGraph Agents

Let’s build a Text Analysis Pipeline with LangGraph Agents

In this article, I will present to you LangGraph, an incredible framework for building applications using graph-based workflows that would otherwise be impossible. I will share my experience with LangGraph, and its important features, and eventually create a text analysis pipeline that will illustrate what LangGraph is capable of doing.

Understanding LangGraph

Essentially, LangGraph is built around the concept of graph-based workflows, where each node functions as a specific procedural or computational step, and edges determine the flow of data between these nodes under certain conditions. This gives a high flexibility and modularity to the application design, making it highly appropriate for complicated tasks such as those found in natural language processing (NLP).

Key Features

  1. State Management: Exceed the Boundaries- State Management of LangGraph: It is probably most special because it has one of the best capabilities to maintain states across diverse nodes so that the application keeps context and can thus reply appropriately to the user’s actions or input.
  2. Flexible Routing: The framework supports dynamic data routing between nodes, allowing for complex decision-making processes within workflows. This flexibility is essential for applications that require adaptability based on varying inputs.
  3. Persistence: LangGraph includes built-in persistence capabilities, enabling workflows to save their state after each step. This feature is crucial for applications that need to recover from interruptions or support human-in-the-loop interactions.
  4. Visualization: The graph-based structure allows developers to visualize workflows easily, which aids in understanding how different components interact and the overall flow of data within the application.

Our Model for This Project: Text Analysis Pipeline

In this tutorial, we will build a multi-stage pipeline for text analysis using LangGraph. The pipeline will deal with processing a given text in three main steps:

1. Text Classification

In the initial stage, we classify the input text into defined categories such as News, Blogs, Research, Others, or something similar. With a classification model at this node, we can rather determine the nature of the text and take it on to further processing steps as required.

2. Entity Extraction

The next thing to do is identify and extract key entities of the text. In such recognition, important components like persons, organizations, and locations occur in that text. Entity extraction adds to the understanding of the text and sets the stage for further detailed analysis.

3. Text Summarization

Ultimately, we will create a small summary of the input text. Summarization techniques are involved in this step to provide all significant, necessary information in a more compressed form to users. The summarization node will collect input from both the classification and entity extraction stages for a more coherent overview.

Building the Pipeline

For the construction of this pipeline in LangGraph, we shall create nodes for each stage of processing and then lay down edges that will define the flow of data through these nodes.

  1. Define Nodes: Classification, Extraction, Summary- Every such function will be expressed in the form of a node in our graph.
  2. Establish Edges: We will create edges joining these nodes based on the output of one as input to another.
  3. Implement Logic: It can be necessary to define a conditional logic determining the path to take, based on the classification results or the extracted entities.

This process can lead us toward making a modular and extensible workflow that can further be easily modified or expanded upon concerning text analysis requirements.

Import Required Libraries

This cell imports all necessary modules and classes for our LangGraph tutorial.

import os
from typing import TypedDict, List
from langgraph.graph import StateGraph, END
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage
from langchain_core.runnables.graph import MermaidDrawMethod
from IPython.display import display, Image

from dotenv import load_dotenv

Set Up API Key

This cell would load the environment variables and should configure the OpenAI API key. You need to have a .env file containing your OPENAI_API_KEY.

## Load environment variables
load_dotenv()

## Set OpenAI API key
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')

Building the Text Processing Pipeline

Define the State and Set up the LLMHere, we define the State class to manage our workflow data and then, initialize the ChatOpenAI model.

class State(TypedDict):
    text: str
    classification: str
    entities: List[str]
    summary: str

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

Define Node Functions

Well, the functions that specify the operations performed at each node in our graph are for classification, entity extraction, and summarization.

def classification_node(state: State):
    ''' Classify the text into one of the categories: News, Blog, Research, or Other '''
    prompt = PromptTemplate(
        input_variables=["text"],
        template="Classify the following text into one of the categories: News, Blog, Research, or Other.\n\nText:{text}\n\nCategory:"
    )
    message = HumanMessage(content=prompt.format(text=state["text"]))
    classification = llm.invoke([message]).content.strip()
    return {"classification": classification}


def entity_extraction_node(state: State):
    ''' Extract all the entities (Person, Organization, Location) from the text '''
    prompt = PromptTemplate(
        input_variables=["text"],
        template="Extract all the entities (Person, Organization, Location) from the following text. Provide the result as a comma-separated list.\n\nText:{text}\n\nEntities:"
    )
    message = HumanMessage(content=prompt.format(text=state["text"]))
    entities = llm.invoke([message]).content.strip().split(", ")
    return {"entities": entities}


def summarization_node(state: State):
    ''' Summarize the text in one short sentence '''
    prompt = PromptTemplate(
        input_variables=["text"],
        template="Summarize the following text in one short sentence.\n\nText:{text}\n\nSummary:"
    )
    message = HumanMessage(content=prompt.format(text=state["text"]))
    summary = llm.invoke([message]).content.strip()
    return {"summary": summary}

Create Tools and Build Workflow

This cell constructs the StateGraph workflow.

workflow = StateGraph(State)

## Add nodes to the graph
workflow.add_node("classification_node", classification_node)
workflow.add_node("entity_extraction", entity_extraction_node)
workflow.add_node("summarization", summarization_node)

## Add edges to the graph
workflow.set_entry_point("classification_node") # Set the entry point of the graph
workflow.add_edge("classification_node", "entity_extraction")
workflow.add_edge("entity_extraction", "summarization")
workflow.add_edge("summarization", END)

## Compile the graph
app = workflow.compile()

Visualizing the Workflow

It helps us present a flow of how our work process goes in this cell through Mermaid.

display(
    Image(
        app.get_graph().draw_mermaid_png(
            draw_method=MermaidDrawMethod.API,
        )
    )
)

Testing the Pipeline

This cell runs a sample text through our pipeline and displays the results.

sample_text = """
OpenAI has announced the GPT-4 model, which is a large multimodal model that exhibits human-level performance on various professional benchmarks. It is developed to improve the alignment and safety of AI systems.
additionally, the model is designed to be more efficient and scalable than its predecessor, GPT-3. The GPT-4 model is expected to be released in the coming months and will be available to the public for research and development purposes.
"""

state_input = {"text": sample_text}
result = app.invoke(state_input)

print("Classification:", result["classification"])
print("\nEntities:", result["entities"])
print("\nSummary:", result["summary"])
#response
Classification: News

Entities: ['OpenAI', 'GPT-4', 'GPT-3']

Summary: OpenAI's upcoming GPT-4 model is a multimodal AI that aims for human-level performance, improved safety, and greater efficiency compared to GPT-3.

Conclusion

In this tutorial, we have:

  1. Studied the concepts of LangGraph
  2. Constructed a text-processing pipeline
  3. Showed an application of LangGraph in data processing workflows
  4. Visualized this workflow in Mermaid

An example of such uses outside the venue of conversational agents is what LangGraph as a general framework can be utilized for: constructing very complex graph-based workflows.

Related Posts

10 Creative Ways to Use ChatGPT Search The Web Feature

10 Creative Ways to Use ChatGPT Search The Web Feature

For example, prompts and outputs Did you know you can use the “search the web” feature of ChatGPT for many tasks other than your basic web search? For those who don't know, ChatGPT’s new

Read More
📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

In an industry as dynamic as AI and tech, staying ahead means constantly upgrading your skills. Whether you’re aiming to dive deep into AI model performance, master data analysis, or transform trad

Read More
10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

In today’s fast-paced digital world, marketers are always looking for smarter ways to streamline their efforts. Imagine having a personal assistant who can create audience profiles, suggest mar

Read More
10+ Top ChatGPT Prompts for UI/UX Designers

10+ Top ChatGPT Prompts for UI/UX Designers

AI technologies, such as machine learning, natural language processing, and data analytics, are redefining traditional design methodologies. From automating repetitive tasks to enabling personal

Read More
100 AI Tools to Finish Months of Work in Minutes

100 AI Tools to Finish Months of Work in Minutes

The rapid advancements in artificial intelligence (AI) have transformed how businesses operate, allowing people to complete tasks that once took weeks or months in mere minutes. From content creat

Read More
17 Mindblowing GitHub Repositories You Never Knew Existed

17 Mindblowing GitHub Repositories You Never Knew Existed

Github Hidden Gems!! Repositories To Bookmark Right Away Learning to code is relatively easy, but mastering the art of writing better code is much tougher. GitHub serves as a treasur

Read More