为商业规划和创业开发人工智能智能指南

Rifx.Online
Programming , Technology , Generative AI
24 Jan, 2025

基于LangGraph的高级智能RAG，结合标准商业指南、基于AI的网页搜索、可信来源和利用多模型的混合搜索

如果您不是 Medium 会员，可以通过此链接阅读完整故事。

在 ChatGPT 发布以及大型语言模型（LLMs）随之而来的激增后，它们固有的幻觉、知识截止日期以及无法提供组织或个人特定信息的局限性很快显现出来，并被视为主要缺陷。为了解决这些问题，检索增强生成（RAG）方法迅速获得关注，这些方法将外部数据集成到 LLMs 中，并指导它们的行为以回答给定知识库中的问题。

有趣的是，关于 RAG 的第一篇论文于 2020 年由来自 Facebook AI Research（现为 Meta AI）的研究人员发表，但直到 ChatGPT 的出现，其潜力才得以充分实现。从那时起，RAG 技术的发展势头没有停止。更先进和复杂的 RAG 框架相继推出，不仅提高了该技术的准确性，还使其能够处理多模态数据，扩展了其在广泛应用中的潜力。我在以下文章中详细讨论了这一主题，特别讨论了上下文多模态 RAG、多模态 AI 商业应用搜索以及信息提取和匹配平台。

随着 RAG 技术的不断发展和新兴的数据访问需求，人们意识到仅从静态知识库中回答问题的检索器 RAG 的功能可以通过集成其他多样的知识源和工具来扩展，例如：

多个数据库（例如，包括向量数据库和知识图谱的知识库）
实时网络搜索以访问最新信息
外部 API 收集特定数据，例如股票市场趋势或来自公司特定工具（如 Slack 渠道或电子邮件帐户）的数据
用于数据分析、报告撰写、文献回顾和人员搜索等任务的工具
比较和整合来自多个来源的信息。

为此，RAG 应能够根据查询选择最佳知识源和/或工具。AI 代理的出现引入了“代理 RAG”的概念，该概念可以根据查询选择最佳行动方案。

在本文中，我们将开发一个具体的代理 RAG 应用程序，称为 智能商业指南 (SBG) — 这是我们正在进行的名为 UPBEAT 项目的第一版工具，该项目由 Interreg 中央波尔图资助。该项目的重点是通过 AI 提升芬兰和爱沙尼亚移民的创业和商业规划技能。SBG 是该项目提升技能过程中的一项工具。该工具专注于为打算创业或已有业务的人提供来自可靠来源的准确和快速的信息。

SBG 的代理 RAG 包括：

作为知识库的商业和创业指南，包含有关商业规划、创业、公司注册、税收、商业创意、规则和法规、商业机会、许可证和许可、商业指南等的信息。
网络搜索以获取带有来源的最新信息。
知识提取工具从可信来源获取信息。此信息包括相关机构的联系方式、最新的税收规则、最新的商业注册规则和最新的许可法规。

这个代理 RAG 有什么特别之处？

在整个代理工作流程中选择不同的开源模型（Llama, Mistral, Gemma）以及专有模型（gpt-4o, gpt-4o-mini）的选项。开源模型不在本地运行，因此不需要强大且昂贵的计算机。相反，它们在 Groq Cloud 的平台上运行，并提供 免费 API。是的，这使其成为一个 无成本 的代理 RAG。GPT 模型也可以通过 OpenAI 的 API 密钥进行选择。
强制知识库搜索、网络搜索和混合搜索的选项。
对检索文档进行评分，以提高响应质量，并根据评分智能调用网络搜索。
选择响应类型的选项：简洁、适中或 解释性。

具体来说，本文围绕以下主题结构：

使用 LlamaParse 解析数据以构建知识库
使用 LangGraph 开发代理工作流程。
使用免费的开源模型开发高级代理 RAG（以下简称智能商业指南或 SBG）

该应用程序的完整代码可以在 GitHub 上找到。

应用程序代码结构分为两个 .py 文件：agentic_rag.py 实现整个代理工作流程，app.py 实现 Streamlit 图形用户界面。

让我们深入了解它。

使用 LlamaParsing 和 LangChain 构建知识库

SBG 的知识库包含由芬兰机构发布的真实商业和创业指南。由于这些指南内容庞大，从中找到所需的信息并不简单，因此目的是开发一个代理 RAG，不仅可以提供这些指南中的准确内容，还可以通过网络搜索和芬兰其他可信来源来增强信息，以获取最新的信息。

LlamaParse 是一个原生于 genAI 的文档解析平台，专为 LLM 用例而构建。我在上述引用的文章中解释了 LlamaParse 的使用。这一次，我直接在 LlamaCloud 上解析文档。LlamaParse 每天提供 1000 个免费积分。使用这些积分取决于解析模式。对于仅包含文本的 PDF，‘Fast’ 模式（1 积分 / 3 页）效果良好，它跳过了 OCR、图像提取和表格/标题识别。还有其他更高级的模式，每页所需的积分更多。我选择了 ‘premium’ 模式，它执行 OCR、图像提取和表格/标题识别，适合包含图像的复杂文档。

我定义了以下解析指令。

You are given a document containing text, tables, and images. Extract all the contents in their correct format. Extract each table in a correct format and include a detailed explanation of each table before its extracted format. 
If an image contains text, extract all the text in the correct format and include a detailed explanation of each image before its extracted text. 
Produce the output in markdown text. Extract each page separately in the form of an individual node. Assign the document name and page number to each extracted node in the format: [Creativity and Business, page 7]. 
Include the document name and page number at the start and end of each extracted page.

解析后的文档以 markdown 格式从 LlamaCloud 下载。相同的解析可以通过 LlamaCloud API 进行，如下所示。

import os
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader

## Define parsing instructions
parsing_instructions = """
Extract the text from the document using proper structure.
"""
def save_to_markdown(output_path, content):
    """
    Save extracted content to a markdown file.

    Parameters:
    output_path (str): The path where the markdown file will be saved.
    content (list): The extracted content to be saved.
    """
    with open(output_path, "w", encoding="utf-8") as md_file:
        for document in content:
            # Extract the text content from the Document object
            md_file.write(document.text + "\n\n")  # Access the 'text' attribute
            
def extract_document(input_path):
    # Initialize the LlamaParse parser
    parsing_instructions = """You are given a document containing text, tables, and images. Extract all the contents in their correct format. Extract each table in a correct format and include a detailed explanation of each table before its extracted format. 
    If an image contains text, extract all the text in the correct format and include a detailed explanation of each image before its extracted text. 
    Produce the output in markdown text. Extract each page separately in the form of an individual node. Assign the document name and page number to each extracted node in the format: [Creativity and Business, page 7]. 
    Include the document name and page number at the start and end of each extracted page.
    """
    parser = LlamaParse(
        result_type="markdown",
        parsing_instructions=parsing_instructions,
        premium_mode=True,
        api_key=LLAMA_CLOUD_API_KEY,
        verbose=True
    )
    
    file_extractor = {".pdf": parser}
    documents = SimpleDirectoryReader(
        input_path, file_extractor=file_extractor
    ).load_data()
    return documents

input_path = r"C:\Users\h02317\Downloads\docs"  # Replace with your document path
output_file = r"C:\Users\h02317\Downloads\extracted_document.md"  # Output markdown file name

## Extract the document
extracted_content = extract_document(input_path)
save_to_markdown(output_file, extracted_content)

以下是 Pikkala 等人（2015 年）编写的指南 Creativity and Business 的示例页面（“可在非商业性私人或公共使用时自由复制，并需注明出处”）。

以下是该页面的解析输出。LlamaParse 高效地从页面中的所有结构中提取信息。页面中显示的笔记本为图像格式。

[Creativity and Business, page 8]

## How to use this book

1. The book is divided into six chapters and sub-sections dealing with different topics. You can read the book through one chapter and topic at a time, or you can use the checklist of the table of contents to select sections on topics in which you need more information and support.

2. Each section opens with a creative entrepreneur's thought on the topic.

3. The introduction gives a brief description of the topic.

4. Each section contains exercises that help you reflect on your own skills and business idea and develop your business idea further.

### What is your business idea

"I would like to launch 
a touring theatre company."

Do you have an idea about a product or service you would like 
to sell? Or do you have a bunch of ideas you have been mull-
ing over for some time? This section will help you get a better 
understanding about your business idea and what competen-
cies you already have that could help you implement it, and 
what types of competencies you still need to gain.

#### EXTRA
Business idea development 
in a nutshell

I found a great definition of what business idea development 
is from the My Coach online service (Youtube 27 May 2014). 
It divides the idea development process into three stages: 
the thinking - stage, the (subconscious) talking - stage, and the 
customer feedback stage. It is important that you talk about 
your business idea, as it is very easy to become stuck on a 
particular path and ignore everything else. You can bounce 
your idea around with all sorts of people: with a local business 
advisor; an experienced entrepreneur; or a friend. As you talk 
about your business idea with others, your subconscious will 
start working on the idea, and the feedback from others will 
help steer the idea in the right direction.

#### Recommended reading
Taivas + helvetti 
(Terho Puustinen & Mika Mäkeläinen: 
One on One Publishing Oy 2013)

#### Keywords
treasure map; business idea; business idea development

### EXERCISE: Identifying your personal competencies

Write down the various things you have done in your life and think what kind of competencies each of these things has 
given you. The idea is not just to write down your education, 
training and work experience like in a CV; you should also 
include hobbies, encounters with different types of people, and any life experiences that may have contributed to you 
being here now with your business idea. The starting circle can be you at any age, from birth to adulthood, depending 
on what types of experiences you have had time to accumulate. The final circle can be you at this moment.

PERSONAL CAREER PATH

SUPPLEMENTARY 
PERSONAL DEVELOPMENT
(e.g. training courses; 
literature; seminars)

Fill in the 
"My Competencies" 
section of the 
Creative Business 
Model Canvas:

5. Each section also includes an EXTRA box with interesting tidbits about the topic at hand.

6. For each topic, tips on further reading are given in the grey box.

7. The second grey box contains recommended keywords for searching more information about the topic online.

8. By completing each section of the one-page business plan or "Creative Business Model Canvas" (page 74), 
by the end of the book you will have a complete business plan.

9. By writing down your business start-up costs (e.g. marketing or logistics) in the price tag box of each section, 
by the time you get to the Finance and Administration section you will already know your start-up costs 
and you can enter them in the receipt provided in the Finance and Administration section (page 57).

This book is based on Finnish practices. The authors and the publisher are not responsible for the applicability of factual information to other 
countries. Readers are advised to check country-specific information on business structures, support organisations, taxation, legislation, etc. 
Factual information about Finnish practices should also be checked in case of differing interpretations by authorities.

[Creativity and Business, page 8]

然后，使用 LangChain 的 RecursiveCharacterTextSplitter 将解析后的 markdown 文档拆分为块，CHUNK_SIZE = 3000 和 CHUNK_OVERLAP = 200。

def staticChunker(folder_path):
    docs = []
    print(f"Creating chunks. CHUNK_SIZE: {CHUNK_SIZE}, CHUNK_OVERLAP: {CHUNK_OVERLAP}")

    # Loop through all .md files in the folder
    for file_name in os.listdir(folder_path):
        if file_name.endswith(".md"):
            file_path = os.path.join(folder_path, file_name)
            print(f"Processing file: {file_path}")
            # Load documents from the Markdown file
            loader = UnstructuredMarkdownLoader(file_path)
            documents = loader.load()
            # Add file-specific metadata (optional)
            for doc in documents:
                doc.metadata["source_file"] = file_name
            # Split loaded documents into chunks
            text_splitter = RecursiveCharacterTextSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)
            chunked_docs = text_splitter.split_documents(documents)
            docs.extend(chunked_docs)
    return docs

随后，使用嵌入模型（如开源的 all-MiniLM-L6-v2 模型或 OpenAI 的 text-embedding-3-large）在 Chroma 数据库中创建向量存储。

def load_or_create_vs(persist_directory):
    # 检查向量存储目录是否存在
    if os.path.exists(persist_directory):
        print("正在加载现有的向量存储...")
        # 加载现有的向量存储
        vectorstore = Chroma(
            persist_directory=persist_directory,
            embedding_function=st.session_state.embed_model,
            collection_name=collection_name
        )
    else:
        print("未找到向量存储。正在创建一个新的...\n")
        docs = staticChunker(DATA_FOLDER)
        print("正在计算嵌入...")
        # 创建并持久化新的Chroma向量存储
        vectorstore = Chroma.from_documents(
            documents=docs,
            embedding=st.session_state.embed_model,
            persist_directory=persist_directory,
            collection_name=collection_name
        )
        print('向量存储成功创建并持久化！')

    return vectorstore

创建代理工作流程

AI 代理是工作流程与决策逻辑的结合，能够智能地回答问题或执行其他复杂任务，这些任务需要分解为更简单的子任务。

我使用 LangGraph 设计了一个工作流程，用于我们的 AI 代理，表示一系列以图形形式呈现的动作或决策。我们的代理需要决定是从向量数据库（知识库）、网络搜索、混合搜索，还是通过使用工具来回答问题。

在我接下来的文章中，我解释了使用 LangGraph 创建代理工作流程的过程。

我们需要创建图形节点来表示做出决策的工作流程（例如，网络搜索或向量数据库搜索）。节点通过边连接，定义决策和动作的流向（例如，检索后下一个状态是什么）。图形状态跟踪信息在图形中的移动，以便代理在每一步使用正确的数据。

工作流程的入口点是一个路由函数，它通过分析用户的查询来确定在工作流程中执行的初始节点。整个工作流程包含以下节点。

retrieve: 从向量存储中获取语义相似的信息块。
grade_documents: 根据用户的查询对检索到的信息块的相关性进行评分。
route_after_grading: 根据评分，决定是用检索到的文档生成响应，还是继续进行网络搜索。
websearch: 使用 Tavily 搜索引擎的 API 从网络来源获取信息。
generate: 使用提供的上下文（从向量存储和/或网络搜索中检索的信息）生成对用户查询的响应。
get_contact_tool: 从与芬兰移民服务相关的预定义可信 URL 中获取联系信息。
get_tax_info: 从预定义的可信 URL 中获取与税务相关的信息。
get_registration_info: 从预定义的可信 URL 中获取关于芬兰公司注册流程的详细信息。
get_licensing_info: 获取有关在芬兰开办企业所需的执照和许可证的信息。
hybrid_search: 结合文档检索和互联网搜索结果，为回答查询提供更广泛的上下文。
unrelated: 处理与工作流程重点无关的问题。

以下是工作流程中的边。

retrieve → grade_documents: 检索到的文档被发送进行评分。
grade_documents → websearch: 如果检索到的文档被认为不相关，则调用网络搜索。
grade_documents → generate: 如果检索到的文档相关，则继续生成响应。
websearch → generate: 将网络搜索的结果传递给生成响应。
get_contact_tool, get_tax_info, get_registration_info, get_licensing_info → generate: 这四个工具到 generate 节点的边传递来自特定可信来源的获取信息以生成响应。
hybrid_search → generate: 传递组合结果（向量存储 + 网络搜索）以生成响应。
unrelated → generate: 为无关问题提供后备响应。

图形状态结构作为维护工作流程状态的容器，包括以下元素：

question: 驱动工作流程的用户查询或输入。
generation: 处理后生成的最终响应。
web_search_needed: 一个标志，指示是否需要根据检索到的文档的相关性进行网络搜索。
documents: 与查询相关的检索或处理过的文档列表。
answer_style: 指定所需答案的风格，如“简洁”、“适中”或“解释性”。

图形状态结构定义如下：

class GraphState(TypedDict):
    question: str
    generation: str
    web_search_needed: str
    documents: List[Document]
    answer_style: str

以下路由函数分析查询并将其路由到相关节点进行处理。创建一个链，其中包含选择工具/节点的提示和查询。该链调用路由 LLM 以选择相关工具。

def route_question(state):
    question = state["question"]
    
    # check whether one of these two options has been selected in the user interface
    hybrid_search_enabled = state.get("hybrid_search", False)
    internet_search_enabled = state.get("internet_search", False)
    
    if hybrid_search_enabled: 
        return "hybrid_search"
    
    if internet_search_enabled:
        return "websearch"

    tool_selection = {
      "get_tax_info": (
          "与税务事项相关的问题，包括当前税率、税收规则、应税收入、税收豁免、报税流程或类似主题。"
      ),
      "get_contact_tool": (
          "专门询问芬兰移民服务（Migri）联系信息的问题。"
      ),
      "get_registration_info": (
          "关于公司注册流程的问题。"
          "这不包括关于开办企业或类似流程的更广泛问题。"
      ),
      "get_licensing_info": (
          "与开办企业所需的执照、许可证和通知相关的问题，特别是针对外国企业家的问题。"
          "这不包括居留许可证或执照的问题。"
      ),
      "websearch": (
          "与居留许可证、签证、移居芬兰相关的问题，或需要当前统计数据或实时信息的问题。"
      ),
      "retrieve": (
          "与商业、商业规划、商业机会、初创企业、创业、就业、失业、养老金、保险、社会福利及类似主题相关的问题。"
          "这包括关于特定商业机会（例如，特定专业、领域、主题）或建议的问题。"
      ),
      "unrelated": (
          "与商业、创业、初创企业、就业、失业、养老金、保险、社会福利或类似主题无关的问题，"
          "或与芬兰以外的其他国家或城市相关的问题。"
      )
    }

    SYS_PROMPT = """作为路由器，根据用户的问题选择特定工具或功能。 
                 - 分析给定的问题，并使用给定的工具选择字典根据其描述和与问题的相关性输出相关工具的名称。 
                   字典的键是工具名称，值是其描述。 
                 - 仅输出工具名称，即确切的键，且不附加任何解释。 
                 - 对于提到芬兰以外的任何其他国家或芬兰以外的任何其他城市的问题，输出'unrelated'。
                """

    # Define the ChatPromptTemplate
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", SYS_PROMPT),
            ("human", """这里是问题：
                        {question}
                        这是工具选择字典：
                        {tool_selection}
                        输出所需的工具。
                    """),
        ]
    )

    # Pass the inputs to the prompt
    inputs = {
        "question": question,
        "tool_selection": tool_selection
    }

    # Invoke the chain
    tool = (prompt | st.session_state.router_llm | StrOutputParser()).invoke(inputs)
    tool = re.sub(r"[\\'\"`]", "", tool.strip()) # Remove backslashes and extra spaces
    if not "unrelated" in tool:
        print(f"通过 {st.session_state.router_llm.model_name} 调用 {tool} 工具")
    if "websearch" in tool:
        print("我需要从这个查询中获取最新信息。")
    return tool

与工作流程无关的问题被路由到 handle_unrelated 节点，该节点通过 generate 节点提供后备响应。

def handle_unrelated(state):
    question = state["question"]
    documents = state.get("documents",[])
    response = "抱歉，我的设计是专门回答与芬兰的商业和创业相关的问题。请您重新措辞您的问题，以关注这些主题。"
    documents.append(Document(page_content=response))
    return {"generation": response, "documents": documents, "question": question}

整个工作流程在下图中表示。

检索与评分

retrieve 节点使用问题调用检索器，从向量存储中获取相关的信息块。这些信息块（“documents”）被发送到 grade_documents 节点进行相关性评分。根据评分后的信息块（“filtered_docs”），route_after_grading 节点决定是继续使用检索到的信息进行生成，还是调用网络搜索。辅助函数 initialize_grader_chain 用于初始化评分链，提示评分 LLM 评估每个信息块的相关性。grade_documents 节点分析每个信息块，以确定其是否与问题相关。对于每个信息块，它输出 “Yes” 或 “No”，具体取决于该信息块是否与问题相关。

def initialize_grader_chain():
    # Data model for LLM output format
    class GradeDocuments(BaseModel):
        """Binary score for relevance check on retrieved documents."""
        binary_score: str = Field(
            description="Documents are relevant to the question, 'yes' or 'no'"
        )

    # LLM for grading
    structured_llm_grader = st.session_state.grader_llm.with_structured_output(GradeDocuments)

    # Prompt template for grading
    SYS_PROMPT = """You are an expert grader assessing relevance of a retrieved document to a user question.
      Follow these instructions for grading:
      - If the document contains keyword(s) or semantic meaning related to the question, grade it as relevant.
      - Your grade should be either 'Yes' or 'No' to indicate whether the document is relevant to the question or not."""

    grade_prompt = ChatPromptTemplate.from_messages([
        ("system", SYS_PROMPT),
        ("human", """Retrieved document:
    {documents}
    User question:
    {question}
    """),
    ])

    # Build grader chain
    return grade_prompt | structured_llm_grader

def grade_documents(state):
    question = state["question"]
    documents = state.get("documents", [])
    filtered_docs = []

    if not documents:
        print("No documents retrieved for grading.")
        return {"documents": [], "question": question, "web_search_needed": "Yes"}

    print(f"Grading retrieved documents with {st.session_state.grader_llm.model_name}")

    for count, doc in enumerate(documents):
        try:
            # Evaluate document relevance
            score = st.session_state.doc_grader.invoke({"documents": [doc], "question": question})
            print(f"Chunk {count} relevance: {score}")
            if score.binary_score == "Yes":
                filtered_docs.append(doc)
        except Exception as e:
            print(f"Error grading document chunk {count}: {e}")

    web_search_needed = "Yes" if not filtered_docs else "No"
    return {"documents": filtered_docs, "question": question, "web_search_needed": web_search_needed}

def route_after_grading(state):
    web_search_needed = state.get("web_search_needed", "No")
    print(f"Routing decision based on web_search_needed={web_search_needed}")
    if web_search_needed == "Yes":
        return "websearch"
    else:
        return "generate"

def retrieve(state):
    print("Retrieving documents")
    question = state["question"]
    documents = st.session_state.retriever.invoke(question)
    return {"documents": documents, "question": question}

网络和混合搜索

web_search 节点通过 route_after_grading 节点到达，当在检索的信息中未找到相关的片段，或者直接通过 route_question 节点到达，当 internet_search_enabled 状态标志为“True”（在用户界面中通过单选按钮选择）时，或者路由函数决定将查询路由到 web_search 以获取最新和更相关的信息。

Tavily 搜索引擎的免费 API 可以通过在他们的网站上创建一个帐户获得。免费计划每月提供 1000 个积分。Tavily 搜索结果附加到状态变量“document”，然后传递给 generate 节点，状态变量为“question”。

混合搜索结合了检索器和 Tavily 搜索的结果，并填充“document”状态变量，该变量与“question”状态变量一起传递给生成节点。

def web_search(state):
    if "tavily_client" not in st.session_state:
        st.session_state.tavily_client = TavilyClient()
    question = state["question"]
    question = re.sub(r'\b\w+\\|Internet search\b', '', question).strip()
    question = question + " in Finland"
    documents = state.get("documents", [])
    try:
        print("Invoking internet search...")
        search_result = st.session_state.tavily_client.get_search_context(
            query=question,
            search_depth="advanced",
            max_tokens=4000
        )
        # Handle different types of results
        if isinstance(search_result, str):
            web_results = search_result
        elif isinstance(search_result, dict) and "documents" in search_result:
            web_results = "\n".join([doc.get("content", "") for doc in search_result["documents"]])
        else:
            web_results = "No valid results returned by TavilyClient."
        web_results_doc = Document(page_content=web_results)
        documents.append(web_results_doc)
    except Exception as e:
        print(f"Error during web search: {e}")
        # Ensure workflow can continue gracefully
        documents.append(Document(page_content=f"Web search failed: {e}"))
    return {"documents": documents, "question": question}

def hybrid_search(state):
    question = state["question"]
    print("Invoking retriever...")
    vector_docs = st.session_state.retriever.invoke(question)
    web_docs = web_search({"question": question})["documents"]
    
    # Add headings to distinguish between vector and web search results
    vector_results = [Document(page_content="Smart guide results:\n\n" + doc.page_content) for doc in vector_docs]
    web_results = [Document(page_content="\n\nInternet search results:\n\n" + doc.page_content) for doc in web_docs]
    
    combined_docs = vector_results + web_results
    return {"documents": combined_docs, "question": question}

调用工具

在这个代理工作流程中使用的工具是抓取功能，用于从预定义的可信网址获取信息。Tavily与这些工具的区别在于，Tavily执行更广泛的互联网搜索，以从多种来源带来结果。而这些工具使用Python的Beautiful Soup网页抓取库，从可信来源（预定义网址）提取信息。通过这种方式，我们确保关于某些查询的信息是从已知的、可信的来源提取的。此外，这种信息检索是完全免费的。

以下是get_tax_info节点如何与一些辅助函数一起工作的。其他类型的工具（节点）也以相同的方式工作。

## Helper function to remove unwanted tags
def remove_tags(soup):
    for element in soup(["script", "style", "header", "footer", "nav", "aside", "noscript"]):
        element.decompose()

    # Extract text while preserving structure
    content = ""
    for element in soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p', 'li']):
        text = element.get_text(strip=True)
        if element.name.startswith('h'):
            level = int(element.name[1])
            content += '#' * level + ' ' + text + '\n\n'  # Markdown-style headings
        elif element.name == 'p':
            content += text + '\n\n'
        elif element.name == 'li':
            content += '- ' + text + '\n'
    return content

## Helper function to fetch and return information from predefined URLs.
def get_info(URLs):
    combined_info = ""
    for url in URLs:
        try:
            response = requests.get(url)
            if response.status_code == 200:
                soup = BeautifulSoup(response.text, "html.parser")
                combined_info += "URL: " + url + ": " + remove_tags(soup) + "\n\n" 
            else:
                combined_info += f"Failed to retrieve information from {url}\n\n"
        except Exception as e:
            combined_info += f"Error fetching URL {url}: {e}\n\n"
    return combined_info

## Tool or node to return updated tax-related information from predefined URLs
def get_tax_info(state):
    """
    Execute the 'get_contact_info' tool to fetch information.
    """
    tax_rates_url = [
        'https://www.vero.fi/en/businesses-and-corporations/taxes-and-charges/vat/rates-of-vat/',
        'https://www.expat-finland.com/living_in_finland/tax.html?utm_source=chatgpt.com',
        'https://finlandexpat.com/tax-in-finland/?utm_source=chatgpt.com'
    ]
    question = state["question"]
    documents = state.get("documents", [])
    try:
        tax_info = get_info(tax_rates_url)
        web_results_doc = Document(page_content=tax_info)
        documents.append(web_results_doc)
        return {
            "generation": tax_info,
            "documents": documents,
            "question": question
        }
    except Exception as e:
        return {
            "generation": f"Error fetching contact information: {e}",
            "documents": [],
            "question": question
        }

生成响应

节点 generate 通过调用具有预定义提示的链（LangChain 的 PromptTemplate 类）来创建最终响应，具体如下所述。rag_prompt 接收状态变量“question”、“context”和“answer_style”，并指导整个响应生成的行为，包括有关响应风格、对话语气、格式指南、引用规则、混合上下文处理和仅上下文关注的指示。

rag_prompt = PromptTemplate(
    template = r"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
    You are a highly accurate and trustworthy assistant specialized in answering questions related to business and entrepreneurship in Finland. 
    Your responses must strictly adhere to the provided context, answer style, using the follow these rules:

    1. **Context-Only Answers with a given answer style**:
    - Always base your answers on the provided context and answer style.
    - If the context does not contain relevant information, respond with: 'No information found. Switch to internet search.'
    - If the context contains some pieces of the required information, answer with that information and very briefly mention that the answer to other parts could not be found.
    - If the context explicitly states 'I apologize, but I'm designed to answer questions specifically related to business and entrepreneurship in Finland,' output this context verbatim.

    2. **Response style**:
    - Address the query directly without unnecessary or speculative information.
    - Do not draw from your knowledge base; strictly use the given context. However, take some liberty to provide more explanations and illustrations for better clarity and demonstration from your knowledge and experience only if answer style is "Moderate" or "Explanatory". 
    3. **Answer style**
    - If answer style = "Concise", generate a concise answer. 
    - If answer style = "Moderate", use a moderate approach to generate answer where you can provide a little bit more explanation and elaborate the answer to improve clarity, integrating your own experience. 
    - If answer style = "Explanatory", elaborate the answer to provide more explanations with examples and illustrations to improve clarity in best possible way, integrating your own experience.
      However, the explanations, examples and illustrations should be strictly based on the context. 

    3. **Conversational tone**
     - Maintain a conversational and helping style which should tend to guide the user and provide him help, hints and offers to further help and information. 
     - Use simple language. Explain difficult concepts or terms wherever needed. Present the information in the best readable form.

    4. **Formatting Guidelines**:
    - Use bullet points for lists.
    - Include line breaks between sections for clarity.
    - Highlight important numbers, dates, and terms using **bold** formatting.
    - Create tables wherever appropriate to present data clearly.
    - If there are discrepancies in the context, clearly explain them.

    5. **Citation Rules**:
    - Citation information may be present in the context in the form of [document name, page number] or URLs. It is very important to cite references if you find them in the context.
    - For responses based on vectorstore retrieval, cite the document name and page number with each piece of information in the format: [document_name, page xx].
    - For the answer compiled from the context from multiple documents, use the format: document_name 1 [page xx, yy, zz, ...], document_name 2 [page xx, yy, zz, ...].
    - For responses derived from websearch results and containing cited URLs, include all the URLs in hyperlink form returned by the websearch, each on a new line.
    - Do not invent any citation or URL. Only use the citation or URL in the context. 

    6. **Hybrid Context Handling**:
    - If the context contains two different sections with the names 'Smart guide results:' and 'Internet search results:', structure your response in corresponding sections with the following headings:
        - **Smart guide results**: Include data from vectorstore retrieval and its citations in the format: [document_name, page xx].
        - **Internet search results**: Include data from websearch and its citations (URLs). This does not mean only internet URLs, but all the data in 'Internet search results:' along with URLs.
        - Do not combine the data in the two sections. Create two separate sections. 

    7. **Integrity and Trustworthiness**:
    - Ensure every part of your response complies with these rules.

    <|eot_id|><|start_header_id|>user<|end_header_id|>
    Question: {question} 
    Context: {context} 
    Answer style: {answer_style}
    Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["question", "context", "answer_style"]
)

generate 节点首先检索状态变量“question”、“documents”和“answer_style”，并将“documents”格式化为一个字符串，作为上下文。随后，它使用 rag_prompt 和响应生成 LLM 调用生成链，以生成最终答案，该答案填充在“generation”状态变量中。此状态变量由 app.py 用于在 Streamlit 用户界面中显示生成的响应。

使用 Groq 的免费 API，有可能达到模型的速率或上下文窗口限制。在这种情况下，我扩展了 generate 节点，以便从模型名称列表中以循环方式动态切换模型，并在生成响应后恢复到当前模型。

## Helper function to format documents into a single string for context.
def format_documents(documents):
    return "\n\n".join(doc.page_content for doc in documents)

## Graph node to generate the final response
def generate(state):
    question = state["question"]
    documents = state.get("documents", [])
    answer_style = state.get("answer_style", "Concise")

    if "llm" not in st.session_state:
        st.session_state.llm = initialize_llm(st.session_state.selected_model, answer_style)

    rag_chain = rag_prompt | st.session_state.llm | StrOutputParser()

    if not documents:
        print("No documents available for generation.")
        return {"generation": "No relevant documents found.", "documents": documents, "question": question}

    tried_models = set()
    original_model = st.session_state.selected_model
    current_model = original_model

    while len(tried_models) < len(model_list):
        try:
            tried_models.add(current_model)
            st.session_state.llm = initialize_llm(current_model, answer_style)
            rag_chain = rag_prompt | st.session_state.llm | StrOutputParser()
            context = format_documents(documents)
            generation = rag_chain.invoke({"context": context, "question": question, "answer_style": answer_style})
            print(f"Generating a {answer_style} length response.")
            print(f"Response generated with {st.session_state.llm.model_name} model.")
            print("Done.")

            if current_model != original_model:
                print(f"Reverting to original model: {original_model}")
                st.session_state.llm = initialize_llm(original_model, answer_style)
            return {"documents": documents, "question": question, "generation": generation}

        except Exception as e:
            error_message = str(e)
            if "rate_limit_exceeded" in error_message or "Request too large" in error_message or "Please reduce the length of the messages or completion" in error_message:
                print(f"Model's rate limit exceeded or request too large.")
                current_model = model_list[(model_list.index(current_model) + 1) % len(model_list)]
                print(f"Switching to model: {current_model}")
            else:
                return {
                    "generation": f"Error during generation: {error_message}",
                    "documents": documents,
                    "question": question,
                }

    return {
        "generation": "Unable to process the request due to limitations across all models.",
        "documents": documents,
        "question": question,
    }

辅助函数

在 agentic_rag.py 中还有其他辅助函数，用于初始化应用程序、LLMs、嵌入模型和会话变量。函数 initialize_app 在应用初始化时从 app.py 调用，并在每次通过 Streamlit 应用更改模型或状态变量时触发。它重新初始化组件并保存更新的状态。该函数还跟踪各种会话变量，并防止冗余初始化。

def initialize_app(model_name, selected_embedding_model, selected_routing_model, selected_grading_model, hybrid_search, internet_search, answer_style):
    """
    Initialize embeddings, vectorstore, retriever, and LLM for the RAG workflow.
    Reinitialize components only if the selection has changed.
    """
    # Track current state to prevent redundant initialization
    if "current_model_state" not in st.session_state:
        st.session_state.current_model_state = {
            "answering_model": None,
            "embedding_model": None,
            "routing_model": None,
            "grading_model": None,
        }

    # Check if models or settings have changed
    state_changed = (
        st.session_state.current_model_state["answering_model"] != model_name or
        st.session_state.current_model_state["embedding_model"] != selected_embedding_model or
        st.session_state.current_model_state["routing_model"] != selected_routing_model or
        st.session_state.current_model_state["grading_model"] != selected_grading_model
    )

    # Reinitialize components only if settings have changed
    if state_changed:
        st.session_state.embed_model = initialize_embedding_model(selected_embedding_model)
        
        # Update vectorstore
        persist_directory = persist_directory_openai if "text-" in selected_embedding_model else persist_directory_huggingface
        st.session_state.vectorstore = load_or_create_vs(persist_directory)
        st.session_state.retriever = st.session_state.vectorstore.as_retriever(search_kwargs={"k": 5})
        
        st.session_state.llm = initialize_llm(model_name, answer_style)
        st.session_state.router_llm = initialize_router_llm(selected_routing_model)
        st.session_state.grader_llm = initialize_grading_llm(selected_grading_model)
        st.session_state.doc_grader = initialize_grader_chain()

        # Save updated state
        st.session_state.current_model_state.update({
            "answering_model": model_name,
            "embedding_model": selected_embedding_model,
            "routing_model": selected_routing_model,
            "grading_model": selected_grading_model,
        })

    print(f"Using LLM: {model_name}, Router LLM: {selected_routing_model}, Grader LLM:{selected_grading_model}, embedding model: {selected_embedding_model}")

    return workflow.compile()

以下辅助函数用于初始化回答 LLM、嵌入模型、路由 LLM 和评分 LLM。模型名称列表 model_list 用于在 generate 节点动态切换模型时跟踪模型。

model_list = [
    "llama-3.1-8b-instant",
    "llama-3.3-70b-versatile",
    "llama3-70b-8192",   
    "llama3-8b-8192", 
    "mixtral-8x7b-32768", 
    "gemma2-9b-it",
    "gpt-4o-mini",
    "gpt-4o"
    ]

## Helper function to initialize the selected answering LLM
def initialize_llm(model_name, answer_style):
    if "llm" not in st.session_state or st.session_state.llm.model_name != model_name:
        if answer_style == "Concise":
            temperature = 0.0
        elif answer_style == "Moderate":
            temperature = 0.2
        elif answer_style == "Explanatory":
            temperature = 0.4

        if "gpt-" in model_name:
            st.session_state.llm = ChatOpenAI(model=model_name, temperature=temperature)
        else:
            st.session_state.llm = ChatGroq(model=model_name, temperature=temperature)

    return st.session_state.llm

## Helper function to initialize the selected embedding model
def initialize_embedding_model(selected_embedding_model):
    # Check if the embed_model exists in session_state
    if "embed_model" not in st.session_state:
        st.session_state.embed_model = None

    # Check if the current model matches the selected one
    current_model_name = None
    if st.session_state.embed_model:
        if hasattr(st.session_state.embed_model, "model"):
            current_model_name = st.session_state.embed_model.model
        elif hasattr(st.session_state.embed_model, "model_name"):
            current_model_name = st.session_state.embed_model.model_name

    # Initialize a new model if it doesn't match the selected one
    if current_model_name != selected_embedding_model:
        if "text-" in selected_embedding_model:
            st.session_state.embed_model = OpenAIEmbeddings(model=selected_embedding_model)
        else:
            st.session_state.embed_model = HuggingFaceEmbeddings(model_name=selected_embedding_model)

    return st.session_state.embed_model

## Helper function to initialize the selected router LLM
def initialize_router_llm(selected_routing_model):
    if "router_llm" not in st.session_state or st.session_state.router_llm.model_name != selected_routing_model:
        if "gpt-" in selected_routing_model:
            st.session_state.router_llm = ChatOpenAI(model=selected_routing_model, temperature=0.0)
        else:
            st.session_state.router_llm = ChatGroq(model=selected_routing_model, temperature=0.0)
    
    return st.session_state.router_llm

## Helper function to initialize the selected grading LLM
def initialize_grading_llm(selected_grading_model):
    if "grader_llm" not in st.session_state or st.session_state.grader_llm.model_name != selected_grading_model:
        if "gpt-" in selected_grading_model:
            st.session_state.grader_llm = ChatOpenAI(model=selected_grading_model, temperature=0.0)
        else:
            st.session_state.grader_llm = ChatGroq(model=selected_grading_model, temperature=0.0)
    
    return st.session_state.grader_llm

建立工作流程

现在图的状态、节点、使用 route_question 的条件入口点和边缘已经定义，以建立节点之间的流。最后，工作流程被编译成可在 Streamlit 界面中使用的可执行 app。工作流程中的条件入口点使用 route_question 函数根据查询选择工作流程中的第一个节点。条件边 (workflow.add_conditional_edges) 描述根据 grade_documents 节点确定的相关性是否过渡到 websearch 或 generate 节点。

workflow = StateGraph(GraphState)

## Add nodes
workflow.add_node("retrieve", retrieve)
workflow.add_node("grade_documents", grade_documents)
workflow.add_node("route_after_grading", route_after_grading)
workflow.add_node("websearch", web_search)
workflow.add_node("generate", generate)
workflow.add_node("get_contact_tool", get_contact_tool)
workflow.add_node("get_tax_info", get_tax_info)
workflow.add_node("get_registration_info", get_registration_info)
workflow.add_node("get_licensing_info", get_licensing_info)
workflow.add_node("hybrid_search", hybrid_search)
workflow.add_node("unrelated", handle_unrelated)

## Set conditional entry points
workflow.set_conditional_entry_point(
    route_question,
    {
        "retrieve": "retrieve",
        "websearch": "websearch",
        "get_contact_tool": "get_contact_tool",
        "get_tax_info": "get_tax_info",
        "get_registration_info": "get_registration_info",
        "get_licensing_info": "get_licensing_info",
        "hybrid_search": "hybrid_search",
        "unrelated": "unrelated"
    },
)

## Add edges
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
    "grade_documents",
    route_after_grading,
    {"websearch": "websearch", "generate": "generate"},
)
workflow.add_edge("websearch", "generate")
workflow.add_edge("get_contact_tool", "generate")
workflow.add_edge("get_tax_info", "generate")
workflow.add_edge("get_registration_info", "generate")
workflow.add_edge("get_licensing_info", "generate")
workflow.add_edge("hybrid_search", "generate")
workflow.add_edge("unrelated", "generate")

## Compile app
app = workflow.compile()

Streamlit 界面

app.py 中的 Streamlit 应用程序提供了一个交互式界面，用于提问和显示响应，使用动态设置进行模型选择、回答风格和特定查询工具。从 agentic_rag.py 导入的 initialize_app 函数初始化所有会话变量，包括所有 LLM、嵌入模型和从左侧边栏选择的其他选项。

agentic_rag.py 中的打印语句通过将 sys.stdout 重定向到 io.stringIO 缓冲区来捕获。然后使用 Streamlit 中的 text_area 组件在调试占位符中显示该缓冲区的内容。

import streamlit as st
from agentic_rag import initialize_app
import sys
import io
import os
import time

## Configure the Streamlit page layout
st.set_page_config(
    page_title="Smart Business Guide",
    layout="wide",
    initial_sidebar_state="expanded",
    page_icon = "🧠"
)

## Initialize session state for messages
if "messages" not in st.session_state:
    st.session_state.messages = []

## Sidebar layout
with st.sidebar:
    try:
        st.image("LOGO_UPBEAT.jpg", width=150, use_container_width=True)
    except Exception as e:
        st.warning("Unable to load image. Continuing without it.")

    st.title("🗣️ Smart Guide 1.0")
    st.markdown("**▶️ Actions:**")

    # Initialize session state for the model if it doesn't exist
    if "selected_model" not in st.session_state:
        st.session_state.selected_model = "gpt-4o"

    if "selected_routing_model" not in st.session_state:
        st.session_state.selected_routing_model = "gpt-4o"

    if "selected_grading_model" not in st.session_state:
        st.session_state.selected_grading_model = "gpt-4o"

    if "selected_embedding_model" not in st.session_state:
        st.session_state.selected_embedding_model = "text-embedding-3-large"

    model_list = [
        "llama-3.1-8b-instant",
        "llama-3.3-70b-versatile",
        "llama3-70b-8192",   
        "llama3-8b-8192", 
        "mixtral-8x7b-32768", 
        "gemma2-9b-it",
        "gpt-4o-mini",
        "gpt-4o"
    ]

    embed_list = [
        "text-embedding-3-large",
        "sentence-transformers/all-MiniLM-L6-v2"
    ]

    with st.expander("⚙️ Settings", expanded=False):
        st.session_state.selected_model = st.selectbox(
            "🤖 选择回答 LLM",
            model_list,
            key="model_selector",
            index=model_list.index(st.session_state.selected_model)
        )

        st.session_state.selected_routing_model = st.selectbox(
            "📡 选择路由 LLM",
            model_list,
            key="routing_model_selector",
            index=model_list.index(st.session_state.selected_routing_model)
        )

        st.session_state.selected_grading_model = st.selectbox(
            "🧮 选择检索评分 LLM",
            model_list,
            key="grading_model_selector",
            index=model_list.index(st.session_state.selected_grading_model)
        )

        st.session_state.selected_embedding_model = st.selectbox(
            "🧠 选择嵌入模型",
            embed_list,
            key="embedding_model_selector",
            index=embed_list.index(st.session_state.selected_embedding_model)
        )
        # Add the slider for answer style
        answer_style = st.select_slider(
            "💬 回答风格",
            options=["简洁", "适中", "详细解释"],
            value="简洁",
            key="answer_style_slider",
            disabled=False
        )

    search_option = st.radio(
        "搜索选项",
        ["智能指南 + 工具", "仅互联网搜索", "混合搜索（指南 + 互联网）"],
        index=0
    )

    # Set the corresponding boolean values based on the selected option
    hybrid_search = search_option == "混合搜索（指南 + 互联网）"
    internet_search = search_option == "仅互联网搜索"
    
    reset_button = st.button("🔄 重置对话", key="reset_button")

    # Initialize the app with the selected model
    app = initialize_app(st.session_state.selected_model, st.session_state.selected_embedding_model, st.session_state.selected_routing_model, st.session_state.selected_grading_model, hybrid_search, internet_search, answer_style)
    if reset_button:
        st.session_state.messages = []
## Title
st.title("📘 芬兰创业与商业规划的智能指南")
st.markdown(
    """
    <div style="text-align: left; font-size: 18px; margin-top: 20px; line-height: 1.6;">
        🤖 <b>欢迎来到您的智能商业指南！</b><br>
        我在这里帮助您：<br>
        <ul style="list-style-position: inside; text-align: left; display: inline-block;">
            <li>基于 AI 代理的方法，从芬兰的商业和创业指南中寻找答案</li>
            <li>通过基于 AI 的互联网搜索提供最新信息</li>
            <li>根据查询理解自动调用基于 AI 的互联网搜索</li>
            <li>税务相关信息、许可证和执照、商业注册、居留许可证等的专用工具：</li>
        </ul>
        <p style="margin-top: 10px;"><b>开始在下面的聊天框中输入您的问题，我将为您的商业需求提供量身定制的答案！</b></p>
    </div>
    """,
    unsafe_allow_html=True
)

## Display conversation history
for message in st.session_state.messages:
    if message["role"] == "user":
        with st.chat_message("user"):
            st.markdown(f"**您:** {message['content']}")
    elif message["role"] == "assistant":
        with st.chat_message("assistant"):
            st.markdown(f"**助手:** {message['content']}")

## Input box at the bottom for new messages
if user_input := st.chat_input("输入您的问题（最多 150 个字符）："):
    if len(user_input) > 150:
        st.error("您的问题超过 150 个字符。请缩短并重试。")
    else:
        # Add user's message to session state and display it
        st.session_state.messages.append({"role": "user", "content": user_input})
        with st.chat_message("user"):
            st.markdown(f"**您:** {user_input}")

        # Capture print statements from agentic_rag.py
        output_buffer = io.StringIO()
        sys.stdout = output_buffer  # Redirect stdout to the buffer

        try:
            with st.chat_message("assistant"):
                response_placeholder = st.empty()
                debug_placeholder = st.empty()
                streamed_response = ""

                # Show spinner while streaming the response
                with st.spinner("思考中..."):
                    #inputs = {"question": user_input}
                    inputs = {"question": user_input, "hybrid_search": hybrid_search, "internet_search":internet_search, "answer_style":answer_style}
                    for i, output in enumerate(app.stream(inputs)):
                        # Capture intermediate print messages
                        debug_logs = output_buffer.getvalue()
                        debug_placeholder.text_area(
                            "调试日志",
                            debug_logs,
                            height=100,
                            key=f"debug_logs_{i}"
                        )

                        if "generate" in output and "generation" in output["generate"]:
                            # Append new content to streamed response
                            streamed_response += output["generate"]["generation"]
                            # Update the placeholder with the streamed response so far
                            response_placeholder.markdown(f"**助手:** {streamed_response}")

                # Store the final response in session state
                st.session_state.messages.append({"role": "assistant", "content": streamed_response or "未生成响应。"})
        except Exception as e:
            # Handle errors and display in the conversation history
            error_message = f"发生错误：{e}"
            st.session_state.messages.append({"role": "assistant", "content": error_message})
            with st.chat_message("assistant"):
                st.error(error_message)
        finally:
            # Restore stdout to its original state
            sys.stdout = sys.__stdout__

这里是 Streamlit 界面的快照：

以下图像显示了在选择了“简洁”回答风格的情况下，llama-3.3–70b-versatile 生成的答案。查询路由器 (route_question) 调用检索器（向量搜索），评分函数找到所有相关的检索块。因此，决定通过 generate 节点生成答案由 route_after_grading 节点做出。