AI Agents Simplified: How AI Agents Answer Questions Using Domain Knowledge
Demystify how enterprise AI agents are tailored to answer client questions using domain knowledge and strictly confine answers to domain knowledge.
Background
Have you ever wondered what powers the cutting-edge AI agents on today’s most advanced platforms? For example, in the screenshot above, I ask the AI agent on wealthsimple.com a question on interest rate, it provides an answer specific to Wealthsimple’s products and limits it to Wealthsimple products only. It did not tell me Bank of America’s cash account interest rate.
How does it work? You might say well it is simple, LLM (large language model)! That’s right, but there is a bit more than just LLM. The powerful AI agents, at least the good ones, are powered by large language models so they are capable of language understanding and can carry out support sessions in a conversational format, just like ChatGPT. BUT, a big but, they also have two exceptional features beyond a general purpose ChatGPT:
- AI agents search for domain knowledge base and provide up-to-date data to answer your questions. The domain knowledge base can be private enterprise data that is not searchable on the web. For example, you may ask your healthcare provider about specific coverage details under your account, which may be embedded in some benefit contract in PDF documents, and is not searchable on the web. Compared to ChatGPT, whose knowledge base is generic and may be outdated. For example, using a free account on chatgpt.com, the model itself is the newer model (ChatGPT-4), but the knowledge base was last refreshed in January 2023, which is 2 years ago from the time of this article in January 2025.
2. AI agents restrict answers to domain knowledge base only. This means they will only answer questions that are relevant to their services or products, but will not provide responses based on generic knowledge base. For example, when I ask the Wealthsimple AI agent how much a Tesla Model 3 costs in Canada, which is publicly available information and should also be part of generic knowledge base, the AI agent says this information is outside the scope of Wealthsimple’s products and services. This is perfect because Wealthsimple should not answer such questions, for business reasons and obviously also legal reasons.
Goal
In this article, let’s discuss how the above features can be achieved. The goal is to understand how such AI agents work from a technology perspective at the high-level.
Disclaimer: Please note the screenshots from Wealthsimple AI agents are for demonstration purpose only. This article does not necessarily cover how exactly Wealthsimple AI agents work and should not be considered as a Wealthsimple engineering blog.
In order to power the AI agents, we need at least 3 major components:
- Understanding user questions. This can be powered by a large language model like OpenAI’s GPT-4o. This allows us to understand the user’s question and extract what the user intend to ask about. For example, in the screenshot above where I asked “What is the interest rate for a cash account”, my intent should be something like “interest rate for cash account”. This is the “user intent”, which will be used later.
- Searchable domain knowledge. Domain knowledge base may be internal enterprise documentations and policies, or public information that is on the company’s own website (so we know it is truth but not spam). We need this knowledge base to be searchable. If you have any database background, you may already be familiar with the term “indexing”, where data is indexed and stored on disk to power search queries. In order to make the domain knowledge searchable, a similar approach can be taken to create an index on the domain knowledge. This index could support key word search to support looking up data by keywords and their spelling variations, vector search to support looking up data by similarity, or a hybrid of both. There are many ways to create the index. One approach is to store internal documents in a storage like Azure Blob Storage or AWS S3, and build search index on the storage folder. Another approach is to create content vector for each searchable content and store the vector in a vector database. A content vector is simply an array of floats that represent the content in math format, so you can search contents by similarity. For example, OpenAI offers text-embedding-ada-002 embedding model that will calculate the vector given the content. We won’t go into implementation details on indexing, but if you are interested, you can read my article Top 3 Strategies to Search Your Data published on Towards Data Science.
- Preparing response based on data found in domain knowledge base. If relevant information is found, prepare a response in a conversational format. If no relevant information is found in the domain knowledge base, say I don’t know or say I can not answer this question. Note the large language model itself may know the answer, but we need to instruct it to NOT prepare response based on its knowledge because we do not want to support that.
Let’s start with a high-level workflow of one conversation round, just to understand how data flows end-to-end.
In the above diagram, the conversation starts with the user asking a question. Let’s take a look at each individual step.
- User sending a question. This is done on the UX end of the AI agent, which is often the website or mobile application. The question may be “what is the interest rate for a cash account”, which is in natural language format.
2. The UX receives the question and calls the backend, which is responsible for all the business logic related to searching data, organizing data, and preparing a response.
3. Extract user intent, so that we know what terms to use as search query. The user question is too verbose, as it has words like “what”, “is”, “for”, and these words may not be required to retrieve relevant information from domain knowledge base. How do we translate the user question in natural language into a search query? This is done with prompt engineering. For example, we provide the following prompt and the user question to the LLM, and extract the user intent into a search query. It is worth noting that prompt engineering is more of an art than a science. It often requires multiple iteration to craft a successful prompt for a specific LLM model. The following prompt is a good starting point.
private const string questionToQueryPrompt = "
Below is a chat history and a new question asked by the user.
The new user question needs to be answered by searching in a knowledge base.
Generate a search query based on the conversation and the new question.
#Requirements#
- If the question is not in English, translate the question to English before generating the search query.
#Chat history#
{{$chat_history}}
#Question#
{{$question}}
";
With the prompt, the user question “what is the interest rate for a cash account” may be translated into a search query like “interest rate for cash accounts”.
4. The search query is routed to a search service. A search service is a separate component from the LLM model, and it’s sole purpose is to provide the capability to search enterprise data that makes up the domain knowledge. The search service should be able to index enterprise data that is stored somewhere in the cloud (AWS S3, Azure Blob Storage) or on premise, this makes the enterprise data searchable. The search service should also support different approaches to search, such as key word search and vector search. Amazon Web Services (AWS) offers Amazon Kendra, and Microsoft offers Azure AI Search, both are cognitive search services for this purpose. The quality of search service directly impact the relevance quality of the AI agent responses, so I personally think this is a critical component that makes an AI agent exceptional vs plain boring.
5. The search service performs the search to query the index that contains the enterprise data (internal documents). The search results may contain documents that are most relevant to the search query. For example, the search result may look like the following, which is raw text from a document.
6. The search result is returned to the LLM so the LLM can use it as data source to prepare a response. The raw text document is a citation, but is not what the user is expecting as a response, because the user is expecting a direct answer instead of a full citation. User does not want to read through the full document to figure out the interest rate. They are expect a few numbers in percentages.
7. LLM produce a response. LLM has to rely on the search result returned from the search service as data source, and only use that to generate an answer. If the search result is empty, the LLM model should be instructed to say something like “I don’t know” instead of trying to generate an answer based on the LLM model’s generic base knowledge. How do we achieve this? Well, again, through prompt engineering. For example, the following prompt instruct the LLM to take the data source and generate a proper response based the facts provided in sources. Note the specific requirement “Do not generate answers that don’t use the sources below” will instruct the LLM model to not generate answer based on its own generic knowledge base, which is the dataset that was used to train the model.
private const string answerPrompt = "
You are a professional chat assistant.
Your role is to help users with their questions about our products.
You will be given the chat history and facts.
Your task is to generate an answer.
- Answer ONLY with the facts listed in the sources provided below.
- Do not generate answers that don't use the sources below.
- If there isn't enough information in the sources, say you don't know.
#Sources#
{{$sources}}
#Chat history#
{{$chat_history}}
";
This raw text search result from step 5 above and this prompt is provided to the LLM, which then generates the answer may look like:
The interest rate for a Wealthsimple Cash account
varies depending on your client status and qualifying deposits.
As of December 18, 2024, the standard rate is 2.25% for all clients.
Premium clients or Core clients with eligible direct deposits
can earn 2.75%. Generation clients or Premium clients with eligible
direct deposits can earn up to 3.25%.
These rates are annualized, calculated daily,
and paid monthly. Keep in mind that rates are subject to change.
8. UX receives the response from the backend. UX may hydrate it a bit more to make it look nicer on the device.
9. The user receives the response. Hooray!!
The above pattern is actually known as Retrieval Augmented Generation (RAG), which is an architecture that leverages an information retrieval system (like a search service) to provide grounding data, and augments the capabilities of a Large Language Model (LLM) like ChatGPT. For more information on RAG, see Microsoft documentation Retrieval Augmented Generation (RAG) in Azure AI Search, or AWS documentation What is Retrieval-Augmented Generation.
Easter egg — How do you know if you are chatting with a real person or a bot?
Thank you reading with me! Hope you enjoyed this article!! If you like this article, some relevant articles that may be of interest to you:
- Towards Data Science: Top 3 Strategies to Search Your Data