Build your own personalized Fitness RAG Agent using Python!

AI Agent | RAG Agent | Python | DSPy | Fitness Agent | Beginner-Friendly

A complete and beginner-friendly guide to building your fully personalized Fitness RAG Agent with Python.

Not a member?Feel free to access the full article here.

Whether you are a developer or an athlete, fitness has always been at the top of our New Year’s resolutions list.

Losing this much weight, gaining so and so muscle mass, and etc. In this ever-raging lifestyle, keeping up with our body’s fitness requirements is becoming more tough.

And something, that can help in at least some of the endeavours of your being fit journey, is a personalized Fitness Agent, someone who is an expert in the domain and is always there for you, also is as vigilant on your health as you.

By the end of this article, you’ll have a Fitness RAG Agent built using Python. The creation of a Retrieval Augmented Generation (RAG) Agent is very thoughtful as it would help us combine the power of both information retrieval and text generation to deliver accurate, personalized, context-aware answers to all your queries.

Excited? Let’s get the keyboard tapping!

Here’s the TOC of the whole article, for your quick perusal.

∘ AI Agent | RAG Agent | Python | DSPy | Fitness Agent | Beginner-Friendly· Prerequisites 🔮· Installing the Packages📦· Collecting Data 📚· Extracting Text from the Book 📙· Splitting the text into documents📖· Compiling the data 💽· Creating a Retrieval Model 📳· Configuring the Language Model and the Retrieval Model ⚙️· Creating the signature for the Agent 🗞️· Creating the RAG Module🤖· Querying the Agent⁉️· Results from the Agent 📜· Complete Code Base 💨· Conclusion 🧩· Today’s Inspiration 🌠· Author’s Note ✒️

Prerequisites 🔮

One of the most important things to do before we actually get into developing the fitness agent is to focus on the major pre-requisites of this project

The project involves configuring the Language Model, which in our case would be gemini-1.5-flash. In order to use this language model, you will need your Google API key to query the model, and you can get one for yourself from here.
This article focuses on developing a specialised RAG Agent, so in order to make things more comprehensible, it is advised to go through this article first, as it would clear most of your basic queries and would also give you a head start on this concept.

Yup! That’s it, no more things to prepare and let’s jump straight into development.🧑🏻‍💻

Installing the Packages📦

To facilitate the development of our Agent, we will make use of some PPP (Popular-Python-Packages) which are PyPDF2, langchain, dspy==0.1.5, dspy-ai[faiss-cpu].

And installing these packages is even easier than applying butter to bread, and the way around is, —

// For Notebook
!pip install PyPDF2 langchain dspy==0.1.5 dspy-ai[faiss-cpu]

// For Terminal
pip install PyPDF2 langchain dspy==0.1.5 dspy-ai[faiss-cpu]

Also, as we move further into the development of our Fitness AI RAG Agent, we will get to know more about these packages.

Collecting Data 📚

Collecting Data, is a significant step that will majorly define the accuracy and relevance of the responses generated by our agent.

For the sake of simplicity and demonstration, we will proceed with a single book but you can definitely collect more data, for the reference of our agent.

We will use one of the most famous books of all time on fitness which is, —

“You Are Your Own Gym: The Bible of Bodyweight Exercises for Men and Women”

By- Mark Lauren & Joshua Clark

You can also download the book from here.

Extracting Text from the Book 📙

As the data that we are referring to is a book, and we know that a book contains a lot more than just plain text, we also know that the agent will only be able to search through the text for its reference.

And as we know this much, we should also know how to extract text from the book. Now, the package PyPDF2 would help us, achieve this job.

The code to extract the text from the book is, —

from PyPDF2 import PdfReader

reader = PdfReader("/content/You_Are_Your_Own_Gym.pdf")
complete_text = ""

for page in reader.pages:
  text = page.extract_text()
  complete_text += text

This code will extract text from the book page-wise and will store the content in the complete_text variable.

💡TRY YOURSELF: Instead of storing the whole text in a variable, try writing the content to a file.

Splitting the text into documents📖

After extracting all the text from the book (PDF file), we have to split it into chunks, which can be stored in the vectors database. These chunks are also known as documents.

The main purpose of creating these documents is to have an overlap between content, which will be useful for the language agent to combine the same content together, and eventually learn better.

We will be using RecursiveCharacterTextSplitter(), from langchain.text_splitter module, for achieving this goal.

The code for the same is given below, —

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    # Set a chunk size, to splitting the documents.
    chunk_size=1024,
    chunk_overlap=100,
    length_function=len,
    is_separator_regex=False,
)

## Contains all the chunked documents of the complete_text_book
texts = text_splitter.create_documents([complete_text])

Here, texts contain all the documents of the specified chunk size, following the chunk overlap length.

💡TRY YOURSELF: Do experiment with the values of chunk_size, and chunk_overlap and notice the change in the response from the agent.

Compiling the data 💽

The texts, that we created earlier contains the list of all the documents, and each document in itself contains a lot of information, including page content, metadata, etc.

Now we have to collect all the page content that is present in each document and prepare a list of it, which will serve the purpose of data for our retrieval model.

The code for preparing the list of all the page_content present in each document is given below, —

## An array containing the page_content of each and every document
page_contents = [text.page_content for text in texts]

Creating a Retrieval Model 📳

A retrieval model is the one that will retrieve the most relevant information from the whole database. The information retrieved from the database will form the base for the reference content, that will be used by our agent while giving the response.

Using dspy-ai[faiss-cpu] package, we are just a line away from creating our own retrieval model, and the code is given below, —

from dspy.retrieve.faiss_rm import FaissRM
frm = FaissRM(page_contents)

Here, frm is our Retrieval Model that uses FAISS which offers an efficient approach to searching and clustering dense vectors or the vector embeddings of the data.

Configuring the Language Model and the Retrieval Model ⚙️

After so much build-up, now we are at the stage of configuring our Language Model and the Retrieval Model.

We will use dspy package to do this, —

import dspy

## The gemini LM for our project
gemini = dspy.Google(model='gemini-1.5-flash', api_key="<YOUR_GOOGLE_API_KEY>", temperature=0.3)

## Configuring the dspy with the LM and RM
dspy.settings.configure(lm=gemini, rm=frm)

We are using, gemini-1.5-flash model as our language model. Here you have to replace <YOUR_GOOGLE_API_KEY> with your API Key, and you would be good to go!

💡TRY YOURSELF: Experiment changing the value of temperature and monitor the quality of responses generated by the model.

Creating the signature for the Agent 🗞️

When we use DSPy, we have to give instructions to the RAG agent, which mention how the agent should respond to all the queries that come to it, and how to use the information that it has.

With the use of DSPy, we can pass those instructions in simple language, in the form of a DSPy signature. Based on those instructions, DSPy tailors and optimizes the language model to fit the use case.

The basic signature for our RAG agent would be, —

class GenerateAnswer(dspy.Signature):
    """You are a highly knowledgeable and empathetic virtual fitness coach, specializing in creating personalized fitness plans and providing guidance to users of all fitness levels. Your goal is to support, educate, and motivate users to achieve their fitness and health goals safely and sustainably. Respond to queries in a friendly, professional, and encouraging tone, ensuring your advice is actionable, evidence-based, and tailored to the user's needs.

In your responses, ensure the following principles:

Empathy and Encouragement: Always maintain a positive, non-judgmental tone, and encourage users, regardless of their fitness level or challenges they face.
Example: ‘It’s great that you’re taking the first step! Let’s work together to create a plan that fits your schedule.’

Personalization: Use information provided by the user, such as their fitness goals, current activity level, dietary preferences, and any limitations, to tailor responses.
Example: For a beginner looking to lose weight, suggest manageable workouts and meal ideas they can follow.

Evidence-Based Advice: Provide recommendations based on scientific evidence, explaining the reasoning behind your advice in simple terms.
Example: Explain why strength training complements weight loss goals or the role of hydration in recovery.

Safety First: Emphasize proper form, gradual progression, and injury prevention in all recommendations. If a user reports pain or discomfort, suggest consulting a healthcare professional.
Example: ‘If you’re new to running, start with a mix of walking and jogging to build endurance gradually.’

Actionable Steps: Break down advice into simple, actionable steps or routines users can follow easily.
Example: Provide a beginner’s workout plan with clear sets, reps, and rest intervals.

Educational Insights: Share educational tips about fitness, nutrition, or wellness in a way that is easy to understand and implement.
Example: Explain the importance of macronutrients in muscle building or the benefits of dynamic stretching before workouts.

Versatile Communication: Respond effectively to a variety of user needs, such as:

Workout Guidance: Suggest specific exercises for goals like weight loss, muscle gain, or flexibility.

Diet and Nutrition: Provide meal ideas, portion guidance, and insights into balanced eating.

Motivation and Mindset: Offer motivational support to help users stay consistent and positive.

       If you dont have context matching to the query of the user, you can politely state that the query can't be well answered.
       Also if the query is not related to the fitness coach, you can politely refuse to answer, stating your purpose.
    """

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField(desc="The query of the user")
    answer = dspy.OutputField()

Here all the text which is written within """ """ are instructions. When creating a signature, we are also required to pass in the information about the input that is being processed and the output, which is the response from the agent.

Yup, I know the prompt is quite big, but the bigger and more detailed the instructions for signature are, the more aligned the response we will get.

💡TRY YOURSELF: Experiment adding more instruction or changing it completely and monitor the responses generated by the agent.

Creating the RAG Module🤖

As we are finished with writing the signature for our RAG agent. Now we have to declare a class, whose object will be used to query our agent.

The code for creating the module is given below, —

 class RAG(dspy.Module):
    def __init__(self, num_passages=5):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

Here, we have inherited dspy.Module in our RAG class to make use of important methods such as dspy.ChainOfThought, and dspy.Prediction.

In this, dspy.Retrieve, calls the configured RM (which is frm in our case), and retrieves the top 5 results that match the most with the query of the user.

Then the result of the RM is passed to the agent along with the question to generate the answer. The dspy.ChainOfThought module has the main purpose of breaking down the question (or query) into smaller parts as per the instructions mentioned in the signature or instruction of the Agent.

Querying the Agent⁉️

Yay! 🥳 we are ready to query the agent!

We have stood the test of keyboard, to build our awesome agent. And now is the time to actually query or test our agent for our specific fitness needs.

To query the agent, we have to create an object of the RAG module, and then pass the query as an argument to the object. The code to do is, —

r = RAG()
response = r("I am a software developer, and have to sit for around 5-6 hr daily. Suggest something to me")
response.answer

After running the kernel, the output of the agent is, —

Also, I have converted the markdown to rich_text, using Python’s rich_text.

Results from the Agent 📜

Some more results from our agent.

Query 1: I am 5’10’’ and around 80Kg. I can devote around 15mins daily, how should I proceed?

(This is not me🫣, I am more fit!)

Query 2: I feel drowsy during daytime, and after meals. I want to get fit but feel demotivated to exercise.

Complete Code Base 💨

The complete code base of the project is given below, —

from PyPDF2 import PdfReader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from dspy.retrieve.faiss_rm import FaissRM
import dspy
from rich import markdown

reader = PdfReader("/content/You_Are_Your_Own_Gym.pdf")
complete_text = ""

for page in reader.pages:
  text = page.extract_text()
  complete_text += text

text_splitter = RecursiveCharacterTextSplitter(
    # Set a chunk size, to splitting the documents.
    chunk_size=1024,
    chunk_overlap=100,
    length_function=len,
    is_separator_regex=False,
)

## Contains all the chunked documents of the complete_text_book
texts = text_splitter.create_documents([complete_text])

## An array containing the page_content of each and every document
page_contents = [text.page_content for text in texts]

frm = FaissRM(page_contents)

## The gemini LM for our project
gemini = dspy.Google(model='gemini-1.5-flash', api_key="<YOUR_API_KEY>", temperature=0.3)

## Configuring the dspy with the LM and RM
dspy.settings.configure(lm=gemini, rm=frm)

## Signature of the Agent
class GenerateAnswer(dspy.Signature):
    """You are a highly knowledgeable and empathetic virtual fitness coach, specializing in creating personalized fitness plans and providing guidance to users of all fitness levels. Your goal is to support, educate, and motivate users to achieve their fitness and health goals safely and sustainably. Respond to queries in a friendly, professional, and encouraging tone, ensuring your advice is actionable, evidence-based, and tailored to the user's needs.

In your responses, ensure the following principles:

Empathy and Encouragement: Always maintain a positive, non-judgmental tone, and encourage users, regardless of their fitness level or challenges they face.
Example: ‘It’s great that you’re taking the first step! Let’s work together to create a plan that fits your schedule.’

Personalization: Use information provided by the user, such as their fitness goals, current activity level, dietary preferences, and any limitations, to tailor responses.
Example: For a beginner looking to lose weight, suggest manageable workouts and meal ideas they can follow.

Evidence-Based Advice: Provide recommendations based on scientific evidence, explaining the reasoning behind your advice in simple terms.
Example: Explain why strength training complements weight loss goals or the role of hydration in recovery.

Safety First: Emphasize proper form, gradual progression, and injury prevention in all recommendations. If a user reports pain or discomfort, suggest consulting a healthcare professional.
Example: ‘If you’re new to running, start with a mix of walking and jogging to build endurance gradually.’

Actionable Steps: Break down advice into simple, actionable steps or routines users can follow easily.
Example: Provide a beginner’s workout plan with clear sets, reps, and rest intervals.

Educational Insights: Share educational tips about fitness, nutrition, or wellness in a way that is easy to understand and implement.
Example: Explain the importance of macronutrients in muscle building or the benefits of dynamic stretching before workouts.

Versatile Communication: Respond effectively to a variety of user needs, such as:

Workout Guidance: Suggest specific exercises for goals like weight loss, muscle gain, or flexibility.

Diet and Nutrition: Provide meal ideas, portion guidance, and insights into balanced eating.

Motivation and Mindset: Offer motivational support to help users stay consistent and positive.

       If you dont have context matching to the query of the user, you can politely state that the query can't be well answered.
       Also if the query is not related to the fitness coach, you can politely refuse to answer, stating your purpose.
    """

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField(desc="The query of the user")
    answer = dspy.OutputField()

## RAG Module
 class RAG(dspy.Module):
    def __init__(self, num_passages=5):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

## Instantiating the RAG Agent
r = RAG()

## Querying the Agent
response = r("I am a software developer, and have to sit for around 5-6 hr daily. Suggest something to me")

## Converting the markdown to rich text
b_res = markdown.Markdown(response.answer)
b_res

Conclusion 🧩

First of all, CONGRATS!🥳🎉You built an awesome Fitness agent using Python and DSPy.

And yeah, that’s all for this article!

A brief about all the things discussed in this article are,-

Learned about the DSPy module, which is used to give high-level instructions to the language and retrieval model.
Extracted the text from the book and then converted it into documents and chunks for storing in the database.
Created a FAISS retrieval model, which is a library created to efficiently search and cluster dense vectors.
Created a signature for your Fitness agent, giving it instructions about how to use the available information and respond to the user query.
Created a RAG Module, to especially navigate all the processes and their results.
Queryed to our Fitness RAG agent, and analysed the results.

And, I think that’s more than enough for a single day.

Today’s Inspiration 🌠

Thousands of candles can be lighted from a single candle, and the life of the candle will not be shortened. Happiness never decreases by being shared.By — Buddha

Author’s Note ✒️

Thank you for going through this article. If you have any questions or advice, please feel free to post them in the comments section. I truly admire feedback, and you can subscribe here, to get all such interesting and informational articles straight into your inbox.