Fine-tuning LLaMA-3 for Text to SQL Generation with Modal: A Comprehensive Guide
- Rifx.Online
- Programming , Machine Learning , Data Science
- 11 Jan, 2025
Fine-tuning large language models (LLMs) has traditionally been a complex endeavor, requiring significant infrastructure setup and management. However, with Modal’s cloud platform and Axolotl’s fine-tuning framework, you can now fine-tune powerful models like LLaMA-3 directly from your local machine, without dealing with infrastructure complexities.
In this guide, we’ll walk through fine-tuning LLaMA-3 8B for SQL query generation using Modal’s remote GPU capabilities and Axolotl’s state-of-the-art training optimizations.
Modal
Modal is a revolutionary cloud platform that lets you run GPU-intensive workloads without managing any infrastructure. Key benefits include:
- Run code remotely within seconds
- Access to powerful GPUs with a single line of code
- Zero infrastructure configuration — everything is defined in code
- Serverless execution with per-second pricing
- Seamless local development experience
- Built-in primitives for distributed computing
The most compelling feature is Modal’s ability to make remote GPU resources feel local. You can develop and test your fine-tuning pipeline on your laptop, then seamlessly execute it on powerful cloud GPUs without any changes to your code.
Why This Stack?
The Modal + Axolotl stack comes with everything you need for efficient fine-tuning:
- Parameter-Efficient Fine-Tuning (PEFT) via LoRA adapters for faster convergence
- Flash Attention for optimized memory usage during training
- Gradient checkpointing to reduce VRAM footprint
- Distributed training via DeepSpeed for optimal multi-GPU scaling
Best of all, Modal eliminates infrastructure headaches. No need to worry about building images, provisioning GPUs, or managing cloud storage. If your training script runs on Modal, it’s production-ready from day one.
Preparing the Dataset
We will use a NL to SQL dataset for this task as it is a very common use-case these days, the dataset we’d be using is Bird NL2SQL dataset on Huggingface.We need to convert the format of this dataset to Alpaca format, which is personally my preferred format of training data, you could use any other format, you would just need to make some small changes in the modal config file.
Here is the python script used to convert the dataset to Alpaca format:
import json
import re
from typing import List, Dict
from datasets import load_dataset
def extract_components(input_text: str) -> tuple:
"""Extract schema and question from input text."""
schema_match = re.search(r'Here is a database schema:(.*?)Please write', input_text, re.DOTALL)
schema = schema_match.group(1).strip() if schema_match else ""
question_match = re.search(r'following question: (.*?)\[/INST\]', input_text, re.DOTALL)
question = question_match.group(1).strip() if question_match else ""
return schema, question
def convert_to_alpaca_format(dataset) -> List[Dict[str, str]]:
"""Convert the dataset to Alpaca format with instruction/input/output fields."""
alpaca_data = []
for item in dataset:
schema, question = extract_components(item['input'])
alpaca_entry = {
"instruction": "Write a SQL query to answer the question based on the given database schema.",
"input": f"Schema:\n{schema}\n\nQuestion: {question}",
"output": item['output'].strip()
}
alpaca_data.append(alpaca_entry)
return alpaca_data
def save_to_jsonl(data: List[Dict[str, str]], output_file: str):
"""Save the data to a JSONL file."""
with open(output_file, 'w', encoding='utf-8') as f:
for item in data:
json_line = json.dumps(item, ensure_ascii=False)
f.write(json_line + '\n')
## Load your dataset (replace with your actual dataset name/path)
dataset = load_dataset("lamini/bird_text_to_sql")
## Convert to Alpaca format
alpaca_data = convert_to_alpaca_format(dataset['train']) # or whichever split you're using
## Save to JSONL file
save_to_jsonl(alpaca_data, 'bird_text_to_sql_alpaca-2.jsonl')
Here is a sample from how the final data would look like:
{
"instruction": "Write a SQL query to answer the question based on the given database schema.",
"input": "Schema:\n[database schema]\n\nQuestion: [question]",
"output": "[SQL query]"
}
Setting up Modal
Before we start, make sure you have:
- Set up a Modal account:
pip install modal
python -m modal setup
- Created a HuggingFace secret in your Modal workspace (get your HF_TOKEN from HuggingFace settings > API tokens)
- Clone the repository:
git clone https://github.com/modal-labs/llm-finetuning.git
cd llm-finetuning
Project Structure
The repository provides everything needed for training and inference:
- Training script for cloud-based fine-tuning
- Inference engine for testing your results
- Configuration files for different models and tasks
- Sample datasets and data processing utilities
Modal’s built-in storage system helps manage data across functions:
/pretrained
volume: Stores pretrained models (loaded once)/runs
volume: Stores configs, datasets, and results for each training run
Configuration for LLaMA-3
You can find configuration files for a bunch of LLMs in this repository like Mistral, CodeLlama, Mixtral etc. but we will use Llama-3 for this example. After making some adjustments to the existing llama-3-config.yml, this is our config file:
###
## Model Configuration: LLaMA-3 8B
###
base_model: NousResearch/Meta-Llama-3-8B
sequence_len: 4096
## base model weight quantization
load_in_8bit: false
load_in_4bit: true
quantization_config:
load_in_4bit: true
bnb_4bit_compute_dtype: "bfloat16"
bnb_4bit_use_double_quant: true
bnb_4bit_quant_type: "nf4"
## attention implementation
flash_attention: true
## finetuned adapter config
adapter: lora
lora_model_dir:
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save: # required when adding new tokens to LLaMA/Mistral
- embed_tokens
- lm_head
## for details, see https://github.com/huggingface/peft/issues/334#issuecomment-1561727994
###
## Dataset Configuration: sqlqa
###
datasets:
# This will be the path used for the data when it is saved to the Volume in the cloud.
- path: bird_text_to_sql_alpaca.jsonl
ds_type: json
type: alpaca
## dataset formatting config
tokens: # add new control tokens from the dataset to the model
- "[INST]"
- " [/INST]"
- "[SQL]"
- " [/SQL]"
special_tokens:
pad_token: <|end_of_text|>
val_set_size: 0.05
###
## Training Configuration
###
## random seed for better reproducibility
seed: 117
## optimizer config
optimizer: adamw_bnb_8bit
learning_rate: 0.0001
lr_scheduler: cosine
num_epochs: 4
micro_batch_size: 2
gradient_accumulation_steps: 16
warmup_steps: 10
## axolotl saving config
dataset_prepared_path: last_run_prepared
output_dir: ./lora-out
## logging and eval config
logging_steps: 1
eval_steps: 0.05
## training performance optimization config
bf16: true
fp16: false
tf32: false
train_on_inputs: false
group_by_length: false
gradient_checkpointing: true
deepspeed: "/workspace/axolotl/deepspeed_configs/zero3.json"
###
## Miscellaneous Configuration
###
## when true, prevents over-writing the config from the CLI
strict: false
## Memory optimizations
gradient_checkpointing: true
use_gradient_checkpointing: true
## "Don't mess with this, it's here for accelerate and torchrun" -- axolotl docs
local_rank:
Training Process
Modal’s training script has three key functions:
launch
: Prepares a new run folder with config and datatrain
: Executes the training jobmerge
: Combines the trained adapter with the base model
To start training:
modal run --detach src.train --config=llama-3-config.yml --data=bird_text_to_sql_alpaca.jsonl
The --detach
flag lets training continue even if your local connection drops. The training run folder name will be in the command output (e.g. axo-2024-01-04-09-19-02-92bb
). You can check if your fine-tuned model is stored properly in this folder using modal volume
ls.
This will trigger a training job and you’ll get a link to track the training process on Modal dashboard, or you can go to your modal dashboard and check for the relevant app logs.
Testing Your Fine-tuned Model
Once training completes, test your model using Modal’s inference engine:
modal run -q src.inference --run-name axo-2025-01-04-09-19-02-92bb
The inference engine uses vLLM for up to 24x faster generation compared to standard implementations.
Advanced Features
Multi-GPU Training
For larger models or datasets, enable multi-GPU training by adding DeepSpeed configuration:
- In
llama-3-config.yml
:
deepspeed: "/workspace/axolotl/deepspeed_configs/zero3.json"
- In your training script:
N_GPUS = int(os.environ.get("N_GPUS", 2))
GPU_CONFIG = modal.gpu.A100(count=N_GPUS, size="80GB")
Weights & Biases Integration
To track training metrics:
- Create a W&B secret in your Modal dashboard
- Add to your app configuration:
app = modal.App(
"example-axolotl",
secrets=[
modal.Secret.from_name("huggingface"),
modal.Secret.from_name("my-wandb-secret"),
],
)
- Update
llama-3-config.yml
:
wandb_project: llama3-sql
wandb_watch: gradients
Conclusion
Modal and Axolotl make fine-tuning LLaMA-3 remarkably straightforward. You get:
- State-of-the-art training optimizations out of the box
- Zero infrastructure management
- Serverless scaling and pay-per-use pricing
- Easy deployment options
The entire process runs through simple Python code — no YAML configurations, no infrastructure setup, just pure functionality. Once trained, you can easily deploy your model using Modal’s web endpoint feature for production use.
Feel free to reach out to me on LinkedIn if you have any questions or suggestions!
Primastat
If you wish to implement LLM applications in your company and are looking for professionals to build complex systems, look no further!
At Primastat, we aim to provide high quality data-driven AI solutions ranging from fine-tuning LLMs to Agentic AI applications. We cater to a range of sectors including Marketing, Healthcare, Legal and Fin-tech.
Drop us a mail at connect@primastat.in or reach out to us on our social media handles: