Fine-tuning LLaMA-3 for Text to SQL Generation with Modal: A Comprehensive Guide

Rifx.Online
Programming , Machine Learning , Data Science
11 Jan, 2025

Fine-tuning large language models (LLMs) has traditionally been a complex endeavor, requiring significant infrastructure setup and management. However, with Modal’s cloud platform and Axolotl’s fine-tuning framework, you can now fine-tune powerful models like LLaMA-3 directly from your local machine, without dealing with infrastructure complexities.

In this guide, we’ll walk through fine-tuning LLaMA-3 8B for SQL query generation using Modal’s remote GPU capabilities and Axolotl’s state-of-the-art training optimizations.

Modal is a revolutionary cloud platform that lets you run GPU-intensive workloads without managing any infrastructure. Key benefits include:

Run code remotely within seconds
Access to powerful GPUs with a single line of code
Zero infrastructure configuration — everything is defined in code
Serverless execution with per-second pricing
Seamless local development experience
Built-in primitives for distributed computing

The most compelling feature is Modal’s ability to make remote GPU resources feel local. You can develop and test your fine-tuning pipeline on your laptop, then seamlessly execute it on powerful cloud GPUs without any changes to your code.

Why This Stack?

The Modal + Axolotl stack comes with everything you need for efficient fine-tuning:

Parameter-Efficient Fine-Tuning (PEFT) via LoRA adapters for faster convergence
Flash Attention for optimized memory usage during training
Gradient checkpointing to reduce VRAM footprint
Distributed training via DeepSpeed for optimal multi-GPU scaling

Best of all, Modal eliminates infrastructure headaches. No need to worry about building images, provisioning GPUs, or managing cloud storage. If your training script runs on Modal, it’s production-ready from day one.

Preparing the Dataset

We will use a NL to SQL dataset for this task as it is a very common use-case these days, the dataset we’d be using is Bird NL2SQL dataset on Huggingface.We need to convert the format of this dataset to Alpaca format, which is personally my preferred format of training data, you could use any other format, you would just need to make some small changes in the modal config file.

Here is the python script used to convert the dataset to Alpaca format:

import json
import re
from typing import List, Dict
from datasets import load_dataset

def extract_components(input_text: str) -> tuple:
    """Extract schema and question from input text."""
    schema_match = re.search(r'Here is a database schema:(.*?)Please write', input_text, re.DOTALL)
    schema = schema_match.group(1).strip() if schema_match else ""
    
    question_match = re.search(r'following question: (.*?)\[/INST\]', input_text, re.DOTALL)
    question = question_match.group(1).strip() if question_match else ""
    
    return schema, question

def convert_to_alpaca_format(dataset) -> List[Dict[str, str]]:
    """Convert the dataset to Alpaca format with instruction/input/output fields."""
    alpaca_data = []
    
    for item in dataset:
        schema, question = extract_components(item['input'])
        
        alpaca_entry = {
            "instruction": "Write a SQL query to answer the question based on the given database schema.",
            "input": f"Schema:\n{schema}\n\nQuestion: {question}",
            "output": item['output'].strip()
        }
        
        alpaca_data.append(alpaca_entry)
    
    return alpaca_data

def save_to_jsonl(data: List[Dict[str, str]], output_file: str):
    """Save the data to a JSONL file."""
    with open(output_file, 'w', encoding='utf-8') as f:
        for item in data:
            json_line = json.dumps(item, ensure_ascii=False)
            f.write(json_line + '\n')

## Load your dataset (replace with your actual dataset name/path)
dataset = load_dataset("lamini/bird_text_to_sql")

## Convert to Alpaca format
alpaca_data = convert_to_alpaca_format(dataset['train'])  # or whichever split you're using

## Save to JSONL file
save_to_jsonl(alpaca_data, 'bird_text_to_sql_alpaca-2.jsonl')

Here is a sample from how the final data would look like:

{
  "instruction": "Write a SQL query to answer the question based on the given database schema.",
  "input": "Schema:\n[database schema]\n\nQuestion: [question]",
  "output": "[SQL query]"
}

Before we start, make sure you have:

Set up a Modal account:

pip install modal
python -m modal setup

Created a HuggingFace secret in your Modal workspace (get your HF_TOKEN from HuggingFace settings > API tokens)
Clone the repository:

git clone https://github.com/modal-labs/llm-finetuning.git
cd llm-finetuning

Project Structure

The repository provides everything needed for training and inference:

Training script for cloud-based fine-tuning
Inference engine for testing your results
Configuration files for different models and tasks
Sample datasets and data processing utilities

Modal’s built-in storage system helps manage data across functions:

/pretrained volume: Stores pretrained models (loaded once)
/runs volume: Stores configs, datasets, and results for each training run

Configuration for LLaMA-3

You can find configuration files for a bunch of LLMs in this repository like Mistral, CodeLlama, Mixtral etc. but we will use Llama-3 for this example. After making some adjustments to the existing llama-3-config.yml, this is our config file:

###
## Model Configuration: LLaMA-3 8B
###

base_model: NousResearch/Meta-Llama-3-8B
sequence_len: 4096

## base model weight quantization
load_in_8bit: false
load_in_4bit: true
quantization_config:
  load_in_4bit: true
  bnb_4bit_compute_dtype: "bfloat16"
  bnb_4bit_use_double_quant: true
  bnb_4bit_quant_type: "nf4"
## attention implementation
flash_attention: true

## finetuned adapter config
adapter: lora
lora_model_dir:
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save: # required when adding new tokens to LLaMA/Mistral
  - embed_tokens
  - lm_head
## for details, see https://github.com/huggingface/peft/issues/334#issuecomment-1561727994

###
## Dataset Configuration: sqlqa
###

datasets:
  # This will be the path used for the data when it is saved to the Volume in the cloud.
  - path: bird_text_to_sql_alpaca.jsonl
    ds_type: json
    type: alpaca
    
## dataset formatting config
tokens: # add new control tokens from the dataset to the model
  - "[INST]"
  - " [/INST]"
  - "[SQL]"
  - " [/SQL]"

special_tokens:
  pad_token: <|end_of_text|>

val_set_size: 0.05

###
## Training Configuration
###

## random seed for better reproducibility
seed: 117

## optimizer config
optimizer: adamw_bnb_8bit
learning_rate: 0.0001
lr_scheduler: cosine
num_epochs: 4
micro_batch_size: 2
gradient_accumulation_steps: 16
warmup_steps: 10

## axolotl saving config
dataset_prepared_path: last_run_prepared
output_dir: ./lora-out

## logging and eval config
logging_steps: 1
eval_steps: 0.05

## training performance optimization config
bf16: true
fp16: false
tf32: false
train_on_inputs: false
group_by_length: false
gradient_checkpointing: true
deepspeed: "/workspace/axolotl/deepspeed_configs/zero3.json"

###
## Miscellaneous Configuration
###

## when true, prevents over-writing the config from the CLI
strict: false

## Memory optimizations
gradient_checkpointing: true
use_gradient_checkpointing: true 

## "Don't mess with this, it's here for accelerate and torchrun" -- axolotl docs
local_rank:

Training Process

Modal’s training script has three key functions:

launch: Prepares a new run folder with config and data
train: Executes the training job
merge: Combines the trained adapter with the base model

To start training:

modal run --detach src.train --config=llama-3-config.yml --data=bird_text_to_sql_alpaca.jsonl

The --detach flag lets training continue even if your local connection drops. The training run folder name will be in the command output (e.g. axo-2024-01-04-09-19-02-92bb). You can check if your fine-tuned model is stored properly in this folder using modal volume ls.

This will trigger a training job and you’ll get a link to track the training process on Modal dashboard, or you can go to your modal dashboard and check for the relevant app logs.

Testing Your Fine-tuned Model

Once training completes, test your model using Modal’s inference engine:

modal run -q src.inference --run-name axo-2025-01-04-09-19-02-92bb

The inference engine uses vLLM for up to 24x faster generation compared to standard implementations.

Advanced Features

Multi-GPU Training

For larger models or datasets, enable multi-GPU training by adding DeepSpeed configuration:

In llama-3-config.yml:

deepspeed: "/workspace/axolotl/deepspeed_configs/zero3.json"

In your training script:

N_GPUS = int(os.environ.get("N_GPUS", 2))
GPU_CONFIG = modal.gpu.A100(count=N_GPUS, size="80GB")

Weights & Biases Integration

To track training metrics:

Create a W&B secret in your Modal dashboard
Add to your app configuration:

app = modal.App(
    "example-axolotl",
    secrets=[
        modal.Secret.from_name("huggingface"),
        modal.Secret.from_name("my-wandb-secret"),
    ],
)

Update llama-3-config.yml:

wandb_project: llama3-sql
wandb_watch: gradients

Conclusion

Modal and Axolotl make fine-tuning LLaMA-3 remarkably straightforward. You get:

State-of-the-art training optimizations out of the box
Zero infrastructure management
Serverless scaling and pay-per-use pricing
Easy deployment options

The entire process runs through simple Python code — no YAML configurations, no infrastructure setup, just pure functionality. Once trained, you can easily deploy your model using Modal’s web endpoint feature for production use.

Feel free to reach out to me on LinkedIn if you have any questions or suggestions!

Primastat

If you wish to implement LLM applications in your company and are looking for professionals to build complex systems, look no further!

At Primastat, we aim to provide high quality data-driven AI solutions ranging from fine-tuning LLMs to Agentic AI applications. We cater to a range of sectors including Marketing, Healthcare, Legal and Fin-tech.

Drop us a mail at connect@primastat.in or reach out to us on our social media handles:

Fine-tuning LLaMA-3 for Text to SQL Generation with Modal: A Comprehensive Guide

Why This Stack?

Preparing the Dataset

Project Structure

Configuration for LLaMA-3

Training Process

Testing Your Fine-tuned Model

Advanced Features

Multi-GPU Training

Weights & Biases Integration

Conclusion

Primastat

References

Tags :

Share :

Related Posts

10 Creative Ways to Use ChatGPT Search The Web Feature

📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

10+ Top ChatGPT Prompts for UI/UX Designers

100 AI Tools to Finish Months of Work in Minutes

17 Mindblowing GitHub Repositories You Never Knew Existed

Fine-tuning LLaMA-3 for Text to SQL Generation with Modal: A Comprehensive Guide

Modal

Why This Stack?

Preparing the Dataset

Setting up Modal

Project Structure

Configuration for LLaMA-3

Training Process

Testing Your Fine-tuned Model

Advanced Features

Multi-GPU Training

Weights & Biases Integration

Conclusion

Primastat

References

Tags :

Share :

Related Posts

10 Creative Ways to Use ChatGPT Search The Web Feature

📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

10+ Top ChatGPT Prompts for UI/UX Designers

100 AI Tools to Finish Months of Work in Minutes

17 Mindblowing GitHub Repositories You Never Knew Existed