Type something to search...
Fine-tuning LLaMA-3 for Text to SQL Generation with Modal: A Comprehensive Guide

Fine-tuning LLaMA-3 for Text to SQL Generation with Modal: A Comprehensive Guide

Fine-tuning large language models (LLMs) has traditionally been a complex endeavor, requiring significant infrastructure setup and management. However, with Modal’s cloud platform and Axolotl’s fine-tuning framework, you can now fine-tune powerful models like LLaMA-3 directly from your local machine, without dealing with infrastructure complexities.

In this guide, we’ll walk through fine-tuning LLaMA-3 8B for SQL query generation using Modal’s remote GPU capabilities and Axolotl’s state-of-the-art training optimizations.

Modal is a revolutionary cloud platform that lets you run GPU-intensive workloads without managing any infrastructure. Key benefits include:

  • Run code remotely within seconds
  • Access to powerful GPUs with a single line of code
  • Zero infrastructure configuration — everything is defined in code
  • Serverless execution with per-second pricing
  • Seamless local development experience
  • Built-in primitives for distributed computing

The most compelling feature is Modal’s ability to make remote GPU resources feel local. You can develop and test your fine-tuning pipeline on your laptop, then seamlessly execute it on powerful cloud GPUs without any changes to your code.

Why This Stack?

The Modal + Axolotl stack comes with everything you need for efficient fine-tuning:

  • Parameter-Efficient Fine-Tuning (PEFT) via LoRA adapters for faster convergence
  • Flash Attention for optimized memory usage during training
  • Gradient checkpointing to reduce VRAM footprint
  • Distributed training via DeepSpeed for optimal multi-GPU scaling

Best of all, Modal eliminates infrastructure headaches. No need to worry about building images, provisioning GPUs, or managing cloud storage. If your training script runs on Modal, it’s production-ready from day one.

Preparing the Dataset

We will use a NL to SQL dataset for this task as it is a very common use-case these days, the dataset we’d be using is Bird NL2SQL dataset on Huggingface.We need to convert the format of this dataset to Alpaca format, which is personally my preferred format of training data, you could use any other format, you would just need to make some small changes in the modal config file.

Here is the python script used to convert the dataset to Alpaca format:

import json
import re
from typing import List, Dict
from datasets import load_dataset

def extract_components(input_text: str) -> tuple:
    """Extract schema and question from input text."""
    schema_match = re.search(r'Here is a database schema:(.*?)Please write', input_text, re.DOTALL)
    schema = schema_match.group(1).strip() if schema_match else ""
    
    question_match = re.search(r'following question: (.*?)\[/INST\]', input_text, re.DOTALL)
    question = question_match.group(1).strip() if question_match else ""
    
    return schema, question

def convert_to_alpaca_format(dataset) -> List[Dict[str, str]]:
    """Convert the dataset to Alpaca format with instruction/input/output fields."""
    alpaca_data = []
    
    for item in dataset:
        schema, question = extract_components(item['input'])
        
        alpaca_entry = {
            "instruction": "Write a SQL query to answer the question based on the given database schema.",
            "input": f"Schema:\n{schema}\n\nQuestion: {question}",
            "output": item['output'].strip()
        }
        
        alpaca_data.append(alpaca_entry)
    
    return alpaca_data

def save_to_jsonl(data: List[Dict[str, str]], output_file: str):
    """Save the data to a JSONL file."""
    with open(output_file, 'w', encoding='utf-8') as f:
        for item in data:
            json_line = json.dumps(item, ensure_ascii=False)
            f.write(json_line + '\n')

## Load your dataset (replace with your actual dataset name/path)
dataset = load_dataset("lamini/bird_text_to_sql")

## Convert to Alpaca format
alpaca_data = convert_to_alpaca_format(dataset['train'])  # or whichever split you're using

## Save to JSONL file
save_to_jsonl(alpaca_data, 'bird_text_to_sql_alpaca-2.jsonl')

Here is a sample from how the final data would look like:

{
  "instruction": "Write a SQL query to answer the question based on the given database schema.",
  "input": "Schema:\n[database schema]\n\nQuestion: [question]",
  "output": "[SQL query]"
}

Setting up Modal

Before we start, make sure you have:

  • Set up a Modal account:
pip install modal
python -m modal setup
  • Created a HuggingFace secret in your Modal workspace (get your HF_TOKEN from HuggingFace settings > API tokens)
  • Clone the repository:
git clone https://github.com/modal-labs/llm-finetuning.git
cd llm-finetuning

Project Structure

The repository provides everything needed for training and inference:

  • Training script for cloud-based fine-tuning
  • Inference engine for testing your results
  • Configuration files for different models and tasks
  • Sample datasets and data processing utilities

Modal’s built-in storage system helps manage data across functions:

  • /pretrained volume: Stores pretrained models (loaded once)
  • /runs volume: Stores configs, datasets, and results for each training run

Configuration for LLaMA-3

You can find configuration files for a bunch of LLMs in this repository like Mistral, CodeLlama, Mixtral etc. but we will use Llama-3 for this example. After making some adjustments to the existing llama-3-config.yml, this is our config file:

###
## Model Configuration: LLaMA-3 8B
###

base_model: NousResearch/Meta-Llama-3-8B
sequence_len: 4096

## base model weight quantization
load_in_8bit: false
load_in_4bit: true
quantization_config:
  load_in_4bit: true
  bnb_4bit_compute_dtype: "bfloat16"
  bnb_4bit_use_double_quant: true
  bnb_4bit_quant_type: "nf4"
## attention implementation
flash_attention: true

## finetuned adapter config
adapter: lora
lora_model_dir:
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save: # required when adding new tokens to LLaMA/Mistral
  - embed_tokens
  - lm_head
## for details, see https://github.com/huggingface/peft/issues/334#issuecomment-1561727994

###
## Dataset Configuration: sqlqa
###

datasets:
  # This will be the path used for the data when it is saved to the Volume in the cloud.
  - path: bird_text_to_sql_alpaca.jsonl
    ds_type: json
    type: alpaca
    
## dataset formatting config
tokens: # add new control tokens from the dataset to the model
  - "[INST]"
  - " [/INST]"
  - "[SQL]"
  - " [/SQL]"

special_tokens:
  pad_token: <|end_of_text|>

val_set_size: 0.05

###
## Training Configuration
###

## random seed for better reproducibility
seed: 117

## optimizer config
optimizer: adamw_bnb_8bit
learning_rate: 0.0001
lr_scheduler: cosine
num_epochs: 4
micro_batch_size: 2
gradient_accumulation_steps: 16
warmup_steps: 10

## axolotl saving config
dataset_prepared_path: last_run_prepared
output_dir: ./lora-out

## logging and eval config
logging_steps: 1
eval_steps: 0.05

## training performance optimization config
bf16: true
fp16: false
tf32: false
train_on_inputs: false
group_by_length: false
gradient_checkpointing: true
deepspeed: "/workspace/axolotl/deepspeed_configs/zero3.json"

###
## Miscellaneous Configuration
###

## when true, prevents over-writing the config from the CLI
strict: false

## Memory optimizations
gradient_checkpointing: true
use_gradient_checkpointing: true 

## "Don't mess with this, it's here for accelerate and torchrun" -- axolotl docs
local_rank:

Training Process

Modal’s training script has three key functions:

  1. launch: Prepares a new run folder with config and data
  2. train: Executes the training job
  3. merge: Combines the trained adapter with the base model

To start training:

modal run --detach src.train --config=llama-3-config.yml --data=bird_text_to_sql_alpaca.jsonl

The --detach flag lets training continue even if your local connection drops. The training run folder name will be in the command output (e.g. axo-2024-01-04-09-19-02-92bb). You can check if your fine-tuned model is stored properly in this folder using modal volume ls.

This will trigger a training job and you’ll get a link to track the training process on Modal dashboard, or you can go to your modal dashboard and check for the relevant app logs.

Testing Your Fine-tuned Model

Once training completes, test your model using Modal’s inference engine:

modal run -q src.inference --run-name axo-2025-01-04-09-19-02-92bb

The inference engine uses vLLM for up to 24x faster generation compared to standard implementations.

Advanced Features

Multi-GPU Training

For larger models or datasets, enable multi-GPU training by adding DeepSpeed configuration:

  • In llama-3-config.yml:
deepspeed: "/workspace/axolotl/deepspeed_configs/zero3.json"
  • In your training script:
N_GPUS = int(os.environ.get("N_GPUS", 2))
GPU_CONFIG = modal.gpu.A100(count=N_GPUS, size="80GB")

Weights & Biases Integration

To track training metrics:

  • Create a W&B secret in your Modal dashboard
  • Add to your app configuration:
app = modal.App(
    "example-axolotl",
    secrets=[
        modal.Secret.from_name("huggingface"),
        modal.Secret.from_name("my-wandb-secret"),
    ],
)
  • Update llama-3-config.yml:
wandb_project: llama3-sql
wandb_watch: gradients

Conclusion

Modal and Axolotl make fine-tuning LLaMA-3 remarkably straightforward. You get:

  • State-of-the-art training optimizations out of the box
  • Zero infrastructure management
  • Serverless scaling and pay-per-use pricing
  • Easy deployment options

The entire process runs through simple Python code — no YAML configurations, no infrastructure setup, just pure functionality. Once trained, you can easily deploy your model using Modal’s web endpoint feature for production use.

Feel free to reach out to me on LinkedIn if you have any questions or suggestions!

Primastat

If you wish to implement LLM applications in your company and are looking for professionals to build complex systems, look no further!

At Primastat, we aim to provide high quality data-driven AI solutions ranging from fine-tuning LLMs to Agentic AI applications. We cater to a range of sectors including Marketing, Healthcare, Legal and Fin-tech.

Drop us a mail at connect@primastat.in or reach out to us on our social media handles:

References

Related Posts

10 Creative Ways to Use ChatGPT Search The Web Feature

10 Creative Ways to Use ChatGPT Search The Web Feature

For example, prompts and outputs Did you know you can use the “search the web” feature of ChatGPT for many tasks other than your basic web search? For those who don't know, ChatGPT’s new

Read More
📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

📚 10 Must-Learn Skills to Stay Ahead in AI and Tech 🚀

In an industry as dynamic as AI and tech, staying ahead means constantly upgrading your skills. Whether you’re aiming to dive deep into AI model performance, master data analysis, or transform trad

Read More
10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

10 Powerful Perplexity AI Prompts to Automate Your Marketing Tasks

In today’s fast-paced digital world, marketers are always looking for smarter ways to streamline their efforts. Imagine having a personal assistant who can create audience profiles, suggest mar

Read More
10+ Top ChatGPT Prompts for UI/UX Designers

10+ Top ChatGPT Prompts for UI/UX Designers

AI technologies, such as machine learning, natural language processing, and data analytics, are redefining traditional design methodologies. From automating repetitive tasks to enabling personal

Read More
100 AI Tools to Finish Months of Work in Minutes

100 AI Tools to Finish Months of Work in Minutes

The rapid advancements in artificial intelligence (AI) have transformed how businesses operate, allowing people to complete tasks that once took weeks or months in mere minutes. From content creat

Read More
17 Mindblowing GitHub Repositories You Never Knew Existed

17 Mindblowing GitHub Repositories You Never Knew Existed

Github Hidden Gems!! Repositories To Bookmark Right Away Learning to code is relatively easy, but mastering the art of writing better code is much tougher. GitHub serves as a treasur

Read More