fine-tuningloraopenaillm-trainingmodel-customization

How to Fine-Tune an LLM in 2026: When to Do It and How

How to Fine-Tune an LLM in 2026: When to Do It and How

Quick answer: Fine-tuning is rarely the first thing you should try. Exhaust prompt engineering and RAG first — they're cheaper, faster to iterate, and often sufficient. Fine-tuning makes sense when: you need consistent style/format that prompting can't reliably achieve, you have 1,000+ high-quality labeled examples, and you need to reduce inference costs via a smaller fine-tuned model.


When fine-tuning is worth it

Good reasons to fine-tune:

  • Consistent brand voice or proprietary format that prompting achieves only 80% of the time
  • Reducing token usage by training a small model to match a large model's output on a narrow task
  • Classifying or extracting from proprietary domain data where general models underperform
  • Meeting latency requirements by using a smaller fine-tuned model instead of a larger prompted one

Bad reasons to fine-tune:

  • "We want the model to know our company facts" → Use RAG instead
  • "We want better reasoning" → Fine-tuning doesn't improve fundamental reasoning; better base models do
  • "Prompting is inconsistent on a few examples" → More examples and better prompts first
  • You have fewer than 200 labeled examples → Not enough data


OpenAI fine-tuning (managed)

OpenAI offers the easiest fine-tuning path for GPT-4o Mini and GPT-4.1 Mini.

Step 1: Prepare training data

{"messages": [{"role": "system", "content": "Classify support tickets."}, {"role": "user", "content": "I can't log in"}, {"role": "assistant", "content": "account_access"}]}
{"messages": [{"role": "system", "content": "Classify support tickets."}, {"role": "user", "content": "Billing charged twice"}, {"role": "assistant", "content": "billing"}]}

Target: 100-1,000 examples minimum. 1,000+ for best results.

Step 2: Upload and train

import openai

client = openai.OpenAI()

# Upload training file
with open("training.jsonl", "rb") as f:
    training_file = client.files.create(file=f, purpose="fine-tune")

# Create fine-tuning job
job = client.fine_tuning.jobs.create(
    training_file=training_file.id,
    model="gpt-4o-mini",
)

print(f"Job ID: {job.id}")

Step 3: Monitor and use

# Check status
job = client.fine_tuning.jobs.retrieve(job.id)
print(job.status)  # "running", "succeeded", "failed"

# Use the fine-tuned model
response = client.chat.completions.create(
    model=job.fine_tuned_model,  # e.g. "ft:gpt-4o-mini:org:name:id"
    messages=[{"role": "user", "content": "I was charged twice"}]
)

Pricing: Training costs $0.025/1K tokens. Inference on fine-tuned GPT-4o Mini costs $0.30/1M input, $1.20/1M output — slightly more than the base model.


LoRA fine-tuning for open-source models

For open-source models (Llama 4, Mistral, Phi-4), LoRA (Low-Rank Adaptation) is the standard approach — it trains a small adapter rather than all model weights, making fine-tuning feasible on a single GPU.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, TaskType
from datasets import Dataset

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-4-Scout",
    load_in_4bit=True,  # QLoRA for memory efficiency
    device_map="auto"
)

# Configure LoRA
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,           # Rank
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.1,
    bias="none"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()  # Typically <1% of total params

Tools: Unsloth (fast LoRA), Axolotl (production training), LLaMA-Factory (GUI).


Cost comparison: fine-tuned small vs prompted large

Scenario: 10M token/month classification task

Option A: Prompted GPT-4o ($2.50/1M input)

  • Monthly cost: $25,000

Option B: Fine-tuned GPT-4.1 Nano ($0.10/1M input)

  • Training: ~$25 one-time
  • Monthly cost: $1,000
  • Monthly savings: $24,000

At this volume, fine-tuning a small model pays back its training cost in the first hour of production use.


The decision framework

  1. Try prompting → if quality <85%, try with 10-shot examples → if still <85%, try RAG → if still <85% and you have 1,000+ examples, fine-tune
  2. Calculate the inference cost delta between a fine-tuned small model and the current model — if it exceeds fine-tuning cost within 3 months, fine-tune

See best LLMs for developers for the current top base models available for fine-tuning.

Your ad here

Related Tools