How to Fine-Tune an LLM in 2026: When to Do It and How
Quick answer: Fine-tuning is rarely the first thing you should try. Exhaust prompt engineering and RAG first — they're cheaper, faster to iterate, and often sufficient. Fine-tuning makes sense when: you need consistent style/format that prompting can't reliably achieve, you have 1,000+ high-quality labeled examples, and you need to reduce inference costs via a smaller fine-tuned model.
When fine-tuning is worth it
Good reasons to fine-tune:
- Consistent brand voice or proprietary format that prompting achieves only 80% of the time
- Reducing token usage by training a small model to match a large model's output on a narrow task
- Classifying or extracting from proprietary domain data where general models underperform
- Meeting latency requirements by using a smaller fine-tuned model instead of a larger prompted one
Bad reasons to fine-tune:
- "We want the model to know our company facts" → Use RAG instead
- "We want better reasoning" → Fine-tuning doesn't improve fundamental reasoning; better base models do
- "Prompting is inconsistent on a few examples" → More examples and better prompts first
- You have fewer than 200 labeled examples → Not enough data
OpenAI fine-tuning (managed)
OpenAI offers the easiest fine-tuning path for GPT-4o Mini and GPT-4.1 Mini.
Step 1: Prepare training data
{"messages": [{"role": "system", "content": "Classify support tickets."}, {"role": "user", "content": "I can't log in"}, {"role": "assistant", "content": "account_access"}]}
{"messages": [{"role": "system", "content": "Classify support tickets."}, {"role": "user", "content": "Billing charged twice"}, {"role": "assistant", "content": "billing"}]}
Target: 100-1,000 examples minimum. 1,000+ for best results.
Step 2: Upload and train
import openai
client = openai.OpenAI()
# Upload training file
with open("training.jsonl", "rb") as f:
training_file = client.files.create(file=f, purpose="fine-tune")
# Create fine-tuning job
job = client.fine_tuning.jobs.create(
training_file=training_file.id,
model="gpt-4o-mini",
)
print(f"Job ID: {job.id}")
Step 3: Monitor and use
# Check status
job = client.fine_tuning.jobs.retrieve(job.id)
print(job.status) # "running", "succeeded", "failed"
# Use the fine-tuned model
response = client.chat.completions.create(
model=job.fine_tuned_model, # e.g. "ft:gpt-4o-mini:org:name:id"
messages=[{"role": "user", "content": "I was charged twice"}]
)
Pricing: Training costs $0.025/1K tokens. Inference on fine-tuned GPT-4o Mini costs $0.30/1M input, $1.20/1M output — slightly more than the base model.
LoRA fine-tuning for open-source models
For open-source models (Llama 4, Mistral, Phi-4), LoRA (Low-Rank Adaptation) is the standard approach — it trains a small adapter rather than all model weights, making fine-tuning feasible on a single GPU.
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, TaskType
from datasets import Dataset
# Load base model
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-4-Scout",
load_in_4bit=True, # QLoRA for memory efficiency
device_map="auto"
)
# Configure LoRA
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=16, # Rank
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.1,
bias="none"
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters() # Typically <1% of total params
Tools: Unsloth (fast LoRA), Axolotl (production training), LLaMA-Factory (GUI).
Cost comparison: fine-tuned small vs prompted large
Scenario: 10M token/month classification task
Option A: Prompted GPT-4o ($2.50/1M input)
- Monthly cost: $25,000
Option B: Fine-tuned GPT-4.1 Nano ($0.10/1M input)
- Training: ~$25 one-time
- Monthly cost: $1,000
- Monthly savings: $24,000
At this volume, fine-tuning a small model pays back its training cost in the first hour of production use.
The decision framework
- Try prompting → if quality <85%, try with 10-shot examples → if still <85%, try RAG → if still <85% and you have 1,000+ examples, fine-tune
- Calculate the inference cost delta between a fine-tuned small model and the current model — if it exceeds fine-tuning cost within 3 months, fine-tune
See best LLMs for developers for the current top base models available for fine-tuning.