Training

QLoRA

Quick Answer

A variant of LoRA that adds quantization for even more parameter-efficient fine-tuning.

QLoRA combines LoRA with quantization for maximum parameter efficiency. The base model is quantized to lower precision (int8 or int4), and LoRA adapts it. QLoRA enables fine-tuning large models on consumer GPUs. It trades some quality for dramatic efficiency gains. QLoRA is practical for fine-tuning 70B parameter models on single GPUs. Quantization can cause slight quality degradation but LoRA compensates well. QLoRA is increasingly popular for accessible fine-tuning.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →