Inference
GGUF
Quick Answer
A file format for quantized models, supporting multiple quantization levels and efficient inference.
GGUF is a format for storing quantized models with metadata. It's designed for efficient inference with CPU and GPU support. GGUF supports various quantization levels (Q4, Q5, Q8). Popular for open-source models via ollama and oobabooga. GGUF enables running quantized models efficiently. It's simpler than some alternatives. GGUF is widely used in open-source communities.
Last verified: 2026-04-08