Architecture

Byte-Pair Encoding

Quick Answer

A tokenization algorithm that iteratively merges the most frequent byte/character pairs.

Byte-pair encoding (BPE) iteratively finds the most frequent pair of tokens in the corpus and merges them into a new token. This process repeats until reaching a target vocabulary size. BPE efficiently learns subword units, enabling the tokenizer to handle unseen words by decomposing them into known subwords. BPE is widely used in LLMs. It provides a good balance between vocabulary size and compression. The BPE vocabulary learned on one corpus may not generalize well to very different text distributions.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →

← All glossary terms