Architecture

Sliding Window Attention

Quick Answer

An attention mechanism where tokens only attend to a fixed window of recent previous tokens.

Sliding window attention restricts each token to attend only to recent previous tokens (e.g., the past 4K tokens) rather than all tokens. This reduces attention computation from O(n²) to O(n) and enables linear memory scaling. Sliding window attention works well when recent context is most important. It limits long-range dependencies but is practical. Some models apply sliding window selectively or combine it with sparse attention patterns. Sliding window is particularly useful for very long documents.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →