Architecture
Softmax
Quick Answer
A function that converts raw scores into a probability distribution.
Softmax converts a vector of raw scores into a probability distribution where all values sum to 1. It exponentiates scores and normalizes, making large scores disproportionately large (softening). Softmax is used in attention to convert similarity scores into probability weights and in classification layers. It's standard in LLMs for converting logits to next-token probabilities. Softmax temperature affects the output distribution (higher temperature makes it more uniform). Numerically, softmax with large inputs can overflow, so implementations use log-sum-exp tricks.
Last verified: 2026-04-08