Safety & Alignment
Toxicity
Quick Answer
Model outputs containing abusive, offensive, or hateful language.
Toxicity measures offensive language. Toxic outputs include: profanity, slurs, hate speech. Toxicity classifiers detect offensive content. Reducing toxicity is a safety priority. Toxicity can be demographic (targeted slurs). Context affects toxicity (some contexts justify certain language). Toxicity measurement is subjective. Toxicity prevention is important for user safety. Toxicity is filtered during training.
Last verified: 2026-04-08