Inference

ONNX

Quick Answer

Open Neural Network Exchange: a standard format for representing trained models across frameworks.

ONNX is an open format for machine learning models, enabling inference in any ONNX-supporting runtime. ONNX models are framework-agnostic: train in PyTorch, deploy in ONNX Runtime. ONNX supports optimizations (quantization, operator fusion). ONNX enables hardware-specific optimization. ONNX Runtime is highly optimized for CPU and GPU inference. ONNX is practical for deployment requiring framework flexibility.

Last verified: 2026-04-08

Compare models

See how different LLMs compare on benchmarks, pricing, and speed.

Browse all models →