What's the Cheapest LLM for Coding?

Finding the cheapest LLM for coding requires balancing price with coding ability. Here are the most affordable coding-capable models, ranked by a weighted cost metric (60% input, 40% output):


  • Phi-4 (Microsoft): $0.070/M input, $0.140/M output. Coding ELO: 1130. Speed: 160 tok/s.
  • Gemini 2.0 Flash Lite (Google): $0.075/M input, $0.300/M output. Coding ELO: 1170. Speed: 180 tok/s.
  • Llama 4 Scout (Meta): $0.100/M input, $0.300/M output. Coding ELO: 1230. Speed: 110 tok/s.
  • Mistral Small (Mistral): $0.100/M input, $0.300/M output. Coding ELO: 1160. Speed: 120 tok/s.
  • Gemini 2.0 Flash (Google): $0.100/M input, $0.400/M output. Coding ELO: 1240. Speed: 160 tok/s.

  • For simple code generation and autocomplete, smaller models like GPT-4.1 Nano or Gemini 2.0 Flash Lite are extremely affordable. For complex multi-file refactoring and architecture decisions, investing in Claude Sonnet 4 or GPT-4.1 pays off in fewer iterations and better results.


    Cost-saving tips for coding workloads: Use prompt caching for system prompts with coding instructions. Batch non-urgent code reviews through batch APIs. Start with a cheaper model and only escalate to premium models for complex tasks.

    Related Questions