Guide

Best Local LLMs for Coding

If you want the best local LLM for coding, start with the task you actually care about: writing patches, fixing bugs, and following instructions under tight VRAM limits. The benchmark data shows that a ~20B parameter model—specifically **gpt-oss:20b**—offers the best ratio of deep coding comprehension to rapid execution on modern hardware.

Section 1

What matters for coding work

Coding models need to hold context, respect instructions, and produce code that compiles cleanly. The latest benchmark data reveals that extremely large models (like Qwen 3.6+ or Gemma 4 31B) can produce slightly stronger code but sacrifice massive amounts of token generation speed (averaging <10 tokens/sec), breaking the real-time coding flow.

Conversely, gpt-oss:20b averages >60 tokens/sec globally and still sits cleanly inside the top tier for quality (average scores >71), proving that you don't strictly need slow, oversized models for premium logic work.

  • Quality: does the model follow the task and preserve surrounding code?
  • Latency: models must generate faster than 30 tok/sec to feel natural in an IDE.
  • Memory: gpt-oss:20b fits cleanly inside upper-tier consumer GPUs with prompt overhead.

Section 2

Practical recommendation by VRAM tier

On 8GB-class cards, stay conservative and prefer efficient 7B or 8B coder models with aggressive quantization (like qwen3.5:9b) to squeeze out usable output.

On 12GB to 16GB cards, you can shift up to high-value 14B models (e.g. ministral-3:14b) for a more reliable, albeit sometimes memory-tight experience.

On 24GB and above, **gpt-oss:20b** is unequivocally the strongest local recommendation for developers right now. It delivers the speed of a smaller model with the uncompromising accuracy of a massive one.

  • 8GB: keep the model compact (8B-9B class) and avoid unnecessary context bloat.
  • 12GB to 16GB: aim for 14B class models that still leave OS headroom.
  • 24GB+: gpt-oss:20b is the definitive sweet spot.

Section 3

How to choose the final candidate

Unless your GPU is strictly limited to low VRAM, we strongly suggest testing gpt-oss:20b as your daily driver for coding. If you lack the hardware, step down progressively to the 14B or 9B classes until you cross the 30 tok/sec line within your available memory budget.

  • Test gpt-oss:20b locally if you have >16GB VRAM.
  • Confirm the model fits your GPU with room to spare.
  • Prefer stable, repeatable results over one-off peaks.