Best Local LLMs for Coding

Section 1

What matters for coding work

Coding models need to hold context, respect instructions, and produce code that compiles cleanly. The latest benchmark data reveals that extremely large models (like Qwen 3.6+ or Gemma 4 31B) can produce slightly stronger code but sacrifice massive amounts of token generation speed (averaging <10 tokens/sec), breaking the real-time coding flow.

Conversely, gpt-oss:20b averages >60 tokens/sec globally and still sits cleanly inside the top tier for quality (average scores >71), proving that you don't strictly need slow, oversized models for premium logic work.

Quality: does the model follow the task and preserve surrounding code?
Latency: models must generate faster than 30 tok/sec to feel natural in an IDE.
Memory: gpt-oss:20b fits cleanly inside upper-tier consumer GPUs with prompt overhead.

Browse model benchmark data

Compare model-level speed, quality, and hardware compatibility.

Read the benchmark methodology

See how coding quality and performance are measured.

Section 2

Practical recommendation by VRAM tier

On 8GB-class cards, stay conservative and prefer efficient 7B or 8B coder models with aggressive quantization (like qwen3.5:9b) to squeeze out usable output.

On 12GB to 16GB cards, you can shift up to high-value 14B models (e.g. ministral-3:14b) for a more reliable, albeit sometimes memory-tight experience.

On 24GB and above, **gpt-oss:20b** is unequivocally the strongest local recommendation for developers right now. It delivers the speed of a smaller model with the uncompromising accuracy of a massive one.

8GB: keep the model compact (8B-9B class) and avoid unnecessary context bloat.
12GB to 16GB: aim for 14B class models that still leave OS headroom.
24GB+: gpt-oss:20b is the definitive sweet spot.

Check hardware compatibility

See how different GPUs handle the same model families.

Open benchmark results

Inspect raw benchmark rows for the fastest and strongest runs.

Section 3

How to choose the final candidate

Unless your GPU is strictly limited to low VRAM, we strongly suggest testing gpt-oss:20b as your daily driver for coding. If you lack the hardware, step down progressively to the 14B or 9B classes until you cross the 30 tok/sec line within your available memory budget.

Test gpt-oss:20b locally if you have >16GB VRAM.
Confirm the model fits your GPU with room to spare.
Prefer stable, repeatable results over one-off peaks.