DEV Community

# quantization

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
How to Pick a GGUF Quant Level for Your VRAM Budget

How to Pick a GGUF Quant Level for Your VRAM Budget

Comments
3 min read
Gemma 4 QAT on a 1080 Ti: What 'Quantization-Aware' Actually Buys — and Fitting the 12B on 8 GB at 16k

Gemma 4 QAT on a 1080 Ti: What 'Quantization-Aware' Actually Buys — and Fitting the 12B on 8 GB at 16k

Comments
5 min read
Quantization formats compared: GGUF vs GPTQ vs AWQ vs NF4

Quantization formats compared: GGUF vs GPTQ vs AWQ vs NF4

Comments
7 min read
INT8 Q/DQ Calibration on Blackwell: 1.8 the TRT 10 + FP16 Baseline

INT8 Q/DQ Calibration on Blackwell: 1.8 the TRT 10 + FP16 Baseline

Comments
7 min read
GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

Comments
4 min read
1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4

1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4

2
Comments 1
2 min read
Why your quantized LLM loses its MTP heads and how to keep them

Why your quantized LLM loses its MTP heads and how to keep them

1
Comments
5 min read
KVQuant: Run 70B LLMs on 8GB RAM with KV Cache Quantization

KVQuant: Run 70B LLMs on 8GB RAM with KV Cache Quantization

Comments
1 min read
KVQuant: Run 70B LLMs on 8GB RAM with 4-bit KV Cache Quantization

KVQuant: Run 70B LLMs on 8GB RAM with 4-bit KV Cache Quantization

Comments
1 min read
Traditional Quantization vs 1.58-Bit Ternary Models: A Practical Comparison

Traditional Quantization vs 1.58-Bit Ternary Models: A Practical Comparison

Comments 1
5 min read
The Best Result This Week Was a Failed Prediction — Phase-3a Doesn't Transfer

The Best Result This Week Was a Failed Prediction — Phase-3a Doesn't Transfer

Comments
1 min read
Two Localizers, Both Wrong: Bounding a Quantization Cost That Wouldn't Close

Two Localizers, Both Wrong: Bounding a Quantization Cost That Wouldn't Close

Comments
1 min read
When the Sensitivity Metric Lies: A Drift-Inversion Smoking Gun in Mixed-Precision LLM Quantization

When the Sensitivity Metric Lies: A Drift-Inversion Smoking Gun in Mixed-Precision LLM Quantization

Comments
8 min read
GIMP's Posterization: Simple Quantization vs. Median Cut for Better Visuals

GIMP's Posterization: Simple Quantization vs. Median Cut for Better Visuals

Comments
8 min read
Q4 KV Cache Fit 32K Context into 8GB VRAM — Only Math Broke

Q4 KV Cache Fit 32K Context into 8GB VRAM — Only Math Broke

Comments
8 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.