DEV Community

AI/LLM Harness Series' Articles

Back to Tech_Nuggets's Series
What is an LLM evaluation harness? A deep dive into lm-eval-harness

What is an LLM evaluation harness? A deep dive into lm-eval-harness

1
Comments
7 min read
Building a domain-specific LLM evaluation set from scratch

Building a domain-specific LLM evaluation set from scratch

1
Comments
8 min read
Speculative decoding: when and why it actually speeds up inference

Speculative decoding: when and why it actually speeds up inference

1
Comments
9 min read
KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break

KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break

1
Comments
8 min read
Prefix caching at scale: when it saves you 80% of prefill cost, and the eviction policies that quietly turn it into 5%

Prefix caching at scale: when it saves you 80% of prefill cost, and the eviction policies that quietly turn it into 5%

Comments
9 min read
LoRA and QLoRA fine-tuning: what they actually do under the hood

LoRA and QLoRA fine-tuning: what they actually do under the hood

Comments
7 min read
Flash Attention: what it does and why it matters

Flash Attention: what it does and why it matters

Comments
8 min read
Flash Attention: what it does and why it matters

Flash Attention: what it does and why it matters

Comments
8 min read
Quantization formats compared: GGUF vs GPTQ vs AWQ vs NF4

Quantization formats compared: GGUF vs GPTQ vs AWQ vs NF4

Comments
7 min read
Sampling strategies compared: temperature, top-p, top-k, min-p, and what actually works in production

Sampling strategies compared: temperature, top-p, top-k, min-p, and what actually works in production

Comments
9 min read