Skip to content

DEV Community

AI/LLM Harness Series' Articles

Back to Tech_Nuggets's Series

Jun 3

What is an LLM evaluation harness? A deep dive into lm-eval-harness

#llm #ai #evaluation #opensource

7 min read

Jun 4

Building a domain-specific LLM evaluation set from scratch

#llm #ai #evaluation #opensource

8 min read

Jun 5

Speculative decoding: when and why it actually speeds up inference

#llm #ai #inference #performance

9 min read

Jun 6

KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break

#llm #ai #vllm #performance

8 min read

Jun 7

Prefix caching at scale: when it saves you 80% of prefill cost, and the eviction policies that quietly turn it into 5%

#llm #ai #infrastructure #vllm

9 min read

Jun 9

LoRA and QLoRA fine-tuning: what they actually do under the hood

#lora #qlora #finetuning #llm

7 min read

Jun 10

Flash Attention: what it does and why it matters

#llm #ai #deeplearning #transformers

8 min read

Jun 10

Flash Attention: what it does and why it matters

#llm #ai #deeplearning #gpu

8 min read

Jun 11

Quantization formats compared: GGUF vs GPTQ vs AWQ vs NF4

#llm #quantization #mlops #tutorial

7 min read

Jun 12

Sampling strategies compared: temperature, top-p, top-k, min-p, and what actually works in production

#llm #ai #machinelearning #opensource

9 min read