Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
AI/LLM Harness Series' Articles
Back to Tech_Nuggets's Series
What is an LLM evaluation harness? A deep dive into lm-eval-harness
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 3
What is an LLM evaluation harness? A deep dive into lm-eval-harness
#
llm
#
ai
#
evaluation
#
opensource
1
reaction
Comments
Add Comment
7 min read
Building a domain-specific LLM evaluation set from scratch
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 4
Building a domain-specific LLM evaluation set from scratch
#
llm
#
ai
#
evaluation
#
opensource
1
reaction
Comments
Add Comment
8 min read
Speculative decoding: when and why it actually speeds up inference
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 5
Speculative decoding: when and why it actually speeds up inference
#
llm
#
ai
#
inference
#
performance
1
reaction
Comments
Add Comment
9 min read
KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 6
KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break
#
llm
#
ai
#
vllm
#
performance
1
reaction
Comments
Add Comment
8 min read
Prefix caching at scale: when it saves you 80% of prefill cost, and the eviction policies that quietly turn it into 5%
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 7
Prefix caching at scale: when it saves you 80% of prefill cost, and the eviction policies that quietly turn it into 5%
#
llm
#
ai
#
infrastructure
#
vllm
Comments
Add Comment
9 min read
LoRA and QLoRA fine-tuning: what they actually do under the hood
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 9
LoRA and QLoRA fine-tuning: what they actually do under the hood
#
lora
#
qlora
#
finetuning
#
llm
Comments
Add Comment
7 min read
Flash Attention: what it does and why it matters
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 10
Flash Attention: what it does and why it matters
#
llm
#
ai
#
deeplearning
#
transformers
Comments
Add Comment
8 min read
Flash Attention: what it does and why it matters
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 10
Flash Attention: what it does and why it matters
#
llm
#
ai
#
deeplearning
#
gpu
Comments
Add Comment
8 min read
Quantization formats compared: GGUF vs GPTQ vs AWQ vs NF4
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 11
Quantization formats compared: GGUF vs GPTQ vs AWQ vs NF4
#
llm
#
quantization
#
mlops
#
tutorial
Comments
Add Comment
7 min read
Sampling strategies compared: temperature, top-p, top-k, min-p, and what actually works in production
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 12
Sampling strategies compared: temperature, top-p, top-k, min-p, and what actually works in production
#
llm
#
ai
#
machinelearning
#
opensource
Comments
Add Comment
9 min read
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account