Skip to content

DEV Community

# vllm

👋 Sign in for the ability to sort posts by relevant, latest, or top.

GaeaRuiW

Jun 9

I built an open-source alternative to Microsoft's KAITO that works on ANY Kubernetes cluster

#kubernetes #vllm #devops #opensource

2 min read

Jun 7

Prefix caching at scale: when it saves you 80% of prefill cost, and the eviction policies that quietly turn it into 5%

#llm #ai #infrastructure #vllm

9 min read

Jun 6

KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break

#llm #ai #vllm #performance

8 min read

xbill for Google Developer Experts

May 30

Gemma 4 Benchmarking NVIDIA Blackwell RTX 6000 vs L4 on Google Cloud Run

#googleantigravity #vllm #googlecloudrun #gemma4

14 min read

May 8

vLLM's V1 Release Fixes the Silent Killer in RL Training

#vllm #machinelearning #python

2 min read

Matthew Gladding

Apr 24

The 70B Threshold: How the RTX 5090 Rewrites the Home Lab Equation

#model #memory #models #vllm

8 min read

May 26

How RunPod FlashBoot Actually Works (4-Request Test)

#runpod #flashboot #serverless #vllm

10 min read

Grace

May 21

Rethinking Open Source Contribution in the Age of AI Agents, featuring vLLM Core Maintainer Roger Wang at MLSys'26

#vllm #ai #machinelearning #llm

3 min read

May 20

Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

#ollama #llamacpp #vllm #comparison

5 min read

May 13

72B Parameters, Zero Quantization, One GPU: Benchmarking Qwen2-VL on AMD MI300X

#vllm #rocm #mi300x #genai

13 min read

Apr 1

From one model to seven — what it took to make TurboQuant model-portable

#python #vllm #gpu #triton

3 min read

Mar 28

Compressed VLM inference from a single Containerfile — turboquant-vllm v1.1

#python #vllm #gpu #containers

2 min read

xbill for Google Developer Experts

Apr 28

Self-hosted Gemma 4 on TPU with vLLM, MCP, ADK, and Gemini CLI

#vllm #googleadk #tpu #gemini

16 min read

Apr 21

11-Second Time to First Token on a Healthy vLLM Server

#vllm #observability #ebpf #mcp

5 min read

Maksim Danilchenko

Apr 11

How to Run Gemma 4 Locally With Ollama, llama.cpp, and vLLM

#gemma4 #ollama #llamacpp #vllm

9 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.