DEV Community

# llamacpp

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
How to Tune llama.cpp --n-gpu-layers: A Practical VRAM Guide (2026)

How to Tune llama.cpp --n-gpu-layers: A Practical VRAM Guide (2026)

Comments
3 min read
How to Tune --n-gpu-layers for Your VRAM Budget

How to Tune --n-gpu-layers for Your VRAM Budget

Comments
4 min read
Hermes Agent Desktop Free With Local LLMs: The Claude Code Alternative Nobody's Billing You For [2026]

Hermes Agent Desktop Free With Local LLMs: The Claude Code Alternative Nobody's Billing You For [2026]

Comments
8 min read
llama.cpp b9455 Finally Caught vLLM: 70t/s on 2x3090 Qwen 27B UQ8

llama.cpp b9455 Finally Caught vLLM: 70t/s on 2x3090 Qwen 27B UQ8

Comments
3 min read
Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

Comments
8 min read
GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

Comments
4 min read
Self-Hosted AI Agent Systems: Why Local Inference Matters More Than You Think

Self-Hosted AI Agent Systems: Why Local Inference Matters More Than You Think

Comments
4 min read
How fast is LlamaStash? Overhead, throughput, and a fair comparison with Ollama and LM Studio

TTFT and RAG efficiency insights

How fast is LlamaStash? Overhead, throughput, and a fair comparison with Ollama and LM Studio

12
Comments 9
24 min read
Introducing LlamaStash: a zero-overhead, terminal-native llama.cpp launcher

OpenAI proxy and VRAM-aware crash recovery

Introducing LlamaStash: a zero-overhead, terminal-native llama.cpp launcher

9
Comments 2
11 min read
Fixing Qwen 3.6 4090 llama.cpp Bug: 18 tok/s on My RTX 4090

Fixing Qwen 3.6 4090 llama.cpp Bug: 18 tok/s on My RTX 4090

Comments
8 min read
Benchmarking the Claude Agent SDK on a local LLM: Haiku and Sonnet tier performance

Benchmarking the Claude Agent SDK on a local LLM: Haiku and Sonnet tier performance

Comments 1
6 min read
First Words: LLM Inference on RISC-V

First Words: LLM Inference on RISC-V

Comments
9 min read
Running a 70B LLM on Pure RISC-V: The MilkV Pioneer Deployment Journey

Running a 70B LLM on Pure RISC-V: The MilkV Pioneer Deployment Journey

Comments
17 min read
Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

Comments 1
5 min read
llama.cppの設定で8GBの性能が5倍変わる — 主要オプションの最適値を出した

llama.cppの設定で8GBの性能が5倍変わる — 主要オプションの最適値を出した

Comments
4 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.