Llamacpp

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Patrick Hughes

Jun 9

How to Tune llama.cpp --n-gpu-layers: A Practical VRAM Guide (2026)

#localllm #llamacpp #gpu #vram

3 min read

Patrick Hughes

Jun 8

How to Tune --n-gpu-layers for Your VRAM Budget

#localllm #llamacpp #gpu #vram

4 min read

Kunal

Jun 5

Hermes Agent Desktop Free With Local LLMs: The Claude Code Alternative Nobody's Billing You For [2026]

#hermesagent #localllm #claudecodealternative #llamacpp

8 min read

Storm Engine Technology.

Jun 3

llama.cpp b9455 Finally Caught vLLM: 70t/s on 2x3090 Qwen 27B UQ8

#llamacpp #llm #ai #opensource

3 min read

Rost

May 24

Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

#selfhosting #llm #ai #llamacpp

8 min read

Patrick Hughes

May 13

GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

#llamacpp #gguf #quantization #localai

4 min read

Aurora

May 13

Self-Hosted AI Agent Systems: Why Local Inference Matters More Than You Think

#rust #ai #llamacpp #selfhosted

4 min read

TTFT and RAG efficiency insights

Deepu K Sasidharan

Jun 2

How fast is LlamaStash? Overhead, throughput, and a fair comparison with Ollama and LM Studio

#ai #llamacpp #benchmark #llm

24 min read

OpenAI proxy and VRAM-aware crash recovery

Deepu K Sasidharan

Jun 2

Introducing LlamaStash: a zero-overhead, terminal-native llama.cpp launcher

#ai #llamacpp #localllm #rust

11 min read

Umair Bilal

Apr 26

Fixing Qwen 3.6 4090 llama.cpp Bug: 18 tok/s on My RTX 4090

#llm #llamacpp #rtx4090 #qwen

8 min read

r-via

May 28

Benchmarking the Claude Agent SDK on a local LLM: Haiku and Sonnet tier performance

#llm #claude #llamacpp #benchmark

6 min read

Bruno Verachten

Apr 22

First Words: LLM Inference on RISC-V

#bananapi #benchmark #inference #llamacpp

9 min read

Bruno Verachten

Apr 22

Running a 70B LLM on Pure RISC-V: The MilkV Pioneer Deployment Journey

#cpuinference #deepseekr1 #llamacpp #llm

17 min read

Thurmon Demich

May 20

Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

#ollama #llamacpp #vllm #comparison

5 min read

plasmon

Apr 14

llama.cppの設定で8GBの性能が5倍変わる — 主要オプションの最適値を出した

#llm #llamacpp #gpu

4 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

DEV Community

# llamacpp

How to Tune llama.cpp --n-gpu-layers: A Practical VRAM Guide (2026)

How to Tune --n-gpu-layers for Your VRAM Budget

Hermes Agent Desktop Free With Local LLMs: The Claude Code Alternative Nobody's Billing You For [2026]

llama.cpp b9455 Finally Caught vLLM: 70t/s on 2x3090 Qwen 27B UQ8

Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

GGUF Quantization Explained: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

Self-Hosted AI Agent Systems: Why Local Inference Matters More Than You Think

How fast is LlamaStash? Overhead, throughput, and a fair comparison with Ollama and LM Studio

Introducing LlamaStash: a zero-overhead, terminal-native llama.cpp launcher

Fixing Qwen 3.6 4090 llama.cpp Bug: 18 tok/s on My RTX 4090

Benchmarking the Claude Agent SDK on a local LLM: Haiku and Sonnet tier performance

First Words: LLM Inference on RISC-V

Running a 70B LLM on Pure RISC-V: The MilkV Pioneer Deployment Journey

Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

llama.cppの設定で8GBの性能が5倍変わる — 主要オプションの最適値を出した