byeongsoo kang

404 bio not found

Republic of Korea Joined on Jun 1, 2026 https://bric.pe.kr

byeongsoo kang

Jun 11

MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent

#llm #machinelearning #performance #gemma

3 min read

Want to connect with byeongsoo kang?

Create an account to connect with byeongsoo kang. You can also sign in below to proceed if you already have an account.

Create Account

Already have an account? Sign in

byeongsoo kang

Jun 11

Gemma 4 QAT on a 1080 Ti: What 'Quantization-Aware' Actually Buys — and Fitting the 12B on 8 GB at 16k

#llm #machinelearning #gemma #quantization

5 min read

byeongsoo kang

Jun 10

The Prefill Wall: Why MTP's 2 Barely Moves Long-Context Latency (Qwen3.6-27B, RTX 3090)

#llm #performance #machinelearning #rag

4 min read

byeongsoo kang

Jun 9

Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever (35.7 ~75 tok/s)

#ollama #llm #performance #machinelearning

7 min read

byeongsoo kang

Jun 5

Building a Fully-Local Research RAG on 2 GTX 1080 Ti + an RTX 3090 — 3 Gotchas

#ollama #llm #rag #machinelearning

5 min read

byeongsoo kang

Jun 5

Running Brand-New Gemma 4 12B on an 8-Year-Old GTX 1080 Ti: Speed, 3 Gotchas, and Why Q8 Beat Q4 on My Own Field

#llm #ollama #gpu #machinelearning

5 min read

byeongsoo kang

Jun 3

Running 35B–400B LLMs on a GPU-less Cluster to Mine 10,000 Papers — and the 4 Bugs That Almost Ruined the Data

#llm #machinelearning #python #infrastructure

9 min read

byeongsoo kang

Jun 3

A MOGONET-Style Multi-Omics Biomarker Pipeline: Why a Near-Random Graph Net Still Earns Its Place

#machinelearning #bioinformatics #python #datascience

7 min read

byeongsoo kang

Jun 3

Running a 35B MoE (Qwen3.6-35B-A3B) on 2x GTX 1080 Ti in 2026 — Real Benchmarks, and Does the Second GPU Actually Help?

#llm #machinelearning #gpu #ollama

5 min read

DEV Community

byeongsoo kang

Badges

Writing Debut

MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent

Want to connect with byeongsoo kang?

Gemma 4 QAT on a 1080 Ti: What 'Quantization-Aware' Actually Buys — and Fitting the 12B on 8 GB at 16k

The Prefill Wall: Why MTP's 2 Barely Moves Long-Context Latency (Qwen3.6-27B, RTX 3090)

Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever (35.7 ~75 tok/s)

Building a Fully-Local Research RAG on 2 GTX 1080 Ti + an RTX 3090 — 3 Gotchas

Running Brand-New Gemma 4 12B on an 8-Year-Old GTX 1080 Ti: Speed, 3 Gotchas, and Why Q8 Beat Q4 on My Own Field

Running 35B–400B LLMs on a GPU-less Cluster to Mine 10,000 Papers — and the 4 Bugs That Almost Ruined the Data

A MOGONET-Style Multi-Omics Biomarker Pipeline: Why a Near-Random Graph Net Still Earns Its Place

Running a 35B MoE (Qwen3.6-35B-A3B) on 2x GTX 1080 Ti in 2026 — Real Benchmarks, and Does the Second GPU Actually Help?