DEV Community

Cover image for How to Run Two RTX 3090s for LLM Inference in 2026
Thurmon Demich
Thurmon Demich

Posted on • Originally published at bestgpuforllm.com

How to Run Two RTX 3090s for LLM Inference in 2026

This article was originally published on Best GPU for LLM. The full version with interactive tools, FAQ, and live pricing is on the original site.

Two used RTX 3090s for $1,200 total. 48GB combined VRAM. Llama 70B at Q4 running at 18-22 tokens per second. That is the pitch — and it actually works. Dual 3090s are the cheapest way to run 70B-class models locally in 2026, and the setup is simpler than most people expect. No NVLink required. No exotic drivers. Just two cards, the right motherboard, and a beefy PSU.

See the recommended pick on the original guide

Why dual 3090s?

The math is straightforward:

Setup VRAM Can run 70B Q4? Cost
1x RTX 4090 24GB No (~42GB needed) ~$1,600
1x RTX 5090 32GB No (~42GB needed) ~$2,000
2x RTX 3090 (used) 48GB Yes ~$1,200
2x RTX 4090 48GB Yes ~$3,200

A single RTX 4090 maxes out at 24GB — short of the ~42GB needed for Llama 70B at Q4_K_M. The only way to fit 70B on consumer hardware is multiple GPUs. And two used 3090s at $600 each cost less than one new 4090.

What you need

Hardware checklist

Component Requirement Why
GPUs 2x RTX 3090 24GB each = 48GB total
Motherboard 2 physical x16 PCIe slots Both must run at x8 or x16
PSU 850W minimum, 1000W recommended Each 3090 draws up to 350W
CPU Any modern 6+ core Not the bottleneck for inference
RAM 32GB minimum 64GB recommended for large context
Case Full tower with good airflow 3090s are triple-slot cards — check clearance
PCIe risers Optional Can help with spacing if slots are too close

Motherboard notes

This is where most builds fail. Many consumer motherboards have two x16-length slots, but the second slot runs at x4 electrically. That works but costs ~15% performance. Look for boards where both slots run at x8/x8 minimum when populated. ATX boards with Intel Z690/Z790 or AMD X670 chipsets usually support this.

Do NOT buy NVLink bridges. The RTX 3090 supports NVLink, but llama.cpp and Ollama do not use it for LLM inference. They use tensor parallelism over PCIe, which works on any multi-GPU setup. NVLink is wasted money for this use case.

Software setup

Option 1: Ollama (easiest)

Ollama automatically detects multiple GPUs and splits the model across them. No configuration needed.

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Run a 70B model — Ollama auto-splits across both GPUs
ollama run llama3.1:70b-instruct-q4_K_M
Enter fullscreen mode Exit fullscreen mode

Verify both GPUs are being used:

nvidia-smi
# Both GPUs should show VRAM usage
Enter fullscreen mode Exit fullscreen mode

Option 2: llama.cpp (more control)

llama.cpp gives you explicit control over layer splitting:

# Auto-split across GPUs
./llama-server -m llama-70b-Q4_K_M.gguf --n-gpu-layers 99

# Manual split: 40 layers on GPU 0, 40 on GPU 1
./llama-server -m llama-70b-Q4_K_M.gguf --n-gpu-layers 80 --tensor-split 0.5,0.5
Enter fullscreen mode Exit fullscreen mode

The --tensor-split flag controls how layers are distributed. Equal split (0.5,0.5) is usually optimal for two identical GPUs. If one card is slightly faster or has more free VRAM, adjust the ratio.

See the recommended pick on the original guide

Performance expectations

Tested with Llama 3.1 70B at Q4_K_M on dual RTX 3090s:

Metric Value
Prompt processing ~350 tok/s
Token generation ~18-22 tok/s
VRAM usage (per GPU) ~21GB each
Total VRAM used ~42GB
Power draw (both GPUs) ~500-600W

18-22 tok/s on a 70B model is comfortable for interactive chat. It is not blazing fast, but responses stream smoothly and you will not feel like you are waiting.

For comparison:

Setup 70B Q4 tok/s Cost
2x RTX 3090 ~18-22 tok/s ~$1,200
2x RTX 4090 ~30-35 tok/s ~$3,200
Cloud (RunPod A100) ~40-50 tok/s ~$2-4/hr

Dual 4090s are ~60% faster, but at nearly 3x the cost. The 3090 setup is the value play.

VRAM chart available at the original article

What models fit on 48GB?

Model Q4_K_M VRAM Fits on 2x 3090? tok/s
Llama 3.1 70B ~42GB Yes ~18-22
Qwen 3 72B ~45GB Tight ~15-18
Llama 4 Scout (109B MoE) ~40GB* Yes ~25-30
Mixtral 8x22B ~40GB Yes ~20-25
Any model under 34B Under 24GB Yes (single GPU) Varies

*MoE models like Llama 4 Scout only load active parameters, so the effective VRAM usage is lower than total parameter count suggests.

The 48GB sweet spot opens up the entire 70B class of dense models and many larger MoE models. This is the key advantage over single-GPU setups.

GPU tier list available at the original article

Common issues and fixes

"Only one GPU is being used"

Check that both GPUs are detected: nvidia-smi should show two devices. If Ollama only uses one, try setting CUDA_VISIBLE_DEVICES=0,1 before starting. In llama.cpp, explicitly set --n-gpu-layers 99 to force full GPU offloading.

Thermal throttling

Two 3090s generate serious heat — up to 700W combined. Ensure your case has strong front-to-back airflow. Leave at least one slot gap between the cards if possible. Consider aftermarket GPU coolers or a case with 140mm fans if you see temperatures hitting 83C+ consistently.

PCIe bandwidth bottleneck

If your second slot runs at x4, you will see one GPU process tokens slower than the other. The impact is ~15% on overall throughput. Upgrading to a motherboard with proper x8/x8 bifurcation fixes this. For most users, the 15% loss is acceptable given the cost savings.

Who should NOT do this?

  • Gamers who occasionally run LLMs. Dual 3090s draw 700W and generate significant heat. If you primarily game, a single RTX 4090 is a better all-rounder (though it cannot do 70B).
  • Anyone who needs 70B at 30+ tok/s. Dual 3090s cap at ~22 tok/s. If speed is critical, dual 4090s or cloud are your options.
  • Small form factor builders. Two triple-slot 3090s need a full tower case with good airflow. mITX and mATX builds cannot accommodate this.

See the recommended pick on the original guide

For used 3090 buying tips, see our used RTX 3090 buying guide. Planning to run Llama specifically? The best GPU for Llama 70B guide covers all options. PSU sizing for multi-GPU is covered in PSU for dual GPU LLM. And for motherboard compatibility, see best motherboard for dual GPU LLM.

Related guides on Best GPU for LLM


Continue on Best GPU for LLM for the complete guide with interactive calculators and current GPU prices.

Top comments (0)