This article was originally published on Best GPU for LLM. The full version with interactive tools, FAQ, and live pricing is on the original site.
Two used RTX 3090s for $1,200 total. 48GB combined VRAM. Llama 70B at Q4 running at 18-22 tokens per second. That is the pitch — and it actually works. Dual 3090s are the cheapest way to run 70B-class models locally in 2026, and the setup is simpler than most people expect. No NVLink required. No exotic drivers. Just two cards, the right motherboard, and a beefy PSU.
See the recommended pick on the original guide
Why dual 3090s?
The math is straightforward:
| Setup | VRAM | Can run 70B Q4? | Cost |
|---|---|---|---|
| 1x RTX 4090 | 24GB | No (~42GB needed) | ~$1,600 |
| 1x RTX 5090 | 32GB | No (~42GB needed) | ~$2,000 |
| 2x RTX 3090 (used) | 48GB | Yes | ~$1,200 |
| 2x RTX 4090 | 48GB | Yes | ~$3,200 |
A single RTX 4090 maxes out at 24GB — short of the ~42GB needed for Llama 70B at Q4_K_M. The only way to fit 70B on consumer hardware is multiple GPUs. And two used 3090s at $600 each cost less than one new 4090.
What you need
Hardware checklist
| Component | Requirement | Why |
|---|---|---|
| GPUs | 2x RTX 3090 | 24GB each = 48GB total |
| Motherboard | 2 physical x16 PCIe slots | Both must run at x8 or x16 |
| PSU | 850W minimum, 1000W recommended | Each 3090 draws up to 350W |
| CPU | Any modern 6+ core | Not the bottleneck for inference |
| RAM | 32GB minimum | 64GB recommended for large context |
| Case | Full tower with good airflow | 3090s are triple-slot cards — check clearance |
| PCIe risers | Optional | Can help with spacing if slots are too close |
Motherboard notes
This is where most builds fail. Many consumer motherboards have two x16-length slots, but the second slot runs at x4 electrically. That works but costs ~15% performance. Look for boards where both slots run at x8/x8 minimum when populated. ATX boards with Intel Z690/Z790 or AMD X670 chipsets usually support this.
Do NOT buy NVLink bridges. The RTX 3090 supports NVLink, but llama.cpp and Ollama do not use it for LLM inference. They use tensor parallelism over PCIe, which works on any multi-GPU setup. NVLink is wasted money for this use case.
Software setup
Option 1: Ollama (easiest)
Ollama automatically detects multiple GPUs and splits the model across them. No configuration needed.
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Run a 70B model — Ollama auto-splits across both GPUs
ollama run llama3.1:70b-instruct-q4_K_M
Verify both GPUs are being used:
nvidia-smi
# Both GPUs should show VRAM usage
Option 2: llama.cpp (more control)
llama.cpp gives you explicit control over layer splitting:
# Auto-split across GPUs
./llama-server -m llama-70b-Q4_K_M.gguf --n-gpu-layers 99
# Manual split: 40 layers on GPU 0, 40 on GPU 1
./llama-server -m llama-70b-Q4_K_M.gguf --n-gpu-layers 80 --tensor-split 0.5,0.5
The --tensor-split flag controls how layers are distributed. Equal split (0.5,0.5) is usually optimal for two identical GPUs. If one card is slightly faster or has more free VRAM, adjust the ratio.
See the recommended pick on the original guide
Performance expectations
Tested with Llama 3.1 70B at Q4_K_M on dual RTX 3090s:
| Metric | Value |
|---|---|
| Prompt processing | ~350 tok/s |
| Token generation | ~18-22 tok/s |
| VRAM usage (per GPU) | ~21GB each |
| Total VRAM used | ~42GB |
| Power draw (both GPUs) | ~500-600W |
18-22 tok/s on a 70B model is comfortable for interactive chat. It is not blazing fast, but responses stream smoothly and you will not feel like you are waiting.
For comparison:
| Setup | 70B Q4 tok/s | Cost |
|---|---|---|
| 2x RTX 3090 | ~18-22 tok/s | ~$1,200 |
| 2x RTX 4090 | ~30-35 tok/s | ~$3,200 |
| Cloud (RunPod A100) | ~40-50 tok/s | ~$2-4/hr |
Dual 4090s are ~60% faster, but at nearly 3x the cost. The 3090 setup is the value play.
VRAM chart available at the original article
What models fit on 48GB?
| Model | Q4_K_M VRAM | Fits on 2x 3090? | tok/s |
|---|---|---|---|
| Llama 3.1 70B | ~42GB | Yes | ~18-22 |
| Qwen 3 72B | ~45GB | Tight | ~15-18 |
| Llama 4 Scout (109B MoE) | ~40GB* | Yes | ~25-30 |
| Mixtral 8x22B | ~40GB | Yes | ~20-25 |
| Any model under 34B | Under 24GB | Yes (single GPU) | Varies |
*MoE models like Llama 4 Scout only load active parameters, so the effective VRAM usage is lower than total parameter count suggests.
The 48GB sweet spot opens up the entire 70B class of dense models and many larger MoE models. This is the key advantage over single-GPU setups.
GPU tier list available at the original article
Common issues and fixes
"Only one GPU is being used"
Check that both GPUs are detected: nvidia-smi should show two devices. If Ollama only uses one, try setting CUDA_VISIBLE_DEVICES=0,1 before starting. In llama.cpp, explicitly set --n-gpu-layers 99 to force full GPU offloading.
Thermal throttling
Two 3090s generate serious heat — up to 700W combined. Ensure your case has strong front-to-back airflow. Leave at least one slot gap between the cards if possible. Consider aftermarket GPU coolers or a case with 140mm fans if you see temperatures hitting 83C+ consistently.
PCIe bandwidth bottleneck
If your second slot runs at x4, you will see one GPU process tokens slower than the other. The impact is ~15% on overall throughput. Upgrading to a motherboard with proper x8/x8 bifurcation fixes this. For most users, the 15% loss is acceptable given the cost savings.
Who should NOT do this?
- Gamers who occasionally run LLMs. Dual 3090s draw 700W and generate significant heat. If you primarily game, a single RTX 4090 is a better all-rounder (though it cannot do 70B).
- Anyone who needs 70B at 30+ tok/s. Dual 3090s cap at ~22 tok/s. If speed is critical, dual 4090s or cloud are your options.
- Small form factor builders. Two triple-slot 3090s need a full tower case with good airflow. mITX and mATX builds cannot accommodate this.
See the recommended pick on the original guide
For used 3090 buying tips, see our used RTX 3090 buying guide. Planning to run Llama specifically? The best GPU for Llama 70B guide covers all options. PSU sizing for multi-GPU is covered in PSU for dual GPU LLM. And for motherboard compatibility, see best motherboard for dual GPU LLM.
Related guides on Best GPU for LLM
- Best Multi-GPU Setup for Local LLM in 2026 (Dual)
- Best Motherboard for Dual GPU LLM in 2026 (PCIe 5)
- PSU for Dual GPU LLM Setup in 2026: Wattage Guide
Continue on Best GPU for LLM for the complete guide with interactive calculators and current GPU prices.
Top comments (0)