Denis Rybakov

Posted on Feb 15

Circumventing Internet Censorship: Protocols, Techniques, and the Arms Race

#dpi #network #architecture #security

Part 2 of the series: "Internet Censorship in Russia: A Technical Deep Dive"

Disclaimer

Educational Purpose: This article examines the technical architecture of internet filtering systems for educational purposes. The author does not encourage violation of local laws. All information is based on publicly available sources and open-source research.

Quick Recap: What We Learned in Part 1

In Part 1, we examined Russia's TSPU (Technical Means of Counteracting Threats) infrastructure.

Key points to remember:

What TSPU can do:

See SNI (Server Name Indication) in TLS Client Hello — primary blocking method
Analyze metadata: packet sizes, timing patterns, connection duration
Use ML/statistics to fingerprint VPN protocols
Actively probe suspicious servers
Send RST packets to kill connections

What TSPU cannot do:

Decrypt modern TLS without MITM (Man-in-the-Middle)
Process 100% of traffic deeply (performance constraint at 100 Gbps)
Block without collateral damage (shared IPs, CDN problem)

The fundamental trade-off is usual: performance versus accuracy.

Key insight from Part 1: TSPU has limitations imposed by physics (processing speed) and architecture (distributed internet).

Two Sides of Technical Conflict

To understand circumvention approaches, we need to think from both perspectives:

"Attacker" side (conditional term, not moral judgment):

People wanting access to blocked resources
Privacy enthusiasts
Engineers researching censorship resistance
Goal: exploit architectural weaknesses of filtering infrastructure

"Defender" side (conditional term, not moral judgment):

TSPU operators
Government filtering mandate
Goal: maximize blocking effectiveness within architectural and economic constraints

In this part, we'll "wear shoes" of both sides:

What weakness does attacker exploit?
How can defender adapt?
What is next move in response?

This is engineering analysis of distributed systems under adversarial conditions.

Now we look at how circumvention techniques exploit TSPU limitations. First we recall OSI model to understand where encryption happens. Then we discuss what architectural weaknesses can be exploited.

Understanding Network Layers and Encryption

To understand why specific technique works, it is useful to look at where encryption happens in network stack.

Where encryption operates determines what DPI can see. Despite fact that DPI doesn't rely only on visible data, but also on packet and traffic patterns, visible part is checked first due to simplicity.

OSI Model: Where Things Happen

The OSI model describes network communication in layers:

┌─────────────────────────────────────┐
│ L7: Application (HTTP, SSH)         │
├─────────────────────────────────────┤
│ L6: Presentation                    │
├─────────────────────────────────────┤
│ L5: Session (TLS operates here)     │
├─────────────────────────────────────┤
│ L4: Transport (TCP, UDP)            │
├─────────────────────────────────────┤
│ L3: Network (IP)                    │
├─────────────────────────────────────┤
│ L2: Data Link                       │
├─────────────────────────────────────┤
│ L1: Physical                        │
└─────────────────────────────────────┘

TLS is often mapped to OSI Session/Presentation layers, but in modern TCP/IP model it acts as a security layer between application and transport.

What DPI Sees at Different Layers

Normal HTTPS connection (TLS-encrypted):

┌──────────────────────┐
│ HTTP (L7)            │
├──────────────────────┤
│ TLS (L5)             │ ← Encrypts HTTP
├──────────────────────┤
│ TCP (L4)             │
├──────────────────────┤
│ IP (L3)              │
└──────────────────────┘

What DPI sees:
✅ IP addresses (source, destination)
✅ TCP ports (443 for HTTPS)
✅ TLS handshake (Client Hello, Server Hello)
✅ SNI field in Client Hello ← PLAINTEXT
❌ HTTP content (encrypted by TLS)

The critical point: SNI must be sent before encryption begins (TLS specification requirement). This is primary blocking method in Part 1.

Nested Encryption: Protocol Inside Protocol

Here is interesting property of network stack: upper layer protocols can encapsulate lower layer protocols.

Example: HTTPS inside SSH tunnel

┌──────────────────────┐
│ HTTP (L7)            │
├──────────────────────┤
│ TLS (L5)             │ ← Encrypts HTTP
├──────────────────────┤
│ SSH (L7)             │ ← Encrypts TLS!
├──────────────────────┤
│ TCP (L4)             │
├──────────────────────┤
│ IP (L3)              │
└──────────────────────┘

What DPI sees:
✅ IP addresses
✅ TCP ports
✅ SSH protocol handshake
❌ TLS Client Hello (hidden inside SSH)
❌ SNI (hidden inside SSH)
❌ HTTP content (double-encrypted)

This is called nested encryption or protocol tunneling. We will return to this idea later.

Exploiting TSPU Architectural Weaknesses

Let's recall DPI limitations and specefics from Part 1:

Cannot decrypt encrypted traffic
Must work at line speed (100 Gbps)
Can analyze unencrypted hello-part of TLS

Now we address each limitation and see how "attacker" side can use it.

Weakness 1: Cannot See Inside Encrypted Traffic

Attacker approach: Use transport layer (L4) and build own encryption on top, where all traffic is encrypted.

Example: WireGuard protocol.

Defender response: Two options:

Ban all encrypted non-standard traffic (too broad, breaks legitimate apps)
Learn patterns of encrypted traffic (current approach)

How defender detects WireGuard:

For now, DPI in Russia detects pure WireGuard due to simple traffic patterns — standard size of first segment and reply. This is based on encryption specifics.

Detection logic:

Is it UDP?
Is first packet 148 bytes?
Is second packet 92 bytes?
Are data packets 16-byte aligned?

If all yes → WireGuard detected.

Attacker counter-adaptation: Use noise in protocol to break standard and repeated sizes.

Example: AmneziaWG — WireGuard updated with random padding rules.

What it changes:

1. Packet size randomization:

Normal WireGuard:
[Header: 16 bytes][Payload: 132 bytes] = 148 bytes (fixed!)

AmneziaWG:
[Header: 16 bytes][Payload: 132 bytes][Random padding: 0-100 bytes]
= 148-248 bytes (varies each packet)

2. Junk packet injection:

Send random UDP packets mixed with real WireGuard packets
Real packets have correct structure
Junk packets are random data
Server knows which are real (based on crypto validation)
DPI sees inconsistent packet sizes, garbage mixed with data

3. Protocol constants modification:

User can configure custom values that control padding behavior and junk generation
Different configurations create different fingerprints
Each client-server pair has unique "noise signature"

This is basic idea of how AmneziaWG works.

Vulnerability to government MITM:

AmneziaWG and WireGuard are not vulnerable to CA-based MITM because:

Work at UDP level (no TLS involved)
Implement own crypto (not certificate-based)
Government CA certificates are irrelevant

This is advantage over TLS-based approaches.

Weakness 2: TLS Client Hello Must Be Plaintext

We have TLS that cannot be decoded by DPI after handshake. But Client Hello is plaintext — can attacker still use strong side of TLS (encrypted traffic) while making unencrypted Hello useless for blocking?

The idea (related to nested encryption from OSI section): Deploy specific proxy-server with conditional behavior:

If authorized client → redirect through "white" site, then use encrypted channel as tunnel
If unauthorized client (DPI probe) → just proxy to real "white" site

This way, Client Hello goes to innocent site (passes checks), but encrypted channel carries real traffic.

This idea found implementation in VLESS+Reality protocol.

How VLESS+Reality Works

VLESS+Reality works roughly like this (simplified understanding, not exact implementation):

Architecture:

Reality is proxy server that receives TLS connections and makes decision: "is this my real client or someone else (maybe DPI probe)?"

Decision based on secret marker hidden in Client Hello.

For authorized client (knows the secret):

1. Client initiates TLS to github.com
   Client Hello contains:
   - SNI: github.com ← looks innocent
   - Secret marker in TLS session_id field

2. Reality server receives Client Hello
   - Extracts session_id
   - Checks secret marker
   - If valid → this is authorized client

3. Reality proxies TLS handshake to REAL github.com
   - Connects to actual GitHub servers
   - Gets GitHub's real certificate
   - Sends it to client
   - Client validates (it's real GitHub cert!)

4. TLS handshake completes → encrypted channel established

5. Reality switches connection into VLESS tunnel mode
   - Traffic inside TLS is VLESS protocol
   - VLESS can proxy connections anywhere
   - All hidden inside encrypted channel

For unauthorized client (DPI probe or random person):

1. Client connects without correct secret marker

2. Reality server detects invalid/missing marker
   - Acts as transparent proxy to github.com
   - Passes everything to real GitHub

3. Client talks to actual GitHub
   - Gets real GitHub responses
   - Cannot tell Reality server is involved

Why this design is beautiful:

What TSPU sees in both cases:

Authorized client:

SNI: github.com ✓
Destination IP: GitHub's actual IP ✓
TLS handshake: normal ✓
Certificate: real GitHub certificate ✓
After handshake: encrypted traffic ✓

Unauthorized client (DPI probe):

SNI: github.com ✓
Destination IP: GitHub's actual IP ✓
TLS handshake: normal ✓
Certificate: real GitHub certificate ✓
After handshake: connection to real GitHub ✓

From outside, both look identical!

How TSPU Can Still Detect Reality and how defender adapts:

Despite sophistication, VLESS+Reality has weaknesses:

Statistical patterns: Normal GitHub or other innocent site sessions are short and bursty in common. VPN tunnels are long-lived with continuous bidirectional flow. ML can flag anomalies.

IP reputation: If many clients connect to one IP with "github.com" SNI but IP is in VPS range → suspicious → triggers probing.

Government MITM: In case when user has Ministry root CA installed, government can substitute certificate, decrypt TLS, see VLESS inside. It works like follow:

Normal:
Client → Reality → GitHub → Real cert → Client ✓

With MITM:
Client → [DPI intercepts] → Reality
         ↓
    DPI generates substitute certificate
    Signed by Ministry CA

If client have a Ministry root CA installed, then:
→ Accepts substitute certificate
→ TLS connection is to DPI, not Reality
→ DPI decrypts all traffic
→ Sees VLESS protocol inside
→ Connection blocked or compromised

Weakness 3: DPI operates at line speed

From Part 1 we've learned a TSPU's trade-off of accuracy versus speed.

Defender constraint: Must process packets in microseconds at 100 Gbps.
Attacker exploit: Make analysis expensive enough that DPI cannot keep up.

Example tools: GoodbyeDPI, Zapret.

The Idea: Make analysis expensive enough to slow down a DPI filtering speed. Base idea is to split segments to smaller parts, wich force TSPU to gather a session, add a short-living segments that take a time to analyze but not received by destination - set a proper TTL and other teqniques.

TCP Segmentation

Concept: Split TLS Client Hello across multiple TCP segments so complete data is not visible in single packet.

Normal (one packet):

TCP packet (seq=1000):
┌────────────────────────────────────┐
│ TCP Header | TLS Client Hello      │
│            | SNI: blocked-site.com │
└────────────────────────────────────┘

DPI: sees complete data → fast check → BLOCK

With segmentation (two packets):

Packet 1 (seq=1000):
┌────────────────────────────────────┐
│ TCP Header | TLS partial...        │
│            | SNI: "blocked-si      │ ← incomplete!
└────────────────────────────────────┘

Packet 2 (seq=1050):
┌────────────────────────────────────┐
│ TCP Header | te.com"...            │
└────────────────────────────────────┘

What defender (DPI) must do:

Buffer packet 1 (requires memory)
Wait for packet 2 (requires state tracking)
Reassemble TCP stream (requires CPU)
Extract complete data
Check against blocklist

TTL Manipulation

Concept: Send two copies with different TTL (Time To Live) values. Or send a junk with short-living TTL.

Packet 1 (decoy):
TTL = 5 → expires at DPI node
SNI: "google.com"

Packet 2 (real):
TTL = 64 → reaches destination
SNI: "blocked-site.com"

Logic:
- DPI analyzes packet 1, sees "google.com" → allow
- Packet 1 expires before destination
- Packet 2 arrives, DPI cache says "allowed" → passes
- Destination receives only packet 2

There are some other tricks, that base on the same idea of manipulation of data in frames of allowed OSI stack.

Current effectiveness:

Highly dependent on:

ISP's TSPU configuration
Time of day (better during peak hours when TSPU under load)
Specific blocklist entries (some enforced stricter)

Also, a circumvention software should be often updated to keep up with the changes in TSPU.

What we've learned:

Circumvention is not about "unbreakable" protocol. It's about:

Understanding system's architectural constraints
Finding weakness in constraint
Exploiting it
Adapting when system adapts

What's Next

In Part 3, we'll examine real-world case studies:

YouTube throttling — how selective slowdown works (GGC manipulation)
Roblox complexity — why modern cloud architecture is hard to block (WebRTC, dynamic IPs, collateral damage)
FaceTime blocking — surgical, protocol-specific filtering with minimal collateral damage

We'll also discuss future: whitelisting scenarios, ECH status in Russia, QUIC challenges, and other constraints.

Next in series: Part 3 - "Case Studies: YouTube, Roblox, FaceTime, and what comes next"

DEV Community