Mitul Shah

Posted on Feb 9

From “It Just Works” to Building a Reverse Tunnel

#rust #devtool #infrastructure #opensource

My journey from struggling with payment webhooks to building FerroTunnel a deep dive into why reverse tunnels matter at every stage of your career

How It Started (2019)

Back in 2019, I was working as a backend developer integrating a payment gateway into a FinTech product. The documentation was clear:

"Configure your webhook / callback endpoint. We'll POST payment events to this URL."

The code itself was not complicated. I had the payment flow implemented locally and was confident about the logic. The only thing left was testing redirects and webhooks end-to-end.

The problem was obvious.

The gateway needed a publicly reachable URL, while my server lived on:

http://localhost:3000

I remember asking around, trying to understand the usual workflow.

A colleague suggested, “Just use ngrok.”

One command later, I had a public URL pointing to my local server. Payment redirects worked. Webhooks started arriving instantly. I could see requests live and debug everything locally.

ngrok http 3000
# → https://abc123.ngrok-free.app

At that moment, it felt almost magical.

I did not think about tunnels, networking, or protocols. I did not ask how it worked internally. It solved my problem immediately, and that was enough.

Fast-forward to 2024: I released FerroTunnel v1.0.0, a reverse tunnel written in Rust. This article is about everything that changed between those two moments.

Developer Days - When Tunnels Saved Hours

The Payment Gateway Nightmare

Building a fintech product meant integrating multiple payment providers: Stripe, Razorpay, PayPal. Each had the same pattern:

1. User makes payment
2. Provider processes it
3. Provider POSTs to your webhook
4. You update order status

Simple, except for one problem: testing.

The old workflow (without tunnels):

1. Write webhook handler locally
2. Git commit & push
3. Wait for CI/CD (3-5 minutes)
4. Deploy to staging
5. SSH into staging server
6. Tail logs
7. Trigger test payment from Stripe dashboard
8. grep through logs to find the webhook
9. Find bug (typo in field name)
10. Go back to step 1

Time per iteration: 15-20 minutes

I spent an entire afternoon debugging why webhook signatures weren't validating. The issue? Stripe sends created_at as Unix timestamp, I was parsing it as ISO string. 4 hours to fix a one-line bug.

With ngrok:

ngrok http 3000

# Configure Stripe webhook: https://abc123.ngrok-free.app/webhooks/stripe
# Make test payment
# See webhook in terminal instantly
# Set breakpoint in IDE
# Fix bug
# Test again (30 seconds)

Time per iteration: 30 seconds

That 4-hour bug? Fixed in 12 minutes with a tunnel.

Beyond Payments: The Developer's Swiss Army Knife

Once I discovered tunnels, I found more uses:

OAuth flows:

GitHub OAuth → Redirect to localhost → Now works!

Mobile app development:

Phone (React Native) → Tunnel → Laptop (API)
                                    ↓
                              Live debugging
                              Console.log visible

Client demos:

Client: "Can I see the new feature?"
Me: *deploys to staging, 10 minutes*

With tunnel:
Me: *shares tunnel URL, 10 seconds*

Slack app development:

Slack → Webhook → Tunnel → Localhost
                              ↓
                        Breakpoints work!

The Developer Mindset

As a developer focused on shipping features, I valued three things about tunnels:

Speed: One command, working immediately
Reliability: Stayed connected while I coded
Inspection: ngrok's web UI showed every request/response

I didn't care how they worked. They were a tool, like npm or git. Use it, move on.

The Shift - When Ownership Changes Perspective

The New World

As my role evolved and I took responsibility beyond writing code, my relationship with tunnels changed. Different title, different problems.

Instead of asking whether something worked, I started understand various scenarios as infra level:

How does this behave when it runs continuously?
What happens when connections drop?
How predictable is memory usage over time?
How observable is this under load?
Partners requiring API access
Strict security policies (no public internet exposure)

I started seeing tunnels not as a developer shortcut, but as part of the systems I was responsible for keeping healthy.

That is when I realized how little I actually understood about something I relied on so heavily.

Suddenly, "just use ngrok" wasn't enough.

New questions I'd never asked:

How many tunnels can one server handle?
What's the memory footprint per tunnel?
What happens when a tunnel crashes?
How do we monitor 50+ active tunnels?
What's the latency overhead?

As a developer, my tunnel ran for 2 hours while I coded. in Infra, these tunnels would run 24/7 for months.

Problem 2: Multi-Environment Chaos

Development environment:

20 developers
Each testing 3-5 services locally
Peak usage: ~80 concurrent tunnels

Staging environment:

10 QA engineers
Testing integrated flows across 10 microservices
Peak usage: ~40 concurrent tunnels

Production:

50 microservices exposed to partners
Must be stable, monitored, and fast
= 50 critical tunnels

Total infrastructure need: 170+ concurrent tunnels

Questions that now mattered:

Developer Me	As a Infra / Devops
Is my tunnel connected?	What's the aggregate memory usage?
Can I see incoming requests?	What's the P99 latency across all tunnels?
Did my webhook arrive?	Which tunnels had errors in the last hour?
-	What's our bandwidth cost projection?
-	Which developer owns which tunnel?
-	How do we handle tunnel server failure?

Problem 3: The ngrok Bill

Month 1 (5 developers):

5 × $8/month = $40/month
✓ Acceptable

Month 6 (team growing):

Developers:  20 × $8  = $160/month
Staging:     10 × $8  = $80/month
Production:  5  × $20 = $100/month
                       ──────────
                       $340/month

Engineering manager: "Why are we spending $340/month on tunnels?"

Evaluating Self-Hosted Options

I had already used Go-based tunneling tools extensively, including frp. They worked well and fit naturally into my ecosystem.

# Deployed frp server on AWS
# Started load testing

50 tunnels:  180 MB RAM  ✓ Good
100 tunnels: 420 MB RAM  ⚠️ Growing
150 tunnels: 680 MB RAM  ❌ Concerning

The motivation to go deeper did not come from dissatisfaction with Go or existing tools. It came from recognizing that tunnels in this context were long-running and connection-heavy systems.

Memory behavior, tail latency, reconnection handling, and failure modes suddenly mattered much more than before.

That curiosity eventually pushed me to explore how tunnels actually work.

Down the Rabbit Hole - Why This Led to Building FerroTunnel

I decided to prototype a tunnel myself to answer questions I could not confidently answer before.

That exploration led me to compare implementations, observe long-running behavior, and experiment with different approaches, including Rust for this specific problem space.

How Do the Experts Do This?

I started researching how Cloudflare and ngrok actually work.

Cloudflare Tunnel (cloudflared):

Architecture:

Client (cloudflared) ← ──QUIC── → Cloudflare Edge (200+ locations)
                                        ↓
                                   End Users

Technology:

Written in Go
Uses QUIC protocol (HTTP/3)
Protocol Buffers for serialization
Handles millions of tunnels globally

Why Go?

Fast development iteration
Excellent networking libraries (net/http, gRPC)
Goroutines make concurrency straightforward
Google-scale battle-testing

ngrok:

Architecture:

Client ← ──Custom Protocol/TLS── → ngrok Edge
  ↓                                  ↓
Local                           Inspection UI
Service                         Analytics

Technology:

Custom binary protocol (optimized for their use case)
Beautiful web UI for request inspection
Smart reconnection with exponential backoff
TLS 1.3 everywhere

What I learned:

"These aren't simple scripts. Cloudflare processes billions of requests through tunnels. ngrok has handled hundreds of millions of developer sessions. This is serious distributed systems engineering."

The Hard Parts (That I Didn't Appreciate as a Developer)

1. Multiplexing

The core problem:

100 HTTP requests → 1 TCP connection → Route to correct handler

Challenges I never considered:

Stream isolation: Don't mix response from stream 5 into stream 3
Flow control: Slow consumer shouldn't block fast ones
Backpressure: What if server can't keep up?
Head-of-line blocking: One slow request blocks others (HTTP/1.1 problem)
Resource cleanup: Stream 42 disconnects mid-transfer—clean up how?

2. Long-Lived Connections

Developer tunnel:

Runs for 2 hours while coding
Restart when it breaks
Memory leaks? Restart fixes it

Platform tunnel:

Runs for weeks or months
Can't "just restart" (production traffic)
Memory leak of 1KB/hour = 700MB/month
TCP keepalive tuning actually matters
Connection state recovery is critical

3. Performance Requirements

Developer perspective:

"As long as it's not noticeably slow, I'm happy."

Platform perspective:

User request latency breakdown:
  CDN/Edge:        10ms
  Tunnel:          ???ms  ← This must be minimal
  Service:         50ms
  Database:        20ms
                  ─────
  Total:           80ms + tunnel

If tunnel adds 10ms → 11% increase in total latency
If tunnel adds 2ms  → 2.5% increase ✓

Every millisecond matters.

4. Observability

Developer needs:

Is it connected? ✓
Can I see requests? ✓

Platform needs:

Active tunnels count
Memory per tunnel
Bandwidth per tunnel per hour
Error rates and types
P50/P99/P99.9 latency
Connection lifecycle events
Which team owns which tunnel
Cost attribution

Over time, that experiment became FerroTunnel.

Not as a replacement for existing tools, but as a way to deeply understand how tunnels behave when they are treated as infrastructure rather than a convenience.

The Rust Experiment

Why Consider Rust?

I am Go Developer and working Comfortable with Go from several years Productive. Why change?

The rumors I kept hearing:

"Rust has no garbage collector—no GC pauses!"
"Memory safe without runtime overhead"
"If it compiles, it usually works"
"Zero-cost abstractions"

My skepticism:

"Go is already fast. GC pauses are sub-millisecond. How much difference could it really make?"

The trigger:

A colleague showed me memory usage graphs:

Go service (24 hours):
Memory: Sawtooth pattern (GC cycles)
Range: 180MB → 420MB → 190MB → 380MB

Rust service (24 hours):
Memory: Flat line
Constant: 85MB

Me: "Wait, is this real?"

Him: "Deterministic memory. No GC. Worth learning for long-running services."

The Prototype Race

I decided to test it empirically: Build the same tunnel in both Go and Rust.

Go prototype (Weekend 1):

// Day 1: Working prototype
type Tunnel struct {
    streams map[uint32]*Stream
    mu      sync.RWMutex
}

// Day 2: Add HTTP ingress, basic multiplexing
// Total: ~800 lines of code
// Time: 2 days

Test results (100 concurrent tunnels, 24 hours):

Memory start:  140 MB
Memory 24hrs:  410 MB
P50 latency:   1.1ms
P99 latency:   3.8ms
P99.9 latency: 12.3ms ← GC spikes visible

Rust prototype (Weekend 2 + 3):

// Day 1-2: Fight borrow checker
// error[E0502]: cannot borrow `streams` as mutable
// error[E0597]: `data` does not live long enough
// ... 47 more errors

// Day 3: Read "The Rust Book" cover to cover

// Day 4-5: Rewrite with proper ownership model
pub struct Multiplexer {
    streams: Arc<RwLock<HashMap<u32, Stream>>>,
}

// Day 6: Finally compiles!
// Day 7: It works!

// Total: ~950 lines of code
// Time: 5 days (2.5× slower than Go)

Test results (100 concurrent tunnels, 24 hours):

Memory start:  92 MB
Memory 24hrs:  92 MB  ← Flat line!
P50 latency:   0.8ms
P99 latency:   2.1ms
P99.9 latency: 4.2ms  ← Consistent, no spikes

My reaction:

"Holy shit. This is actually real. The Rust hype isn't just hype."

The Numbers That Convinced Me

After 7 days of continuous operation:

Metric	Go	Rust	Difference
Memory (start)	140 MB	92 MB	-34%
Memory (7 days)	520 MB	92 MB	-82%
P99 latency	3.8ms	2.1ms	-45%
P99.9 latency	12.3ms	4.2ms	-66%

100 tunnels × memory savings:

Go:   520 MB
Rust: 92 MB
      ────
Saved: 428 MB per 100 tunnels

At 500 tunnels:
Go:   2,600 MB (2.5 GB)
Rust: 460 MB
      ────
Saved: 2,140 MB (2.1 GB) = 82% reduction

Infrastructure cost impact:

Go:   Need 4GB RAM servers
Rust: Can use 1GB RAM servers

AWS t3.small (2GB): $16.79/month
AWS t3.micro (1GB): $8.40/month

Savings: ~$100/year per server
At 10 servers: $1,000/year saved

Latency impact on user experience:

User request path:
  CDN:            10ms
  Tunnel (Go):    3.8ms (P99)
  Service:        50ms
                  ─────
  Total:          63.8ms

With Rust:
  CDN:            10ms
  Tunnel (Rust):  2.1ms (P99)
  Service:        50ms
                  ─────
  Total:          62.1ms

Improvement: 2.7% faster end-to-end

2.7% doesn't sound huge, but:

Across millions of requests
Compounding with other optimizations
Better user experience

The Commitment

I decided to go all-in on Rust.

Why:

Numbers don't lie: Memory and latency improvements were real
Platform fit: Long-running infrastructure is exactly where Rust shines
Learning value: Even if I abandon this project, I'll understand systems programming deeply
Future-proof: Memory efficiency scales linearly with tunnel count

What I committed to:

Learning Rust properly (no shortcuts, no fighting the borrow checker)
Building production-quality code (not a toy project)
Open sourcing everything (give back to community)
Documenting learnings (this article!)

What Changed Between Developer and Platform Engineer

Developer perspective:

Tunnels are a tool (like git or npm)
Speed and reliability matter
Don't care about internals
"It works" is good enough

Platform engineer perspective:

Tunnels are infrastructure
Every millisecond and megabyte matters
Must understand failure modes
"It works 99% of the time" isn't good enough

What Building This Taught Me

1. Problems look different at scale

1 tunnel:    "It's working!"
10 tunnels:  "How do we monitor these?"
100 tunnels: "What's our memory budget?"
1000 tunnels: "Every optimization matters"

2. Performance is measurable, not theoretical

Everyone says "Rust is fast." But:

How much faster?
Does it matter for my use case?
What are the trade-offs?

Building FerroTunnel gave me real numbers to answer these questions.

3. The best way to learn is to build

I read about multiplexing, protocols, and async I/O for years. I understood them conceptually.

Building a tunnel forced me to understand them deeply:

How does backpressure propagate?
What happens when a stream closes mid-transfer?
How do you prevent resource leaks?

What Surprised Me

1. Rust wasn't as hard as I feared

Yes, the borrow checker fights you. But:

The error messages are helpful
The community is incredibly supportive
Once you "get" ownership, it clicks

2. The ecosystem is mature

I expected to write everything from scratch. Instead:

Tokio (async runtime): Production-grade
Serde (serialization): Just works
Bytes (zero-copy buffers): Perfectly designed

3. Memory efficiency compounds

92 MB vs 520 MB seems small. But:

Over 1000 tunnels: 2.1 GB saved
Over 24/7 operation: Stability matters
Lower memory = cheaper hosting

Building FerroTunnel

The Journey So Far

Timeline:

Learning Rust fundamentals
Core multiplexer implementation
HTTP/TCP ingress, plugin system
Observability, dashboard, polish
Testing, benchmarking, docs
v1.0.0 release to crates.io

Current state:

✅ Published to crates.io
✅ Docker images on GitHub Container Registry
✅ Powers my personal projects
✅ Full documentation and examples

What FerroTunnel includes:

Core features:
- HTTP and TCP tunneling
- TLS 1.3 with mutual TLS support
- Stream multiplexing with backpressure
- Auto-reconnect with exponential backoff

Observability:
- Real-time dashboard (WebSocket + Server-Sent Events)
- Prometheus metrics
- Structured JSON logging
- Connection lifecycle events

Extensibility:
- Plugin system for custom auth, rate limiting
- Built-in plugins: token auth, IP allowlist, logger
- Circuit breaker pattern

Developer experience:
- Both library API and CLI
- Docker Compose ready
- Clear error messages
- Extensive examples

What's Next - Immediate roadmap:

v1.0.x: HTTP/2 support (native multiplexing)
v1.0.x: gRPC tunneling
v1.0.x: QUIC transport (like Cloudflare)
v1.0.x: Connection pooling
v1.0.x: Multi-region support

Closing Thoughts

This journey started with a simple need: testing callbacks on localhost.

It led to a deeper understanding of reverse tunnels, long-lived connections, and the trade-offs involved in building and operating them.

If you use tunnels regularly, I strongly recommend taking the time to understand how they work internally. The perspective you gain changes how you design, evaluate, and own systems.

What I built:

Not just a tunnel. But:

A deep understanding of multiplexing
Production systems programming experience in Rust
An open-source tool others can use
Foundation for future micro-SaaS

What you can learn from this:

Use the right tool for the job
- ngrok for quick development
- Cloudflare for enterprise scale
- Self-hosted for control and learning
Understand what you use daily
- I used tunnels for 3 years without understanding them
- Building one taught me more than any tutorial
Performance claims need testing
- "Rust is faster" → By how much?
- Prototype both, measure, decide

Try FerroTunnel:

GitHub: https://github.com/MitulShah1/ferrotunnel
Quick start:

  # Install
cargo install ferrotunnel-cli

# Start server
ferrotunnel server --token secret

# Start client (in another terminal; token from env or secure prompt if omitted)
ferrotunnel client --server localhost:7835 --local-addr 127.0.0.1:8080 --tunnel-id my-app

What’s Next

Want the technical deep-dive?

Part 2 will covers:

Why Rust specifically (Go developer's perspective)
Multiplexer architecture and design decisions
Protocol choices and trade-offs
Performance benchmarks and optimization
Key learnings from building in Rust

DEV Community