DEV Community

Cover image for From “It Just Works” to Building a Reverse Tunnel
Mitul Shah
Mitul Shah

Posted on

From “It Just Works” to Building a Reverse Tunnel

My journey from struggling with payment webhooks to building FerroTunnel a deep dive into why reverse tunnels matter at every stage of your career

How It Started (2019)

Back in 2019, I was working as a backend developer integrating a payment gateway into a FinTech product. The documentation was clear:

"Configure your webhook / callback endpoint. We'll POST payment events to this URL."

The code itself was not complicated. I had the payment flow implemented locally and was confident about the logic. The only thing left was testing redirects and webhooks end-to-end.

The problem was obvious.

The gateway needed a publicly reachable URL, while my server lived on:

http://localhost:3000
Enter fullscreen mode Exit fullscreen mode

I remember asking around, trying to understand the usual workflow.

A colleague suggested, “Just use ngrok.”

One command later, I had a public URL pointing to my local server. Payment redirects worked. Webhooks started arriving instantly. I could see requests live and debug everything locally.

ngrok http 3000
# → https://abc123.ngrok-free.app
Enter fullscreen mode Exit fullscreen mode

At that moment, it felt almost magical.

I did not think about tunnels, networking, or protocols. I did not ask how it worked internally. It solved my problem immediately, and that was enough.

Fast-forward to 2024: I released FerroTunnel v1.0.0, a reverse tunnel written in Rust. This article is about everything that changed between those two moments.

Developer Days - When Tunnels Saved Hours

The Payment Gateway Nightmare

Building a fintech product meant integrating multiple payment providers: Stripe, Razorpay, PayPal. Each had the same pattern:

1. User makes payment
2. Provider processes it
3. Provider POSTs to your webhook
4. You update order status
Enter fullscreen mode Exit fullscreen mode

Simple, except for one problem: testing.

The old workflow (without tunnels):

1. Write webhook handler locally
2. Git commit & push
3. Wait for CI/CD (3-5 minutes)
4. Deploy to staging
5. SSH into staging server
6. Tail logs
7. Trigger test payment from Stripe dashboard
8. grep through logs to find the webhook
9. Find bug (typo in field name)
10. Go back to step 1

Time per iteration: 15-20 minutes
Enter fullscreen mode Exit fullscreen mode

I spent an entire afternoon debugging why webhook signatures weren't validating. The issue? Stripe sends created_at as Unix timestamp, I was parsing it as ISO string. 4 hours to fix a one-line bug.

With ngrok:

ngrok http 3000

# Configure Stripe webhook: https://abc123.ngrok-free.app/webhooks/stripe
# Make test payment
# See webhook in terminal instantly
# Set breakpoint in IDE
# Fix bug
# Test again (30 seconds)

Time per iteration: 30 seconds
Enter fullscreen mode Exit fullscreen mode

That 4-hour bug? Fixed in 12 minutes with a tunnel.

Beyond Payments: The Developer's Swiss Army Knife

Once I discovered tunnels, I found more uses:

OAuth flows:

GitHub OAuth → Redirect to localhost → Now works!
Enter fullscreen mode Exit fullscreen mode

Mobile app development:

Phone (React Native) → Tunnel → Laptop (API)
                                    ↓
                              Live debugging
                              Console.log visible
Enter fullscreen mode Exit fullscreen mode

Client demos:

Client: "Can I see the new feature?"
Me: *deploys to staging, 10 minutes*

With tunnel:
Me: *shares tunnel URL, 10 seconds*
Enter fullscreen mode Exit fullscreen mode

Slack app development:

Slack → Webhook → Tunnel → Localhost
                              ↓
                        Breakpoints work!
Enter fullscreen mode Exit fullscreen mode

The Developer Mindset

As a developer focused on shipping features, I valued three things about tunnels:

  1. Speed: One command, working immediately
  2. Reliability: Stayed connected while I coded
  3. Inspection: ngrok's web UI showed every request/response

I didn't care how they worked. They were a tool, like npm or git. Use it, move on.

The Shift - When Ownership Changes Perspective

The New World

As my role evolved and I took responsibility beyond writing code, my relationship with tunnels changed. Different title, different problems.

Instead of asking whether something worked, I started understand various scenarios as infra level:

  • How does this behave when it runs continuously?
  • What happens when connections drop?
  • How predictable is memory usage over time?
  • How observable is this under load?
  • Partners requiring API access
  • Strict security policies (no public internet exposure)

I started seeing tunnels not as a developer shortcut, but as part of the systems I was responsible for keeping healthy.

That is when I realized how little I actually understood about something I relied on so heavily.

Suddenly, "just use ngrok" wasn't enough.

New questions I'd never asked:

  • How many tunnels can one server handle?
  • What's the memory footprint per tunnel?
  • What happens when a tunnel crashes?
  • How do we monitor 50+ active tunnels?
  • What's the latency overhead?

As a developer, my tunnel ran for 2 hours while I coded. in Infra, these tunnels would run 24/7 for months.

Problem 2: Multi-Environment Chaos

Development environment:

20 developers
Each testing 3-5 services locally
Peak usage: ~80 concurrent tunnels
Enter fullscreen mode Exit fullscreen mode

Staging environment:

10 QA engineers
Testing integrated flows across 10 microservices
Peak usage: ~40 concurrent tunnels
Enter fullscreen mode Exit fullscreen mode

Production:

50 microservices exposed to partners
Must be stable, monitored, and fast
= 50 critical tunnels
Enter fullscreen mode Exit fullscreen mode

Total infrastructure need: 170+ concurrent tunnels

Questions that now mattered:

Developer Me As a Infra / Devops
Is my tunnel connected? What's the aggregate memory usage?
Can I see incoming requests? What's the P99 latency across all tunnels?
Did my webhook arrive? Which tunnels had errors in the last hour?
- What's our bandwidth cost projection?
- Which developer owns which tunnel?
- How do we handle tunnel server failure?

Problem 3: The ngrok Bill

Month 1 (5 developers):

5 × $8/month = $40/month
✓ Acceptable
Enter fullscreen mode Exit fullscreen mode

Month 6 (team growing):

Developers:  20 × $8  = $160/month
Staging:     10 × $8  = $80/month
Production:  5  × $20 = $100/month
                       ──────────
                       $340/month
Enter fullscreen mode Exit fullscreen mode

Engineering manager: "Why are we spending $340/month on tunnels?"

Evaluating Self-Hosted Options

I had already used Go-based tunneling tools extensively, including frp. They worked well and fit naturally into my ecosystem.

# Deployed frp server on AWS
# Started load testing

50 tunnels:  180 MB RAM  ✓ Good
100 tunnels: 420 MB RAM  ⚠️ Growing
150 tunnels: 680 MB RAM  ❌ Concerning
Enter fullscreen mode Exit fullscreen mode

The motivation to go deeper did not come from dissatisfaction with Go or existing tools. It came from recognizing that tunnels in this context were long-running and connection-heavy systems.

Memory behavior, tail latency, reconnection handling, and failure modes suddenly mattered much more than before.

That curiosity eventually pushed me to explore how tunnels actually work.

Down the Rabbit Hole - Why This Led to Building FerroTunnel

I decided to prototype a tunnel myself to answer questions I could not confidently answer before.

That exploration led me to compare implementations, observe long-running behavior, and experiment with different approaches, including Rust for this specific problem space.

How Do the Experts Do This?

I started researching how Cloudflare and ngrok actually work.

Cloudflare Tunnel (cloudflared):

Architecture:

Client (cloudflared) ← ──QUIC── → Cloudflare Edge (200+ locations)
                                        ↓
                                   End Users
Enter fullscreen mode Exit fullscreen mode

Technology:

  • Written in Go
  • Uses QUIC protocol (HTTP/3)
  • Protocol Buffers for serialization
  • Handles millions of tunnels globally

Why Go?

  • Fast development iteration
  • Excellent networking libraries (net/http, gRPC)
  • Goroutines make concurrency straightforward
  • Google-scale battle-testing

ngrok:

Architecture:

Client ← ──Custom Protocol/TLS── → ngrok Edge
  ↓                                  ↓
Local                           Inspection UI
Service                         Analytics
Enter fullscreen mode Exit fullscreen mode

Technology:

  • Custom binary protocol (optimized for their use case)
  • Beautiful web UI for request inspection
  • Smart reconnection with exponential backoff
  • TLS 1.3 everywhere

What I learned:

"These aren't simple scripts. Cloudflare processes billions of requests through tunnels. ngrok has handled hundreds of millions of developer sessions. This is serious distributed systems engineering."

The Hard Parts (That I Didn't Appreciate as a Developer)

1. Multiplexing

The core problem:

100 HTTP requests → 1 TCP connection → Route to correct handler
Enter fullscreen mode Exit fullscreen mode

Challenges I never considered:

  • Stream isolation: Don't mix response from stream 5 into stream 3
  • Flow control: Slow consumer shouldn't block fast ones
  • Backpressure: What if server can't keep up?
  • Head-of-line blocking: One slow request blocks others (HTTP/1.1 problem)
  • Resource cleanup: Stream 42 disconnects mid-transfer—clean up how?

2. Long-Lived Connections

Developer tunnel:

  • Runs for 2 hours while coding
  • Restart when it breaks
  • Memory leaks? Restart fixes it

Platform tunnel:

  • Runs for weeks or months
  • Can't "just restart" (production traffic)
  • Memory leak of 1KB/hour = 700MB/month
  • TCP keepalive tuning actually matters
  • Connection state recovery is critical

3. Performance Requirements

Developer perspective:

"As long as it's not noticeably slow, I'm happy."

Platform perspective:

User request latency breakdown:
  CDN/Edge:        10ms
  Tunnel:          ???ms  ← This must be minimal
  Service:         50ms
  Database:        20ms
                  ─────
  Total:           80ms + tunnel

If tunnel adds 10ms → 11% increase in total latency
If tunnel adds 2ms  → 2.5% increase ✓
Enter fullscreen mode Exit fullscreen mode

Every millisecond matters.

4. Observability

Developer needs:

  • Is it connected? ✓
  • Can I see requests? ✓

Platform needs:

  • Active tunnels count
  • Memory per tunnel
  • Bandwidth per tunnel per hour
  • Error rates and types
  • P50/P99/P99.9 latency
  • Connection lifecycle events
  • Which team owns which tunnel
  • Cost attribution

Over time, that experiment became FerroTunnel.

Not as a replacement for existing tools, but as a way to deeply understand how tunnels behave when they are treated as infrastructure rather than a convenience.

The Rust Experiment

Why Consider Rust?

I am Go Developer and working Comfortable with Go from several years Productive. Why change?

The rumors I kept hearing:

  1. "Rust has no garbage collector—no GC pauses!"
  2. "Memory safe without runtime overhead"
  3. "If it compiles, it usually works"
  4. "Zero-cost abstractions"

My skepticism:

"Go is already fast. GC pauses are sub-millisecond. How much difference could it really make?"

The trigger:

A colleague showed me memory usage graphs:

Go service (24 hours):
Memory: Sawtooth pattern (GC cycles)
Range: 180MB → 420MB → 190MB → 380MB

Rust service (24 hours):
Memory: Flat line
Constant: 85MB
Enter fullscreen mode Exit fullscreen mode

Me: "Wait, is this real?"

Him: "Deterministic memory. No GC. Worth learning for long-running services."

The Prototype Race

I decided to test it empirically: Build the same tunnel in both Go and Rust.

Go prototype (Weekend 1):

// Day 1: Working prototype
type Tunnel struct {
    streams map[uint32]*Stream
    mu      sync.RWMutex
}

// Day 2: Add HTTP ingress, basic multiplexing
// Total: ~800 lines of code
// Time: 2 days
Enter fullscreen mode Exit fullscreen mode

Test results (100 concurrent tunnels, 24 hours):

Memory start:  140 MB
Memory 24hrs:  410 MB
P50 latency:   1.1ms
P99 latency:   3.8ms
P99.9 latency: 12.3ms ← GC spikes visible
Enter fullscreen mode Exit fullscreen mode

Rust prototype (Weekend 2 + 3):

// Day 1-2: Fight borrow checker
// error[E0502]: cannot borrow `streams` as mutable
// error[E0597]: `data` does not live long enough
// ... 47 more errors

// Day 3: Read "The Rust Book" cover to cover

// Day 4-5: Rewrite with proper ownership model
pub struct Multiplexer {
    streams: Arc<RwLock<HashMap<u32, Stream>>>,
}

// Day 6: Finally compiles!
// Day 7: It works!

// Total: ~950 lines of code
// Time: 5 days (2.5× slower than Go)
Enter fullscreen mode Exit fullscreen mode

Test results (100 concurrent tunnels, 24 hours):

Memory start:  92 MB
Memory 24hrs:  92 MB  ← Flat line!
P50 latency:   0.8ms
P99 latency:   2.1ms
P99.9 latency: 4.2ms  ← Consistent, no spikes
Enter fullscreen mode Exit fullscreen mode

My reaction:

"Holy shit. This is actually real. The Rust hype isn't just hype."

The Numbers That Convinced Me

After 7 days of continuous operation:

Metric Go Rust Difference
Memory (start) 140 MB 92 MB -34%
Memory (7 days) 520 MB 92 MB -82%
P99 latency 3.8ms 2.1ms -45%
P99.9 latency 12.3ms 4.2ms -66%

100 tunnels × memory savings:

Go:   520 MB
Rust: 92 MB
      ────
Saved: 428 MB per 100 tunnels

At 500 tunnels:
Go:   2,600 MB (2.5 GB)
Rust: 460 MB
      ────
Saved: 2,140 MB (2.1 GB) = 82% reduction
Enter fullscreen mode Exit fullscreen mode

Infrastructure cost impact:

Go:   Need 4GB RAM servers
Rust: Can use 1GB RAM servers

AWS t3.small (2GB): $16.79/month
AWS t3.micro (1GB): $8.40/month

Savings: ~$100/year per server
At 10 servers: $1,000/year saved
Enter fullscreen mode Exit fullscreen mode

Latency impact on user experience:

User request path:
  CDN:            10ms
  Tunnel (Go):    3.8ms (P99)
  Service:        50ms
                  ─────
  Total:          63.8ms

With Rust:
  CDN:            10ms
  Tunnel (Rust):  2.1ms (P99)
  Service:        50ms
                  ─────
  Total:          62.1ms

Improvement: 2.7% faster end-to-end
Enter fullscreen mode Exit fullscreen mode

2.7% doesn't sound huge, but:

  • Across millions of requests
  • Compounding with other optimizations
  • Better user experience

The Commitment

I decided to go all-in on Rust.

Why:

  1. Numbers don't lie: Memory and latency improvements were real
  2. Platform fit: Long-running infrastructure is exactly where Rust shines
  3. Learning value: Even if I abandon this project, I'll understand systems programming deeply
  4. Future-proof: Memory efficiency scales linearly with tunnel count

What I committed to:

  • Learning Rust properly (no shortcuts, no fighting the borrow checker)
  • Building production-quality code (not a toy project)
  • Open sourcing everything (give back to community)
  • Documenting learnings (this article!)

What Changed Between Developer and Platform Engineer

Developer perspective:

  • Tunnels are a tool (like git or npm)
  • Speed and reliability matter
  • Don't care about internals
  • "It works" is good enough

Platform engineer perspective:

  • Tunnels are infrastructure
  • Every millisecond and megabyte matters
  • Must understand failure modes
  • "It works 99% of the time" isn't good enough

What Building This Taught Me

1. Problems look different at scale

1 tunnel:    "It's working!"
10 tunnels:  "How do we monitor these?"
100 tunnels: "What's our memory budget?"
1000 tunnels: "Every optimization matters"
Enter fullscreen mode Exit fullscreen mode

2. Performance is measurable, not theoretical

Everyone says "Rust is fast." But:

  • How much faster?
  • Does it matter for my use case?
  • What are the trade-offs?

Building FerroTunnel gave me real numbers to answer these questions.

3. The best way to learn is to build

I read about multiplexing, protocols, and async I/O for years. I understood them conceptually.

Building a tunnel forced me to understand them deeply:

  • How does backpressure propagate?
  • What happens when a stream closes mid-transfer?
  • How do you prevent resource leaks?

What Surprised Me

1. Rust wasn't as hard as I feared

Yes, the borrow checker fights you. But:

  • The error messages are helpful
  • The community is incredibly supportive
  • Once you "get" ownership, it clicks

2. The ecosystem is mature

I expected to write everything from scratch. Instead:

  • Tokio (async runtime): Production-grade
  • Serde (serialization): Just works
  • Bytes (zero-copy buffers): Perfectly designed

3. Memory efficiency compounds

92 MB vs 520 MB seems small. But:

  • Over 1000 tunnels: 2.1 GB saved
  • Over 24/7 operation: Stability matters
  • Lower memory = cheaper hosting

Building FerroTunnel

The Journey So Far

Timeline:

  • Learning Rust fundamentals
  • Core multiplexer implementation
  • HTTP/TCP ingress, plugin system
  • Observability, dashboard, polish
  • Testing, benchmarking, docs
  • v1.0.0 release to crates.io

Current state:

✅ Published to crates.io
✅ Docker images on GitHub Container Registry
✅ Powers my personal projects
✅ Full documentation and examples

What FerroTunnel includes:

Core features:
- HTTP and TCP tunneling
- TLS 1.3 with mutual TLS support
- Stream multiplexing with backpressure
- Auto-reconnect with exponential backoff

Observability:
- Real-time dashboard (WebSocket + Server-Sent Events)
- Prometheus metrics
- Structured JSON logging
- Connection lifecycle events

Extensibility:
- Plugin system for custom auth, rate limiting
- Built-in plugins: token auth, IP allowlist, logger
- Circuit breaker pattern

Developer experience:
- Both library API and CLI
- Docker Compose ready
- Clear error messages
- Extensive examples
Enter fullscreen mode Exit fullscreen mode

What's Next - Immediate roadmap:

  • v1.0.x: HTTP/2 support (native multiplexing)
  • v1.0.x: gRPC tunneling
  • v1.0.x: QUIC transport (like Cloudflare)
  • v1.0.x: Connection pooling
  • v1.0.x: Multi-region support

Closing Thoughts

This journey started with a simple need: testing callbacks on localhost.

It led to a deeper understanding of reverse tunnels, long-lived connections, and the trade-offs involved in building and operating them.

If you use tunnels regularly, I strongly recommend taking the time to understand how they work internally. The perspective you gain changes how you design, evaluate, and own systems.

What I built:

Not just a tunnel. But:

  • A deep understanding of multiplexing
  • Production systems programming experience in Rust
  • An open-source tool others can use
  • Foundation for future micro-SaaS

What you can learn from this:

  1. Use the right tool for the job

    • ngrok for quick development
    • Cloudflare for enterprise scale
    • Self-hosted for control and learning
  2. Understand what you use daily

    • I used tunnels for 3 years without understanding them
    • Building one taught me more than any tutorial
  3. Performance claims need testing

    • "Rust is faster" → By how much?
    • Prototype both, measure, decide

Try FerroTunnel:

  # Install
cargo install ferrotunnel-cli

# Start server
ferrotunnel server --token secret

# Start client (in another terminal; token from env or secure prompt if omitted)
ferrotunnel client --server localhost:7835 --local-addr 127.0.0.1:8080 --tunnel-id my-app
Enter fullscreen mode Exit fullscreen mode

What’s Next

Want the technical deep-dive?

Part 2 will covers:

  • Why Rust specifically (Go developer's perspective)
  • Multiplexer architecture and design decisions
  • Protocol choices and trade-offs
  • Performance benchmarks and optimization
  • Key learnings from building in Rust

Let's Discuss

Questions I'd love to hear:

  • How do you use tunnels in your workflow?
  • Have you hit scaling issues with ngrok/frp?
  • Considering Rust for infrastructure? What's holding you back?
  • Built something similar? What challenges did you face?

Drop a comment or open an issue on GitHub!

Top comments (0)