My journey from struggling with payment webhooks to building FerroTunnel a deep dive into why reverse tunnels matter at every stage of your career
How It Started (2019)
Back in 2019, I was working as a backend developer integrating a payment gateway into a FinTech product. The documentation was clear:
"Configure your webhook / callback endpoint. We'll POST payment events to this URL."
The code itself was not complicated. I had the payment flow implemented locally and was confident about the logic. The only thing left was testing redirects and webhooks end-to-end.
The problem was obvious.
The gateway needed a publicly reachable URL, while my server lived on:
http://localhost:3000
I remember asking around, trying to understand the usual workflow.
A colleague suggested, “Just use ngrok.”
One command later, I had a public URL pointing to my local server. Payment redirects worked. Webhooks started arriving instantly. I could see requests live and debug everything locally.
ngrok http 3000
# → https://abc123.ngrok-free.app
At that moment, it felt almost magical.
I did not think about tunnels, networking, or protocols. I did not ask how it worked internally. It solved my problem immediately, and that was enough.
Fast-forward to 2024: I released FerroTunnel v1.0.0, a reverse tunnel written in Rust. This article is about everything that changed between those two moments.
Developer Days - When Tunnels Saved Hours
The Payment Gateway Nightmare
Building a fintech product meant integrating multiple payment providers: Stripe, Razorpay, PayPal. Each had the same pattern:
1. User makes payment
2. Provider processes it
3. Provider POSTs to your webhook
4. You update order status
Simple, except for one problem: testing.
The old workflow (without tunnels):
1. Write webhook handler locally
2. Git commit & push
3. Wait for CI/CD (3-5 minutes)
4. Deploy to staging
5. SSH into staging server
6. Tail logs
7. Trigger test payment from Stripe dashboard
8. grep through logs to find the webhook
9. Find bug (typo in field name)
10. Go back to step 1
Time per iteration: 15-20 minutes
I spent an entire afternoon debugging why webhook signatures weren't validating. The issue? Stripe sends created_at as Unix timestamp, I was parsing it as ISO string. 4 hours to fix a one-line bug.
With ngrok:
ngrok http 3000
# Configure Stripe webhook: https://abc123.ngrok-free.app/webhooks/stripe
# Make test payment
# See webhook in terminal instantly
# Set breakpoint in IDE
# Fix bug
# Test again (30 seconds)
Time per iteration: 30 seconds
That 4-hour bug? Fixed in 12 minutes with a tunnel.
Beyond Payments: The Developer's Swiss Army Knife
Once I discovered tunnels, I found more uses:
OAuth flows:
GitHub OAuth → Redirect to localhost → Now works!
Mobile app development:
Phone (React Native) → Tunnel → Laptop (API)
↓
Live debugging
Console.log visible
Client demos:
Client: "Can I see the new feature?"
Me: *deploys to staging, 10 minutes*
With tunnel:
Me: *shares tunnel URL, 10 seconds*
Slack app development:
Slack → Webhook → Tunnel → Localhost
↓
Breakpoints work!
The Developer Mindset
As a developer focused on shipping features, I valued three things about tunnels:
- Speed: One command, working immediately
- Reliability: Stayed connected while I coded
- Inspection: ngrok's web UI showed every request/response
I didn't care how they worked. They were a tool, like npm or git. Use it, move on.
The Shift - When Ownership Changes Perspective
The New World
As my role evolved and I took responsibility beyond writing code, my relationship with tunnels changed. Different title, different problems.
Instead of asking whether something worked, I started understand various scenarios as infra level:
- How does this behave when it runs continuously?
- What happens when connections drop?
- How predictable is memory usage over time?
- How observable is this under load?
- Partners requiring API access
- Strict security policies (no public internet exposure)
I started seeing tunnels not as a developer shortcut, but as part of the systems I was responsible for keeping healthy.
That is when I realized how little I actually understood about something I relied on so heavily.
Suddenly, "just use ngrok" wasn't enough.
New questions I'd never asked:
- How many tunnels can one server handle?
- What's the memory footprint per tunnel?
- What happens when a tunnel crashes?
- How do we monitor 50+ active tunnels?
- What's the latency overhead?
As a developer, my tunnel ran for 2 hours while I coded. in Infra, these tunnels would run 24/7 for months.
Problem 2: Multi-Environment Chaos
Development environment:
20 developers
Each testing 3-5 services locally
Peak usage: ~80 concurrent tunnels
Staging environment:
10 QA engineers
Testing integrated flows across 10 microservices
Peak usage: ~40 concurrent tunnels
Production:
50 microservices exposed to partners
Must be stable, monitored, and fast
= 50 critical tunnels
Total infrastructure need: 170+ concurrent tunnels
Questions that now mattered:
| Developer Me | As a Infra / Devops |
|---|---|
| Is my tunnel connected? | What's the aggregate memory usage? |
| Can I see incoming requests? | What's the P99 latency across all tunnels? |
| Did my webhook arrive? | Which tunnels had errors in the last hour? |
| - | What's our bandwidth cost projection? |
| - | Which developer owns which tunnel? |
| - | How do we handle tunnel server failure? |
Problem 3: The ngrok Bill
Month 1 (5 developers):
5 × $8/month = $40/month
✓ Acceptable
Month 6 (team growing):
Developers: 20 × $8 = $160/month
Staging: 10 × $8 = $80/month
Production: 5 × $20 = $100/month
──────────
$340/month
Engineering manager: "Why are we spending $340/month on tunnels?"
Evaluating Self-Hosted Options
I had already used Go-based tunneling tools extensively, including frp. They worked well and fit naturally into my ecosystem.
# Deployed frp server on AWS
# Started load testing
50 tunnels: 180 MB RAM ✓ Good
100 tunnels: 420 MB RAM ⚠️ Growing
150 tunnels: 680 MB RAM ❌ Concerning
The motivation to go deeper did not come from dissatisfaction with Go or existing tools. It came from recognizing that tunnels in this context were long-running and connection-heavy systems.
Memory behavior, tail latency, reconnection handling, and failure modes suddenly mattered much more than before.
That curiosity eventually pushed me to explore how tunnels actually work.
Down the Rabbit Hole - Why This Led to Building FerroTunnel
I decided to prototype a tunnel myself to answer questions I could not confidently answer before.
That exploration led me to compare implementations, observe long-running behavior, and experiment with different approaches, including Rust for this specific problem space.
How Do the Experts Do This?
I started researching how Cloudflare and ngrok actually work.
Cloudflare Tunnel (cloudflared):
Architecture:
Client (cloudflared) ← ──QUIC── → Cloudflare Edge (200+ locations)
↓
End Users
Technology:
- Written in Go
- Uses QUIC protocol (HTTP/3)
- Protocol Buffers for serialization
- Handles millions of tunnels globally
Why Go?
- Fast development iteration
- Excellent networking libraries (
net/http, gRPC) - Goroutines make concurrency straightforward
- Google-scale battle-testing
ngrok:
Architecture:
Client ← ──Custom Protocol/TLS── → ngrok Edge
↓ ↓
Local Inspection UI
Service Analytics
Technology:
- Custom binary protocol (optimized for their use case)
- Beautiful web UI for request inspection
- Smart reconnection with exponential backoff
- TLS 1.3 everywhere
What I learned:
"These aren't simple scripts. Cloudflare processes billions of requests through tunnels. ngrok has handled hundreds of millions of developer sessions. This is serious distributed systems engineering."
The Hard Parts (That I Didn't Appreciate as a Developer)
1. Multiplexing
The core problem:
100 HTTP requests → 1 TCP connection → Route to correct handler
Challenges I never considered:
- Stream isolation: Don't mix response from stream 5 into stream 3
- Flow control: Slow consumer shouldn't block fast ones
- Backpressure: What if server can't keep up?
- Head-of-line blocking: One slow request blocks others (HTTP/1.1 problem)
- Resource cleanup: Stream 42 disconnects mid-transfer—clean up how?
2. Long-Lived Connections
Developer tunnel:
- Runs for 2 hours while coding
- Restart when it breaks
- Memory leaks? Restart fixes it
Platform tunnel:
- Runs for weeks or months
- Can't "just restart" (production traffic)
- Memory leak of 1KB/hour = 700MB/month
- TCP keepalive tuning actually matters
- Connection state recovery is critical
3. Performance Requirements
Developer perspective:
"As long as it's not noticeably slow, I'm happy."
Platform perspective:
User request latency breakdown:
CDN/Edge: 10ms
Tunnel: ???ms ← This must be minimal
Service: 50ms
Database: 20ms
─────
Total: 80ms + tunnel
If tunnel adds 10ms → 11% increase in total latency
If tunnel adds 2ms → 2.5% increase ✓
Every millisecond matters.
4. Observability
Developer needs:
- Is it connected? ✓
- Can I see requests? ✓
Platform needs:
- Active tunnels count
- Memory per tunnel
- Bandwidth per tunnel per hour
- Error rates and types
- P50/P99/P99.9 latency
- Connection lifecycle events
- Which team owns which tunnel
- Cost attribution
Over time, that experiment became FerroTunnel.
Not as a replacement for existing tools, but as a way to deeply understand how tunnels behave when they are treated as infrastructure rather than a convenience.
The Rust Experiment
Why Consider Rust?
I am Go Developer and working Comfortable with Go from several years Productive. Why change?
The rumors I kept hearing:
- "Rust has no garbage collector—no GC pauses!"
- "Memory safe without runtime overhead"
- "If it compiles, it usually works"
- "Zero-cost abstractions"
My skepticism:
"Go is already fast. GC pauses are sub-millisecond. How much difference could it really make?"
The trigger:
A colleague showed me memory usage graphs:
Go service (24 hours):
Memory: Sawtooth pattern (GC cycles)
Range: 180MB → 420MB → 190MB → 380MB
Rust service (24 hours):
Memory: Flat line
Constant: 85MB
Me: "Wait, is this real?"
Him: "Deterministic memory. No GC. Worth learning for long-running services."
The Prototype Race
I decided to test it empirically: Build the same tunnel in both Go and Rust.
Go prototype (Weekend 1):
// Day 1: Working prototype
type Tunnel struct {
streams map[uint32]*Stream
mu sync.RWMutex
}
// Day 2: Add HTTP ingress, basic multiplexing
// Total: ~800 lines of code
// Time: 2 days
Test results (100 concurrent tunnels, 24 hours):
Memory start: 140 MB
Memory 24hrs: 410 MB
P50 latency: 1.1ms
P99 latency: 3.8ms
P99.9 latency: 12.3ms ← GC spikes visible
Rust prototype (Weekend 2 + 3):
// Day 1-2: Fight borrow checker
// error[E0502]: cannot borrow `streams` as mutable
// error[E0597]: `data` does not live long enough
// ... 47 more errors
// Day 3: Read "The Rust Book" cover to cover
// Day 4-5: Rewrite with proper ownership model
pub struct Multiplexer {
streams: Arc<RwLock<HashMap<u32, Stream>>>,
}
// Day 6: Finally compiles!
// Day 7: It works!
// Total: ~950 lines of code
// Time: 5 days (2.5× slower than Go)
Test results (100 concurrent tunnels, 24 hours):
Memory start: 92 MB
Memory 24hrs: 92 MB ← Flat line!
P50 latency: 0.8ms
P99 latency: 2.1ms
P99.9 latency: 4.2ms ← Consistent, no spikes
My reaction:
"Holy shit. This is actually real. The Rust hype isn't just hype."
The Numbers That Convinced Me
After 7 days of continuous operation:
| Metric | Go | Rust | Difference |
|---|---|---|---|
| Memory (start) | 140 MB | 92 MB | -34% |
| Memory (7 days) | 520 MB | 92 MB | -82% |
| P99 latency | 3.8ms | 2.1ms | -45% |
| P99.9 latency | 12.3ms | 4.2ms | -66% |
100 tunnels × memory savings:
Go: 520 MB
Rust: 92 MB
────
Saved: 428 MB per 100 tunnels
At 500 tunnels:
Go: 2,600 MB (2.5 GB)
Rust: 460 MB
────
Saved: 2,140 MB (2.1 GB) = 82% reduction
Infrastructure cost impact:
Go: Need 4GB RAM servers
Rust: Can use 1GB RAM servers
AWS t3.small (2GB): $16.79/month
AWS t3.micro (1GB): $8.40/month
Savings: ~$100/year per server
At 10 servers: $1,000/year saved
Latency impact on user experience:
User request path:
CDN: 10ms
Tunnel (Go): 3.8ms (P99)
Service: 50ms
─────
Total: 63.8ms
With Rust:
CDN: 10ms
Tunnel (Rust): 2.1ms (P99)
Service: 50ms
─────
Total: 62.1ms
Improvement: 2.7% faster end-to-end
2.7% doesn't sound huge, but:
- Across millions of requests
- Compounding with other optimizations
- Better user experience
The Commitment
I decided to go all-in on Rust.
Why:
- Numbers don't lie: Memory and latency improvements were real
- Platform fit: Long-running infrastructure is exactly where Rust shines
- Learning value: Even if I abandon this project, I'll understand systems programming deeply
- Future-proof: Memory efficiency scales linearly with tunnel count
What I committed to:
- Learning Rust properly (no shortcuts, no fighting the borrow checker)
- Building production-quality code (not a toy project)
- Open sourcing everything (give back to community)
- Documenting learnings (this article!)
What Changed Between Developer and Platform Engineer
Developer perspective:
- Tunnels are a tool (like git or npm)
- Speed and reliability matter
- Don't care about internals
- "It works" is good enough
Platform engineer perspective:
- Tunnels are infrastructure
- Every millisecond and megabyte matters
- Must understand failure modes
- "It works 99% of the time" isn't good enough
What Building This Taught Me
1. Problems look different at scale
1 tunnel: "It's working!"
10 tunnels: "How do we monitor these?"
100 tunnels: "What's our memory budget?"
1000 tunnels: "Every optimization matters"
2. Performance is measurable, not theoretical
Everyone says "Rust is fast." But:
- How much faster?
- Does it matter for my use case?
- What are the trade-offs?
Building FerroTunnel gave me real numbers to answer these questions.
3. The best way to learn is to build
I read about multiplexing, protocols, and async I/O for years. I understood them conceptually.
Building a tunnel forced me to understand them deeply:
- How does backpressure propagate?
- What happens when a stream closes mid-transfer?
- How do you prevent resource leaks?
What Surprised Me
1. Rust wasn't as hard as I feared
Yes, the borrow checker fights you. But:
- The error messages are helpful
- The community is incredibly supportive
- Once you "get" ownership, it clicks
2. The ecosystem is mature
I expected to write everything from scratch. Instead:
- Tokio (async runtime): Production-grade
- Serde (serialization): Just works
- Bytes (zero-copy buffers): Perfectly designed
3. Memory efficiency compounds
92 MB vs 520 MB seems small. But:
- Over 1000 tunnels: 2.1 GB saved
- Over 24/7 operation: Stability matters
- Lower memory = cheaper hosting
Building FerroTunnel
The Journey So Far
Timeline:
- Learning Rust fundamentals
- Core multiplexer implementation
- HTTP/TCP ingress, plugin system
- Observability, dashboard, polish
- Testing, benchmarking, docs
- v1.0.0 release to crates.io
Current state:
✅ Published to crates.io
✅ Docker images on GitHub Container Registry
✅ Powers my personal projects
✅ Full documentation and examples
What FerroTunnel includes:
Core features:
- HTTP and TCP tunneling
- TLS 1.3 with mutual TLS support
- Stream multiplexing with backpressure
- Auto-reconnect with exponential backoff
Observability:
- Real-time dashboard (WebSocket + Server-Sent Events)
- Prometheus metrics
- Structured JSON logging
- Connection lifecycle events
Extensibility:
- Plugin system for custom auth, rate limiting
- Built-in plugins: token auth, IP allowlist, logger
- Circuit breaker pattern
Developer experience:
- Both library API and CLI
- Docker Compose ready
- Clear error messages
- Extensive examples
What's Next - Immediate roadmap:
- v1.0.x: HTTP/2 support (native multiplexing)
- v1.0.x: gRPC tunneling
- v1.0.x: QUIC transport (like Cloudflare)
- v1.0.x: Connection pooling
- v1.0.x: Multi-region support
Closing Thoughts
This journey started with a simple need: testing callbacks on localhost.
It led to a deeper understanding of reverse tunnels, long-lived connections, and the trade-offs involved in building and operating them.
If you use tunnels regularly, I strongly recommend taking the time to understand how they work internally. The perspective you gain changes how you design, evaluate, and own systems.
What I built:
Not just a tunnel. But:
- A deep understanding of multiplexing
- Production systems programming experience in Rust
- An open-source tool others can use
- Foundation for future micro-SaaS
What you can learn from this:
-
Use the right tool for the job
- ngrok for quick development
- Cloudflare for enterprise scale
- Self-hosted for control and learning
-
Understand what you use daily
- I used tunnels for 3 years without understanding them
- Building one taught me more than any tutorial
-
Performance claims need testing
- "Rust is faster" → By how much?
- Prototype both, measure, decide
Try FerroTunnel:
- GitHub: https://github.com/MitulShah1/ferrotunnel
- Quick start:
# Install
cargo install ferrotunnel-cli
# Start server
ferrotunnel server --token secret
# Start client (in another terminal; token from env or secure prompt if omitted)
ferrotunnel client --server localhost:7835 --local-addr 127.0.0.1:8080 --tunnel-id my-app
What’s Next
Want the technical deep-dive?
Part 2 will covers:
- Why Rust specifically (Go developer's perspective)
- Multiplexer architecture and design decisions
- Protocol choices and trade-offs
- Performance benchmarks and optimization
- Key learnings from building in Rust
Let's Discuss
Questions I'd love to hear:
- How do you use tunnels in your workflow?
- Have you hit scaling issues with ngrok/frp?
- Considering Rust for infrastructure? What's holding you back?
- Built something similar? What challenges did you face?
Drop a comment or open an issue on GitHub!
Top comments (0)