The conventional wisdom in Linux networking is straightforward: use XDP (Express Data Path) for stateless packet filtering such as DDoS mitigation, and rely on iptables or nftables for complex stateful inspection.
The reasoning is twofold. First, XDP executes extremely early in the networking pipeline — before sk_buff allocation and before netfilter or conntrack hooks. Second, the eBPF verifier imposes strict safety constraints, particularly around loops, making deep packet inspection appear impractical.
To explore these limitations, I built Hyperion, a stateful L7 firewall that runs entirely in XDP. It performs TCP connection tracking, bounded payload inspection, and asynchronous telemetry export while sustaining approximately 11.8 million packets per second, compared to roughly 2.1M PPS for equivalent iptables drop rules.
This article describes the key engineering challenges involved.
Challenge 1: Connection Tracking in XDP
Because XDP executes before the kernel networking stack, it cannot access native nf_conntrack state. Flow-based filtering therefore requires a custom implementation.
Hyperion uses a BPF_MAP_TYPE_LRU_HASH to store flow state:
struct flow_key {
__u32 src_ip;
__u32 dst_ip;
__u16 src_port;
__u16 dst_port;
__u8 protocol;
};
The LRU eviction policy is critical under load. When the map reaches capacity during high traffic, older flows are automatically removed, preventing memory exhaustion.
Challenge 2: Verifier Constraints and Payload Inspection
Deep packet inspection requires iterating through packet payloads. The eBPF verifier allows loops only if they are provably bounded.
Hyperion uses bounded loops combined with compile-time unrolling:
#define MAX_SCAN_LEN 16
#define MAX_SIG_LEN 8
#pragma unroll
for (int i = 0; i < MAX_SCAN_LEN; i++) {
if (data + i + MAX_SIG_LEN > data_end)
break;
if (payload[i] == 'r' && payload[i+1] == 'o' &&
payload[i+2] == 'o' && payload[i+3] == 't') {
return 1;
}
}
Unrolling transforms the loop into a linear instruction sequence that the verifier can statically analyze for safety.
Challenge 3: Observability at High Packet Rates
Visibility is essential for production deployments. However, traditional debugging mechanisms such as bpf_trace_printk introduce significant overhead.
Hyperion instead uses the BPF ring buffer, which enables asynchronous event delivery to userspace. The kernel writes structured events into shared memory, while a userspace program consumes them independently.
This design minimizes impact on packet processing performance.
Performance Observations
Testing with pktgen using 64-byte UDP floods produced the following results:
| System | Throughput | CPU Load |
|---|---|---|
| iptables drop | ~2.1M PPS | 100% SoftIRQ |
| Hyperion XDP | ~11.8M PPS | ~40% CPU |
The improvement primarily comes from eliminating SKB allocation and bypassing netfilter processing.
Conclusion
While XDP is often considered suitable only for stateless filtering, it can support stateful inspection when designed within verifier constraints.
Key techniques include:
- Efficient map-based flow tracking
- Bounded, verifier-safe payload scanning
- Asynchronous telemetry pipelines
These approaches enable complex security logic to execute efficiently at the earliest point in the networking stack.
Top comments (0)