Darian Vance

Posted on Feb 18 • Originally published at wp.me

Solved: Does anyone else feel the Gateway API design is awkward for multi-tenancy?

#devops #programming #tutorial #cloud

🚀 Executive Summary

TL;DR: The Kubernetes Gateway API’s default multi-tenancy model is awkward due to its flexibility, allowing HTTPRoute conflicts across namespaces and potentially hijacking critical traffic. To solve this, platform teams must implement explicit controls like allowedRoutes on the Gateway, enforce cluster-wide hostname uniqueness with OPA Gatekeeper, or provision dedicated Gateways per tenant for robust isolation.

🎯 Key Takeaways

The Gateway API’s multi-tenancy challenge stems from its flexible design, where a Gateway trusts any HTTPRoute to claim hostnames, leading to potential conflicts and service disruptions without explicit controls.
The allowedRoutes field on a Gateway resource offers a quick fix by restricting HTTPRoute attachments to specific, labeled namespaces, mitigating cross-namespace accidents but not internal namespace conflicts.
OPA Gatekeeper enables robust, cluster-wide policy enforcement at the Kubernetes API server level, allowing platform teams to define and enforce hostname uniqueness rules for HTTPRoute resources before deployment.

Struggling with Kubernetes Gateway API’s multi-tenancy model? A senior engineer breaks down why its design feels awkward for shared clusters and provides three battle-tested solutions—from quick fixes to robust policy enforcement—to prevent route conflicts and restore order.

Is the Kubernetes Gateway API Awkward for Multi-Tenancy? Yes. Here’s How We Fix It.

I still remember the 2 AM PagerDuty alert. The incident channel on Slack was a firehose of panic. Our main payment processing endpoint, api.techresolve.com/v1/charge, was intermittently returning 503s. The weird part? No new code had been deployed to the billing service. After a frantic 20 minutes of digging, I found the culprit: a junior engineer on the marketing analytics team, working late on a new feature, had deployed an HTTPRoute for a temporary metrics dashboard. They used api.techresolve.com as the hostname, same as our production billing API. Their route, with a more specific path, somehow convinced the gateway controller to intermittently hijack traffic. We lost revenue, and a well-meaning engineer almost updated their resume. This, right here, is the crux of the multi-tenancy headache with the Gateway API.

The “Why”: A Design for Flexibility, A Recipe for Chaos

Let’s get one thing straight: the Gateway API’s design isn’t “wrong,” it’s just incredibly flexible, and that flexibility has sharp edges. The core model separates the concerns:

The Platform Team (us) owns the Gateway resource. It lives in a locked-down namespace like infra-gateways and defines the entrypoint (the load balancer, the ports, the TLS certs).
The Application Teams (tenants) own the HTTPRoute resources. They live in their own dev namespaces (e.g., prod-billing, staging-analytics) and define how traffic for a specific hostname and path gets routed to their Service.

The problem lives in the handshake between these two resources. By default, a Gateway is a very trusting soul. It will happily attach any HTTPRoute from any namespace that wants to claim a hostname. The principle of “the most specific match wins” generally applies, but in a large, complex cluster, “winning” can mean accidentally knocking another team’s critical service offline. There’s no built-in, cluster-wide registry that says, “Sorry, api.techresolve.com is already claimed by the prod-billing namespace.” And so, we have to build the fences ourselves.

Solution 1: The Quick Fix – Tightening the Leash with `allowedRoutes`

The fastest way to stop the bleeding is to use the tools the Gateway API gives you directly. You can configure your central Gateway to only listen for HTTPRoutes from specific, approved namespaces. This is like putting a bouncer at the door.

Instead of letting anyone attach, we use a label selector. We’ll tell our main gateway, prod-gateway-external, that it should only accept routes from namespaces with the label gateway-access: "true". Your platform team now controls access by applying this label to trusted namespaces.

Example `Gateway` Manifest:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: prod-gateway-external
  namespace: infra-gateways
spec:
  gatewayClassName: gke-l7-gxlb
  listeners:
  - name: https-main
    hostname: "*.techresolve.com"
    port: 443
    protocol: HTTPS
    tls:
      mode: Terminate
      certificateRefs:
      - name: techresolve-com-tls
    allowedRoutes:
      namespaces:
        from: Selector
        selector:
          matchLabels:
            gateway-access: "true"

Pro Tip: This is a great first step. It stops the staging-analytics namespace from accidentally interfering with the prod-billing namespace. However, it does NOT stop two different teams inside the production-apps namespace (if you use a shared one) from fighting over the same hostname. It’s damage control, not a permanent solution for hostname contention.

Solution 2: The Permanent Fix – Cluster-Wide Policy with OPA Gatekeeper

To truly solve the “who owns this hostname?” problem, you need to enforce rules at the Kubernetes API server level. When a developer runs kubectl apply -f my-route.yaml, we want the cluster itself to reject the request if it violates our multi-tenancy rules. This is the job for a policy engine like OPA Gatekeeper.

The idea is to create a ConstraintTemplate that defines our rule: “An HTTPRoute‘s hostname must be unique across the entire cluster, unless it’s a subdomain of an already-claimed domain by the same namespace.” This is much more robust. It moves the check from the gateway controller’s runtime logic to the API server’s admission control.

Conceptual Rego Policy for Gatekeeper:

Writing the full Rego policy is a topic for another day, but here’s the logic in plain English that you’d implement:

package k8s.httproute.uniqueness

deny[msg] {
    # 1. Get the incoming HTTPRoute being created/updated
    input_route := input.review.object

    # 2. Get its list of hostnames
    input_hostnames := input_route.spec.hostnames

    # 3. Look at ALL other HTTPRoutes in the cluster
    other_route := data.inventory.cluster["gateway.networking.k8s.io/v1"]["HTTPRoute"][_]

    # 4. Make sure it's not the same route we are currently checking
    input_route.metadata.uid != other_route.metadata.uid

    # 5. Check if any hostname from the input route exists in the other route
    some i
    input_hostnames[i] == other_route.spec.hostnames[_]

    # 6. If we found a match, deny the request!
    msg := sprintf("Hostname '%v' is already claimed by HTTPRoute '%v' in namespace '%v'.", [input_hostnames[i], other_route.metadata.name, other_route.metadata.namespace])
}

Warning: This is the most powerful solution, but it’s also the most complex. It requires installing and managing OPA Gatekeeper, learning the Rego policy language, and carefully rolling out policies so you don’t break existing CI/CD pipelines. It’s the right long-term investment for a large, mature organization.

Solution 3: The ‘Nuclear’ Option – One Gateway Per Tenant

Sometimes, the teams you support have such different security postures, performance requirements, or blast radiuses that sharing a single gateway, even with policies, is too risky. When you need absolute, hard-shelled isolation, you give each tenant their own Gateway.

In this model, the prod-billing team gets their own Gateway resource living in their own prod-billing namespace. This gateway is configured to *only* listen for routes from its own namespace.

Example Tenant-Specific `Gateway`:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: billing-services-gateway
  namespace: prod-billing  # <-- Lives with the app!
spec:
  gatewayClassName: gke-l7-gxlb
  listeners:
  - name: https-billing
    hostname: "billing.techresolve.com" # <-- More specific hostname
    port: 443
    protocol: HTTPS
    tls:
      mode: Terminate
      certificateRefs:
      - name: billing-techresolve-com-tls
    allowedRoutes:
      namespaces:
        from: Same # <-- The magic! Only allows routes from prod-billing.

This is the “nuclear” option for a reason. Depending on your GatewayClass (your ingress controller), each new Gateway resource might provision a new, dedicated cloud load balancer. This can get expensive and complex to manage from a networking and DNS perspective. But for that high-value tenant like the billing team, the cost of total isolation can be well worth the peace of mind.

Which one is right for you?

Solution	Best For	Downside
1. Allowed Routes	Small to medium teams; preventing cross-namespace accidents.	Doesn’t solve conflicts within an allowed namespace.
2. OPA Gatekeeper	Large organizations requiring granular, automated, cluster-wide rules.	High complexity to set up and maintain.
3. Per-Tenant Gateway	High-security or high-stakes tenants needing total isolation.	Can be expensive and increase infrastructure overhead.

In the end, the awkwardness of the Gateway API in a multi-tenant world comes from its inherent trust in its users. As platform engineers, our job is to replace that trust with verification. Start with the simplest fence (allowedRoutes), and if your tenants keep finding ways to drive through it, don’t be afraid to bring in the concrete barriers of policy enforcement or dedicated infrastructure.

👉 Read the original article on TechResolve.blog

☕ Support my work

If this article helped you, you can buy me a coffee:

👉 https://buymeacoffee.com/darianvance

DEV Community

Solved: Does anyone else feel the Gateway API design is awkward for multi-tenancy?

🚀 Executive Summary

🎯 Key Takeaways

Is the Kubernetes Gateway API Awkward for Multi-Tenancy? Yes. Here’s How We Fix It.

The “Why”: A Design for Flexibility, A Recipe for Chaos

Solution 1: The Quick Fix – Tightening the Leash with `allowedRoutes`

Example `Gateway` Manifest:

Solution 2: The Permanent Fix – Cluster-Wide Policy with OPA Gatekeeper

Conceptual Rego Policy for Gatekeeper:

Solution 3: The ‘Nuclear’ Option – One Gateway Per Tenant

Example Tenant-Specific `Gateway`:

Which one is right for you?

Top comments (0)

🚀 Executive Summary

🎯 Key Takeaways

Is the Kubernetes Gateway API Awkward for Multi-Tenancy? Yes. Here’s How We Fix It.

The “Why”: A Design for Flexibility, A Recipe for Chaos

Solution 1: The Quick Fix – Tightening the Leash with allowedRoutes

Example Gateway Manifest:

Solution 2: The Permanent Fix – Cluster-Wide Policy with OPA Gatekeeper

Conceptual Rego Policy for Gatekeeper:

Solution 3: The ‘Nuclear’ Option – One Gateway Per Tenant

Example Tenant-Specific Gateway:

Which one is right for you?

Solution 1: The Quick Fix – Tightening the Leash with `allowedRoutes`

Example `Gateway` Manifest:

Example Tenant-Specific `Gateway`: