José Gonçalves

Posted on Feb 15

Building Scalable Delivery Platforms in Brazil: What We Learned Powering 30+ Brands

#architecture #webdev #programming #devops

Brazil's delivery market is one of the most competitive in the world. iFood dominates with 80%+ market share, Rappi fights for second place, and hundreds of regional players compete for what's left. But here's something most people don't realize: behind many of those regional delivery brands, there's often a single white-label platform powering everything.

At Mind Group Technologies, we built that platform. Since 2018, our multi-tenant delivery infrastructure has powered 30+ brands across Brazil — from food delivery to pharmacy logistics to grocery services. Each brand has its own identity, its own customers, and its own business rules. But under the hood, they share a battle-tested codebase.

Here's what we learned building it.

Why White-Label Delivery Works in Brazil

Brazil has 5,570 municipalities. iFood operates in roughly 1,500 of them. That leaves over 4,000 cities where local entrepreneurs see an opportunity to build delivery services tailored to their communities.

These entrepreneurs don't have $5 million to build a delivery platform from scratch. They need something that works out of the box but can be customized — their brand, their rules, their pricing model.

That's the white-label proposition: instead of building from zero, you get a proven platform with your logo on it. Your customers never know (or care) that the same technology powers the delivery app in the next city over.

The economics are compelling:

Custom build: $200K-500K upfront + $15K-30K/month maintenance
White-label platform: $2K-8K/month, operational in 2-4 weeks
Time to market: 12-18 months vs 2-4 weeks

The Technical Architecture That Makes It Work

Multi-Tenant Core

Every delivery platform needs to handle orders, payments, logistics, and communication. In a multi-tenant system, these are shared services with tenant-specific configuration.

┌─────────────────────────────────────────┐
│           API Gateway (Kong)             │
│    Tenant resolution via subdomain/JWT   │
├─────────────────────────────────────────┤
│                                         │
│  ┌──────────┐ ┌──────────┐ ┌─────────┐ │
│  │  Order    │ │ Payment  │ │Logistics│ │
│  │ Service   │ │ Service  │ │ Service │ │
│  │ (Node.js) │ │  (Go)    │ │  (Go)   │ │
│  └──────────┘ └──────────┘ └─────────┘ │
│                                         │
│  ┌──────────┐ ┌──────────┐ ┌─────────┐ │
│  │  User    │ │  Store   │ │  Notif  │ │
│  │ Service  │ │ Service  │ │ Service │ │
│  │(Node.js) │ │(Node.js) │ │(Node.js)│ │
│  └──────────┘ └──────────┘ └─────────┘ │
│                                         │
├─────────────────────────────────────────┤
│    PostgreSQL (RLS) │ Redis │ RabbitMQ  │
└─────────────────────────────────────────┘

The API Gateway handles tenant resolution. Every request includes a tenant identifier — either through the subdomain (brand-a.delivery.com vs brand-b.delivery.com) or embedded in the JWT token. Downstream services never operate without tenant context.

Database Strategy: PostgreSQL + Row-Level Security

We use a shared PostgreSQL database with Row-Level Security (RLS). Every table has a tenant_id column, and RLS policies ensure that queries are automatically filtered by tenant.

-- Create policy for orders table
CREATE POLICY tenant_isolation ON orders
    USING (tenant_id = current_setting('app.current_tenant')::uuid);

-- Set tenant context at connection level
SET app.current_tenant = 'tenant-uuid-here';

-- This query automatically returns only current tenant's orders
SELECT * FROM orders WHERE status = 'pending';

Why RLS over application-level filtering? Because application-level filtering relies on developers never forgetting a WHERE clause. In a codebase with 50+ developers touching it over the years, someone WILL forget. RLS makes isolation a database-level guarantee, not an application-level hope.

Real-Time Order Tracking

Delivery platforms live or die by real-time tracking. Customers want to see their order moving on a map. Restaurants want to know when a driver is approaching. Drivers need efficient routing.

Our stack:

WebSocket gateway for real-time client connections
Redis Pub/Sub for broadcasting location updates
PostGIS for geospatial queries (nearest driver, delivery radius)
Google Maps / Mapbox for routing and ETA calculation

// Simplified driver location update flow
async function updateDriverLocation(driverId, tenantId, lat, lng) {
  // Store in PostGIS for geospatial queries
  await db.query(`
    UPDATE drivers 
    SET location = ST_SetSRID(ST_MakePoint($1, $2), 4326),
        updated_at = NOW()
    WHERE id = $3 AND tenant_id = $4
  `, [lng, lat, driverId, tenantId]);

  // Broadcast to connected clients via Redis Pub/Sub
  await redis.publish(`location:${tenantId}:${driverId}`, 
    JSON.stringify({ lat, lng, timestamp: Date.now() })
  );

  // Update active order ETAs
  await recalculateETAs(driverId, tenantId);
}

Payment Processing for Brazil

Payment in Brazil is unique. Pix (instant payment) now accounts for 40%+ of digital transactions. Credit card installments (parcelamento) are standard — customers expect to pay for a R$100 order in 3x of R$33.33.

Our payment service handles:

Pix (instant, QR code or copy-paste)
Credit/debit cards via payment gateways (Mercado Pago, PagSeguro, Stripe Brazil)
Installments (parcelamento) up to 12x
Cash on delivery (still common in smaller cities)
Vouchers and loyalty points

Each tenant can configure which payment methods they accept and set their own installment rules. The payment service abstracts gateway differences so tenant configuration is just feature flags.

Scaling Challenges We Didn't Expect

The Friday Night Problem

In delivery, traffic is not uniform. Friday and Saturday nights between 7-10 PM can see 10x the traffic of a Tuesday morning. For a multi-tenant platform, this means 30+ brands all spiking simultaneously.

Our solution: Kubernetes Horizontal Pod Autoscaler (HPA) with custom metrics based on order volume per tenant, not just CPU. We also implemented tenant-level rate limiting to prevent a single brand's promotion from degrading service for everyone else.

Menu Synchronization

Restaurants change their menus constantly. Items go out of stock mid-shift. Prices change for promotions. Across 30+ brands with hundreds of restaurants each, menu sync becomes a distributed systems problem.

We built a menu service with:

Eventual consistency model (menus sync within 30 seconds)
Optimistic UI (customer sees menu, stock validated at order time)
Webhook-based restaurant integrations for POS systems
Cache invalidation strategy using Redis with tenant-namespaced keys

Driver Assignment Algorithm

Assigning drivers to orders is an optimization problem. You want to minimize delivery time, maximize driver utilization, and keep things fair. We use a scoring algorithm:

score = (proximity_weight × distance_score) 
      + (availability_weight × idle_time_score)
      + (performance_weight × rating_score)
      + (fairness_weight × orders_today_score)

Weights are configurable per tenant. Some brands prioritize speed (higher proximity weight), others prioritize fairness for drivers (higher fairness weight).

Feature Flag Architecture

With 30+ brands, each with different requirements, feature flags are essential. We use a hierarchical configuration system:

{
  "platform_defaults": {
    "max_delivery_radius_km": 10,
    "enable_pix_payment": true,
    "enable_installments": true,
    "enable_cash_on_delivery": true,
    "enable_driver_tips": true,
    "enable_scheduled_orders": false
  },
  "tenant_overrides": {
    "brand-pharmacy": {
      "enable_prescription_upload": true,
      "max_delivery_radius_km": 20,
      "enable_scheduled_orders": true,
      "required_documents": ["prescription"]
    },
    "brand-grocery": {
      "enable_item_substitution": true,
      "enable_weight_based_pricing": true
    }
  }
}

This means a pharmacy brand gets prescription upload and larger delivery radius, while a grocery brand gets item substitution — all from the same codebase.

Monitoring and Observability

With 30+ tenants, when something breaks, you need to know WHICH tenant is affected and WHY. Our observability stack:

Distributed tracing (Jaeger) with tenant_id in every span
Metrics (Prometheus + Grafana) with tenant-level dashboards
Alerting based on per-tenant SLOs (99.9% order success rate)
Log aggregation (ELK stack) with tenant-indexed logs

The key insight: aggregate metrics hide problems. If overall order success rate is 99.5%, that sounds fine. But if one tenant has 95% success and the others have 99.9%, that one tenant is having a terrible experience. Per-tenant SLOs catch this.

Business Metrics That Matter

After 6+ years running this platform, here are the numbers that actually matter:

Metric	Value
Time to onboard new brand	2-4 weeks
Average monthly orders per brand	15K-50K
Platform uptime (2025)	99.95%
Cost per brand (infrastructure)	~$800-1,200/month
Order processing latency (p95)	< 200ms
Driver assignment time (p95)	< 3 seconds

Lessons for Anyone Building Multi-Tenant Platforms

Start multi-tenant from day one. We didn't, and the migration cost us 4 months. If you know you'll have multiple tenants, build for it immediately.
RLS is not optional. Application-level tenant filtering will eventually fail. Database-level isolation (PostgreSQL RLS) is the only approach that scales safely.
Feature flags > code forks. Never maintain separate codebases per tenant. Feature flags with a hierarchical config system handle 99% of customization needs.
Per-tenant observability is essential. Aggregate metrics lie. Build dashboards that let you see each tenant's health independently.
Cache namespacing saves lives. Every cache key must include tenant_id. One cache poisoning incident across tenants, and you'll wish you'd done this from the start.
Plan for traffic spikes. In delivery, peak traffic is 10x baseline. Auto-scaling with custom metrics (not just CPU) is the only way to handle this reliably.

What's Next

We're currently exploring AI-powered features for the platform: demand prediction (so restaurants can prep ingredients before the rush), dynamic pricing optimization, and automated customer support through conversational AI.

The delivery market in Brazil continues to grow, especially in cities outside the major metros. The white-label model is perfectly positioned for this expansion because it gives local entrepreneurs the technology they need without the cost of building from scratch.

If you're building multi-tenant platforms — whether for delivery or any other vertical — the architectural principles are the same: strict data isolation, configurable feature flags, per-tenant observability, and infrastructure that scales with demand.

José Gonçalves is the Founder of Mind Group Technologies, a software company based in Sorocaba, Brazil, building multi-tenant platforms for delivery, healthcare, fintech, and education. Learn more at mindconsulting.com.br.

DEV Community