DEV Community

José Gonçalves
José Gonçalves

Posted on

From Monolith to Microservices: A Real Migration Story From a Brazilian Software Company

Two years ago, our monolithic application was killing us. Not metaphorically — it was literally costing us clients.

At Mind Group Technologies, we run a white-label SaaS platform that powers 30+ brands across delivery, fintech, healthcare, and education in Brazil and Latin America. By 2022, our monolith had grown into a 400K+ line codebase that took 45 minutes to deploy and crashed in ways that affected every single client simultaneously.

This is the story of how we migrated to microservices — the decisions we got right, the mistakes we made, and the actual numbers from before and after.

Why We Had to Move

Our monolith wasn't always a problem. From 2016 to 2020, it was perfect. One codebase, one deployment, one team that understood everything. We could ship features fast because there was zero coordination overhead.

The breaking point came at scale:

Deploy time: 45 minutes for a full deployment. During that time, all 30+ brands experienced downtime or degraded performance. We were deploying during off-peak hours (2 AM BRT), which meant engineers working ungodly hours.

Blast radius: A bug in the healthcare module once took down the delivery platforms. A memory leak in a reporting feature crashed the entire application. Every failure was a total failure.

Team bottlenecks: With 15 engineers working on one codebase, merge conflicts were constant. Feature branches diverged for weeks. Code reviews became archaeological expeditions through unrelated changes.

Scaling limitations: Our fintech module needed 10x more compute during month-end reconciliation. But scaling the monolith meant scaling everything — including the education module that was barely used at night.

Database contention: One massive PostgreSQL database serving all modules. Complex queries from the reporting module were slowing down real-time operations in the delivery module.

The final straw: a client threatened to leave because our deployment windows conflicted with their peak business hours. We couldn't deploy without downtime, and we couldn't avoid deploying because features and fixes were piling up.

The Migration Plan

We didn't do a big-bang rewrite. That's how companies die. Instead, we used the Strangler Fig Pattern — gradually extracting services from the monolith while keeping it running.

Phase 1: Identify Service Boundaries (Month 1-2)

We mapped our monolith's functionality into domain boundaries:

  • Auth Service: Authentication, authorization, JWT management
  • Tenant Service: Tenant configuration, feature flags, branding
  • Delivery Service: Order management, driver dispatch, tracking
  • Fintech Service: Transactions, reconciliation, KYC
  • Healthcare Service: Patient records, appointments, triage
  • Education Service: Courses, progress tracking, assessments
  • Notification Service: Email, SMS, push notifications
  • Reporting Service: Analytics, dashboards, data exports

The key insight: service boundaries should follow business domains, not technical layers. Don't create a "database service" or a "validation service." Create services that own a complete business capability.

Phase 2: Extract the First Service (Month 3-4)

We started with the Notification Service because:

  • It had the clearest boundary (sends messages, receives events)
  • Low risk — if notifications fail, the core business continues
  • High value — it was one of the biggest performance bottlenecks

The extraction process:

  1. Built the new Notification Service as a standalone Node.js application
  2. Created a message queue (RabbitMQ) between the monolith and the new service
  3. Modified the monolith to publish events instead of sending notifications directly
  4. Deployed both systems in parallel, with the monolith as the primary
  5. Gradually shifted traffic to the new service
  6. Removed notification code from the monolith
// Before: Monolith sends notifications directly
async function createOrder(orderData) {
  const order = await db.orders.create(orderData);
  await sendEmail(order.customer.email, 'Order Confirmed', template);
  await sendSMS(order.customer.phone, 'Your order is confirmed');
  await sendPush(order.customer.deviceToken, 'Order Confirmed');
  return order;
}

// After: Monolith publishes event, service handles notifications
async function createOrder(orderData) {
  const order = await db.orders.create(orderData);
  await messageQueue.publish('order.created', {
    orderId: order.id,
    tenantId: order.tenantId,
    customerEmail: order.customer.email,
    customerPhone: order.customer.phone,
    deviceToken: order.customer.deviceToken
  });
  return order;
}
Enter fullscreen mode Exit fullscreen mode

Phase 3: Extract High-Value Services (Month 5-12)

After Notification, we extracted in this order:

  1. Auth Service — critical for security isolation
  2. Tenant Service — foundation for multi-tenant microservices
  3. Fintech Service — needed independent scaling for month-end
  4. Delivery Service — needed independent scaling for lunch rush
  5. Healthcare Service — needed strict LGPD compliance isolation
  6. Education Service — lowest traffic, extracted last

Phase 4: Kill the Monolith (Month 13-18)

The remaining monolith became a thin API gateway that routed requests to the appropriate service. Eventually, we replaced it with a proper API gateway (Kong) and the monolith was officially retired.

The Technical Stack

Services: Node.js (Express) for API services, Go for high-performance services (delivery tracking, real-time notifications)

Communication:

  • Synchronous: gRPC for service-to-service calls that need immediate responses
  • Asynchronous: RabbitMQ for event-driven communication (order created, payment processed, etc.)

Database: Each service owns its own PostgreSQL database. No shared databases. The Fintech service also uses Redis for real-time transaction caching.

Deployment: Kubernetes (EKS on AWS). Each service runs as a separate deployment with independent scaling policies.

Service Mesh: Initially none — we added Istio after reaching 8 services because managing service-to-service communication, retries, and circuit breakers at the application level was becoming unsustainable.

Observability:

  • Distributed tracing: Jaeger (every request gets a trace ID that flows through all services)
  • Logging: Structured JSON logs → CloudWatch → Elasticsearch
  • Metrics: Prometheus + Grafana
  • Alerting: PagerDuty for critical alerts, Slack for warnings

The Numbers: Before vs After

Metric Monolith Microservices
Deploy time 45 min (full app) 3-5 min (per service)
Deploy frequency 2x/week 10-15x/week
Blast radius All 30+ brands Only affected service
Scale granularity All or nothing Per-service
Mean Time to Recovery 30-60 min 5-10 min
Monthly infrastructure cost $12K $18K
Developer velocity (PRs/week) ~25 ~45

The infrastructure cost went UP. This is the dirty secret of microservices that nobody tells you. Running Kubernetes, a service mesh, distributed tracing, and multiple databases costs more than a single server. But the business value — faster deploys, independent scaling, fault isolation — more than compensated.

The Mistakes We Made

1. Extracting too many services at once
In Month 5, we tried extracting Auth, Tenant, and Fintech simultaneously. Three major surgeries on a running system. We had cascading failures for two weeks. Lesson: one service extraction at a time.

2. Distributed transactions
We underestimated how hard distributed transactions would be. When a fintech payment requires updating the order status in the delivery service AND creating an audit log in the compliance service, you need a saga pattern or eventual consistency. We spent a month building a saga orchestrator.

3. Data duplication anxiety
We initially tried to avoid duplicating data across services by calling other services for every piece of data. This created a web of synchronous dependencies that was worse than the monolith. Eventually we embraced data duplication — each service stores the data it needs, synced via events.

4. Skipping contract testing
Without contract testing, Service A would change its API response format without telling Service B. Things would break in production because our integration tests didn't cover every cross-service interaction. We adopted Pact for contract testing and it eliminated this class of bugs.

5. Over-engineering from day one
We added Istio (service mesh) before we needed it. For 3-4 services, simple HTTP with retry logic is fine. We should have waited until the complexity justified the operational overhead.

What We'd Do Differently

If we started the migration today:

  1. Start with an API gateway immediately — even before extracting the first service. This gives you a single entry point and makes routing changes invisible to clients.

  2. Invest in a shared library for cross-cutting concerns — tenant context, logging, tracing, auth validation. Every service needs these, and building them independently leads to inconsistency.

  3. Use event sourcing for the fintech module — instead of traditional CRUD, event sourcing would have given us a natural audit trail and made distributed transactions simpler.

  4. Hire a platform engineer earlier — we treated infrastructure as everyone's responsibility, which meant it was no one's responsibility. A dedicated platform engineer from Month 1 would have saved us months of debugging Kubernetes configs.

  5. Better staging environments — our staging environment didn't replicate the multi-service topology accurately. Bugs that appeared in production weren't reproducible in staging. We now use namespaces in Kubernetes to create per-developer staging environments.

Is Microservices Right for You?

Honest answer: probably not, unless you have these conditions:

  • Multiple teams (4+ engineers minimum) working on the same codebase
  • Different scaling requirements across modules
  • Independent deployment needs (can't afford whole-system downtime)
  • Regulatory requirements that demand service isolation (healthcare, fintech)

If you're a team of 3 building an MVP, stay monolithic. A well-structured monolith with clear module boundaries is far better than premature microservices. We only needed microservices because we had 30+ brands depending on the same system with conflicting requirements.

The monolith served Mind Group Technologies well for four years. Microservices have served us well for the next four. The architecture should match your scale, not your ambition.


José Gonçalves is the Founder of Mind Group Technologies, a software company based in Sorocaba, SP, Brazil. Mind Group builds multi-tenant SaaS platforms serving 30+ brands across healthcare, fintech, delivery, and education.

Top comments (0)