Improving

Posted on Feb 9 • Originally published at improving.com

What Nobody Tells You About Golden Paths at Scale

#platformengineering

Your platform team just celebrated hitting 85% golden path adoption. Everyone is excited. Onboarding time for new members dropped from three weeks to two days. New services spin up in minutes. Leadership loved the improved metrics.

Six months later, you've got 23 capability requests in your backlog. Your platform team is drowning. ML teams need custom GPU scheduling. The data team wants streaming pipeline patterns. API teams are rolling their own rate limiting because yours doesn’t fit their needs.

You nailed Day 1.

You're dying on Day 50.

This is the hidden scaling problem with golden paths. And it’s not solved by building more golden paths.

Golden Path Promise vs. What Actually Happens

The platform engineering playbook says golden paths reduce cognitive load and bring standardization across teams. They give developers a blessed path from code to production through self-service, accelerating feature development.

This works well for onboarding and early development. But creating new projects and features is maybe 1% of an application’s lifetime. The remaining 99% is operations, debugging, scaling, adding features, and handling edge cases.

Golden paths excel at the first 1%. They struggle with the rest.

Netflix learned this the hard way. They built a polished developer portal with documentation, recommended tools, and curated paths. Developers said it “wasn’t compelling enough” to change habits. Why?

Because it helped them start things, not run things.

The real work happens after deployment. That’s where centralized golden paths become bottlenecks.

Why Your Platform Team Hits a Ceiling

Your platform team can’t scale linearly with the organization. It’s just math.

Imagine:

200 engineers across 20 teams
Each team with distinct needs:
- ML teams need GPU scheduling, Kubeflow, model serving
- Data teams want Kafka, Airflow, stream processing
- API teams need rate limiting, circuit breakers, tracing
- Mobile backend teams need push notification infrastructure
Platform team size: 6 generalists

What Goes Wrong

Queue problem

Every capability funnels through the platform team. Prioritization becomes about who shouts loudest, not what delivers the most value.

Expertise problem

You build “good enough” solutions. ML teams need 12 GPU configurations. They get 3. It checks the box but doesn’t solve the problem.

Maintenance trap

You ship 30 capabilities over two years. Now you maintain all 30.

Kubernetes upgrade? Update 30 configs
Security patch? Test 30 capabilities
Team that requested capability #17 moved on? You still own it

Rigidity issue

Abstractions cover the 80% use case. The remaining 20% fights the platform or bypasses it entirely. This is abstraction debt.

Your platform team becomes the bottleneck for every capability, edge case, and new tool. That’s not sustainable.

Go With a Marketplace Approach

At KubeCon Atlanta, I discussed a different model.

Why should the platform team be the sole provider?

Why not turn the platform into a marketplace?

At a certain point, platform teams should stop being the builders of everything and become marketplace operators.

ML team contributes GPU scheduling
Data team contributes streaming pipelines
API team contributes rate limiting
Security team contributes authorization patterns

The platform provides the infrastructure for contribution, not every capability.

How the IDP Marketplace Model Works

Define clear interfaces

Expose APIs and standards for capability integration. Teams know exactly what to implement.

Build contribution templates

Provide scaffolding so teams don’t guess how to package their capability.

Automate validation

Every contribution must pass automated checks:

Metrics exposure
Security scans
Documentation
Health checks

Create recognition systems

Contribution isn’t charity. Track it. Reward it. Make it count in performance reviews.

Advantages of the IDP Marketplace Model

Parallel capability development instead of queues
Domain expertise embedded where it belongs
Platform team focuses on primitives, not products
Network effects drive adoption and value

Organizations running mature marketplace models see 3–4x faster capability development compared to centralized teams.

But Here’s the Part Nobody Talks About

After KubeCon Atlanta, many teams shared failed attempts at this approach.

Governance Breakdown

No quality standards lead to capability sprawl
Developers don’t trust community contributions
Multiple poorly maintained implementations of the same thing

One organization had three different Postgres operators, none properly maintained. Teams gave up and installed Postgres manually.

Quality Problems

Capabilities work for the original team but fail later:

Security CVEs
Kubernetes upgrades
Hidden network assumptions

Nobody owns the fix. Capabilities become orphaned and unusable.

Contribution Friction

Platform APIs are complex. Contributing requires understanding:

Service meshes
CI/CD pipelines
Monitoring
Security policies

Only senior engineers contribute. Participation dies out.

Maintenance Nightmare

Kubernetes 1.35 drops. Who updates 40 capabilities?
Security patch lands. Who validates everything?
Production breaks at 3am. Who’s on call?

Prerequisites for Making Marketplaces Work

1. Platform Primitives That Enable Contribution

Capabilities must plug in without platform code changes. If every addition requires core modifications, your platform isn’t ready.

2. Enforced Quality Standards

Automated testing
Mandatory metrics and health checks
Security scanning for CVEs and secrets
Documentation requirements:
- Runbooks
- Troubleshooting guides
- Usage examples

No documentation means no shipment.

3. Ownership Beyond Initial Contribution

Define maintenance responsibilities upfront
Clear security patching ownership
Deprecation and migration policies
Explicit handoff mechanisms

“You build it, you own it for 12 months” is a valid rule.

4. Cultural Readiness

Inner-source culture already exists
Contributions count toward goals and reviews
Leadership supports contribution time

If leadership sees contribution as “not real work,” the marketplace fails.

Hybrid Approach

Don’t go all-in immediately.

Golden capabilities for common needs (70–80%)
Marketplace capabilities for specialized domains

Capability Tiers

Platform-blessed

Maintained by platform team, SLAs guaranteed
Community-maintained

Supported by contributors, use at own risk
Experimental

No stability guarantees

Clear expectations prevent surprises.

Next Step for You

If you’re hitting scaling issues:

Audit your backlog for domain-specific requests
Identify teams with deep expertise
Start with a low-risk pilot capability
Build templates and validation, not just docs
Establish governance before scale

If you’re building your first platform:

Start centralized
Design extensibility from day one
Avoid premature marketplace complexity

Real Insight

Platform maturity isn’t “build golden paths and stop.”

It’s:

Build golden paths
Recognize when they become bottlenecks
Evolve your model intentionally

Centralization gives control and consistency.

Marketplaces give scale and expertise.

Neither is perfect.

The right choice depends on your organization’s stage.

I explored platform marketplaces, governance models, and real-world failure modes at KubeCon Atlanta.

Want to discuss platform scaling or share your experience?

Connect with me on LinkedIn. If you’re struggling with platform engineering, contact our consultants—we help teams build platforms that actually scale.

DEV Community