Building and Maintaining Enterprise Tools as a Solo Developer

#devops #backend #tooling #programming

👋 Let's Connect! Follow me on GitHub for new projects and tips.

Introduction

Anyone who has worked in enterprise environments at any point in their career can tell you a real constraint is operational attention. This guide focuses on patterns that reduce load: tight scope, safe defaults, automated checks, and predictable maintenance. All examples that follow are just for concept illustration.

Scope, Ownership, and Enterprise Criteria

Define criteria up front.

SLOs and blast radius
- Set a basic SLO (e.g., 99.5% monthly availability) and a max acceptable data loss (RPO) / recovery time (RTO).
- Identify the blast radius: which teams, which workflows, what happens if it’s down.
Non-negotiables
- Authentication + authorization (no shared logins).
- Audit trail for sensitive actions.
- Backups + restore test.
- Observability: logs + metrics + error reporting.
- CI checks and repeatable deploys.
Explicit ownership
- Write down: who approves access, who onboards users, who rotates secrets, who can disable the tool.
- If it’s you, automate responsibly.

Pitfall: internal tools often become “critical” without being treated as such. Add a banner in the README: support hours, escalation path, and what to do if it breaks.

Architecture and Operations That Scale Down (Solo Friendly)

Optimize for simplicity under change.

Choose boring building blocks
- One service, one database, one deployment target.
- Prefer managed services for auth, DB, and secrets if your org provides them.
Data safety
- Use migrations, constraints, and idempotent writes.
- Add “dry run” modes for destructive operations.
- Prefer append only audit tables for critical workflows.
Security baseline
- SSO/OIDC if possible; enforce MFA and short-lived sessions.
- RBAC: start with minimum roles (viewer/operator/admin).
- Least privilege for service accounts; separate read vs write credentials.
- CSRF protection for browser apps; strict CORS; secure cookies.
Deployability
- Single command deploy (CI does it).
- Blue/green or rolling deploy if supported; otherwise maintenance window + fast rollback.
- Feature flags for risky changes.
Observability
- Structured logs with request_id/user_id.
- Metrics: request rate, latency, error rate, job failures, DB errors.
- Alert on symptoms (5xx rate, job backlog), not on every exception.

Example 1: CI Gate for a Solo Maintained Tool

A minimal CI pipeline that prevents the most common solo dev regressions: failing tests, broken migrations, lint drift, and missing env config.

Step 1: Add a GitHub Actions workflow (.github/workflows/ci.yml)

name: ci

on:
  pull_request:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_USER: SECRET
          POSTGRES_PASSWORD: SECRET
          POSTGRES_DB: SECRET
        ports:
          - 5432:5432
        options: >-
          --health-cmd="pg_isready -U app -d app_test"
          --health-interval=5s
          --health-timeout=5s
          --health-retries=10

    env:
      DATABASE_URL: postgresql://app:app@localhost:5432/app_test
      NODE_ENV: test

    steps:
      - uses: actions/checkout@v4

      - name: Use Node
        uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm

      - name: Install
        run: npm ci

      - name: Lint
        run: npm run lint

      - name: Typecheck
        run: npm run typecheck

      - name: Migrate (smoke test)
        run: npm run db:migrate

      - name: Test
        run: npm test -- --runInBand

      - name: Build
        run: npm run build

Step 2: Run the same checks locally

npm ci && npm run lint && npm run typecheck && npm run db:migrate && npm test && npm run build

Expected Output

> lint
✔ no issues found

> typecheck
✔ 0 errors

> db:migrate
Applied 3 migrations

> test
PASS 42 tests

> build
Build completed successfully

Notes:

The migration smoke test catches “works locally” schema drift early.
If you can’t run DB in CI, at least validate migrations compile and run against a disposable container in a nightly job.
Keep the pipeline under ~10 minutes; long CI trains solo devs to bypass it.

Example 2: Structured Logging + Request Correlation (Node/Express)

Make debugging cheap: every log line should tell you who did what, where, and why it failed.

Add request_id and structured logs

const app = express();
const logger = pino({ level: process.env.LOG_LEVEL ?? "info" });

app.use(
  pinoHttp({
    logger,
    genReqId: (req, res) => {
      const id = (req.headers["x-request-id"] as string) ?? randomUUID();
      res.setHeader("x-request-id", id);
      return id;
    },
    customProps: (req) => ({
      user_id: (req as any).user?.id ?? null, // set after auth middleware
    }),
    redact: {
      paths: ["req.headers.authorization", "req.body.password", "req.body.token"],
      remove: true,
    },
  })
);

app.get("/healthz", (_req, res) => res.status(200).send("ok"));

app.post("/admin/reindex", async (req, res) => {
  req.log.info({ action: "reindex_start" }, "admin action");
  // ... do work
  req.log.info({ action: "reindex_done" }, "admin action complete");
  res.json({ ok: true });
});

app.use((err: any, req: any, res: any, _next: any) => {
  req.log.error({ err }, "request failed");
  res.status(500).json({ error: "internal_error", request_id: req.id });
});

Output

{"level":30,"time":...,"req":{"id":"9c...","method":"POST","url":"/admin/reindex"},"user_id":"u_123","action":"reindex_start","msg":"admin action"}
{"level":30,"time":...,"req":{"id":"9c...","method":"POST","url":"/admin/reindex"},"user_id":"u_123","action":"reindex_done","msg":"admin action complete"}

Notes:

Always return request_id to the caller; it’s your fastest support loop.
Redact secrets at the logger level; don’t rely on developer discipline.
Add an audit table for admin actions; logs are not an audit trail.

Solution: A Solo Developer Maintenance Loop That Prevents Fires

Treat maintenance as a product feature. The goal is stability with a small weekly budget.

Weekly (30–60 min)
- Review error budget signals: 5xx rate, job failures, slow endpoints.
- Triage dependency updates (security first).
- Scan audit logs for unexpected admin actions.
Monthly (1–2 hrs)
- Restore test from backup into a scratch environment.
- Rotate secrets (or validate rotation automation).
- Review access list and remove stale accounts/roles.
Quarterly
- Chaos-lite: kill a worker, simulate DB failover (if applicable), validate alerts.
- Revisit SLOs and “critical path” workflows.

Automate the routine checks so you don’t rely on memory.

# Example: a simple scheduled “ops check” script you can run in CI or cron
./scripts/ops-check.sh

Notes:

Your best leverage is removing manual steps: onboarding, deploys, migrations, and access changes.
If a task happens more than twice, script it; if it’s risky once, add a guardrail (dry-run, confirmation, role check).

Key Takeaways

Define criteria upfront: auth, auditability, backups+restore, observability, and repeatable deploys.
Optimize for operational simplicity: architecture, safe data patterns, and CI gates that catch drift.
Run a lightweight maintenance loop with automated checks; solo success is about reducing attention load.

Conclusion

Solo built tools can be enterprise grade if you design for reliability and maintenance from day one: constrain scope, enforce security defaults, automate validation, and keep a predictable ops cadence. The payoff is fewer interruptions and a tool that earns trust across the org.

Meta Description
A pragmatic playbook for solo developers building enterprise grade tools: scope, security, CI/CD, observability, and maintenance routines with concrete examples.

TLDR - Highlights for Skimmers

Ship with non-negotiables: auth, audit trail, backups+restore test, and observability.
Add CI gates for lint/typecheck/tests and a migration smoke test to prevent schema drift.
Maintain with a weekly/monthly ops loop and automate anything that repeats or is risky.

What’s the one solo development failure mode you’ve seen most often; auth gaps, data drift, or deploy brittleness?