DEV Community

linou518
linou518

Posted on

OpenClaw Guide Ch9: Multi-Server Cluster Deployment

Chapter 9: Multi-Server Cluster Deployment

🎯 Learning Objective: Master OpenClaw's multi-server deployment architecture, achieve high availability and horizontal scaling

πŸ—οΈ Why Multi-Server Deployment?

Single-Server Limitations

  • πŸ”₯ Performance Bottleneck: Limited single-machine resources
  • πŸ’₯ Single Point of Failure: Server failure takes down the entire system
  • πŸ“ˆ Scaling Difficulty: Vertical scaling is expensive, horizontal scaling not possible
  • 🌍 Geographic Limits: Cannot serve global users locally
  • πŸ”’ Security Risk: All services concentrated on one machine

Multi-Server Benefits

  • ⚑ High Performance: Distributed computing, parallel request processing
  • πŸ›‘οΈ High Availability: Server failure doesn't affect overall availability
  • πŸ“Š Horizontal Scaling: Dynamically add servers based on load
  • 🌐 Geographic Distribution: Deploy globally, serve users locally
  • πŸ” Security Isolation: Different services run in isolated environments

πŸ›οΈ Architecture Patterns

Pattern 1: Functional Separation

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Gateway       β”‚  β”‚   Agent Farm    β”‚  β”‚   Service       β”‚
β”‚   Server        β”‚  β”‚   Server        β”‚  β”‚   Server        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β€’ Load Balancer β”‚  β”‚ β€’ Agent-1       β”‚  β”‚ β€’ Database      β”‚
β”‚ β€’ API Gateway   β”‚  β”‚ β€’ Agent-2       β”‚  β”‚ β€’ File Storage  β”‚
β”‚ β€’ Auth Service  β”‚  β”‚ β€’ Agent-3       β”‚  β”‚ β€’ Message Queue β”‚
β”‚ β€’ Rate Limiting β”‚  β”‚ β€’ Agent-4       β”‚  β”‚ β€’ Cache         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

Pattern 2: Geographic Distribution

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Global CDN    β”‚
                    β”‚  & Load Balancerβ”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚                     β”‚                     β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
β”‚  US-East     β”‚     β”‚    EU-Central   β”‚     β”‚  Asia-Pacificβ”‚
β”‚  Region      β”‚     β”‚    Region       β”‚     β”‚   Region     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

Pattern 3: Microservices

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Service Mesh                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚Gateway  β”‚  β”‚ Agent   β”‚  β”‚ Session β”‚  β”‚  Tool   β”‚    β”‚
β”‚  β”‚Service  β”‚  β”‚Manager  β”‚  β”‚Manager  β”‚  β”‚Service  β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚Channel  β”‚  β”‚Memory   β”‚  β”‚ Config  β”‚  β”‚Monitor  β”‚    β”‚
β”‚  β”‚Service  β”‚  β”‚Service  β”‚  β”‚Service  β”‚  β”‚Service  β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

πŸ› οΈ Hands-On: 3-Server Cluster Deployment

Server Planning

Server Role Specs Components
Server-1 Gateway + LB 4-core 8 GB Gateway, load balancer, auth
Server-2 Agent Farm 8-core 16 GB Primary Agent instances
Server-3 Services 4-core 8 GB Database, cache, storage

Network Architecture

Internet
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Load        β”‚
β”‚ Balancer    β”‚  ← Nginx/HAProxy
β”‚ (Server-1)  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
   β”Œβ”€β”€β”€β”΄β”€β”€β”€β”
   β–Ό       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ OpenClaw    β”‚    β”‚ OpenClaw    β”‚
β”‚ Gateway     β”‚    β”‚ Agents      β”‚
β”‚ (Server-1)  │◄──►│ (Server-2)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚                  β”‚
       └──────────────────┼────────┐
                          β”‚        β”‚
                    β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”
                    β”‚   Services     β”‚
                    β”‚  β€’ PostgreSQL  β”‚
                    β”‚  β€’ Redis       β”‚
                    β”‚  β€’ File Store  β”‚
                    β”‚  (Server-3)    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

πŸš€ Deployment Guide

Step 1: Server Preparation

Base Environment Setup (All Servers)

#!/bin/bash
# setup-base.sh β€” Base environment setup

# Update system
sudo apt update && sudo apt upgrade -y

# Install base software
sudo apt install -y \
    curl wget git vim htop \
    docker.io docker-compose \
    nginx certbot \
    postgresql-client redis-tools

# Configure Docker
sudo usermod -aG docker $USER
sudo systemctl enable docker
sudo systemctl start docker

# Install Node.js
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs

# Install OpenClaw
sudo npm install -g @openclaw/openclaw

# Configure firewall
sudo ufw allow ssh
sudo ufw allow 80
sudo ufw allow 443
sudo ufw allow 18789  # OpenClaw Gateway
sudo ufw --force enable
Enter fullscreen mode Exit fullscreen mode

Step 2: Server-1 (Gateway) Configuration

Nginx Load Balancer

upstream openclaw_gateway {
    least_conn;
    server 127.0.0.1:18789 max_fails=3 fail_timeout=30s;
    server SERVER-2-IP:18789 max_fails=3 fail_timeout=30s backup;
}

server {
    listen 443 ssl http2;
    server_name your-domain.com;

    ssl_certificate /etc/letsencrypt/live/your-domain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/your-domain.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;

    add_header Strict-Transport-Security "max-age=63072000" always;

    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
    limit_req zone=api burst=20 nodelay;

    location / {
        proxy_pass http://openclaw_gateway;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_read_timeout 300s;
    }

    location /health {
        access_log off;
        proxy_pass http://openclaw_gateway/health;
    }
}
Enter fullscreen mode Exit fullscreen mode

Gateway Configuration

{
  "gateway": {
    "port": 18789,
    "bind": "all",
    "cluster": {
      "enabled": true,
      "mode": "gateway",
      "peers": [
        "http://SERVER-2-IP:18789"
      ]
    }
  },
  "database": {
    "type": "postgresql",
    "host": "SERVER-3-IP",
    "port": 5432,
    "database": "openclaw",
    "username": "openclaw",
    "password": "SECURE_PASSWORD"
  },
  "cache": {
    "type": "redis",
    "host": "SERVER-3-IP",
    "port": 6379,
    "password": "REDIS_PASSWORD"
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Server-2 (Agent Farm) Configuration

{
  "gateway": {
    "port": 18789,
    "bind": "all",
    "cluster": {
      "enabled": true,
      "mode": "agent",
      "primary": "SERVER-1-IP:18789"
    }
  },
  "agents": [
    {
      "id": "email-manager",
      "name": "Email Manager",
      "model": "anthropic/claude-sonnet-4-20250514",
      "workspace": {"root": "./agents/email-manager"},
      "resources": {"memory": "2GB", "cpu": "2"}
    },
    {
      "id": "calendar-manager",
      "name": "Calendar Manager",
      "model": "anthropic/claude-sonnet-4-20250514",
      "workspace": {"root": "./agents/calendar-manager"},
      "resources": {"memory": "1.5GB", "cpu": "1.5"}
    },
    {
      "id": "doc-processor",
      "name": "Document Processor",
      "model": "anthropic/claude-sonnet-4-20250514",
      "workspace": {"root": "./agents/doc-processor"},
      "resources": {"memory": "3GB", "cpu": "2.5"}
    },
    {
      "id": "data-analyst",
      "name": "Data Analyst",
      "model": "anthropic/claude-sonnet-4-20250514",
      "workspace": {"root": "./agents/data-analyst"},
      "resources": {"memory": "4GB", "cpu": "3"}
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Server-3 (Services) Configuration

Database Setup

-- setup-database.sql
CREATE DATABASE openclaw;
CREATE USER openclaw WITH PASSWORD 'SECURE_PASSWORD';
GRANT ALL PRIVILEGES ON DATABASE openclaw TO openclaw;

\c openclaw

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE IF NOT EXISTS agents (
    id VARCHAR(50) PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    config JSONB NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE IF NOT EXISTS sessions (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    agent_id VARCHAR(50) NOT NULL,
    user_id VARCHAR(100),
    channel VARCHAR(50),
    context JSONB,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (agent_id) REFERENCES agents(id)
);

CREATE TABLE IF NOT EXISTS messages (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    session_id UUID NOT NULL,
    role VARCHAR(20) NOT NULL,
    content TEXT NOT NULL,
    metadata JSONB,
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (session_id) REFERENCES sessions(id)
);

CREATE INDEX idx_sessions_agent_id ON sessions(agent_id);
CREATE INDEX idx_messages_session_id ON messages(session_id);
CREATE INDEX idx_messages_created_at ON messages(created_at);
Enter fullscreen mode Exit fullscreen mode

Redis Configuration

port 6379
bind 0.0.0.0
protected-mode yes
requirepass REDIS_PASSWORD

save 900 1
save 300 10
save 60 10000

maxmemory 2gb
maxmemory-policy allkeys-lru
Enter fullscreen mode Exit fullscreen mode

πŸ”„ Automated Deployment

Deployment Script

#!/bin/bash
# deploy.sh β€” Automated deployment script

set -e

SERVERS=("SERVER-1-IP" "SERVER-2-IP" "SERVER-3-IP")
SSH_USER="openclaw"
SSH_KEY="~/.ssh/openclaw-deploy"

log_info() { echo -e "\033[0;32m[INFO]\033[0m $1"; }
log_error() { echo -e "\033[0;31m[ERROR]\033[0m $1"; }

# Check prerequisites
check_prerequisites() {
    log_info "Checking deployment prerequisites..."
    for server in "${SERVERS[@]}"; do
        if ! ssh -i $SSH_KEY -o ConnectTimeout=5 $SSH_USER@$server "echo 'SSH OK'" &>/dev/null; then
            log_error "Cannot SSH to $server"
            exit 1
        fi
    done
    log_info "Prerequisites check passed"
}

# Deploy Services server
deploy_services() {
    log_info "Deploying Services server..."
    ssh -i $SSH_KEY $SSH_USER@${SERVERS[2]} << 'EOF'
        cd ~/openclaw
        docker-compose -f docker-compose.services.yml down
        docker-compose -f docker-compose.services.yml pull
        docker-compose -f docker-compose.services.yml up -d
EOF
    log_info "Services server deployment complete"
}

# Deploy Agent server
deploy_agents() {
    log_info "Deploying Agent server..."
    ssh -i $SSH_KEY $SSH_USER@${SERVERS[1]} << 'EOF'
        cd ~/openclaw
        openclaw gateway stop 2>/dev/null || true
        sudo npm update -g @openclaw/openclaw
        openclaw gateway start --config agents/config.json --daemon
        sleep 10
        openclaw status
EOF
    log_info "Agent server deployment complete"
}

# Deploy Gateway server
deploy_gateway() {
    log_info "Deploying Gateway server..."
    ssh -i $SSH_KEY $SSH_USER@${SERVERS[0]} << 'EOF'
        cd ~/openclaw
        sudo nginx -t && sudo systemctl reload nginx
        openclaw gateway stop 2>/dev/null || true
        sudo npm update -g @openclaw/openclaw
        openclaw gateway start --config gateway/config.json --daemon
        sleep 10
        openclaw status
        curl -f http://localhost:18789/health || exit 1
EOF
    log_info "Gateway server deployment complete"
}

# Health check
health_check() {
    log_info "Running health checks..."
    for server in "${SERVERS[@]}"; do
        ssh -i $SSH_KEY $SSH_USER@$server "openclaw status" || log_error "$server status abnormal"
    done
}

# Main deployment flow
main() {
    log_info "Starting OpenClaw cluster deployment..."
    check_prerequisites
    deploy_services
    sleep 60
    deploy_agents
    sleep 30
    deploy_gateway
    sleep 30
    health_check
    log_info "πŸŽ‰ OpenClaw cluster deployment complete!"
}

main "$@"
Enter fullscreen mode Exit fullscreen mode

πŸ“Š Cluster Monitoring

Prometheus Configuration

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'openclaw-gateway'
    static_configs:
      - targets: ['SERVER-1-IP:9090', 'SERVER-2-IP:9090']

  - job_name: 'system'
    static_configs:
      - targets: ['SERVER-1-IP:9100', 'SERVER-2-IP:9100', 'SERVER-3-IP:9100']

  - job_name: 'postgres'
    static_configs:
      - targets: ['SERVER-3-IP:9187']

  - job_name: 'redis'
    static_configs:
      - targets: ['SERVER-3-IP:9121']
Enter fullscreen mode Exit fullscreen mode

Alert Rules

groups:
  - name: openclaw
    rules:
      - alert: GatewayDown
        expr: up{job="openclaw-gateway"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "OpenClaw Gateway is down on {{ $labels.instance }}"

      - alert: HighResponseTime
        expr: openclaw_request_duration_seconds{quantile="0.95"} > 5
        for: 2m
        labels:
          severity: warning

      - alert: HighErrorRate
        expr: rate(openclaw_requests_total{status=~"5.."}[5m]) > 0.1
        for: 2m
        labels:
          severity: critical
Enter fullscreen mode Exit fullscreen mode

βœ… Chapter Summary

After this chapter, you have mastered:

  • [x] Multi-server architecture design concepts and patterns
  • [x] Complete 3-server cluster deployment
  • [x] Load balancing and high-availability configuration
  • [x] Database cluster and cache configuration
  • [x] Automated deployment and configuration management
  • [x] Monitoring and alerting system setup
  • [x] Operations and fault handling procedures

πŸ“ Extended Exercises

  1. HA Practice: Simulate server failure, test automatic failover
  2. Performance Tuning: Use load testing tools to find cluster performance limits
  3. Disaster Recovery Drill: Execute a complete backup and recovery process
  4. Multi-Region Deployment: Extend to a geographically distributed global deployment
  5. Monitoring Optimization: Build a comprehensive SRE monitoring system

Ready to tackle even larger-scale enterprise deployments? 🎯

Top comments (0)