linou518

Posted on Feb 14

OpenClaw Guide Ch9: Multi-Server Cluster Deployment

#openclaw #ai #agents #automation

Chapter 9: Multi-Server Cluster Deployment

🎯 Learning Objective: Master OpenClaw's multi-server deployment architecture, achieve high availability and horizontal scaling

🏗️ Why Multi-Server Deployment?

Single-Server Limitations

🔥 Performance Bottleneck: Limited single-machine resources
💥 Single Point of Failure: Server failure takes down the entire system
📈 Scaling Difficulty: Vertical scaling is expensive, horizontal scaling not possible
🌍 Geographic Limits: Cannot serve global users locally
🔒 Security Risk: All services concentrated on one machine

Multi-Server Benefits

⚡ High Performance: Distributed computing, parallel request processing
🛡️ High Availability: Server failure doesn't affect overall availability
📊 Horizontal Scaling: Dynamically add servers based on load
🌐 Geographic Distribution: Deploy globally, serve users locally
🔐 Security Isolation: Different services run in isolated environments

🏛️ Architecture Patterns

Pattern 1: Functional Separation

┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│   Gateway       │  │   Agent Farm    │  │   Service       │
│   Server        │  │   Server        │  │   Server        │
├─────────────────┤  ├─────────────────┤  ├─────────────────┤
│ • Load Balancer │  │ • Agent-1       │  │ • Database      │
│ • API Gateway   │  │ • Agent-2       │  │ • File Storage  │
│ • Auth Service  │  │ • Agent-3       │  │ • Message Queue │
│ • Rate Limiting │  │ • Agent-4       │  │ • Cache         │
└─────────────────┘  └─────────────────┘  └─────────────────┘

Pattern 2: Geographic Distribution

                    ┌─────────────────┐
                    │   Global CDN    │
                    │  & Load Balancer│
                    └─────────┬───────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
┌───────▼──────┐     ┌────────▼────────┐     ┌──────▼───────┐
│  US-East     │     │    EU-Central   │     │  Asia-Pacific│
│  Region      │     │    Region       │     │   Region     │
└──────────────┘     └─────────────────┘     └──────────────┘

Pattern 3: Microservices

┌─────────────────────────────────────────────────────────┐
│                    Service Mesh                         │
├─────────────────────────────────────────────────────────┤
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐    │
│  │Gateway  │  │ Agent   │  │ Session │  │  Tool   │    │
│  │Service  │  │Manager  │  │Manager  │  │Service  │    │
│  └─────────┘  └─────────┘  └─────────┘  └─────────┘    │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐    │
│  │Channel  │  │Memory   │  │ Config  │  │Monitor  │    │
│  │Service  │  │Service  │  │Service  │  │Service  │    │
│  └─────────┘  └─────────┘  └─────────┘  └─────────┘    │
└─────────────────────────────────────────────────────────┘

🛠️ Hands-On: 3-Server Cluster Deployment

Server Planning

Server	Role	Specs	Components
Server-1	Gateway + LB	4-core 8 GB	Gateway, load balancer, auth
Server-2	Agent Farm	8-core 16 GB	Primary Agent instances
Server-3	Services	4-core 8 GB	Database, cache, storage

Network Architecture

Internet
    │
    ▼
┌─────────────┐
│ Load        │
│ Balancer    │  ← Nginx/HAProxy
│ (Server-1)  │
└──────┬──────┘
       │
   ┌───┴───┐
   ▼       ▼
┌─────────────┐    ┌─────────────┐
│ OpenClaw    │    │ OpenClaw    │
│ Gateway     │    │ Agents      │
│ (Server-1)  │◄──►│ (Server-2)  │
└─────────────┘    └──────┬──────┘
       │                  │
       └──────────────────┼────────┐
                          │        │
                    ┌─────▼────────▼─┐
                    │   Services     │
                    │  • PostgreSQL  │
                    │  • Redis       │
                    │  • File Store  │
                    │  (Server-3)    │
                    └────────────────┘

🚀 Deployment Guide

Step 1: Server Preparation

Base Environment Setup (All Servers)

#!/bin/bash
# setup-base.sh — Base environment setup

# Update system
sudo apt update && sudo apt upgrade -y

# Install base software
sudo apt install -y \
    curl wget git vim htop \
    docker.io docker-compose \
    nginx certbot \
    postgresql-client redis-tools

# Configure Docker
sudo usermod -aG docker $USER
sudo systemctl enable docker
sudo systemctl start docker

# Install Node.js
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs

# Install OpenClaw
sudo npm install -g @openclaw/openclaw

# Configure firewall
sudo ufw allow ssh
sudo ufw allow 80
sudo ufw allow 443
sudo ufw allow 18789  # OpenClaw Gateway
sudo ufw --force enable

Step 2: Server-1 (Gateway) Configuration

Nginx Load Balancer

upstream openclaw_gateway {
    least_conn;
    server 127.0.0.1:18789 max_fails=3 fail_timeout=30s;
    server SERVER-2-IP:18789 max_fails=3 fail_timeout=30s backup;
}

server {
    listen 443 ssl http2;
    server_name your-domain.com;

    ssl_certificate /etc/letsencrypt/live/your-domain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/your-domain.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;

    add_header Strict-Transport-Security "max-age=63072000" always;

    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
    limit_req zone=api burst=20 nodelay;

    location / {
        proxy_pass http://openclaw_gateway;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_read_timeout 300s;
    }

    location /health {
        access_log off;
        proxy_pass http://openclaw_gateway/health;
    }
}

Gateway Configuration

{
  "gateway": {
    "port": 18789,
    "bind": "all",
    "cluster": {
      "enabled": true,
      "mode": "gateway",
      "peers": [
        "http://SERVER-2-IP:18789"
      ]
    }
  },
  "database": {
    "type": "postgresql",
    "host": "SERVER-3-IP",
    "port": 5432,
    "database": "openclaw",
    "username": "openclaw",
    "password": "SECURE_PASSWORD"
  },
  "cache": {
    "type": "redis",
    "host": "SERVER-3-IP",
    "port": 6379,
    "password": "REDIS_PASSWORD"
  }
}

Step 3: Server-2 (Agent Farm) Configuration

{
  "gateway": {
    "port": 18789,
    "bind": "all",
    "cluster": {
      "enabled": true,
      "mode": "agent",
      "primary": "SERVER-1-IP:18789"
    }
  },
  "agents": [
    {
      "id": "email-manager",
      "name": "Email Manager",
      "model": "anthropic/claude-sonnet-4-20250514",
      "workspace": {"root": "./agents/email-manager"},
      "resources": {"memory": "2GB", "cpu": "2"}
    },
    {
      "id": "calendar-manager",
      "name": "Calendar Manager",
      "model": "anthropic/claude-sonnet-4-20250514",
      "workspace": {"root": "./agents/calendar-manager"},
      "resources": {"memory": "1.5GB", "cpu": "1.5"}
    },
    {
      "id": "doc-processor",
      "name": "Document Processor",
      "model": "anthropic/claude-sonnet-4-20250514",
      "workspace": {"root": "./agents/doc-processor"},
      "resources": {"memory": "3GB", "cpu": "2.5"}
    },
    {
      "id": "data-analyst",
      "name": "Data Analyst",
      "model": "anthropic/claude-sonnet-4-20250514",
      "workspace": {"root": "./agents/data-analyst"},
      "resources": {"memory": "4GB", "cpu": "3"}
    }
  ]
}

Step 4: Server-3 (Services) Configuration

Database Setup

-- setup-database.sql
CREATE DATABASE openclaw;
CREATE USER openclaw WITH PASSWORD 'SECURE_PASSWORD';
GRANT ALL PRIVILEGES ON DATABASE openclaw TO openclaw;

\c openclaw

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE IF NOT EXISTS agents (
    id VARCHAR(50) PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    config JSONB NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE IF NOT EXISTS sessions (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    agent_id VARCHAR(50) NOT NULL,
    user_id VARCHAR(100),
    channel VARCHAR(50),
    context JSONB,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (agent_id) REFERENCES agents(id)
);

CREATE TABLE IF NOT EXISTS messages (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    session_id UUID NOT NULL,
    role VARCHAR(20) NOT NULL,
    content TEXT NOT NULL,
    metadata JSONB,
    created_at TIMESTAMP DEFAULT NOW(),
    FOREIGN KEY (session_id) REFERENCES sessions(id)
);

CREATE INDEX idx_sessions_agent_id ON sessions(agent_id);
CREATE INDEX idx_messages_session_id ON messages(session_id);
CREATE INDEX idx_messages_created_at ON messages(created_at);

Redis Configuration

port 6379
bind 0.0.0.0
protected-mode yes
requirepass REDIS_PASSWORD

save 900 1
save 300 10
save 60 10000

maxmemory 2gb
maxmemory-policy allkeys-lru

🔄 Automated Deployment

Deployment Script

#!/bin/bash
# deploy.sh — Automated deployment script

set -e

SERVERS=("SERVER-1-IP" "SERVER-2-IP" "SERVER-3-IP")
SSH_USER="openclaw"
SSH_KEY="~/.ssh/openclaw-deploy"

log_info() { echo -e "\033[0;32m[INFO]\033[0m $1"; }
log_error() { echo -e "\033[0;31m[ERROR]\033[0m $1"; }

# Check prerequisites
check_prerequisites() {
    log_info "Checking deployment prerequisites..."
    for server in "${SERVERS[@]}"; do
        if ! ssh -i $SSH_KEY -o ConnectTimeout=5 $SSH_USER@$server "echo 'SSH OK'" &>/dev/null; then
            log_error "Cannot SSH to $server"
            exit 1
        fi
    done
    log_info "Prerequisites check passed"
}

# Deploy Services server
deploy_services() {
    log_info "Deploying Services server..."
    ssh -i $SSH_KEY $SSH_USER@${SERVERS[2]} << 'EOF'
        cd ~/openclaw
        docker-compose -f docker-compose.services.yml down
        docker-compose -f docker-compose.services.yml pull
        docker-compose -f docker-compose.services.yml up -d
EOF
    log_info "Services server deployment complete"
}

# Deploy Agent server
deploy_agents() {
    log_info "Deploying Agent server..."
    ssh -i $SSH_KEY $SSH_USER@${SERVERS[1]} << 'EOF'
        cd ~/openclaw
        openclaw gateway stop 2>/dev/null || true
        sudo npm update -g @openclaw/openclaw
        openclaw gateway start --config agents/config.json --daemon
        sleep 10
        openclaw status
EOF
    log_info "Agent server deployment complete"
}

# Deploy Gateway server
deploy_gateway() {
    log_info "Deploying Gateway server..."
    ssh -i $SSH_KEY $SSH_USER@${SERVERS[0]} << 'EOF'
        cd ~/openclaw
        sudo nginx -t && sudo systemctl reload nginx
        openclaw gateway stop 2>/dev/null || true
        sudo npm update -g @openclaw/openclaw
        openclaw gateway start --config gateway/config.json --daemon
        sleep 10
        openclaw status
        curl -f http://localhost:18789/health || exit 1
EOF
    log_info "Gateway server deployment complete"
}

# Health check
health_check() {
    log_info "Running health checks..."
    for server in "${SERVERS[@]}"; do
        ssh -i $SSH_KEY $SSH_USER@$server "openclaw status" || log_error "$server status abnormal"
    done
}

# Main deployment flow
main() {
    log_info "Starting OpenClaw cluster deployment..."
    check_prerequisites
    deploy_services
    sleep 60
    deploy_agents
    sleep 30
    deploy_gateway
    sleep 30
    health_check
    log_info "🎉 OpenClaw cluster deployment complete!"
}

main "$@"

📊 Cluster Monitoring

Prometheus Configuration

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'openclaw-gateway'
    static_configs:
      - targets: ['SERVER-1-IP:9090', 'SERVER-2-IP:9090']

  - job_name: 'system'
    static_configs:
      - targets: ['SERVER-1-IP:9100', 'SERVER-2-IP:9100', 'SERVER-3-IP:9100']

  - job_name: 'postgres'
    static_configs:
      - targets: ['SERVER-3-IP:9187']

  - job_name: 'redis'
    static_configs:
      - targets: ['SERVER-3-IP:9121']

Alert Rules

groups:
  - name: openclaw
    rules:
      - alert: GatewayDown
        expr: up{job="openclaw-gateway"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "OpenClaw Gateway is down on {{ $labels.instance }}"

      - alert: HighResponseTime
        expr: openclaw_request_duration_seconds{quantile="0.95"} > 5
        for: 2m
        labels:
          severity: warning

      - alert: HighErrorRate
        expr: rate(openclaw_requests_total{status=~"5.."}[5m]) > 0.1
        for: 2m
        labels:
          severity: critical

✅ Chapter Summary

After this chapter, you have mastered:

[x] Multi-server architecture design concepts and patterns
[x] Complete 3-server cluster deployment
[x] Load balancing and high-availability configuration
[x] Database cluster and cache configuration
[x] Automated deployment and configuration management
[x] Monitoring and alerting system setup
[x] Operations and fault handling procedures

📝 Extended Exercises

HA Practice: Simulate server failure, test automatic failover
Performance Tuning: Use load testing tools to find cluster performance limits
Disaster Recovery Drill: Execute a complete backup and recovery process
Multi-Region Deployment: Extend to a geographically distributed global deployment
Monitoring Optimization: Build a comprehensive SRE monitoring system

Ready to tackle even larger-scale enterprise deployments? 🎯

DEV Community

OpenClaw Guide Ch9: Multi-Server Cluster Deployment

Chapter 9: Multi-Server Cluster Deployment

🏗️ Why Multi-Server Deployment?

Single-Server Limitations

Multi-Server Benefits

🏛️ Architecture Patterns

Pattern 1: Functional Separation

Pattern 2: Geographic Distribution

Pattern 3: Microservices

🛠️ Hands-On: 3-Server Cluster Deployment

Server Planning

Network Architecture

🚀 Deployment Guide

Step 1: Server Preparation

Base Environment Setup (All Servers)

Step 2: Server-1 (Gateway) Configuration

Nginx Load Balancer

Gateway Configuration

Step 3: Server-2 (Agent Farm) Configuration

Step 4: Server-3 (Services) Configuration

Database Setup

Redis Configuration

🔄 Automated Deployment

Deployment Script

📊 Cluster Monitoring

Prometheus Configuration

Alert Rules

✅ Chapter Summary

📝 Extended Exercises

Top comments (0)