Chapter 9: Multi-Server Cluster Deployment
π― Learning Objective: Master OpenClaw's multi-server deployment architecture, achieve high availability and horizontal scaling
ποΈ Why Multi-Server Deployment?
Single-Server Limitations
- π₯ Performance Bottleneck: Limited single-machine resources
- π₯ Single Point of Failure: Server failure takes down the entire system
- π Scaling Difficulty: Vertical scaling is expensive, horizontal scaling not possible
- π Geographic Limits: Cannot serve global users locally
- π Security Risk: All services concentrated on one machine
Multi-Server Benefits
- β‘ High Performance: Distributed computing, parallel request processing
- π‘οΈ High Availability: Server failure doesn't affect overall availability
- π Horizontal Scaling: Dynamically add servers based on load
- π Geographic Distribution: Deploy globally, serve users locally
- π Security Isolation: Different services run in isolated environments
ποΈ Architecture Patterns
Pattern 1: Functional Separation
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Gateway β β Agent Farm β β Service β
β Server β β Server β β Server β
βββββββββββββββββββ€ βββββββββββββββββββ€ βββββββββββββββββββ€
β β’ Load Balancer β β β’ Agent-1 β β β’ Database β
β β’ API Gateway β β β’ Agent-2 β β β’ File Storage β
β β’ Auth Service β β β’ Agent-3 β β β’ Message Queue β
β β’ Rate Limiting β β β’ Agent-4 β β β’ Cache β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Pattern 2: Geographic Distribution
βββββββββββββββββββ
β Global CDN β
β & Load Balancerβ
βββββββββββ¬ββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββ
β β β
βββββββββΌβββββββ ββββββββββΌβββββββββ ββββββββΌββββββββ
β US-East β β EU-Central β β Asia-Pacificβ
β Region β β Region β β Region β
ββββββββββββββββ βββββββββββββββββββ ββββββββββββββββ
Pattern 3: Microservices
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Service Mesh β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β βGateway β β Agent β β Session β β Tool β β
β βService β βManager β βManager β βService β β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β βChannel β βMemory β β Config β βMonitor β β
β βService β βService β βService β βService β β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π οΈ Hands-On: 3-Server Cluster Deployment
Server Planning
| Server | Role | Specs | Components |
|---|---|---|---|
| Server-1 | Gateway + LB | 4-core 8 GB | Gateway, load balancer, auth |
| Server-2 | Agent Farm | 8-core 16 GB | Primary Agent instances |
| Server-3 | Services | 4-core 8 GB | Database, cache, storage |
Network Architecture
Internet
β
βΌ
βββββββββββββββ
β Load β
β Balancer β β Nginx/HAProxy
β (Server-1) β
ββββββββ¬βββββββ
β
βββββ΄ββββ
βΌ βΌ
βββββββββββββββ βββββββββββββββ
β OpenClaw β β OpenClaw β
β Gateway β β Agents β
β (Server-1) βββββΊβ (Server-2) β
βββββββββββββββ ββββββββ¬βββββββ
β β
ββββββββββββββββββββΌβββββββββ
β β
βββββββΌβββββββββΌββ
β Services β
β β’ PostgreSQL β
β β’ Redis β
β β’ File Store β
β (Server-3) β
ββββββββββββββββββ
π Deployment Guide
Step 1: Server Preparation
Base Environment Setup (All Servers)
#!/bin/bash
# setup-base.sh β Base environment setup
# Update system
sudo apt update && sudo apt upgrade -y
# Install base software
sudo apt install -y \
curl wget git vim htop \
docker.io docker-compose \
nginx certbot \
postgresql-client redis-tools
# Configure Docker
sudo usermod -aG docker $USER
sudo systemctl enable docker
sudo systemctl start docker
# Install Node.js
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs
# Install OpenClaw
sudo npm install -g @openclaw/openclaw
# Configure firewall
sudo ufw allow ssh
sudo ufw allow 80
sudo ufw allow 443
sudo ufw allow 18789 # OpenClaw Gateway
sudo ufw --force enable
Step 2: Server-1 (Gateway) Configuration
Nginx Load Balancer
upstream openclaw_gateway {
least_conn;
server 127.0.0.1:18789 max_fails=3 fail_timeout=30s;
server SERVER-2-IP:18789 max_fails=3 fail_timeout=30s backup;
}
server {
listen 443 ssl http2;
server_name your-domain.com;
ssl_certificate /etc/letsencrypt/live/your-domain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/your-domain.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
add_header Strict-Transport-Security "max-age=63072000" always;
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
limit_req zone=api burst=20 nodelay;
location / {
proxy_pass http://openclaw_gateway;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_read_timeout 300s;
}
location /health {
access_log off;
proxy_pass http://openclaw_gateway/health;
}
}
Gateway Configuration
{
"gateway": {
"port": 18789,
"bind": "all",
"cluster": {
"enabled": true,
"mode": "gateway",
"peers": [
"http://SERVER-2-IP:18789"
]
}
},
"database": {
"type": "postgresql",
"host": "SERVER-3-IP",
"port": 5432,
"database": "openclaw",
"username": "openclaw",
"password": "SECURE_PASSWORD"
},
"cache": {
"type": "redis",
"host": "SERVER-3-IP",
"port": 6379,
"password": "REDIS_PASSWORD"
}
}
Step 3: Server-2 (Agent Farm) Configuration
{
"gateway": {
"port": 18789,
"bind": "all",
"cluster": {
"enabled": true,
"mode": "agent",
"primary": "SERVER-1-IP:18789"
}
},
"agents": [
{
"id": "email-manager",
"name": "Email Manager",
"model": "anthropic/claude-sonnet-4-20250514",
"workspace": {"root": "./agents/email-manager"},
"resources": {"memory": "2GB", "cpu": "2"}
},
{
"id": "calendar-manager",
"name": "Calendar Manager",
"model": "anthropic/claude-sonnet-4-20250514",
"workspace": {"root": "./agents/calendar-manager"},
"resources": {"memory": "1.5GB", "cpu": "1.5"}
},
{
"id": "doc-processor",
"name": "Document Processor",
"model": "anthropic/claude-sonnet-4-20250514",
"workspace": {"root": "./agents/doc-processor"},
"resources": {"memory": "3GB", "cpu": "2.5"}
},
{
"id": "data-analyst",
"name": "Data Analyst",
"model": "anthropic/claude-sonnet-4-20250514",
"workspace": {"root": "./agents/data-analyst"},
"resources": {"memory": "4GB", "cpu": "3"}
}
]
}
Step 4: Server-3 (Services) Configuration
Database Setup
-- setup-database.sql
CREATE DATABASE openclaw;
CREATE USER openclaw WITH PASSWORD 'SECURE_PASSWORD';
GRANT ALL PRIVILEGES ON DATABASE openclaw TO openclaw;
\c openclaw
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE TABLE IF NOT EXISTS agents (
id VARCHAR(50) PRIMARY KEY,
name VARCHAR(100) NOT NULL,
config JSONB NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE IF NOT EXISTS sessions (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
agent_id VARCHAR(50) NOT NULL,
user_id VARCHAR(100),
channel VARCHAR(50),
context JSONB,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (agent_id) REFERENCES agents(id)
);
CREATE TABLE IF NOT EXISTS messages (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
session_id UUID NOT NULL,
role VARCHAR(20) NOT NULL,
content TEXT NOT NULL,
metadata JSONB,
created_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (session_id) REFERENCES sessions(id)
);
CREATE INDEX idx_sessions_agent_id ON sessions(agent_id);
CREATE INDEX idx_messages_session_id ON messages(session_id);
CREATE INDEX idx_messages_created_at ON messages(created_at);
Redis Configuration
port 6379
bind 0.0.0.0
protected-mode yes
requirepass REDIS_PASSWORD
save 900 1
save 300 10
save 60 10000
maxmemory 2gb
maxmemory-policy allkeys-lru
π Automated Deployment
Deployment Script
#!/bin/bash
# deploy.sh β Automated deployment script
set -e
SERVERS=("SERVER-1-IP" "SERVER-2-IP" "SERVER-3-IP")
SSH_USER="openclaw"
SSH_KEY="~/.ssh/openclaw-deploy"
log_info() { echo -e "\033[0;32m[INFO]\033[0m $1"; }
log_error() { echo -e "\033[0;31m[ERROR]\033[0m $1"; }
# Check prerequisites
check_prerequisites() {
log_info "Checking deployment prerequisites..."
for server in "${SERVERS[@]}"; do
if ! ssh -i $SSH_KEY -o ConnectTimeout=5 $SSH_USER@$server "echo 'SSH OK'" &>/dev/null; then
log_error "Cannot SSH to $server"
exit 1
fi
done
log_info "Prerequisites check passed"
}
# Deploy Services server
deploy_services() {
log_info "Deploying Services server..."
ssh -i $SSH_KEY $SSH_USER@${SERVERS[2]} << 'EOF'
cd ~/openclaw
docker-compose -f docker-compose.services.yml down
docker-compose -f docker-compose.services.yml pull
docker-compose -f docker-compose.services.yml up -d
EOF
log_info "Services server deployment complete"
}
# Deploy Agent server
deploy_agents() {
log_info "Deploying Agent server..."
ssh -i $SSH_KEY $SSH_USER@${SERVERS[1]} << 'EOF'
cd ~/openclaw
openclaw gateway stop 2>/dev/null || true
sudo npm update -g @openclaw/openclaw
openclaw gateway start --config agents/config.json --daemon
sleep 10
openclaw status
EOF
log_info "Agent server deployment complete"
}
# Deploy Gateway server
deploy_gateway() {
log_info "Deploying Gateway server..."
ssh -i $SSH_KEY $SSH_USER@${SERVERS[0]} << 'EOF'
cd ~/openclaw
sudo nginx -t && sudo systemctl reload nginx
openclaw gateway stop 2>/dev/null || true
sudo npm update -g @openclaw/openclaw
openclaw gateway start --config gateway/config.json --daemon
sleep 10
openclaw status
curl -f http://localhost:18789/health || exit 1
EOF
log_info "Gateway server deployment complete"
}
# Health check
health_check() {
log_info "Running health checks..."
for server in "${SERVERS[@]}"; do
ssh -i $SSH_KEY $SSH_USER@$server "openclaw status" || log_error "$server status abnormal"
done
}
# Main deployment flow
main() {
log_info "Starting OpenClaw cluster deployment..."
check_prerequisites
deploy_services
sleep 60
deploy_agents
sleep 30
deploy_gateway
sleep 30
health_check
log_info "π OpenClaw cluster deployment complete!"
}
main "$@"
π Cluster Monitoring
Prometheus Configuration
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'openclaw-gateway'
static_configs:
- targets: ['SERVER-1-IP:9090', 'SERVER-2-IP:9090']
- job_name: 'system'
static_configs:
- targets: ['SERVER-1-IP:9100', 'SERVER-2-IP:9100', 'SERVER-3-IP:9100']
- job_name: 'postgres'
static_configs:
- targets: ['SERVER-3-IP:9187']
- job_name: 'redis'
static_configs:
- targets: ['SERVER-3-IP:9121']
Alert Rules
groups:
- name: openclaw
rules:
- alert: GatewayDown
expr: up{job="openclaw-gateway"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "OpenClaw Gateway is down on {{ $labels.instance }}"
- alert: HighResponseTime
expr: openclaw_request_duration_seconds{quantile="0.95"} > 5
for: 2m
labels:
severity: warning
- alert: HighErrorRate
expr: rate(openclaw_requests_total{status=~"5.."}[5m]) > 0.1
for: 2m
labels:
severity: critical
β Chapter Summary
After this chapter, you have mastered:
- [x] Multi-server architecture design concepts and patterns
- [x] Complete 3-server cluster deployment
- [x] Load balancing and high-availability configuration
- [x] Database cluster and cache configuration
- [x] Automated deployment and configuration management
- [x] Monitoring and alerting system setup
- [x] Operations and fault handling procedures
π Extended Exercises
- HA Practice: Simulate server failure, test automatic failover
- Performance Tuning: Use load testing tools to find cluster performance limits
- Disaster Recovery Drill: Execute a complete backup and recovery process
- Multi-Region Deployment: Extend to a geographically distributed global deployment
- Monitoring Optimization: Build a comprehensive SRE monitoring system
Ready to tackle even larger-scale enterprise deployments? π―
Top comments (0)