Sockudo
Server

Scaling

Horizontal adapters, transport backends, presence sync, and multi-node deployment.

Architecture Overview

Sockudo uses a pluggable adapter system for horizontal scaling. All horizontal adapters share the same HorizontalAdapterBase<T> core with a transport-specific backend.

Each node maintains local WebSocket connections and uses the adapter to synchronize state across the cluster.

Adapter Drivers

DriverUse CaseRequirements
localSingle-node, developmentNone
redisMulti-node with standalone RedisRedis 6+
redis-clusterMulti-node with Redis ClusterRedis Cluster 6+ (7.0+ for sharded pub/sub)
natsMulti-node with NATSNATS server

Set the adapter via config or env:

ADAPTER_DRIVER=redis
{ "adapter": { "driver": "redis" } }

Communication Model

All horizontal adapters communicate over three channels:

ChannelPurpose
Broadcast (prefix:#broadcast)Distribute events to subscribers across nodes
Request (prefix:#requests)Node-to-node queries (socket counts, presence members, etc.)
Response (prefix:#responses)Return aggregated responses to requests

Broadcasting Flow

  1. Node receives an event (from a client or the HTTP API).
  2. Delivers to matching local subscribers immediately.
  3. Serializes and publishes to the broadcast channel.
  4. Other nodes receive the message, apply filtering (socket exclusion, tag filtering, delta compression), and deliver to their local subscribers.

Request/Response Pattern

Cross-node queries use a request/response pattern with correlation IDs and timeouts:

  • Fast queries (socket existence, channel membership): 50ms timeout per expected node.
  • Full queries (presence members, channel lists): full configured timeout.
  • Early return: positive results (e.g., "socket exists") return immediately without waiting for all nodes.

Supported request types include:

  • Socket counts and channel lists
  • Channel membership and presence member queries
  • User connection counts
  • Presence state sync (join/leave replication)
  • Heartbeat and dead node notifications
  • Terminate user connections

Local Adapter

The default adapter. All communication stays in-process with no network overhead.

{ "adapter": { "driver": "local" } }

Best for:

  • Development and testing
  • Single-node deployments
  • Environments where horizontal scaling is not needed

Redis Adapter

Uses Redis Pub/Sub for cross-node communication.

{
  "adapter": {
    "driver": "redis",
    "redis": {
      "requests_timeout": 5000,
      "prefix": "sockudo_adapter:",
      "cluster_mode": false
    }
  }
}

Configuration

FieldTypeDefaultDescription
requests_timeoutnumber5000Request timeout in ms
prefixstringsockudo_adapter:Redis key/channel prefix
redis_pub_optionsobject{}Additional options for the publish connection
redis_sub_optionsobject{}Additional options for the subscribe connection
cluster_modeboolfalseUse cluster-aware connections

Connection Management

  • Uses ConnectionManager for automatic reconnection (5 retries with exponential backoff).
  • Maintains a separate connection for the Events API to avoid blocking pub/sub.
  • Node counting via PUBSUB NUMSUB.
  • Publish retries: 3 attempts with exponential backoff (100ms to 1000ms).

Redis Database Connection

The Redis adapter uses the connection configured in database.redis. You can override it globally with REDIS_URL:

REDIS_URL=redis://:password@redis.example.com:6379/0

This overrides adapter, cache, queue, and rate limiter Redis connections in one shot.

Redis Cluster Adapter

Uses Redis Cluster for sharded, multi-node pub/sub.

{
  "adapter": {
    "driver": "redis-cluster",
    "cluster": {
      "nodes": [
        "redis://10.0.0.11:7000",
        "redis://10.0.0.12:7001",
        "redis://10.0.0.13:7002"
      ],
      "prefix": "sockudo_adapter:",
      "request_timeout_ms": 1000,
      "use_connection_manager": true,
      "use_sharded_pubsub": false
    }
  }
}

Configuration

FieldTypeDefaultDescription
nodesstring[][]Cluster seed node URLs
prefixstringsockudo_adapter:Key/channel prefix
request_timeout_msnumber1000Request timeout in ms
use_connection_managerbooltrueUse persistent connection manager
use_sharded_pubsubboolfalseUse SSUBSCRIBE/SPUBLISH (Redis 7.0+)

Sharded Pub/Sub

When use_sharded_pubsub is enabled, the adapter uses Redis 7.0+ sharded pub/sub commands (SSUBSCRIBE/SPUBLISH). This routes messages to the specific shard that owns the channel's hash slot instead of broadcasting to all nodes.

Benefits:

  • Significantly better performance in large clusters.
  • Reduced network traffic (messages go only to relevant shards).
  • Eliminates pub/sub hotspots on single nodes.
{
  "adapter": {
    "driver": "redis-cluster",
    "cluster": {
      "use_sharded_pubsub": true
    }
  }
}
Sharded pub/sub requires Redis 7.0+. Enable only if all cluster nodes support it.

Connection Strategy

  • Persistent publish connection: cloned per operation (cheap, thread-safe).
  • Dedicated health check connection: prevents timeouts under high load (10K+ msg/s).
  • Reconnection with exponential backoff (500ms to 10s max).

Cluster Node Configuration

Provide seed nodes via config or env:

REDIS_CLUSTER_NODES=redis://10.0.0.11:7000,redis://10.0.0.12:7001
# or
DATABASE_REDIS_CLUSTER_NODES=redis://10.0.0.11:7000,redis://10.0.0.12:7001

Cluster auth and TLS are configured under database.redis.cluster:

{
  "database": {
    "redis": {
      "cluster": {
        "nodes": [
          { "host": "node1.example.com", "port": 7000 },
          { "host": "node2.example.com", "port": 7001 }
        ],
        "username": "cluster-user",
        "password": "cluster-secret",
        "use_tls": true
      }
    }
  }
}

NATS Adapter

Uses NATS subjects for cross-node messaging.

{
  "adapter": {
    "driver": "nats",
    "nats": {
      "servers": ["nats://localhost:4222"],
      "prefix": "sockudo_adapter:",
      "request_timeout_ms": 5000,
      "connection_timeout_ms": 5000
    }
  }
}

Configuration

FieldTypeDefaultDescription
serversstring[]["nats://localhost:4222"]NATS server URLs
prefixstringsockudo_adapter:Subject prefix
request_timeout_msnumber5000Request timeout in ms
usernamestring?nullNATS username
passwordstring?nullNATS password
tokenstring?nullNATS auth token
connection_timeout_msnumber5000Connection timeout in ms
nodes_numbernumber?nullExpected cluster node count (manual)

Authentication

NATS supports username/password or token auth:

NATS_SERVERS=nats://nats1:4222,nats://nats2:4222
NATS_USERNAME=sockudo
NATS_PASSWORD=secret
# or
NATS_TOKEN=my-auth-token

Node Counting

NATS does not support automatic subscriber counting like Redis. If you need accurate node counts (for request timeout calculation), set nodes_number manually:

{
  "adapter": {
    "nats": {
      "nodes_number": 3
    }
  }
}

Presence Synchronization

Horizontal adapters maintain a cluster-wide presence registry for presence channels. Each node tracks presence data from all other nodes.

How It Works

  1. Join: When a user joins a presence channel, the node broadcasts a PresenceMemberJoined message to all nodes.
  2. Leave: On disconnect or channel leave, PresenceMemberLeft is broadcast.
  3. New node detection: When a heartbeat is received from a previously unknown node, existing nodes send a PresenceStateSync with their full presence data.
  4. Conflict resolution: Each presence operation carries a sequence number. Higher sequence numbers win during concurrent updates.

Data Structure

When a node dies, the cluster health system detects it and cleans up all its presence entries in O(1) (removal of entire node key).

Cluster Health

Cluster health monitoring detects dead nodes and cleans up their state. It is enabled by default when using a horizontal adapter.

How It Works

  1. Heartbeat: Each node periodically broadcasts a heartbeat with its timestamp.
  2. Dead node detection: A periodic task checks for nodes whose last heartbeat exceeds the timeout.
  3. Leader election: The node with the lowest node_id becomes the cleanup leader.
  4. Cleanup (leader only): The leader removes the dead node's presence data and broadcasts a NodeDead message. Follower nodes clean their local registries upon receiving this message.

Configuration

{
  "cluster_health": {
    "enabled": true,
    "heartbeat_interval_ms": 10000,
    "node_timeout_ms": 30000,
    "cleanup_interval_ms": 10000
  }
}
FieldDefaultDescription
enabledtrueEnable cluster health monitoring
heartbeat_interval_ms10000Heartbeat broadcast interval
node_timeout_ms30000Node considered dead after this
cleanup_interval_ms10000How often to check for dead nodes
heartbeat_interval_ms must be at most node_timeout_ms / 3 to avoid false positive dead node detection.

Environment Variables

CLUSTER_HEALTH_ENABLED=true
CLUSTER_HEALTH_HEARTBEAT_INTERVAL=10000
CLUSTER_HEALTH_NODE_TIMEOUT=30000
CLUSTER_HEALTH_CLEANUP_INTERVAL=10000

Adapter-Level Settings

These settings apply to all horizontal adapters.

SettingEnv VarDefaultDescription
Buffer multiplierADAPTER_BUFFER_MULTIPLIER_PER_CPU64Concurrent operations per CPU core
Socket countingADAPTER_ENABLE_SOCKET_COUNTINGtrueTrack socket counts across cluster

Socket counting aggregates connection counts across all nodes. Disable it if you don't need cluster-wide counts and want to reduce cross-node requests:

ADAPTER_ENABLE_SOCKET_COUNTING=false

Performance Optimizations

The horizontal adapter layer includes several automatic optimizations:

  • Single-node detection: When only one node is detected (via cluster health), cross-node requests are skipped entirely.
  • Delta compression: Pre-computes deltas at the broadcast level and reuses them for all local sockets, avoiding redundant computation.
  • Concurrency control: A broadcast semaphore limits parallel socket sends to prevent overwhelming the event loop. Sockets are processed in adaptive chunks (1-8 based on socket count).
  • Connection reuse: All adapters use persistent connections with cheap cloning for concurrent operations.
  • Adaptive timeouts: Fast queries (existence checks) use short timeouts, while full queries (member lists) use the configured timeout.

Adapter Comparison

FeatureLocalRedisRedis ClusterNATS
Horizontal scalingNoYesYesYes
Auto-reconnectionN/AAutomatic (5 retries)Manual with backoffBuilt-in (NATS lib)
Node counting1PUBSUB NUMSUBPUBSUB NUMSUBManual (nodes_number)
Sharded pub/subN/ANoYes (Redis 7.0+)N/A
TLS supportN/AVia REDIS_URLVia use_tlsVia NATS URL scheme
AuthN/APassword/ACLPassword/ACLUser/pass or token
Health checkN/APINGPINGConnection state

Cache / Queue / Rate Limiter

These backends are configured independently from the adapter:

BackendDriversPurpose
Cachememory, redis, redis-cluster, noneApp manager caching, channel state
Queuememory, redis, redis-cluster, sqs, noneWebhook delivery
Rate limitermemory, redis, redis-cluster, noneAPI and WebSocket rate limiting

Use shared Redis-based backends when you need cluster-wide consistency (e.g., rate limiting across all nodes).

Build Features

Compile only the adapters you need for smaller binaries:

# Local only (default)
cargo build

# Redis adapter
cargo build --features redis

# Redis Cluster adapter
cargo build --features redis-cluster

# NATS adapter
cargo build --features nats

# Multiple backends
cargo build --features "redis,nats,postgres"

# Everything
cargo build --release --features full

Typical Production Configurations

Redis (most common)

{
  "adapter": { "driver": "redis" },
  "cache": { "driver": "redis" },
  "queue": { "driver": "redis" },
  "rate_limiter": { "driver": "redis", "enabled": true }
}

Redis Cluster (large scale)

{
  "adapter": {
    "driver": "redis-cluster",
    "cluster": {
      "use_sharded_pubsub": true
    }
  },
  "cache": { "driver": "redis-cluster" },
  "queue": { "driver": "redis-cluster" }
}

NATS

{
  "adapter": {
    "driver": "nats",
    "nats": {
      "servers": ["nats://nats1:4222", "nats://nats2:4222"],
      "nodes_number": 3
    }
  }
}
Copyright © 2026