Operations · 2026-05-19

Running Realtime Infrastructure Across Nodes

How to reason about adapters, duplicate delivery, recovery windows, and push queues in a Sockudo cluster.

Single-node realtime systems are easy to reason about because local memory looks authoritative. Multi-node systems are different. Every useful guarantee has to survive retries, disconnects, duplicate messages, node death, and backend latency.

Sockudo clusters should share the dependencies that carry state across nodes:

Duplicate delivery is normal

Distributed transports retry. Clients and consumers should use message_id, application IDs, and idempotency keys to make duplicates harmless.

Recovery is a window

Replay buffers are bounded by size and TTL. They protect reconnects, not indefinite offline periods. Durable history and push notifications solve different product needs and should be configured separately.

Push changes the capacity model

Push does not scale like WebSocket fanout. Provider rate limits, platform payload sizes, queue depth, credentials, and delivery callbacks matter. A production cluster should alert on push admission, dispatch, provider failures, and status retention just like it alerts on WebSocket publish failures.