Token streaming and rollup
AI Transport append rollup settings for high-rate token streams.
Sockudo persists every versioned message.append operation exactly as received. Append rollup only changes WebSocket egress: the version store, durable history, recovery, and latest-message reads still observe the unrolled mutation log.
Enable the ai-transport Cargo feature and runtime AI Transport config before using rollup:
[ai_transport]
enabled = true
[[ai_transport.channels]]
prefix = "private-ai-"
[ai_transport.rollup]
enabled = true
default_window_ms = 40
min_window_ms = 0
max_window_ms = 500
orphan_ttl_ms = 1000
wheel_tick_ms = 5
shards = 64The WebSocket query parameter append_rollup_window is accepted only for Protocol V2 and must be one of the locked values below. Current server v1 semantics use server-wide per (app_id, channel, message_serial) coalescing at [ai_transport.rollup].default_window_ms; the query parameter is validated for SDK compatibility but does not allocate per-subscriber rollup state.
append_rollup_window | Behavior |
|---|---|
0 | Disable coalescing; every append fan-out counts individually. |
20 | Short coalescing window for lower added latency. |
40 | Default; caps a steady stream near 25 deliveries per second. |
100 | Heavier coalescing for overloaded subscribers. |
500 | Maximum supported coalescing window. |
For a stream, the first append is delivered immediately. Later appends inside the fixed window are held and the latest append wins on egress. A terminal append with extras.ai.transport.status of complete or cancelled, plus message.update or message.delete, flushes pending append state before delivering the terminal operation.
Prometheus exposes low-cardinality rollup metrics per app:
sockudo_appends_received_totalsockudo_appends_delivered_totalsockudo_rollup_ratiosockudo_active_streamssockudo_flush_latency
Billing, rate limits, durable history, version storage, webhooks, and push accounting count original create/update/delete/append requests. Rollup metrics count both original append receipt and coalesced egress delivery, so operators can see the reduction ratio without hiding ingress load.
Important edge cases:
- A terminal append flushes pending state before delivering the terminal operation.
message.updateandmessage.deleteflush pending append state first.- A late subscriber reconstructs state from history/version storage, not from rollup buffers.
- A node that sees a stale stream after
orphan_ttl_msclaims it through shared cache and appends a normal cancellation update; the latest version is re-read before mutation to avoid cancelling a stream that advanced on another node. append_rollup_window=0disables egress coalescing but does not change persistence or rate-limit behavior.
Run scripts/ai-rollup-load-test.mjs against a local server to exercise a synthetic 200 tok/s stream and inspect delivery rate, final content, and append mutation latency.