Token streaming and rollup

Sockudo persists every versioned message.append operation exactly as received. Append rollup only changes WebSocket egress: the version store, durable history, recovery, and latest-message reads still observe the unrolled mutation log.

Enable the ai-transport Cargo feature and runtime AI Transport config before using rollup:

[ai_transport]
enabled = true

[[ai_transport.channels]]
prefix = "private-ai-"

[ai_transport.rollup]
enabled = true
default_window_ms = 40
min_window_ms = 0
max_window_ms = 500
orphan_ttl_ms = 1000
wheel_tick_ms = 5
shards = 64

The WebSocket query parameter append_rollup_window is accepted only for Protocol V2 and must be one of the locked values below. Current server v1 semantics use server-wide per (app_id, channel, message_serial) coalescing at [ai_transport.rollup].default_window_ms; the query parameter is validated for SDK compatibility but does not allocate per-subscriber rollup state.

`append_rollup_window`	Behavior
`0`	Disable coalescing; every append fan-out counts individually.
`20`	Short coalescing window for lower added latency.
`40`	Default; caps a steady stream near 25 deliveries per second.
`100`	Heavier coalescing for overloaded subscribers.
`500`	Maximum supported coalescing window.

For a stream, the first append is delivered immediately. Later appends inside the fixed window are held and the latest append wins on egress. A terminal append with extras.ai.transport.status of complete or cancelled, plus message.update or message.delete, flushes pending append state before delivering the terminal operation.

Prometheus exposes low-cardinality rollup metrics per app:

sockudo_appends_received_total
sockudo_appends_delivered_total
sockudo_rollup_ratio
sockudo_active_streams
sockudo_flush_latency

Billing, rate limits, durable history, version storage, webhooks, and push accounting count original create/update/delete/append requests. Rollup metrics count both original append receipt and coalesced egress delivery, so operators can see the reduction ratio without hiding ingress load.

Important edge cases:

A terminal append flushes pending state before delivering the terminal operation.
message.update and message.delete flush pending append state first.
A late subscriber reconstructs state from history/version storage, not from rollup buffers.
A node that sees a stale stream after orphan_ttl_ms claims it through shared cache and appends a normal cancellation update; the latest version is re-read before mutation to avoid cancelling a stream that advanced on another node.
append_rollup_window=0 disables egress coalescing but does not change persistence or rate-limit behavior.

Run scripts/ai-rollup-load-test.mjs against a local server to exercise a synthetic 200 tok/s stream and inspect delivery rate, final content, and append mutation latency.