Files
nanoclaw/docs/v1-vs-v2/router.md
gavrielc 47950671fa docs: add v1→v2 action-items analysis + SDK signal probe tool
- docs/v1-vs-v2/: full v1→v2 regression analysis (SUMMARY + 21 per-module
  docs + ACTION-ITEMS rollup with decisions + timezone recreation spec).
- container/agent-runner/scripts/sdk-signal-probe.ts: empirical harness
  used to characterise Claude Agent SDK event/hook/stderr timing for the
  stuck-detection design in item 9.
- src/channels/chat-sdk-bridge.ts: document the conversations Map staleness
  in a code comment; fix deferred to when dynamic group registration lands
  (ACTION-ITEMS item 17).

No runtime behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:00:04 +03:00

68 lines
5.4 KiB
Markdown

# router: v1 vs v2
## Scope
- v1 (distributed across): `src/v1/index.ts` (startMessageLoop, trigger check), `group-queue.ts` (concurrency, retry), `router.ts` (outbound formatting, 44 LOC), `sender-allowlist.ts` (drop/allow)
- v2: `src/router.ts` (317 LOC), `src/session-manager.ts` (346 LOC), `src/container-runner.ts`, `src/access.ts`, `src/db/messaging-groups.ts` (trigger_rules schema)
## Routing-flow diff
### v1 (polling, per-group)
1. Channel receives message → `onMessage` → store in DB
2. Sender allowlist drop-mode filter → discard denied
3. `startMessageLoop` polls every POLL_INTERVAL
4. For each group: lookup channel (`findChannel` O(n)), check trigger requirement, load allowlist, scan for pattern, skip if no trigger
5. Pull messages since `lastAgentTimestamp`, XML-format with tz context
6. If active container: write JSON to IPC file; else `enqueueMessageCheck(groupJid)` → GroupQueue
7. Retry on failure (up to 5, exp. backoff); rollback cursor on agent error
### v2 (event-driven, entity model)
1. Channel adapter → `routeInbound(platformId, threadId, message)`
2. Apply thread policy (`supportsThreads` → collapse to null)
3. Resolve `messaging_group` (lookup or auto-create)
4. Extract sender → upsert `users` row → `userId` (namespaced `channel_type:handle`)
5. Lookup wired agent groups via `messaging_group_agents`; drop if none
6. `pickAgent` (highest priority; **trigger_rules matching is TODO**)
7. `enforceAccess`: owner/admin/member gate; `unknown_sender_policy: strict | request_approval | public`
8. `resolveSession` by `session_mode` (`agent-shared`/`shared`/`per-thread`)
9. `insertMessage` to session `inbound.db`, write session_routing + destinations
10. `startTypingRefresh`; `wakeContainer(session)` (dedup by `activeContainers` + `wakePromises`)
11. Container polls inbound.db, writes outbound.db; host `delivery.ts` polls and sends via adapter; `stopTypingRefresh` on container exit
## Capability map
| v1 behavior | v2 location | Status | Notes |
|---|---|---|---|
| Sender allowlist drop/allow modes | — | **removed** | Replaced by access gate + `unknown_sender_policy` |
| Group registration auto-creating folder on first message | `router.ts` auto-creates messaging_group; group folder via `group-init.ts` on wake | moved | Admin skill path for agent groups |
| Trigger pattern matching (`requiresTrigger`, `DEFAULT_TRIGGER`) | `messaging_group_agents.trigger_rules` JSON | **deferred** | Schema ready; `pickAgent` has TODO comment |
| `lastAgentTimestamp` cursor tracking | — | **removed** | All messages written immediately to inbound.db |
| IPC file polling (`inputDir`, `_close` sentinel) | — | **removed** | DB polling replaces |
| GroupQueue concurrency + waiting-groups | `container-runner.ts:42-82` `activeContainers` + `wakePromises` | reimplemented | Per-session not per-group |
| Task scheduler → enqueue to GroupQueue | host-sweep due-wake + delivery system-actions | preserved | |
| Session reuse rules (session mode) | `session-manager.ts` (agent-shared/shared/per-thread) | **enhanced** | Explicit per-wiring |
| Remote control command interception | — | **removed** | |
| Idle timeout + stdin close | `container-runner.ts:135-140` `resetIdle` | kept | Heartbeat instead of stdin |
| Host-level retry on agent error | — | **removed** | Container is authority; host sweep retries stale only |
| Typing indicator | `delivery.ts:startTypingRefresh` | kept | Gated on heartbeat |
## Missing from v2
1. **Trigger-rule matching**`router.ts:198` TODO. Currently every wired agent fires on every message (only priority breaks ties). **Without this, multi-agent wirings don't work as intended.**
2. **Sender drop mode** — v1's silent-drop for noisy users is gone. v2 only has binary allow/deny.
3. **Cursor / state recovery** — v2 writes immediately to DB. If container crashes mid-output, no host-level dedup guarantees (beyond `messages_in.id` PK)
4. **Remote control** — v1 intercepted `/remote-control` commands pre-storage; no v2 equivalent
5. **Host-level retry with backoff on agent error** — v1 had MAX_RETRIES=5 + exp. backoff on `processGroupMessages`; v2 only retries on stale heartbeat detection
## Behavioral discrepancies
1. **Trigger evaluation**: v1 eager (skip group until trigger arrives, accumulate context); v2 TODO — once implemented, likely drops non-trigger messages at ingest (semantic change)
2. **Session reuse**: v1 single session per group; v2 multiple (one per thread on threaded platforms)
3. **Access control timing**: v1 pre-storage (cheap drop); v2 post-sender-resolution (requires `users` upsert)
4. **Unknown channels**: v1 silently ignored; v2 auto-creates `messaging_groups` row — no data loss but orphaned rows possible
5. **Formatting**: v1 host formats with tz + cursor-based message subset; v2 pushes raw JSON to inbound.db, container formats from full session history
## Worth preserving?
1. **Trigger rule matching (HIGH priority)** — schema is ready; 10-line implementation in `pickAgent`. Currently broken-by-default for multi-agent wirings
2. **Sender drop mode (MEDIUM)** — add `(agent_group_id, sender_pattern)` drop table; orthogonal to privilege
3. **State recovery (LOW)** — add unique constraint on `messages_in.id` if not already; v2's model is simpler + more robust
4. **Host-level retry on agent error (MEDIUM)** — currently only stale containers retry. Explicit container-exit-error retry could be valuable
5. **Remote control** — decide: restore as opt-in skill or document deletion