Files

gavrielc 47950671fa docs: add v1→v2 action-items analysis + SDK signal probe tool

- docs/v1-vs-v2/: full v1→v2 regression analysis (SUMMARY + 21 per-module
  docs + ACTION-ITEMS rollup with decisions + timezone recreation spec).
- container/agent-runner/scripts/sdk-signal-probe.ts: empirical harness
  used to characterise Claude Agent SDK event/hook/stderr timing for the
  stuck-detection design in item 9.
- src/channels/chat-sdk-bridge.ts: document the conversations Map staleness
  in a code comment; fix deferred to when dynamic group registration lands
  (ACTION-ITEMS item 17).

No runtime behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-20 01:00:04 +03:00

5.3 KiB

Raw Blame History

container index (agent-runner entry): v1 vs v2

Scope

v1: container/agent-runner/src/v1/index.ts (736 LOC) — monolithic: arg parsing, IPC polling, SDK integration, output marshaling
v2 (split): container/agent-runner/src/index.ts (124 LOC) + poll-loop.ts (436 LOC) + destinations.ts (118 LOC) + formatter.ts (228 LOC) + db/*.ts + providers/*.ts

Startup sequence diff

Step	v1 (IPC)	v2 (SQLite poll)
Arg parsing	stdin JSON via `readStdin()` (v1:105-115)	env vars: `AGENT_PROVIDER`, `NANOCLAW_*` (v2 index.ts:44-51)
Env setup	`sdkEnv` + `CLAUDE_CODE_AUTO_COMPACT_WINDOW` (v1:626-629)	same, delegated to provider (index.ts:109)
DB open	— (IPC files only)	inbound.db (RO) + outbound.db (RW) + `session_state` table
MCP server config	hardcoded nanoclaw server (v1:477-486)	same + `NANOCLAW_MCP_SERVERS` env for additional (index.ts:94-104)
Message loop	`waitForIpcMessage()` polling (v1:350-366)	`poll-loop.ts:62+` `getPendingMessages()` every 1000ms idle / 500ms active
Provider	Claude SDK direct	provider abstraction factory (`providers/factory.ts`, supports claude/mock/custom)
Message stream	`MessageStream` iterable (v1:71-103)	same pattern in `providers/claude.ts:51-80`
System prompt	manual CLAUDE.md load + hardcoded destinations (v1:416-420)	`buildSystemPromptAddendum()` from inbound.db destinations (`destinations.ts:76-117`)
Query execution	`runQuery()` with IPC polling during query (v1:374-545)	`processQuery()` polls messages_in + `provider.query()` (`poll-loop.ts:259-319`)
Session resumption	sessionId on stdin + `resumeAt` tracking	`getStoredSessionId()` from outbound.db; cleared on `/clear` admin command
Shutdown	stdout output markers + exit(1) on error	no markers; logs errors; host manages lifecycle
Heartbeat	—	file touch at `SESSION_HEARTBEAT_PATH` on each result

Capability map

v1 behavior	v2 location	Status	Notes
Parse prompt/session/group/chat/etc. from stdin	env + inbound.db	kept
Env injection (ANTHROPIC_BASE_URL, proxy)	passed to provider.query() (index.ts:109)	kept
Stdin JSON parsing	—	removed
IPC file polling	`messages_in` table	modernized	Same semantics, DB-backed
IPC `_close` sentinel	implicit (process killed by host)	simplified
Output wrapping markers	writes to `messages_out`	removed
Session archiving PreCompact hook	`providers/claude.ts` hook	kept
Session resumption by ID	`getStoredSessionId()` (poll-loop.ts:51)	persisted	Survives container restart
Scheduled task script execution	`task-script.ts:applyPreTaskScripts()` (poll-loop.ts:159)	kept
Command filtering (`/help`, `/login`)	`categorizeMessage()` + filtered set (formatter.ts:14, poll-loop.ts:95-100)	enhanced	Explicit categories
Admin commands (`/clear`, etc.)	`categorizeMessage` + `NANOCLAW_ADMIN_USER_IDS` gate (poll-loop.ts:102-131)	kept	Explicit admin role from env
Destination routing `to=`	`destinations` table + `dispatchResultText()` (poll-loop.ts:350-432)	modernized	Named destinations instead of raw JIDs
Multi-destination message blocks	`MESSAGE_RE` regex (poll-loop.ts:350-414)	kept
Tool allowlist	`providers/claude.ts:19-39`	kept
MCP server setup	index.ts:81-104	kept + extensible
`@-syntax` additional dirs	`/workspace/extra/*` discovered at startup (index.ts:64-74)	kept
Global CLAUDE.md	SDK preset append (index.ts:56-58)	kept
Idle stream termination	—	new (IDLE_END_MS = 20s prevents zombies)
Admin user ID prefixing (chat-sdk)	explicit `channel_type:` prefix (formatter.ts:58-66)	new
Processing ACK	new	prevents re-processing on container restart
Message kind formatting	`formatMessages()` (formatter.ts)	enhanced	Routes by kind: chat/task/webhook/system

Missing from v2

None of v1's core capabilities dropped. Notes on format/protocol shifts:

Stdout markers removed — host now parses messages_out table instead of stdout
Stdin protocol gone — follow-up messages via messages_in table
Script-phase fast exit removed — v1 could skip container entirely if wakeAgent=false; v2 gates message processing but container keeps polling (slightly more idle cost)

Behavioral discrepancies

Idle timeout: v1 had no query-level timeout → zombies possible. v2 ends stream after 20s with no SDK events
Resume: v1 re-read sessionId from stdin each run; v2 persists in session_state across restarts
Admin gating: v1 passed everything through; v2 categorizes + admin-gates /clear etc.
Destination naming: v1 raw JID; v2 human names from destinations table
Poll cadence: v2 dual-rate — 1000ms idle, 500ms active (CPU efficiency + responsiveness)
Message kind routing: v1 uniform; v2 distinguishes chat/chat-sdk/task/webhook/system with per-kind formatting

Worth preserving?

v1 should remain historical reference only. v2 strictly supersedes:

DB-backed state survives restarts
Provider abstraction allows non-Claude agents
Dynamic destinations from inbound.db
Session invalidation detection + processing ACK idempotence
Dual poll rate + idle termination prevent pathological query hangs

No merge-back candidates identified.

5.3 KiB Raw Blame History