Files

gavrielc 47950671fa docs: add v1→v2 action-items analysis + SDK signal probe tool

- docs/v1-vs-v2/: full v1→v2 regression analysis (SUMMARY + 21 per-module
  docs + ACTION-ITEMS rollup with decisions + timezone recreation spec).
- container/agent-runner/scripts/sdk-signal-probe.ts: empirical harness
  used to characterise Claude Agent SDK event/hook/stderr timing for the
  stuck-detection design in item 9.
- src/channels/chat-sdk-bridge.ts: document the conversations Map staleness
  in a code comment; fix deferred to when dynamic group registration lands
  (ACTION-ITEMS item 17).

No runtime behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-20 01:00:04 +03:00

10 KiB

Raw Blame History

v1 → v2 Deep Dive: Aggregate Summary

Per-file deep-dives were produced for every file in src/v1/ and container/agent-runner/src/v1/. This document aggregates findings across all 21 modules.

Per-file docs

Topic	File	v1 source(s)
Configuration	config.md	`src/v1/config.ts`
Environment helpers	env.md	`src/v1/env.ts`
Types	types.md	`src/v1/types.ts`
Logger	logger.md	`src/v1/logger.ts`
Timezone	timezone.md	`src/v1/timezone.ts`
Database layer	db.md	`src/v1/db.ts`
Container runner	container-runner.md	`src/v1/container-runner.ts`
Container runtime + mounts	container-runtime.md	`src/v1/container-runtime.ts`, `mount-security.ts`
Group folder	group-folder.md	`src/v1/group-folder.ts`
Group queue	group-queue.md	`src/v1/group-queue.ts`
Host index	index-host.md	`src/v1/index.ts`
IPC (host + container)	ipc.md	`src/v1/ipc.ts`, `container/.../v1/ipc-mcp-stdio.ts`
Remote control	remote-control.md	`src/v1/remote-control.ts`
Router	router.md	`src/v1/router.ts` + `index.ts` routing
Sender allowlist	sender-allowlist.md	`src/v1/sender-allowlist.ts`
Session cleanup	session-cleanup.md	`src/v1/session-cleanup.ts`
Task scheduler	task-scheduler.md	`src/v1/task-scheduler.ts`
Channels	channels.md	`src/v1/channels/*`
Agent-runner entry	container-index.md	`container/.../v1/index.ts`
Agent-runner MCP tools	container-mcp-tools.md	`container/.../v1/mcp-tools.ts`
Formatting test (orphan)	formatting-test.md	`src/v1/formatting.test.ts`

The big shift

v2 rewrote the fundamental transport between host and container. The one-line version:

v1 = IPC files + stdin/stdout + in-memory GroupQueue + polling message loop. v2 = two SQLite DBs per session + event-driven routing + 60s host sweep.

Everything else flows from that. Removing IPC forced a rewrite of the router, the container-runner, the agent-runner entry, and the MCP-tool bridge. The 60s sweep absorbed the task scheduler, session cleanup, and pending-message recovery. The entity model (users/roles/messaging_groups) replaced the flat sender allowlist and chat-level config. Provider abstraction + Chat SDK bridge replaced hardcoded Claude SDK + per-channel adapters.

Net LOC: v1 (~7.4k host + monolithic container-runner) → v2 (~5.5k host, split modules). Fewer lines, cleaner boundaries, more coverage.

What's kept (identical or near-identical)

timezone.ts — byte-identical
group-folder.ts — byte-identical validation; v2 adds group-init.ts for filesystem scaffold
container-runtime.ts — nearly identical (only logger import swapped)
mount-security.ts — same structure, one field removed (see regressions)
config.ts / env.ts — same structure, same .env surface; several constants now dead code
logger.ts — same levels/colors/routing, but API shape changed (message-first instead of data-first)
MCP send_message tool — kept + enhanced with named destinations

What's new in v2

Two-DB session model (inbound.db + outbound.db) with even/odd seq parity, journal_mode=DELETE for cross-mount visibility
Entity model — users, user_roles (owner/admin/scoped), agent_group_members, messaging_groups, messaging_group_agents, user_dms (cold-DM cache)
Host sweep (60s) — absorbs scheduler, cleanup, pending-message recovery, recurrence firing, stale detection, orphan cleanup
Chat SDK bridge — unifies Discord/Slack/Teams/other adapters through @anthropic-ai/chat
Provider abstraction — default Claude + opt-in OpenCode etc. via providers branch
OneCLI integration — credential gateway + approval flow (src/onecli-approvals.ts)
16 new MCP tools — scheduling (6), interactive (2), self-mod (3), agent mgmt (1), message manipulation (3), plus enhanced send_message
Heartbeat file mtime — replaces IPC liveness
Session persistence — session ID survives container restarts
Dual-rate polling — 1000ms idle / 500ms active inside container
Idle stream termination — 20s timeout prevents zombie queries
Processing ACK — reverse channel (outbound → inbound) for idempotence
Migration system — 9 numbered migrations vs v1's ad-hoc ALTERs
Webhook server (new for HTTP-based channels)
Container typing indicator refresh via delivery

What's removed (deliberately)

IPC transport (files, stdin/stdout JSON, MCP-over-stdio bridge) — replaced by DB polling
GroupQueue in-memory state machine — serialization via messages_in.status
Output markers (---NANOCLAW_OUTPUT_START/END---) — results land in messages_out
State persistence (router_state, lastAgentTimestamp map) — each message is independent
Per-exit container log files — only logger.debug to host log
Flat sender allowlist (JSON config) — replaced by role-based access + unknown_sender_policy
Remote control subsystem (/remote-control command → spawned CLI)
IPC watcher (dynamic group-add while running)
task_runs audit table — no task execution log
Cron/interval task types as first-class entities — tasks are messages_in rows with kind='task' + recurrence
Stdin protocol for agent input — container reads from inbound.db

Regressions worth fixing (ranked)

HIGH priority

Trigger-rule matching in pickAgent (src/router.ts:198 TODO). Without this, a messaging group wired to multiple agents fires ALL of them on every message. Schema (messaging_group_agents.trigger_rules) is ready; the check is ~10 lines. Likely broken-by-default for multi-agent setups.
nonMainReadOnly mount isolation removed (src/mount-security.ts). Non-main/shared agent groups can now mount read-write on any path the allowlist permits. v1 enforced read-only-for-non-main regardless of allowlist. Security regression for multi-tenant setups. Restore: add field + restore isMain param flow.
Pending-message recovery on startup (src/v1/index.ts:465-473). v1 explicitly scanned for unprocessed messages on restart. v2 relies on the sweep to notice. Likely works in practice, but worth a test: kill container mid-message, restart host, verify redelivery within ≤5s.

MEDIUM priority

response_scope enforcement (messaging_group_agents.response_scope stored but unused). Values 'all' | 'triggered' | 'allowlisted' are saved but nothing reads them.
request_approval flow for unknown senders (src/router.ts:295 TODO). unknown_sender_policy='request_approval' is scaffolded but doesn't actually produce an approval card.
Per-group container timeout. v1's containerConfig.timeout override is gone; all groups share IDLE_TIMEOUT. Slow-but-healthy agents get killed with fast agents' timeout.
Container streaming output. v1's marker-based pre-completion delivery is gone. v2 must wait for outbound.db poll. Latency-sensitive UX regresses.
Per-exit container logs. v1 wrote timestamped per-exit log files with full I/O + mounts + stderr. v2 only has logger.debug. Zero-cost on success, high-value on crash. Restore at least for non-zero exit.
Explicit container kill on stale detection. v2's sweep marks messages for retry but doesn't stop the stale container. Only cleanupOrphans() at startup removes them. Add stopContainer() when heartbeat stale AND processing stuck.
Host-level retry with backoff on agent error. v1 had MAX_RETRIES=5 + exp. backoff on processGroupMessages failure. v2 only retries on stale-heartbeat. Explicit agent-error retry could close the gap.

LOW priority

Process ID in logger output — lost multi-process debugging info
Task dedup via unique (kind, series_id) index — v2 can have two pending rows with same series; best-effort via atomic status update
Silent-drop mode for noisy senders — v1's mode:'drop' had a use case; orthogonal to privilege
Remote control — decide: restore as opt-in skill or document as removed
Dead config constants (POLL_INTERVAL, SCHEDULER_POLL_INTERVAL, IPC_POLL_INTERVAL) — delete from src/config.ts
Configurable retention thresholds (STALE_THRESHOLD_MS, MAX_TRIES) — move from constants to config.ts
Dynamic group-add (IPC watcher equivalent) — probably not worth; document that restart is required

Things kept as test-only regression risk

The orphan src/v1/formatting.test.ts asserted behaviors that aren't fully exercised in v2:

Timezone-aware formatted timestamps — v1 emitted locale strings ("Jan 1, 2024, 1:30 PM"); v2 emits UTC HH:MM
<context timezone="..."/> header — gone
reply_to="<id>" attribute — v2 only stores sender name + truncated preview
Trigger-pattern unit tests — no direct equivalent (logic moved to DB but isn't tested at the router level)
Internal tag stripping tests — no isolated tests in agent-runner

These are specs worth porting to v2 tests once trigger matching is implemented.

Files entirely gone in v2

src/v1/ipc.ts + src/v1/ipc-auth.test.ts — IPC is dead
container/.../v1/ipc-mcp-stdio.ts — MCP-over-stdio bridge dead
src/v1/group-queue.ts — serialization via DB
src/v1/session-cleanup.ts — merged into host-sweep.ts
src/v1/task-scheduler.ts — merged into host-sweep.ts + system actions in delivery.ts
src/v1/remote-control.ts — feature removed
src/v1/sender-allowlist.ts — entity model supersedes

Net architectural assessment

v2 is strictly simpler, more consistent, and more robust in its happy path. The remaining TODOs (trigger matching, response_scope, request_approval) reflect scaffolding that was checked in ahead of the feature — none are deep design issues. The one actual regression is nonMainReadOnly mount isolation; it was a defense-in-depth feature and deserves to come back. The removal of per-exit container logs and streaming output markers are judgment calls that traded observability for simplicity — both can be restored cheaply if needed.

No file in v1 contains a behavior that v2 is architecturally unable to express. The outstanding work is feature-completion, not architecture.

10 KiB Raw Blame History