# v1 → v2 Action Items Working doc for each finding from [SUMMARY.md](SUMMARY.md). Decisions were made one-by-one; this rollup summarizes the outcome. **Status legend**: `pending` · `discussing` · `decided` · `deferred` · `dropped` · `done` --- ## Rollup ### To implement (~800 LOC total, roughly) | # | Topic | LOC | Notes | |---|---|---|---| | 1 | Engage modes + sender scope + accumulate/drop + fan-out + tool blocklist | ~315 | DB migration drops `trigger_rules`/`response_scope`, adds `engage_mode`/`engage_pattern`/`sender_scope`/`ignored_message_policy` + `trigger` column on `messages_in`; router `pickAgents` fan-out; adapter-level gating via new hooks | | 5 | `request_approval` flow for unknown senders (default policy flips from `strict` to `request_approval`) | ~175 | New `pending_sender_approvals` table; reuses existing `pickApprover` + card infra | | 9 | Stuck detection (60s claim-age rule), heartbeat-based lifecycle, `max(30m, bash_timeout)` absolute ceiling, SDK tool blocklist (`AskUserQuestion`, `EnterPlanMode`, `ExitPlanMode`, `EnterWorktree`, `ExitWorktree`), remove `IDLE_TIMEOUT` setTimeout + `IDLE_END_MS` machinery | ~115 | Container state row for Bash timeout tracking | | 15 | Delete three dead config constants from `src/config.ts` | 3 | `POLL_INTERVAL`, `SCHEDULER_POLL_INTERVAL`, `IPC_POLL_INTERVAL` | | 18 | Timezone + formatting recreation — port v1 bit-for-bit (`formatLocalTime`, `` header, `reply_to` + `` XML, `stripInternalTags`) + scheduling tool TZ normalization + cron TZ parsing | ~195 (75 prod + 120 tests) | Full spec in [timezone-formatting-v1-recreation.md](timezone-formatting-v1-recreation.md) | ### Deferred (wait for trigger) | # | Topic | Trigger | |---|---|---| | 2 | `nonMainReadOnly` mount isolation | If multi-tenant / untrusted-group use ever surfaces. In the meantime, mount-declaration skill must explicitly prompt RO/RW when added | | 3a | End-to-end recovery test | When next touching `host-sweep.ts` / `index.ts` startup | | 14 | Remote control subsystem | When someone needs it. Opt-in skill, provider-specific (Claude SDK only) | | 17 | Dynamic group-add (bridge conversations cache refresh) | When implementing dynamic group registration feature. Code comment added at `chat-sdk-bridge.ts:73` | ### Dropped (won't implement / not-a-regression) | # | Topic | Why | |---|---|---| | 3 | Explicit pending-message recovery | Working as designed via sweep's immediate first tick + `cleanupOrphans` | | 4 | `response_scope` enforcement | Folded into item 1 migration (column deleted, values backfilled) | | 6 | Per-group container timeout | Not a regression — v1's hard-kill was worse than v2's keep-alive-after-idle | | 7 | Container streaming output markers | Replaced by `send_message` MCP tool; latency ~1s is fine for chat UX | | 8 | Per-exit container log files | Underlying info still recoverable (session DBs, heartbeat mtime, exit code) | | 10 | Host-level retry on agent error | Folded into item 9's kill + sweep-reset loop | | 11 | Process ID in logger output | Single host process; container stderr already tagged with `agentGroup.folder` | | 12 | Task dedup via unique series_id index | Recurrence logic is structurally dedup-safe; not a real issue | | 13 | Silent-drop sender mode | Admin can use `unknown_sender_policy='strict'` or remove from members instead | | 16 | Configurable retention thresholds | Personal-assistant scale; source constants are fine | ### Extras recorded during discussion - **1a**: Implementation-ordering plan for item 1 - **6a**: Remove `IDLE_END_MS` from `poll-loop.ts` (folded into item 9) - **3a**: E2E recovery test (deferred) ### Follow-up PRs (scoped, not in this branch) | # | Topic | Why later | |---|---|---| | 22 | Unknown-channel wiring approval flow (card to owner when bot receives inbound in an unwired messaging group) | Gap surfaced after item 5 landed — item 5's `request_approval` covers unknown senders but presupposes a wired channel. See item 22 for the full design. | --- ## HIGH ### 1. Trigger-rule matching in `pickAgent` **Finding**: `src/router.ts:246` TODO. Confirmed trigger filtering is non-functional end-to-end: `trigger_rules` JSON is parsed into `ConversationConfig` and passed to adapters, but the Chat SDK bridge never reads it, and router's `pickAgent` picks by priority only. `response_scope` on `messaging_group_agents` is stored but never enforced. Chat SDK bridge hard-subscribes on every mention (bridge:173) and every DM (bridge:189). **Status**: decided — design locked; implementation pending **Decision**: replace `trigger_rules` JSON + `response_scope` with four explicit orthogonal columns on `messaging_group_agents`. Fan out inbound messages to all matching agents (N containers for N agents). Adapter-level gating in the bridge. `sender_scope` enforcement moves to the permissions module. **Schema** (`messaging_group_agents`): ``` engage_mode TEXT NOT NULL DEFAULT 'mention' -- 'pattern' | 'mention' | 'mention-sticky' engage_pattern TEXT -- required when mode='pattern'; '.' = always sender_scope TEXT NOT NULL DEFAULT 'all' -- 'all' | 'known' ignored_message_policy TEXT NOT NULL DEFAULT 'drop' -- 'drop' | 'accumulate' ``` Drop `trigger_rules` + `response_scope`. **No per-wiring accumulate cap** — storage is unbounded. **Global wake cap** (not a column): reuse `MAX_MESSAGES_PER_PROMPT` in `src/config.ts` (already defined, default 10, currently dead code from v1). Pass to container via `NANOCLAW_MAX_MESSAGES_PER_PROMPT`. Container applies `ORDER BY seq DESC LIMIT $N` when pulling pending messages on wake. **Session DB** (`messages_in`): ``` trigger INTEGER NOT NULL DEFAULT 1 -- 0 = context-only, 1 = wake agent ``` Host's `countDueMessages` / wake logic gates on `trigger=1`. Container reads all messages for context regardless. **Decisions locked**: - `always` collapses into `pattern` with `engage_pattern='.'` (three modes total) - `mention` and `mention-sticky` are separate modes (stickiness is user-visible) - `pattern` is a JS regex string — applied as `new RegExp(pattern).test(text)` - Accumulate cap = last N messages, default 10 - Fan-out: each matching agent gets its own session + container - Per-channel defaults live in the setup/register flow, not in the schema: - DM → `pattern` with `.` - Threaded group → `mention-sticky` - Non-threaded group → `mention` **Routing flow** (future): 1. Inbound → resolve messaging_group → group-level `unknown_sender_policy` gate 2. `pickAgents()` returns all wired agents (not just priority 0) 3. For each agent: a. `sender_scope` check (permissions module) b. `engage_mode` check (regex / mention / mention-sticky) c. Matched → write with `trigger=1`, wake container d. Not matched + `accumulate` → write with `trigger=0`, don't wake (no cap — stored forever) e. Not matched + `drop` → skip On wake, container pulls pending messages with `ORDER BY seq DESC LIMIT MAX_MESSAGES_PER_PROMPT` so only the most recent N reach the prompt regardless of accumulation depth. **Adapter bridge**: - Read `conversations.get(channelId)` before `setupConfig.onInbound(...)` - For `pattern` mode: test regex - For `mention` / `mention-sticky`: require bot to be mentioned - Only `thread.subscribe()` when mode is `mention-sticky` (today it subscribes unconditionally) **LOC estimate**: ~315 (~255 prod + ~60 test) - schema migration + backfill: 40 - session DB `trigger` column: 25 - types + adapter contract: 20 - DB helpers (CRUD): 20 - host→adapter plumbing (including `NANOCLAW_MAX_MESSAGES_PER_PROMPT` env): 10 - router fan-out + gating: 70 - sender-scope in permissions module: 15 - Chat SDK bridge gating + subscribe control: 40 - container-side `LIMIT N` on pending-message pull: 5 - smart defaults in setup/register flow: 15 - tests: 60 (Note: earlier plan's "accumulate prune-to-N in router" is dropped — host doesn't prune. Cap is container-side only.) **Core vs module split**: - Core (`src/`): schema, engage_mode enforcement, pickAgents fan-out, bridge gating, `trigger` column, accumulate/drop - Permissions module: `sender_scope` enforcement (extends existing access gate). Default `sender_scope='all'` → no-op when permissions module absent **Next step**: new action item for implementation — see item 1a. --- ### 1a. Implementation plan for engage/sender/ignored columns **Status**: pending — ready to implement **Order**: (a) migration + backfill, (b) types + DB helpers, (c) router fan-out + gating, (d) bridge gating, (e) permissions sender_scope, (f) setup-flow defaults, (g) tests **Next step**: draft the migration + write up the PR plan when ready ### 2. `nonMainReadOnly` mount isolation **Finding**: `mount-security.ts` moved to `src/modules/mount-security/index.ts` during the refactor. `validateMount(mount)` no longer takes an `isMain` param; `MountAllowlist` has no `nonMainReadOnly` field. Regression is real. But v1's "main vs non-main" concept doesn't map cleanly to v2 — `agent_groups` has no `is_main` flag. **Status**: deferred **Decision**: do not restore the v1 flag. Trust admin-declared `readonly` values in `container.json`. The allowlist's per-root `allowReadWrite` + path gating is sufficient for the current threat model (personal-assistant use, single admin). If multi-tenant / untrusted auxiliary groups become a real use case, prefer framing B (add `agent_groups.mount_access: 'rw' | 'ro'` column) over resurrecting `isMain`. **Rationale**: v2 deliberately dropped the "main" concept. Reintroducing `isMain` to restore a defense-in-depth check that was designed for a different entity model is the wrong trade. Admin already has to opt-in twice (allowlist `allowReadWrite: true` + container.json `readonly: false`) to get RW — that's two deliberate keys. The v1 flag was a triple-check for a rare class of admin mistakes in a shared-infra setup. **Follow-up (required)**: when building the skill / guide / setup flow that lets admins declare additional mounts (e.g. self-customize, manage-mounts, or a new `/add-mount` skill), the flow **must clearly surface the RO vs RW distinction** to the admin — explicit choice, explicit warning when RW is selected, and default to RO. This replaces v1's automatic enforcement with informed consent. **Next step**: when the mount-declaration skill/flow is next touched, add explicit RO/RW prompting. Track as a sub-item if a skill exists yet. ### 3. Explicit pending-message recovery on startup **Finding**: v1 had a named `recoverPendingMessages()` function at startup. v2 relies on the host sweep. Verified: the recovery path exists and is correct — just renamed/relocated. **Status**: decided — working as designed, no code change **Current mechanism** (verified against tree): 1. `cleanupOrphans()` at startup kills any leftover container from the previous run (`src/index.ts:69`) 2. `startHostSweep()` runs its first sweep **immediately** — no 60s delay (`src/host-sweep.ts:38`) 3. Sweep per session: `syncProcessingAcks` → `countDueMessages` → `wakeContainer` if work pending and no container → `detectStaleContainers` resets stuck `processing` rows with backoff **Scenarios covered**: - Host crashed while container idle with pending messages → orphan cleanup + first sweep re-wakes - Host crashed mid-processing → stale detection resets `processing → pending`, next sweep wakes - Container crashed with host alive → heartbeat mtime catches it inside 10 min `STALE_THRESHOLD_MS` **Rationale**: the function got renamed (recovery → sweep) but the behavior is equivalent or better. Sweep is continuous; recovery used to be one-shot. **Next step**: see item 3a. --- ### 22. Unknown-channel wiring approval flow **Finding** (post-item-5 discussion): item 5's `request_approval` only fires when a messaging group already has agents wired. Three scenarios slip through to the earlier `no_agent_wired` structural-drop branch in `src/router.ts` and get silent-dropped with no signal to the owner: 1. A new user DMs the agent directly (the DM's messaging group auto-creates but has no wiring) 2. The agent is @mentioned in a group the admin hasn't registered 3. The agent is added to a new group and someone there addresses it In all three, the user sees no response and the owner has no signal anything happened. **Status**: decided — companion PR to item 5, scoped separately **Decision**: when the router hits `no_agent_wired` for a non-public event, **instead of silent-dropping, pick the owner and DM them a wiring card**. Two flavors depending on who triggered it: - **Sender IS an owner/admin** (the common "I just added the bot" case) → auto-wire IF exactly one agent group exists. Silent seamless flow. If multiple agent groups exist, fall through to the card so the owner picks. - **Sender is anyone else** (stranger, or owner in a multi-agent install) → deliver a card: - Title: `🔌 New channel — wire it?` - Body: ` is trying to reach you in on . Wire to which agent?` - Options: one button per existing `agent_groups` row, plus `➕ Create new` and `Ignore` **On approve (existing agent group)**: 1. `createMessagingGroupAgent(...)` with channel-kind defaults — DM→`pattern` + `'.'`, threaded group→`mention-sticky`, non-threaded group→`mention` (same defaults as `scripts/init-first-agent.ts`) 2. Replay the stored event via `routeInbound` (sender-approval pattern) 3. Delete pending row **On approve "Create new"**: [OPEN SCOPE] — needs name/folder input. Options: - Follow-up ask_question card asking for a name → auto-derive folder from slug → create group + wire - Or: skill-backed flow — the button dispatches to `/init-agent` or similar and the card just links out - Punt until implementation; mention in the PR brief that we'll decide when building **On ignore**: delete pending row; future attempts re-prompt fresh (consistent with sender-approval deny; no denial persistence). **Failure cases** (drop silently with log, don't leave a pending row): - No owner configured (fresh install) — same behaviour as sender-approval - No reachable DM for any owner/admin - Delivery adapter missing **New table**: ``` pending_channel_approvals ( id TEXT PRIMARY KEY, messaging_group_id TEXT NOT NULL REFERENCES messaging_groups(id), sender_identity TEXT, -- NULL when triggered by a non-identifiable event sender_name TEXT, original_message TEXT NOT NULL, -- JSON InboundEvent for replay approver_user_id TEXT NOT NULL, created_at TEXT NOT NULL, UNIQUE(messaging_group_id) -- one pending wiring per channel ) ``` Dedup is narrower than sender-approval's `(mg_id, sender_id)` — one pending wiring per channel, period. A second stranger writing into the same unwired channel piggybacks on the existing card instead of spawning a new one. Latest event replaces the stored `original_message` (we only replay one anyway, and latest is most useful). **Card action id prefix**: `nca-:` where value is `agent-group-` / `create` / `ignore`. Response handler lives in `src/modules/permissions/` alongside `handleSenderApprovalResponse`. **Owner-sender auto-wire logic**: ``` if sender is owner/admin AND getAllAgentGroups().length === 1: auto-wire to that group, replay event, done — no card else: deliver card ``` Don't auto-create a new agent group silently — always require a prompt for that. **LOC estimate**: ~145 - Migration + CRUD: 45 - Router hook before `no_agent_wired` drop → try channel approval: 15 - Owner-sender auto-wire fast path: 20 - Card delivery (scope `pickApprover(null)`; build buttons from `getAllAgentGroups()`): 25 - Response handler: 25 - Tests: 15 **Open scopes (flag at PR time)**: - "Create new" sub-flow — pick between follow-up card vs skill link - Do we also react to bot-added-to-group platform events? Simpler to stay lazy (first-message-triggered only). Platform lifecycle events are inconsistent across Discord/Slack/Telegram anyway. - Worth scanning the `channels` branch for any existing channel-lifecycle handlers that might conflict. **Next step**: open a follow-up PR off this branch once #1869 lands. --- ### 3a. End-to-end recovery test **Finding**: no test confirms the host-crash-restart scenario produces timely re-delivery. **Status**: pending — nice-to-have **Decision**: add an integration test: (1) write a pending message to inbound.db, (2) kill the host simulating crash, (3) start host, (4) assert container is woken and message processed within a bounded time (≤5s? ≤ one sweep interval). **Rationale**: the sweep logic is correct as written, but a regression here would be silent (messages just sit). Worth a safety net. **Next step**: draft test when touching `host-sweep.ts` or `index.ts` startup flow next. --- ## MEDIUM ### 4. `response_scope` enforcement **Finding**: `messaging_group_agents.response_scope` stores `'all' | 'triggered' | 'allowlisted'` but nothing reads it. **Status**: decided — folded into item 1 **Decision**: delete the `response_scope` column as part of the item-1 migration. Values backfill into the new explicit columns: | Old `response_scope` | New columns | |---|---| | `all` | `engage_mode='pattern'`, `engage_pattern='.'`, `sender_scope='all'` | | `triggered` | `engage_mode='mention'` (or `'pattern'` if legacy row has a pattern), `sender_scope='all'` | | `allowlisted` | `engage_mode` derived from `trigger_rules`, `sender_scope='known'` | **Rationale**: `response_scope` conflated two orthogonal axes (engage + sender). Splitting them is the whole point of item 1. **Next step**: ensure the item-1 migration includes the `response_scope` backfill in its UP step. ### 5. `request_approval` flow for unknown senders **Finding**: `unknown_sender_policy='request_approval'` is scaffolded in `src/modules/permissions/index.ts:100-108` but falls through to log-and-drop (explicit TODO comment). Current default is `'strict'`, which silently drops — user has no signal that their agent isn't responding. **Status**: decided — implement, keep simple **Decision**: implement full approval flow **and** flip the schema default from `'strict'` to `'request_approval'`. UX rationale: users wire their DM during setup; silent drops create a mystery when the agent doesn't respond. Public is unsafe. Approval default → admin sees a card and explicitly decides. **Flow**: 1. Unknown sender writes to wired messaging group with policy `'request_approval'` 2. If pending approval for `(messaging_group, sender)` already exists → drop this message silently (in-flight dedup; not persistence) 3. Otherwise: insert into `pending_sender_approvals` with original message + timestamp 4. `pickApprover(agent_group_id)` + `pickApprovalDelivery(approverUserId)` — existing machinery in `src/access.ts` 5. Deliver a card via adapter's `deliver()` with `Card`/`Actions`/`Button` primitives (already in chat-sdk-bridge) 6. Card action id prefix `nsa::` (parallels existing `ncq:` prefix for `ask_user_question`) 7. On `allow`: upsert `users` row, insert into `agent_group_members`, deliver stored message through normal routing (original timestamp preserved), cleanup pending row 8. On `deny`: cleanup pending row, drop the message. No denial persistence — next attempt from same sender triggers a fresh card. **No denial persistence** explicit rationale: personal-assistant scale, admin can switch policy to `'strict'` per messaging group if a hostile sender starts spamming. Avoids a new table column and a TTL config. **New table**: ``` pending_sender_approvals ( id TEXT PRIMARY KEY, messaging_group_id TEXT NOT NULL, agent_group_id TEXT NOT NULL, sender_identity TEXT NOT NULL, -- channel_type:handle sender_name TEXT, original_message TEXT NOT NULL, -- JSON of the InboundEvent approver_user_id TEXT NOT NULL, created_at TEXT NOT NULL, UNIQUE(messaging_group_id, sender_identity) -- enforces in-flight dedup ) ``` Dedicated (not reusing `pending_approvals` which is OneCLI-specific). **Reuse**: - `pickApprover` / `pickApprovalDelivery` in `src/access.ts` - Card rendering primitives already in `src/channels/chat-sdk-bridge.ts` - `onAction` dispatch — add the `nsa:` prefix handler alongside existing `ncq:` **LOC estimate**: ~175 - Migration + CRUD for `pending_sender_approvals`: 55 - `handleUnknownSender` request_approval branch + in-flight dedup: 25 - Host-side card dispatcher (pick approver + deliver card): 25 - `onAction` handler for `nsa:` prefix (allow/deny): 30 - Schema default flip + router auto-create update: 5 - Tests: 35 **Module location**: all in `src/modules/permissions/`. Module stays optional; default-allow fallback behavior when not loaded is preserved. **Open gotchas noted**: - The router's auto-create at `router.ts:123` currently hardcodes `'strict'` — change to omit the field so schema default applies - `pickApprover` may return null if no admin/owner exists (e.g. fresh install before first user registered). In that case: log + drop silently, treat as effectively `'strict'` for safety. Don't block message forever. **Scope boundary** (important): this item covers **unknown sender in a wired channel**. The parallel case — **unknown channel** (new DM / unwired group / bot-added-to-group) — short-circuits at the `no_agent_wired` structural drop before this flow ever runs. Tracked as item 22. **Next step**: implement alongside item 1 or as a follow-up. Same migration window is fine (one migration for engage columns + request_approval default change + new table). ### 6. Per-group container timeout **Finding**: v1's `containerConfig.timeout` override is gone. All groups share `IDLE_TIMEOUT`. Original framing (slow-but-healthy agents getting killed) was wrong — v1's `timeout` was a hard wall-clock kill on the whole spawn, totally different from v2's `IDLE_TIMEOUT` (keep-alive after last activity). V2's behavior is strictly better for long-running agents. **Status**: dropped — not a regression **Decision**: don't restore per-group timeout override. `IDLE_TIMEOUT=30min` global is the right model. If per-group idle tuning ever becomes useful it's ~15 LOC (new column, env injection at spawn) — small feature add, not a regression to repair. **Rationale**: v2 lets long-running agents finish; v1 would have hard-killed them at 30min. Current behavior is an improvement. **Next step**: see 6a. --- ### 6a. Remove IDLE_END_MS (container-side query idle termination) **Finding**: `container/agent-runner/src/poll-loop.ts:11` defines `IDLE_END_MS = 20_000`. Inside `processQuery`, a concurrent interval ends the active Agent SDK `query()` stream after 20s of SDK silence. Any SDK event (tool use, tool result, streamed text, new pushed message) resets the timer. This is a general "SDK silence detector," not specifically post-result. It fires any time: - Mid-work: slow tool call with no intermediate SDK events (`npm install`, `pytest`, long `WebFetch`, etc.) - Post-result: agent finished, stream waiting for potential follow-up - Any other pause in the SDK stream **Status**: decided — remove, pending SDK verification **Decision**: delete `IDLE_END_MS` and its setInterval check. Let the `query()` stream stay open as long as the container is active. Container's 30-min `IDLE_TIMEOUT` (host-side in `container-runner.ts`) is the single source of truth for "when to let go." **Rationale**: - When new messages arrive mid-stream, they push in via `query.push()` with no reconnect — stream-open is cheaper per-message than close-and-reopen - Closing early forces a reconnect + cold prompt cache for the next message - Container stays alive anyway; ending the stream doesn't save resources at the container level - `CLAUDE_CODE_AUTO_COMPACT_WINDOW=165000` already handles context window growth within a long-lived query - Anthropic API's own stream timeout will fire if needed; SDK should handle it transparently - Avoids the false-positive kill during legitimate slow tool calls (common case: agent running `npm install` gets cut off at 20s) **Caveat (must verify before removal)**: confirm Claude Agent SDK doesn't require explicit `query.end()` for prompt-cache commit or session-state persistence. Expected to be fine (SDK checkpoints per turn) but double-check docs / run a quick test where container idles with stream open, then processes a follow-up. **LOC estimate**: ~−15 (net deletion — remove constant, setInterval idle check, the `done` flag plumbing may also simplify) **Next step**: when implementing item 1's changes (or standalone), verify SDK behavior with stream-open-indefinite, then delete IDLE_END_MS block. Watch for any test assertions on it. ### 7. Container streaming output (marker-based pre-delivery) **Finding**: v1's `---NANOCLAW_OUTPUT_START/END---` markers enabled pre-completion delivery. v2's two paths (final-result `dispatchResultText` + mid-turn `send_message` MCP tool) both write to outbound.db; host polls every `ACTIVE_POLL_MS = 1000ms`. **Status**: dropped — not a regression **Decision**: v2's `send_message` MCP tool is the correct replacement for v1's marker-based streaming. Latency is ≤1s (poll interval), which is fine for chat UX. **Rationale**: v1's marker model required the agent and host to share a fragile state machine over stdout. v2 uses explicit tool calls and a DB surface — cleaner architecture, comparable latency, and control stays with the agent. If perceived latency ever becomes a real complaint, tune `ACTIVE_POLL_MS` down (500ms / 250ms) — low-cost knob. **Next step**: none. ### 8. Per-exit container log files **Finding**: v1 wrote timestamped per-exit logs with full I/O + mounts + stderr. v2: stderr → `log.debug` (invisible at default `LOG_LEVEL=info`), container close → `log.info` with exit code, session DBs preserved on disk. Real gap: stderr on abnormal exit isn't auto-surfaced. **Status**: dropped **Decision**: skip — no per-exit file restoration, no stderr-on-crash buffer. **Rationale**: underlying forensic info is still recoverable (session DBs on disk, heartbeat mtime, exit code in log). `LOG_LEVEL=debug` surfaces stderr when needed. The cost of adding buffered crash-log promotion (~15 LOC) isn't justified by the frequency of post-mortem cases. **Next step**: none. ### 9. Stuck detection + heartbeat-based container lifecycle **Finding**: v2's sweep detects stale heartbeats (10 min) and resets messages with backoff, but doesn't kill the container. Idle timeout is delivery-count-based (30 min since last messages_out). Together these produce a gap where a stuck container holds resources + blocks new wakes for up to 30 min. **Empirical findings from SDK probe** (`container/agent-runner/scripts/sdk-signal-probe.ts`, runs logged in `/tmp/probe-*.jsonl`): - Silent Bash tools (e.g. `sleep 30`) produce 30+ seconds of zero SDK events — heartbeat goes stale during legitimate work - Natural intra-stream silences up to ~12s observed mid-tool-use JSON streaming - `PreToolUse` / `PostToolUse` hook pair is reliable; `PostToolUseFailure` fires on blocked requests - `SubagentStart`/`SubagentStop` and `system/task_started`/`system/task_notification` pairs also reliable - **Pushing a new message mid-active-turn does NOT fire `UserPromptSubmit`** (fires only at start of a new turn, after `result`) - SDK's built-in `AskUserQuestion` doesn't actually block; returns placeholder - Bash tool's declared `timeout` param is visible in `tool_use` input — we can read it container-side - Stuck tools (hook that never resolves) produce indefinite silence — no SDK-side timeout **Status**: decided **Decision**: replace existing IDLE_TIMEOUT setTimeout + STALE_THRESHOLD=10min combo with message-scoped stuck detection + absolute 30-min ceiling. Reset messages inline when we kill. Blocklist SDK tools that don't fit our async model. **Sweep logic** (per active session): If container isn't running → reset any `'processing'` rows in processing_ack to `'pending'` + tries++ + backoff. Done. If container IS running, apply in order: 1. **Absolute ceiling**: if `heartbeat_mtime` older than `max(30 min, current_bash_timeout)` → kill + reset any processing to pending + retry++. Rationale: 30 min idle ceiling, extended only if agent is currently inside a Bash tool with longer declared timeout. Agents needing >30 min should use `run_in_background`. 2. **Message-scoped stuck**: for each `processing_ack` row with status=`'processing'`: - `claim_age = now - status_changed` - `tolerance = max(60s, current_bash_timeout)` if Bash in flight, else `60s` - If `claim_age > tolerance` AND `heartbeat_mtime <= status_changed` → kill + reset this message + retry++ Semantics: "container claimed a message and went silent for >tolerance since claim." No separate idle rule — rule 1 covers it. An idle container hits 30-min stale with no processing rows; kill has nothing to reset. **Container state surface** (for Bash timeout tracking): New table in outbound.db (or session_state row): ``` container_state ( session_id TEXT PRIMARY KEY, current_tool TEXT, -- null when no tool in flight tool_declared_timeout_ms INTEGER, tool_started_at TEXT ) ``` Container writes on `PreToolUse` (reads Bash `timeout` from tool input), clears on `PostToolUse` / `PostToolUseFailure`. Host reads in sweep decision. **Tool blocklist** (initial): - `AskUserQuestion` — SDK built-in; we have our own DB-backed MCP version - `EnterPlanMode` / `ExitPlanMode` — Claude Code UI only - `EnterWorktree` / `ExitWorktree` — Claude Code UI only Enforcement: - Pass `disallowedTools: [...]` to `query()` options — agent never sees them in its tool list - `PreToolUse` hook guard (defense-in-depth): if a blocklisted tool name somehow fires, immediately reset the current message + kill (treat as stuck) **Kill old machinery**: - Remove `setTimeout` + `resetIdle` plumbing in `container-runner.ts:128-140` - Remove `resetContainerIdleTimer` export + its caller in `delivery.ts:26` - Remove `IDLE_END_MS = 20_000` in `poll-loop.ts:11` (item 6a decision) — stream stays open as long as container alive - Existing `detectStaleContainers` logic merges into the new sweep rules; the heartbeat-stale-10-min path disappears **LOC estimate**: ~115 - New sweep decision logic replacing existing detectStaleContainers + IDLE_TIMEOUT path: 50 - Container state table + PreToolUse/PostToolUse write, host read: 25 - Tool blocklist (disallowedTools + hook guard): 15 - Deletions (IDLE_TIMEOUT setTimeout, IDLE_END_MS): −25 - Tests (kill paths, Bash-timeout grace, blocklist hit): 50 **Why this converged here** (rationale summary): - Empirical data showed we can't reliably tell stuck from legitimate-silent-work without state. Bash-declared-timeout is the cleanest per-tool signal available. - 60s-since-claim is tight enough for normal work (WebSearch/WebFetch finish in ~8s) but generous enough for reasonable delays. Exception for Bash covers agents running scripts with user-declared timeouts. - 30-min absolute ceiling prevents infinitely-stuck containers; agents needing longer have `run_in_background`. - Pushing messages can't serve as a liveness probe (they're silent mid-turn), so detection is state-driven, not push-driven. - Blocklist prevents a whole class of "SDK tool designed for interactive UI" footguns that would appear stuck in our async model. **Next step**: implement as a focused PR. Order: (a) tool blocklist — safe to ship alone, (b) container state table + PreToolUse writes, (c) sweep rewrite + message reset, (d) delete old IDLE_TIMEOUT + IDLE_END_MS machinery, (e) tests. ### 10. Host-level retry with backoff on agent error **Finding**: v1 had MAX_RETRIES=5 + exp. backoff on `processGroupMessages` failure. v2's equivalent is now covered by item 9's sweep logic — any time the container isn't running with `'processing'` rows present, they get reset to pending with backoff + retry++. **Status**: folded into item 9 **Decision**: no separate action. Agent-error retry happens via container-exit → sweep reset. Container errors also surface via provider-side session invalidation check (`poll-loop.ts:200-211` — `provider.isSessionInvalid(err)` → clears stored session id → fresh retry). Both paths preserved. **Next step**: none. --- ### 11. Process ID in logger output **Finding**: v1 emitted `(${process.pid})` after the level tag. v2 dropped it. **Status**: dropped **Decision**: don't restore. Host is single-process (PID is constant). Container stderr already gets tagged with `{ container: agentGroup.folder }` at `container-runner.ts:121`, which is more informative than a PID. **Next step**: none. --- ## LOW ### 11. Process ID in logger output **Finding**: v1 emitted `(${process.pid})` after the level tag. v2 dropped it. **Status**: pending **Decision**: **Rationale**: **Next step**: ### 12. Task dedup via unique `(kind, series_id)` index **Finding**: verified — `messages_in.series_id` column exists with a non-unique index. Concern was theoretical: two pending rows with same series could coexist. **Status**: dropped **Decision**: not a real issue. Recurrence logic at `src/modules/scheduling/recurrence.ts` is structurally dedup-safe: only `completed` rows with `recurrence` get cloned, and after cloning `recurrence` is cleared on the original so it can't re-clone. Plus container's atomic `markProcessing` prevents double-execution at claim time. **Next step**: none. ### 13. Silent-drop mode for noisy senders **Finding**: v1's `mode:'drop'` let you ignore specific users without logging. v2 only has binary allow/deny via access gate. **Status**: dropped — won't implement **Decision**: not worth the table + gate complexity for a personal-assistant scale. If a specific sender becomes a problem, admin can switch the messaging_group's `unknown_sender_policy` to `'strict'` or remove the sender from `agent_group_members`. **Next step**: none. ### 14. Remote control subsystem **Finding**: v1's `/remote-control` command spawned `claude remote-control` CLI detached, polled stdout for session URL, persisted PID/URL state. Entirely gone in v2. **Status**: deferred — opt-in skill when needed **Decision**: reintroduce as an opt-in install skill (e.g. `/add-remote-control`), not on trunk. Provider-specific: only works with `claude` provider (Claude Agent SDK); not supported by OpenCode or other providers. Skill should check `agent_group.provider` at install time and bail gracefully with an error message if not `'claude'`. **Rationale**: niche feature valuable only for direct agent SDK attachment during dev/debugging. Keeping it off trunk matches v2's "infra-only trunk, features-via-skills" philosophy. Also avoids carrying code for a feature that simply doesn't exist in non-Claude providers. **Next step**: none until someone needs it. When implementing, likely lives on the `providers` branch (since it's provider-specific) or its own branch, installed via skill that copies files + checks provider. ### 15. Dead config constants **Finding**: verified — `POLL_INTERVAL` (line 13), `SCHEDULER_POLL_INTERVAL` (line 14), and `IPC_POLL_INTERVAL` (line 32) in `src/config.ts` have zero imports elsewhere in v2. Container's `POLL_INTERVAL_MS` in `poll-loop.ts` is a distinct local constant, unrelated. **Status**: decided — delete **Decision**: remove the three constants from `src/config.ts`. Trivial 3-line deletion. **Next step**: do as part of any sweep-touching PR, or standalone. ### 16. Configurable retention thresholds **Finding**: `STALE_THRESHOLD_MS` (10 min) and `MAX_TRIES` (5) in `host-sweep.ts` are hardcoded. Item 9's redesign replaces `STALE_THRESHOLD_MS` with new constants (60s claim-age, 30 min ceiling). **Status**: dropped — keep as constants **Decision**: leave the new item-9 thresholds + `MAX_TRIES` as source constants. Adding config surface for them isn't worth it at personal-assistant scale. If operational tuning ever becomes a real need, revisit — they're small centralized constants, one-line change each. **Next step**: none. ### 17. Dynamic group-add (IPC watcher equivalent) **Finding**: not actually a restart requirement — investigation showed: - Router reads `messaging_groups` + `messaging_group_agents` fresh per inbound (dynamic by design) - Chat SDK bridge has a `conversations: Map` populated at setup + `updateConversations()` method - **Nothing in the bridge currently reads the map**, and no code calls `updateConversations()` after startup - Today: stale map has no observable effect (dead state) - After item 1 ships (adapter-level gating): stale map would matter; new wirings wouldn't apply in the adapter gate until restart **Status**: deferred — comment added now, implement alongside dynamic group registration feature **Decision**: don't refactor the adapter interface now. Added a NOTE comment at `src/channels/chat-sdk-bridge.ts:73` flagging the staleness issue so the next person touching the bridge or adding dynamic-registration sees it. When dynamic group registration is implemented (admin adds a new messaging_group_agents row while host is running), handle cache refresh then — most likely by calling `adapter.updateConversations(freshConfigs)` after the mutation, keyed off the adapter's `channelType`. **Rationale**: item 1's initial landing can keep the adapter gating responsibilities small or skip adapter-side gating entirely. Refactoring ConversationConfig now would add scope; better to ship item 1 first, see if over-subscription bites, address if it does. **Next step**: when building the admin-skill path for adding messaging_group ↔ agent_group wirings, include a `refreshAdapterConversations(channelType)` call after the INSERT. ~10 LOC when needed. --- ## Test regressions (v1 `formatting.test.ts` assertions) ### 18+19+20+21. Timezone + formatting recreation (merged) **Finding**: v1 had a full timezone-aware formatting pipeline. v2 lost most of it, producing real bugs where the agent misinterprets user intent (scheduling for wrong times, suggesting time-inappropriate things). **Scope** — recreate v1 behavior faithfully wherever times touch the agent: - Timestamp formatting on inbound messages: `formatLocalTime(utcIso, TIMEZONE)` producing "Jan 1, 2024, 1:30 PM" format via `Intl.DateTimeFormat('en-US', {...})` (v1 `timezone.ts`) - `` header prepended to message block (v1 `router.ts:20-22`) - Reply-to with message ID: `......` (v1 `router.ts:10-18`) - `stripInternalTags()`: regex `/[\s\S]*?<\/internal>/g` applied to outbound text, then `.trim()` (v1 `router.ts:25-27`) - Cron expressions parsed with explicit user TZ: `CronExpressionParser.parse(expr, { tz: TIMEZONE })` (v1 `task-scheduler.ts:20-49`) - User-specified times normalized via the user's TZ: in v1 this was the host-side task scheduler; in v2 it's the new-in-v2 scheduling MCP tool (`mcp-tools/scheduling.ts`). Same principle — accept user-local times, normalize to UTC for storage, interpret cron in user's TZ. **Status**: decided — recreate with tests **Decision**: port v1's formatter + timezone behavior faithfully. Full recreation spec at [`timezone-formatting-v1-recreation.md`](timezone-formatting-v1-recreation.md) — includes exact v1 code, line numbers at commit `27c5220`, complete test inventory from `src/v1/formatting.test.ts` and `src/v1/task-scheduler.test.ts`. **Core principle** (per Gavriel): the agent operates in the user's timezone. Every timestamp the agent sees is user-local. Every time the agent outputs is interpreted as user-local. This is load-bearing for correctness, not a nice-to-have. **Porting plan** (from recreation spec): 1. `container/agent-runner/src/formatter.ts` — replace `formatTime` with `formatLocalTime(ts, TIMEZONE)` call; add reply_to attribute + `` element exactly as v1 2. Prepend `\n` to the messages block at formatter entry 3. Extract `stripInternalTags` as a named function; apply in outbound dispatch path (`poll-loop.ts:389` currently uses inline regex) 4. `container/agent-runner/src/mcp-tools/scheduling.ts` — clarify `processAfter` description, normalize to UTC ISO in handler 5. `src/modules/scheduling/recurrence.ts` — pass `{ tz: TIMEZONE }` to `CronExpressionParser.parse()` explicitly 6. Port all test cases from v1's `formatting.test.ts` and `task-scheduler.test.ts` to v2's test tree **LOC estimate**: ~75 prod + ~120 tests (reproducing v1's 40+ test cases) **Next step**: implement as a focused PR. Order: (a) formatter changes + tests, (b) context header + tests, (c) reply_to + tests, (d) stripInternalTags extraction + tests, (e) scheduling tool + cron TZ + tests. ### 19, 20, 21 — merged into 18 above See item 18 for the full recreation plan and spec reference. --- ## Notes - `src/v1/` was deleted upstream (commit 86becf8) after this analysis was written. v2 tree has since had a major module extraction (approvals, interactive, scheduling, permissions, agent-to-agent, self-mod) and a new CLI channel. **Verify each item against the current tree before deciding** — some may already be addressed.