Files

gavrielc 5d5f72e117 docs(action-items): add item 22 (unknown-channel wiring approval flow)

Covers the gap item 5 left open: request_approval presupposes a wired
channel, so unknown-channel cases (new DM, @mention in unwired group,
bot added to fresh group) short-circuit at no_agent_wired before the
approval flow runs.

Design:
- Owner-sender auto-wire fast path (exactly one agent group → wire
  silently; multiple → card)
- Card with one button per existing agent group + "Create new" + "Ignore"
- New pending_channel_approvals table, UNIQUE(messaging_group_id)
- nca- action-id prefix paralleling nsa- / ncq-
- Handler lives alongside handleSenderApprovalResponse
- "Create new" sub-flow is intentionally open scope

Cross-reference added to item 5 so the scope boundary is explicit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-20 09:55:16 +03:00

41 KiB

Raw Blame History

v1 → v2 Action Items

Working doc for each finding from SUMMARY.md. Decisions were made one-by-one; this rollup summarizes the outcome.

Status legend: pending · discussing · decided · deferred · dropped · done

Rollup

To implement (~800 LOC total, roughly)

#	Topic	LOC	Notes
1	Engage modes + sender scope + accumulate/drop + fan-out + tool blocklist	~315	DB migration drops `trigger_rules`/`response_scope`, adds `engage_mode`/`engage_pattern`/`sender_scope`/`ignored_message_policy` + `trigger` column on `messages_in`; router `pickAgents` fan-out; adapter-level gating via new hooks
5	`request_approval` flow for unknown senders (default policy flips from `strict` to `request_approval`)	~175	New `pending_sender_approvals` table; reuses existing `pickApprover` + card infra
9	Stuck detection (60s claim-age rule), heartbeat-based lifecycle, `max(30m, bash_timeout)` absolute ceiling, SDK tool blocklist (`AskUserQuestion`, `EnterPlanMode`, `ExitPlanMode`, `EnterWorktree`, `ExitWorktree`), remove `IDLE_TIMEOUT` setTimeout + `IDLE_END_MS` machinery	~115	Container state row for Bash timeout tracking
15	Delete three dead config constants from `src/config.ts`	3	`POLL_INTERVAL`, `SCHEDULER_POLL_INTERVAL`, `IPC_POLL_INTERVAL`
18	Timezone + formatting recreation — port v1 bit-for-bit (`formatLocalTime`, `<context timezone="..."/>` header, `reply_to` + `<quoted_message>` XML, `stripInternalTags`) + scheduling tool TZ normalization + cron TZ parsing	~195 (75 prod + 120 tests)	Full spec in timezone-formatting-v1-recreation.md

Deferred (wait for trigger)

#	Topic	Trigger
2	`nonMainReadOnly` mount isolation	If multi-tenant / untrusted-group use ever surfaces. In the meantime, mount-declaration skill must explicitly prompt RO/RW when added
3a	End-to-end recovery test	When next touching `host-sweep.ts` / `index.ts` startup
14	Remote control subsystem	When someone needs it. Opt-in skill, provider-specific (Claude SDK only)
17	Dynamic group-add (bridge conversations cache refresh)	When implementing dynamic group registration feature. Code comment added at `chat-sdk-bridge.ts:73`

Dropped (won't implement / not-a-regression)

#	Topic	Why
3	Explicit pending-message recovery	Working as designed via sweep's immediate first tick + `cleanupOrphans`
4	`response_scope` enforcement	Folded into item 1 migration (column deleted, values backfilled)
6	Per-group container timeout	Not a regression — v1's hard-kill was worse than v2's keep-alive-after-idle
7	Container streaming output markers	Replaced by `send_message` MCP tool; latency ~1s is fine for chat UX
8	Per-exit container log files	Underlying info still recoverable (session DBs, heartbeat mtime, exit code)
10	Host-level retry on agent error	Folded into item 9's kill + sweep-reset loop
11	Process ID in logger output	Single host process; container stderr already tagged with `agentGroup.folder`
12	Task dedup via unique series_id index	Recurrence logic is structurally dedup-safe; not a real issue
13	Silent-drop sender mode	Admin can use `unknown_sender_policy='strict'` or remove from members instead
16	Configurable retention thresholds	Personal-assistant scale; source constants are fine

Extras recorded during discussion

1a: Implementation-ordering plan for item 1
6a: Remove IDLE_END_MS from poll-loop.ts (folded into item 9)
3a: E2E recovery test (deferred)

Follow-up PRs (scoped, not in this branch)

#	Topic	Why later
22	Unknown-channel wiring approval flow (card to owner when bot receives inbound in an unwired messaging group)	Gap surfaced after item 5 landed — item 5's `request_approval` covers unknown senders but presupposes a wired channel. See item 22 for the full design.

HIGH

1. Trigger-rule matching in `pickAgent`

Finding: src/router.ts:246 TODO. Confirmed trigger filtering is non-functional end-to-end: trigger_rules JSON is parsed into ConversationConfig and passed to adapters, but the Chat SDK bridge never reads it, and router's pickAgent picks by priority only. response_scope on messaging_group_agents is stored but never enforced. Chat SDK bridge hard-subscribes on every mention (bridge:173) and every DM (bridge:189).

Status: decided — design locked; implementation pending

Decision: replace trigger_rules JSON + response_scope with four explicit orthogonal columns on messaging_group_agents. Fan out inbound messages to all matching agents (N containers for N agents). Adapter-level gating in the bridge. sender_scope enforcement moves to the permissions module.

Schema (messaging_group_agents):

engage_mode            TEXT NOT NULL DEFAULT 'mention'
                       -- 'pattern' | 'mention' | 'mention-sticky'
engage_pattern         TEXT            -- required when mode='pattern'; '.' = always
sender_scope           TEXT NOT NULL DEFAULT 'all'     -- 'all' | 'known'
ignored_message_policy TEXT NOT NULL DEFAULT 'drop'    -- 'drop' | 'accumulate'

Drop trigger_rules + response_scope. No per-wiring accumulate cap — storage is unbounded.

Global wake cap (not a column): reuse MAX_MESSAGES_PER_PROMPT in src/config.ts (already defined, default 10, currently dead code from v1). Pass to container via NANOCLAW_MAX_MESSAGES_PER_PROMPT. Container applies ORDER BY seq DESC LIMIT $N when pulling pending messages on wake.

Session DB (messages_in):

trigger INTEGER NOT NULL DEFAULT 1   -- 0 = context-only, 1 = wake agent

Host's countDueMessages / wake logic gates on trigger=1. Container reads all messages for context regardless.

Decisions locked:

always collapses into pattern with engage_pattern='.' (three modes total)
mention and mention-sticky are separate modes (stickiness is user-visible)
pattern is a JS regex string — applied as new RegExp(pattern).test(text)
Accumulate cap = last N messages, default 10
Fan-out: each matching agent gets its own session + container
Per-channel defaults live in the setup/register flow, not in the schema:
- DM → pattern with .
- Threaded group → mention-sticky
- Non-threaded group → mention

Routing flow (future):

Inbound → resolve messaging_group → group-level unknown_sender_policy gate
pickAgents() returns all wired agents (not just priority 0)
For each agent: a. sender_scope check (permissions module) b. engage_mode check (regex / mention / mention-sticky) c. Matched → write with trigger=1, wake container d. Not matched + accumulate → write with trigger=0, don't wake (no cap — stored forever) e. Not matched + drop → skip

On wake, container pulls pending messages with ORDER BY seq DESC LIMIT MAX_MESSAGES_PER_PROMPT so only the most recent N reach the prompt regardless of accumulation depth.

Adapter bridge:

Read conversations.get(channelId) before setupConfig.onInbound(...)
For pattern mode: test regex
For mention / mention-sticky: require bot to be mentioned
Only thread.subscribe() when mode is mention-sticky (today it subscribes unconditionally)

LOC estimate: ~315 (~255 prod + ~60 test)

schema migration + backfill: 40
session DB trigger column: 25
types + adapter contract: 20
DB helpers (CRUD): 20
host→adapter plumbing (including NANOCLAW_MAX_MESSAGES_PER_PROMPT env): 10
router fan-out + gating: 70
sender-scope in permissions module: 15
Chat SDK bridge gating + subscribe control: 40
container-side LIMIT N on pending-message pull: 5
smart defaults in setup/register flow: 15
tests: 60

(Note: earlier plan's "accumulate prune-to-N in router" is dropped — host doesn't prune. Cap is container-side only.)

Core vs module split:

Core (src/): schema, engage_mode enforcement, pickAgents fan-out, bridge gating, trigger column, accumulate/drop
Permissions module: sender_scope enforcement (extends existing access gate). Default sender_scope='all' → no-op when permissions module absent

Next step: new action item for implementation — see item 1a.

1a. Implementation plan for engage/sender/ignored columns

Status: pending — ready to implement Order: (a) migration + backfill, (b) types + DB helpers, (c) router fan-out + gating, (d) bridge gating, (e) permissions sender_scope, (f) setup-flow defaults, (g) tests Next step: draft the migration + write up the PR plan when ready

2. `nonMainReadOnly` mount isolation

Finding: mount-security.ts moved to src/modules/mount-security/index.ts during the refactor. validateMount(mount) no longer takes an isMain param; MountAllowlist has no nonMainReadOnly field. Regression is real. But v1's "main vs non-main" concept doesn't map cleanly to v2 — agent_groups has no is_main flag.

Status: deferred

Decision: do not restore the v1 flag. Trust admin-declared readonly values in container.json. The allowlist's per-root allowReadWrite + path gating is sufficient for the current threat model (personal-assistant use, single admin). If multi-tenant / untrusted auxiliary groups become a real use case, prefer framing B (add agent_groups.mount_access: 'rw' | 'ro' column) over resurrecting isMain.

Rationale: v2 deliberately dropped the "main" concept. Reintroducing isMain to restore a defense-in-depth check that was designed for a different entity model is the wrong trade. Admin already has to opt-in twice (allowlist allowReadWrite: true + container.json readonly: false) to get RW — that's two deliberate keys. The v1 flag was a triple-check for a rare class of admin mistakes in a shared-infra setup.

Follow-up (required): when building the skill / guide / setup flow that lets admins declare additional mounts (e.g. self-customize, manage-mounts, or a new /add-mount skill), the flow must clearly surface the RO vs RW distinction to the admin — explicit choice, explicit warning when RW is selected, and default to RO. This replaces v1's automatic enforcement with informed consent.

Next step: when the mount-declaration skill/flow is next touched, add explicit RO/RW prompting. Track as a sub-item if a skill exists yet.

3. Explicit pending-message recovery on startup

Finding: v1 had a named recoverPendingMessages() function at startup. v2 relies on the host sweep. Verified: the recovery path exists and is correct — just renamed/relocated.

Status: decided — working as designed, no code change

Current mechanism (verified against tree):

cleanupOrphans() at startup kills any leftover container from the previous run (src/index.ts:69)
startHostSweep() runs its first sweep immediately — no 60s delay (src/host-sweep.ts:38)
Sweep per session: syncProcessingAcks → countDueMessages → wakeContainer if work pending and no container → detectStaleContainers resets stuck processing rows with backoff

Scenarios covered:

Host crashed while container idle with pending messages → orphan cleanup + first sweep re-wakes
Host crashed mid-processing → stale detection resets processing → pending, next sweep wakes
Container crashed with host alive → heartbeat mtime catches it inside 10 min STALE_THRESHOLD_MS

Rationale: the function got renamed (recovery → sweep) but the behavior is equivalent or better. Sweep is continuous; recovery used to be one-shot.

Next step: see item 3a.

22. Unknown-channel wiring approval flow

Finding (post-item-5 discussion): item 5's request_approval only fires when a messaging group already has agents wired. Three scenarios slip through to the earlier no_agent_wired structural-drop branch in src/router.ts and get silent-dropped with no signal to the owner:

A new user DMs the agent directly (the DM's messaging group auto-creates but has no wiring)
The agent is @mentioned in a group the admin hasn't registered
The agent is added to a new group and someone there addresses it

In all three, the user sees no response and the owner has no signal anything happened.

Status: decided — companion PR to item 5, scoped separately

Decision: when the router hits no_agent_wired for a non-public event, instead of silent-dropping, pick the owner and DM them a wiring card. Two flavors depending on who triggered it:

Sender IS an owner/admin (the common "I just added the bot" case) → auto-wire IF exactly one agent group exists. Silent seamless flow. If multiple agent groups exist, fall through to the card so the owner picks.
Sender is anyone else (stranger, or owner in a multi-agent install) → deliver a card:
- Title: 🔌 New channel — wire it?
- Body: <senderName> is trying to reach you in <channelName> on <platform>. Wire to which agent?
- Options: one button per existing agent_groups row, plus ➕ Create new and Ignore

On approve (existing agent group):

createMessagingGroupAgent(...) with channel-kind defaults — DM→pattern + '.', threaded group→mention-sticky, non-threaded group→mention (same defaults as scripts/init-first-agent.ts)
Replay the stored event via routeInbound (sender-approval pattern)
Delete pending row

On approve "Create new": [OPEN SCOPE] — needs name/folder input. Options:

Follow-up ask_question card asking for a name → auto-derive folder from slug → create group + wire
Or: skill-backed flow — the button dispatches to /init-agent or similar and the card just links out
Punt until implementation; mention in the PR brief that we'll decide when building

On ignore: delete pending row; future attempts re-prompt fresh (consistent with sender-approval deny; no denial persistence).

Failure cases (drop silently with log, don't leave a pending row):

No owner configured (fresh install) — same behaviour as sender-approval
No reachable DM for any owner/admin
Delivery adapter missing

New table:

pending_channel_approvals (
  id                 TEXT PRIMARY KEY,
  messaging_group_id TEXT NOT NULL REFERENCES messaging_groups(id),
  sender_identity    TEXT,                 -- NULL when triggered by a non-identifiable event
  sender_name        TEXT,
  original_message   TEXT NOT NULL,        -- JSON InboundEvent for replay
  approver_user_id   TEXT NOT NULL,
  created_at         TEXT NOT NULL,
  UNIQUE(messaging_group_id)               -- one pending wiring per channel
)

Dedup is narrower than sender-approval's (mg_id, sender_id) — one pending wiring per channel, period. A second stranger writing into the same unwired channel piggybacks on the existing card instead of spawning a new one. Latest event replaces the stored original_message (we only replay one anyway, and latest is most useful).

Card action id prefix: nca-<approvalId>:<value> where value is agent-group-<id> / create / ignore. Response handler lives in src/modules/permissions/ alongside handleSenderApprovalResponse.

Owner-sender auto-wire logic:

if sender is owner/admin AND getAllAgentGroups().length === 1:
    auto-wire to that group, replay event, done — no card
else:
    deliver card

Don't auto-create a new agent group silently — always require a prompt for that.

LOC estimate: ~145

Migration + CRUD: 45
Router hook before no_agent_wired drop → try channel approval: 15
Owner-sender auto-wire fast path: 20
Card delivery (scope pickApprover(null); build buttons from getAllAgentGroups()): 25
Response handler: 25
Tests: 15

Open scopes (flag at PR time):

"Create new" sub-flow — pick between follow-up card vs skill link
Do we also react to bot-added-to-group platform events? Simpler to stay lazy (first-message-triggered only). Platform lifecycle events are inconsistent across Discord/Slack/Telegram anyway.
Worth scanning the channels branch for any existing channel-lifecycle handlers that might conflict.

Next step: open a follow-up PR off this branch once #1869 lands.

3a. End-to-end recovery test

Finding: no test confirms the host-crash-restart scenario produces timely re-delivery.

Status: pending — nice-to-have

Decision: add an integration test: (1) write a pending message to inbound.db, (2) kill the host simulating crash, (3) start host, (4) assert container is woken and message processed within a bounded time (≤5s? ≤ one sweep interval).

Rationale: the sweep logic is correct as written, but a regression here would be silent (messages just sit). Worth a safety net.

Next step: draft test when touching host-sweep.ts or index.ts startup flow next.

MEDIUM

4. `response_scope` enforcement

Finding: messaging_group_agents.response_scope stores 'all' | 'triggered' | 'allowlisted' but nothing reads it.

Status: decided — folded into item 1

Decision: delete the response_scope column as part of the item-1 migration. Values backfill into the new explicit columns:

Old `response_scope`	New columns
`all`	`engage_mode='pattern'`, `engage_pattern='.'`, `sender_scope='all'`
`triggered`	`engage_mode='mention'` (or `'pattern'` if legacy row has a pattern), `sender_scope='all'`
`allowlisted`	`engage_mode` derived from `trigger_rules`, `sender_scope='known'`

Rationale: response_scope conflated two orthogonal axes (engage + sender). Splitting them is the whole point of item 1.

Next step: ensure the item-1 migration includes the response_scope backfill in its UP step.

5. `request_approval` flow for unknown senders

Finding: unknown_sender_policy='request_approval' is scaffolded in src/modules/permissions/index.ts:100-108 but falls through to log-and-drop (explicit TODO comment). Current default is 'strict', which silently drops — user has no signal that their agent isn't responding.

Status: decided — implement, keep simple

Decision: implement full approval flow and flip the schema default from 'strict' to 'request_approval'. UX rationale: users wire their DM during setup; silent drops create a mystery when the agent doesn't respond. Public is unsafe. Approval default → admin sees a card and explicitly decides.

Flow:

Unknown sender writes to wired messaging group with policy 'request_approval'
If pending approval for (messaging_group, sender) already exists → drop this message silently (in-flight dedup; not persistence)
Otherwise: insert into pending_sender_approvals with original message + timestamp
pickApprover(agent_group_id) + pickApprovalDelivery(approverUserId) — existing machinery in src/access.ts
Deliver a card via adapter's deliver() with Card/Actions/Button primitives (already in chat-sdk-bridge)
Card action id prefix nsa:<approval_id>:<allow|deny> (parallels existing ncq: prefix for ask_user_question)
On allow: upsert users row, insert into agent_group_members, deliver stored message through normal routing (original timestamp preserved), cleanup pending row
On deny: cleanup pending row, drop the message. No denial persistence — next attempt from same sender triggers a fresh card.

No denial persistence explicit rationale: personal-assistant scale, admin can switch policy to 'strict' per messaging group if a hostile sender starts spamming. Avoids a new table column and a TTL config.

New table:

pending_sender_approvals (
  id                TEXT PRIMARY KEY,
  messaging_group_id TEXT NOT NULL,
  agent_group_id    TEXT NOT NULL,
  sender_identity   TEXT NOT NULL,  -- channel_type:handle
  sender_name       TEXT,
  original_message  TEXT NOT NULL,  -- JSON of the InboundEvent
  approver_user_id  TEXT NOT NULL,
  created_at        TEXT NOT NULL,
  UNIQUE(messaging_group_id, sender_identity)  -- enforces in-flight dedup
)

Dedicated (not reusing pending_approvals which is OneCLI-specific).

Reuse:

pickApprover / pickApprovalDelivery in src/access.ts
Card rendering primitives already in src/channels/chat-sdk-bridge.ts
onAction dispatch — add the nsa: prefix handler alongside existing ncq:

LOC estimate: ~175

Migration + CRUD for pending_sender_approvals: 55
handleUnknownSender request_approval branch + in-flight dedup: 25
Host-side card dispatcher (pick approver + deliver card): 25
onAction handler for nsa: prefix (allow/deny): 30
Schema default flip + router auto-create update: 5
Tests: 35

Module location: all in src/modules/permissions/. Module stays optional; default-allow fallback behavior when not loaded is preserved.

Open gotchas noted:

The router's auto-create at router.ts:123 currently hardcodes 'strict' — change to omit the field so schema default applies
pickApprover may return null if no admin/owner exists (e.g. fresh install before first user registered). In that case: log + drop silently, treat as effectively 'strict' for safety. Don't block message forever.

Scope boundary (important): this item covers unknown sender in a wired channel. The parallel case — unknown channel (new DM / unwired group / bot-added-to-group) — short-circuits at the no_agent_wired structural drop before this flow ever runs. Tracked as item 22.

Next step: implement alongside item 1 or as a follow-up. Same migration window is fine (one migration for engage columns + request_approval default change + new table).

6. Per-group container timeout

Finding: v1's containerConfig.timeout override is gone. All groups share IDLE_TIMEOUT. Original framing (slow-but-healthy agents getting killed) was wrong — v1's timeout was a hard wall-clock kill on the whole spawn, totally different from v2's IDLE_TIMEOUT (keep-alive after last activity). V2's behavior is strictly better for long-running agents.

Status: dropped — not a regression

Decision: don't restore per-group timeout override. IDLE_TIMEOUT=30min global is the right model. If per-group idle tuning ever becomes useful it's ~15 LOC (new column, env injection at spawn) — small feature add, not a regression to repair.

Rationale: v2 lets long-running agents finish; v1 would have hard-killed them at 30min. Current behavior is an improvement.

Next step: see 6a.

6a. Remove IDLE_END_MS (container-side query idle termination)

Finding: container/agent-runner/src/poll-loop.ts:11 defines IDLE_END_MS = 20_000. Inside processQuery, a concurrent interval ends the active Agent SDK query() stream after 20s of SDK silence. Any SDK event (tool use, tool result, streamed text, new pushed message) resets the timer.

This is a general "SDK silence detector," not specifically post-result. It fires any time:

Mid-work: slow tool call with no intermediate SDK events (npm install, pytest, long WebFetch, etc.)
Post-result: agent finished, stream waiting for potential follow-up
Any other pause in the SDK stream

Status: decided — remove, pending SDK verification

Decision: delete IDLE_END_MS and its setInterval check. Let the query() stream stay open as long as the container is active. Container's 30-min IDLE_TIMEOUT (host-side in container-runner.ts) is the single source of truth for "when to let go."

Rationale:

When new messages arrive mid-stream, they push in via query.push() with no reconnect — stream-open is cheaper per-message than close-and-reopen
Closing early forces a reconnect + cold prompt cache for the next message
Container stays alive anyway; ending the stream doesn't save resources at the container level
CLAUDE_CODE_AUTO_COMPACT_WINDOW=165000 already handles context window growth within a long-lived query
Anthropic API's own stream timeout will fire if needed; SDK should handle it transparently
Avoids the false-positive kill during legitimate slow tool calls (common case: agent running npm install gets cut off at 20s)

Caveat (must verify before removal): confirm Claude Agent SDK doesn't require explicit query.end() for prompt-cache commit or session-state persistence. Expected to be fine (SDK checkpoints per turn) but double-check docs / run a quick test where container idles with stream open, then processes a follow-up.

LOC estimate: ~−15 (net deletion — remove constant, setInterval idle check, the done flag plumbing may also simplify)

Next step: when implementing item 1's changes (or standalone), verify SDK behavior with stream-open-indefinite, then delete IDLE_END_MS block. Watch for any test assertions on it.

7. Container streaming output (marker-based pre-delivery)

Finding: v1's ---NANOCLAW_OUTPUT_START/END--- markers enabled pre-completion delivery. v2's two paths (final-result dispatchResultText + mid-turn send_message MCP tool) both write to outbound.db; host polls every ACTIVE_POLL_MS = 1000ms.

Status: dropped — not a regression

Decision: v2's send_message MCP tool is the correct replacement for v1's marker-based streaming. Latency is ≤1s (poll interval), which is fine for chat UX.

Rationale: v1's marker model required the agent and host to share a fragile state machine over stdout. v2 uses explicit tool calls and a DB surface — cleaner architecture, comparable latency, and control stays with the agent. If perceived latency ever becomes a real complaint, tune ACTIVE_POLL_MS down (500ms / 250ms) — low-cost knob.

Next step: none.

8. Per-exit container log files

Finding: v1 wrote timestamped per-exit logs with full I/O + mounts + stderr. v2: stderr → log.debug (invisible at default LOG_LEVEL=info), container close → log.info with exit code, session DBs preserved on disk. Real gap: stderr on abnormal exit isn't auto-surfaced.

Status: dropped

Decision: skip — no per-exit file restoration, no stderr-on-crash buffer.

Rationale: underlying forensic info is still recoverable (session DBs on disk, heartbeat mtime, exit code in log). LOG_LEVEL=debug surfaces stderr when needed. The cost of adding buffered crash-log promotion (~15 LOC) isn't justified by the frequency of post-mortem cases.

Next step: none.

9. Stuck detection + heartbeat-based container lifecycle

Finding: v2's sweep detects stale heartbeats (10 min) and resets messages with backoff, but doesn't kill the container. Idle timeout is delivery-count-based (30 min since last messages_out). Together these produce a gap where a stuck container holds resources + blocks new wakes for up to 30 min.

Empirical findings from SDK probe (container/agent-runner/scripts/sdk-signal-probe.ts, runs logged in /tmp/probe-*.jsonl):

Silent Bash tools (e.g. sleep 30) produce 30+ seconds of zero SDK events — heartbeat goes stale during legitimate work
Natural intra-stream silences up to ~12s observed mid-tool-use JSON streaming
PreToolUse / PostToolUse hook pair is reliable; PostToolUseFailure fires on blocked requests
SubagentStart/SubagentStop and system/task_started/system/task_notification pairs also reliable
Pushing a new message mid-active-turn does NOT fire UserPromptSubmit (fires only at start of a new turn, after result)
SDK's built-in AskUserQuestion doesn't actually block; returns placeholder
Bash tool's declared timeout param is visible in tool_use input — we can read it container-side
Stuck tools (hook that never resolves) produce indefinite silence — no SDK-side timeout

Status: decided

Decision: replace existing IDLE_TIMEOUT setTimeout + STALE_THRESHOLD=10min combo with message-scoped stuck detection + absolute 30-min ceiling. Reset messages inline when we kill. Blocklist SDK tools that don't fit our async model.

Sweep logic (per active session):

If container isn't running → reset any 'processing' rows in processing_ack to 'pending' + tries++ + backoff. Done.

If container IS running, apply in order:

Absolute ceiling: if heartbeat_mtime older than max(30 min, current_bash_timeout) → kill + reset any processing to pending + retry++. Rationale: 30 min idle ceiling, extended only if agent is currently inside a Bash tool with longer declared timeout. Agents needing >30 min should use run_in_background.
Message-scoped stuck: for each processing_ack row with status='processing':
- claim_age = now - status_changed
- tolerance = max(60s, current_bash_timeout) if Bash in flight, else 60s
- If claim_age > tolerance AND heartbeat_mtime <= status_changed → kill + reset this message + retry++
Semantics: "container claimed a message and went silent for >tolerance since claim."

No separate idle rule — rule 1 covers it. An idle container hits 30-min stale with no processing rows; kill has nothing to reset.

Container state surface (for Bash timeout tracking): New table in outbound.db (or session_state row):

container_state (
  session_id          TEXT PRIMARY KEY,
  current_tool        TEXT,      -- null when no tool in flight
  tool_declared_timeout_ms INTEGER,
  tool_started_at     TEXT
)

Container writes on PreToolUse (reads Bash timeout from tool input), clears on PostToolUse / PostToolUseFailure. Host reads in sweep decision.

Tool blocklist (initial):

AskUserQuestion — SDK built-in; we have our own DB-backed MCP version
EnterPlanMode / ExitPlanMode — Claude Code UI only
EnterWorktree / ExitWorktree — Claude Code UI only

Enforcement:

Pass disallowedTools: [...] to query() options — agent never sees them in its tool list
PreToolUse hook guard (defense-in-depth): if a blocklisted tool name somehow fires, immediately reset the current message + kill (treat as stuck)

Kill old machinery:

Remove setTimeout + resetIdle plumbing in container-runner.ts:128-140
Remove resetContainerIdleTimer export + its caller in delivery.ts:26
Remove IDLE_END_MS = 20_000 in poll-loop.ts:11 (item 6a decision) — stream stays open as long as container alive
Existing detectStaleContainers logic merges into the new sweep rules; the heartbeat-stale-10-min path disappears

LOC estimate: ~115

New sweep decision logic replacing existing detectStaleContainers + IDLE_TIMEOUT path: 50
Container state table + PreToolUse/PostToolUse write, host read: 25
Tool blocklist (disallowedTools + hook guard): 15
Deletions (IDLE_TIMEOUT setTimeout, IDLE_END_MS): −25
Tests (kill paths, Bash-timeout grace, blocklist hit): 50

Why this converged here (rationale summary):

Empirical data showed we can't reliably tell stuck from legitimate-silent-work without state. Bash-declared-timeout is the cleanest per-tool signal available.
60s-since-claim is tight enough for normal work (WebSearch/WebFetch finish in ~8s) but generous enough for reasonable delays. Exception for Bash covers agents running scripts with user-declared timeouts.
30-min absolute ceiling prevents infinitely-stuck containers; agents needing longer have run_in_background.
Pushing messages can't serve as a liveness probe (they're silent mid-turn), so detection is state-driven, not push-driven.
Blocklist prevents a whole class of "SDK tool designed for interactive UI" footguns that would appear stuck in our async model.

Next step: implement as a focused PR. Order: (a) tool blocklist — safe to ship alone, (b) container state table + PreToolUse writes, (c) sweep rewrite + message reset, (d) delete old IDLE_TIMEOUT + IDLE_END_MS machinery, (e) tests.

10. Host-level retry with backoff on agent error

Finding: v1 had MAX_RETRIES=5 + exp. backoff on processGroupMessages failure. v2's equivalent is now covered by item 9's sweep logic — any time the container isn't running with 'processing' rows present, they get reset to pending with backoff + retry++.

Status: folded into item 9

Decision: no separate action. Agent-error retry happens via container-exit → sweep reset. Container errors also surface via provider-side session invalidation check (poll-loop.ts:200-211 — provider.isSessionInvalid(err) → clears stored session id → fresh retry). Both paths preserved.

Next step: none.

11. Process ID in logger output

Finding: v1 emitted (${process.pid}) after the level tag. v2 dropped it.

Status: dropped

Decision: don't restore. Host is single-process (PID is constant). Container stderr already gets tagged with { container: agentGroup.folder } at container-runner.ts:121, which is more informative than a PID.

Next step: none.

LOW

11. Process ID in logger output

Finding: v1 emitted (${process.pid}) after the level tag. v2 dropped it. Status: pending Decision: Rationale: Next step:

12. Task dedup via unique `(kind, series_id)` index

Finding: verified — messages_in.series_id column exists with a non-unique index. Concern was theoretical: two pending rows with same series could coexist.

Status: dropped

Decision: not a real issue. Recurrence logic at src/modules/scheduling/recurrence.ts is structurally dedup-safe: only completed rows with recurrence get cloned, and after cloning recurrence is cleared on the original so it can't re-clone. Plus container's atomic markProcessing prevents double-execution at claim time.

Next step: none.

13. Silent-drop mode for noisy senders

Finding: v1's mode:'drop' let you ignore specific users without logging. v2 only has binary allow/deny via access gate.

Status: dropped — won't implement

Decision: not worth the table + gate complexity for a personal-assistant scale. If a specific sender becomes a problem, admin can switch the messaging_group's unknown_sender_policy to 'strict' or remove the sender from agent_group_members.

Next step: none.

14. Remote control subsystem

Finding: v1's /remote-control command spawned claude remote-control CLI detached, polled stdout for session URL, persisted PID/URL state. Entirely gone in v2.

Status: deferred — opt-in skill when needed

Decision: reintroduce as an opt-in install skill (e.g. /add-remote-control), not on trunk. Provider-specific: only works with claude provider (Claude Agent SDK); not supported by OpenCode or other providers. Skill should check agent_group.provider at install time and bail gracefully with an error message if not 'claude'.

Rationale: niche feature valuable only for direct agent SDK attachment during dev/debugging. Keeping it off trunk matches v2's "infra-only trunk, features-via-skills" philosophy. Also avoids carrying code for a feature that simply doesn't exist in non-Claude providers.

Next step: none until someone needs it. When implementing, likely lives on the providers branch (since it's provider-specific) or its own branch, installed via skill that copies files + checks provider.

15. Dead config constants

Finding: verified — POLL_INTERVAL (line 13), SCHEDULER_POLL_INTERVAL (line 14), and IPC_POLL_INTERVAL (line 32) in src/config.ts have zero imports elsewhere in v2. Container's POLL_INTERVAL_MS in poll-loop.ts is a distinct local constant, unrelated.

Status: decided — delete

Decision: remove the three constants from src/config.ts. Trivial 3-line deletion.

Next step: do as part of any sweep-touching PR, or standalone.

16. Configurable retention thresholds

Finding: STALE_THRESHOLD_MS (10 min) and MAX_TRIES (5) in host-sweep.ts are hardcoded. Item 9's redesign replaces STALE_THRESHOLD_MS with new constants (60s claim-age, 30 min ceiling).

Status: dropped — keep as constants

Decision: leave the new item-9 thresholds + MAX_TRIES as source constants. Adding config surface for them isn't worth it at personal-assistant scale. If operational tuning ever becomes a real need, revisit — they're small centralized constants, one-line change each.

Next step: none.

17. Dynamic group-add (IPC watcher equivalent)

Finding: not actually a restart requirement — investigation showed:

Router reads messaging_groups + messaging_group_agents fresh per inbound (dynamic by design)
Chat SDK bridge has a conversations: Map<platformId, ConversationConfig> populated at setup + updateConversations() method
Nothing in the bridge currently reads the map, and no code calls updateConversations() after startup
Today: stale map has no observable effect (dead state)
After item 1 ships (adapter-level gating): stale map would matter; new wirings wouldn't apply in the adapter gate until restart

Status: deferred — comment added now, implement alongside dynamic group registration feature

Decision: don't refactor the adapter interface now. Added a NOTE comment at src/channels/chat-sdk-bridge.ts:73 flagging the staleness issue so the next person touching the bridge or adding dynamic-registration sees it. When dynamic group registration is implemented (admin adds a new messaging_group_agents row while host is running), handle cache refresh then — most likely by calling adapter.updateConversations(freshConfigs) after the mutation, keyed off the adapter's channelType.

Rationale: item 1's initial landing can keep the adapter gating responsibilities small or skip adapter-side gating entirely. Refactoring ConversationConfig now would add scope; better to ship item 1 first, see if over-subscription bites, address if it does.

Next step: when building the admin-skill path for adding messaging_group ↔ agent_group wirings, include a refreshAdapterConversations(channelType) call after the INSERT. ~10 LOC when needed.

Test regressions (v1 `formatting.test.ts` assertions)

18+19+20+21. Timezone + formatting recreation (merged)

Finding: v1 had a full timezone-aware formatting pipeline. v2 lost most of it, producing real bugs where the agent misinterprets user intent (scheduling for wrong times, suggesting time-inappropriate things).

Scope — recreate v1 behavior faithfully wherever times touch the agent:

Timestamp formatting on inbound messages: formatLocalTime(utcIso, TIMEZONE) producing "Jan 1, 2024, 1:30 PM" format via Intl.DateTimeFormat('en-US', {...}) (v1 timezone.ts)
<context timezone="<IANA_NAME>" /> header prepended to message block (v1 router.ts:20-22)
Reply-to with message ID: <message ... reply_to="<id>">...<quoted_message from="...">...</quoted_message></message> (v1 router.ts:10-18)
stripInternalTags(): regex /<internal>[\s\S]*?<\/internal>/g applied to outbound text, then .trim() (v1 router.ts:25-27)
Cron expressions parsed with explicit user TZ: CronExpressionParser.parse(expr, { tz: TIMEZONE }) (v1 task-scheduler.ts:20-49)
User-specified times normalized via the user's TZ: in v1 this was the host-side task scheduler; in v2 it's the new-in-v2 scheduling MCP tool (mcp-tools/scheduling.ts). Same principle — accept user-local times, normalize to UTC for storage, interpret cron in user's TZ.

Status: decided — recreate with tests

Decision: port v1's formatter + timezone behavior faithfully. Full recreation spec at timezone-formatting-v1-recreation.md — includes exact v1 code, line numbers at commit 27c5220, complete test inventory from src/v1/formatting.test.ts and src/v1/task-scheduler.test.ts.

Core principle (per Gavriel): the agent operates in the user's timezone. Every timestamp the agent sees is user-local. Every time the agent outputs is interpreted as user-local. This is load-bearing for correctness, not a nice-to-have.

Porting plan (from recreation spec):

container/agent-runner/src/formatter.ts — replace formatTime with formatLocalTime(ts, TIMEZONE) call; add reply_to attribute + <quoted_message> element exactly as v1
Prepend <context timezone="<IANA>" />\n to the messages block at formatter entry
Extract stripInternalTags as a named function; apply in outbound dispatch path (poll-loop.ts:389 currently uses inline regex)
container/agent-runner/src/mcp-tools/scheduling.ts — clarify processAfter description, normalize to UTC ISO in handler
src/modules/scheduling/recurrence.ts — pass { tz: TIMEZONE } to CronExpressionParser.parse() explicitly
Port all test cases from v1's formatting.test.ts and task-scheduler.test.ts to v2's test tree

LOC estimate: ~75 prod + ~120 tests (reproducing v1's 40+ test cases)

Next step: implement as a focused PR. Order: (a) formatter changes + tests, (b) context header + tests, (c) reply_to + tests, (d) stripInternalTags extraction + tests, (e) scheduling tool + cron TZ + tests.

19, 20, 21 — merged into 18 above

See item 18 for the full recreation plan and spec reference.

Notes

src/v1/ was deleted upstream (commit 86becf8) after this analysis was written. v2 tree has since had a major module extraction (approvals, interactive, scheduling, permissions, agent-to-agent, self-mod) and a new CLI channel. Verify each item against the current tree before deciding — some may already be addressed.

41 KiB Raw Blame History Unescape Escape

v1 → v2 Action Items

Rollup

To implement (~800 LOC total, roughly)

Deferred (wait for trigger)

Dropped (won't implement / not-a-regression)

Extras recorded during discussion

Follow-up PRs (scoped, not in this branch)

HIGH

1. Trigger-rule matching in pickAgent

1a. Implementation plan for engage/sender/ignored columns

2. nonMainReadOnly mount isolation

3. Explicit pending-message recovery on startup

22. Unknown-channel wiring approval flow

3a. End-to-end recovery test

MEDIUM

4. response_scope enforcement

5. request_approval flow for unknown senders

6. Per-group container timeout

6a. Remove IDLE_END_MS (container-side query idle termination)

7. Container streaming output (marker-based pre-delivery)

8. Per-exit container log files

9. Stuck detection + heartbeat-based container lifecycle

10. Host-level retry with backoff on agent error

11. Process ID in logger output

LOW

11. Process ID in logger output

12. Task dedup via unique (kind, series_id) index

13. Silent-drop mode for noisy senders

14. Remote control subsystem

15. Dead config constants

16. Configurable retention thresholds

17. Dynamic group-add (IPC watcher equivalent)

Test regressions (v1 formatting.test.ts assertions)

18+19+20+21. Timezone + formatting recreation (merged)

19, 20, 21 — merged into 18 above

Notes

41 KiB

Raw Blame History

1. Trigger-rule matching in `pickAgent`

2. `nonMainReadOnly` mount isolation

4. `response_scope` enforcement

5. `request_approval` flow for unknown senders

12. Task dedup via unique `(kind, series_id)` index

Test regressions (v1 `formatting.test.ts` assertions)