Commit Graph

336 Commits

Author SHA1 Message Date
gavrielc
c8fc1da719 refactor(claude-md): compose per-group CLAUDE.md from shared base + fragments
Replace the per-group "written once at init, owned by the group" CLAUDE.md
with a host-regenerated entry point that imports:

  - a shared base (`container/CLAUDE.md` mounted RO at `/app/CLAUDE.md`)
  - optional per-skill fragments (skills that ship `instructions.md`)
  - optional per-MCP-server fragments (inline `instructions` field in
    `container.json`)
  - per-group agent memory (`CLAUDE.local.md`, auto-loaded by Claude Code)

Principle: RW = per-group memory, RO = shared content. Source/skills/base
are shared; personality, config, working files, and Claude state stay
per-group.

Key changes:

  - New `src/claude-md-compose.ts` — per-spawn composition +
    `migrateGroupsToClaudeLocal()` one-time cutover.
  - New `container/CLAUDE.md` — shared base, seeded verbatim from the
    former `groups/global/CLAUDE.md`.
  - `src/container-runner.ts` — swap `/workspace/global` mount for RO
    `/app/CLAUDE.md`; call `composeGroupClaudeMd()` after
    `initGroupFilesystem()`.
  - `src/group-init.ts` — drop `.claude-global.md` symlink + initial
    `CLAUDE.md` write; seed `CLAUDE.local.md` from `opts.instructions`.
  - `src/index.ts` — call `migrateGroupsToClaudeLocal()` at startup.
  - `src/container-config.ts` — add optional `instructions` field to
    `McpServerConfig` (inline per-MCP guidance fragment).
  - `container/Dockerfile` — drop dead `/workspace/global` mkdir.
  - Remove obsolete `scripts/migrate-group-claude-md.ts`.

Migration (runs once at host startup, idempotent):

  - Delete `.claude-global.md` symlinks in each group.
  - Rename each `groups/<folder>/CLAUDE.md` → `CLAUDE.local.md`
    (preserves existing per-group content as memory).
  - Delete `groups/global/` directory.

Design docs: `docs/claude-md-composition.md` and `docs/shared-source.md`
(the latter is the sibling design discussion this refactor builds on).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:58:43 +03:00
exe.dev user
8a12fa61ac refactor: shared source — replace per-group agent-runner copies with single RO mount
Replace the per-group agent-runner-src copy model with a single shared
read-only mount. Source and skills are now RO + shared; personality,
config, working files, and Claude state stay RW + per-group.

Key changes:
- Mount container/agent-runner/src/ RO at /app/src (all groups share one copy)
- Mount container/skills/ RO at /app/skills; per-group skill selection via
  symlinks in .claude-shared/skills/ based on container.json "skills" field
- Mount container.json as nested RO bind on top of RW group dir
- Move all NANOCLAW_* env vars to container.json (runner reads at startup)
- New runner config.ts module replaces process.env reads
- Move command gate (filtered/admin) from container to host router
- Dockerfile: remove source COPY, split CLI installs (claude-code last),
  move agent-runner deps above CLIs for better layer caching
- Add writeOutboundDirect for router denial responses
- Design doc at docs/shared-src.md

Not included (follow-up): DB migration to drop agent_provider columns,
cleanup of orphaned agent-runner-src directories.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 12:58:43 +03:00
gavrielc
1858ef35f0 Merge pull request #1908 from qwibitai/setup-auto
feat(setup): scripted branded setup flow (nanoclaw.sh)
2026-04-22 03:06:30 +03:00
gavrielc
416fe01855 refactor(setup): drop CLI-bonus wiring from init-first-agent
init-first-agent used to double-wire the CLI channel to every new DM
agent as a convenience for `pnpm run chat`, gated by --no-cli-bonus.
With the /new-setup-2 flow gone and a dedicated scratch CLI agent
created earlier in setup:auto, that bonus just stomps on CLI routing
the user already set up. Remove the CLI_CHANNEL/CLI_PLATFORM_ID
constants, ensureCliMessagingGroup, the --no-cli-bonus flag, and the
cli-bonus wiring block.

Pass the paired user's identity through to the welcome delivery so
the sender resolver sees the real owner (e.g. telegram:<id>) instead
of cli:local. Extend the CLI channel's admin-transport payload to
accept optional sender/senderId overrides — falls back to the old
cli/cli:local defaults when omitted, so existing callers are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 02:13:22 +03:00
Dave Kim
91c668e0cc fix: persist SDK session_id on init + split long messages before adapter truncation
Two related bugs that surfaced together when a Discord response exceeded
2000 chars:

1. **Session id lost on mid-turn container exit.** `runPollLoop` was
   calling `setStoredSessionId` only after `processQuery` returned. If
   the container died between the SDK's `init` event (where session_id
   arrives) and the stream completing, the id was never persisted. The
   next wake called `getStoredSessionId()` → undefined and started a
   fresh Claude session, dropping all prior context. Fix: persist
   immediately in the `init` branch inside `processQuery`. The existing
   post-query store becomes a harmless no-op.

2. **Silent truncation past adapter limits.** `chat-sdk-bridge.deliver`
   handed full text straight to `adapter.postMessage`. Discord's adapter
   hard-truncates at 2000 chars; Telegram's at 4096. Responses longer
   than that were cut off without any signal to the user or host. Fix:
   add `maxTextLength` to `ChatSdkBridgeConfig` and a `splitForLimit`
   helper that breaks on paragraph → line → hard-char boundaries, then
   posts chunks sequentially. Files ride on the first chunk; the
   returned id is the first chunk's so edits and reactions still target
   the reply head.

Channel adapter files (Discord, Telegram, …) live on the `channels`
branch — a companion PR wires `maxTextLength: 1900` for Discord and
`4000` for Telegram so the splitter actually engages in those installs.
Without wiring, behavior is unchanged.
2026-04-21 13:04:57 +00:00
gavrielc
01ffce6f74 Revert "fix(permissions): welcome new approved channels via /welcome, route to them"
This reverts commit 9776dd4f32.
2026-04-21 15:20:06 +03:00
Koshkoshinsk
9776dd4f32 fix(permissions): welcome new approved channels via /welcome, route to them
When the unknown-channel approval flow completes, seed a /welcome task
into the newly-wired session so the agent greets the new user on first
contact. The replayed /start (Telegram's default first-message) is filtered
by the agent-runner's command-command filter, so without an explicit
onboarding trigger the first interaction produced nothing.

Pin the destination by its local_name from agent_destinations to avoid the
agent picking the wrong named destination (previously it greeted the owner,
whose DM is in CLAUDE.md).

Also guard dispatchResultText against echoing trailing status lines when
the agent has already sent messages explicitly via send_message. Otherwise
a task-triggered flow that calls send_message then emits "welcome message
sent" produces a duplicate chat to the recipient.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 11:40:12 +00:00
gavrielc
d8d61d3695 fix: Teams user-id prefix + defer cli:local owner grant
parseUserId now falls back to user.kind when the id prefix isn't a
registered adapter — Teams uses `29:` rather than `teams:`, so the
literal prefix wouldn't resolve the channel adapter for cold DMs.

init-cli-agent no longer claims the first-owner slot on `cli:local`.
The CLI identity is scratch; owner promotion belongs to
init-first-agent once the real channel user is wired.
2026-04-21 10:16:13 +03:00
gavrielc
0f6a1ba1ed style: apply prettier formatting to touched files
Pre-commit hook reflowed imports on files changed in the previous commit.
Unrelated format drift on other files intentionally left unstaged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 23:31:42 +03:00
gavrielc
6c26c0413a feat(router,cli): replyTo override + CLI admin-transport flows
- InboundEvent gains an optional replyTo; router stamps the row's address
  fields from it when set, so replies can route to a different channel than
  the one the inbound came in on.
- ChannelSetup adds onInboundEvent for admin-transport adapters that build
  the full event themselves.
- CLI wire format accepts {text, to, reply_to}. Routed messages go through
  onInboundEvent and do not evict an active chat client.
- init-first-agent hands the DM welcome to the running service via
  data/cli.sock — synchronous wake, no sweep wait. Fails loudly if the
  service is down; no silent fallback.
- Split the CLI scratch-agent bootstrap into scripts/init-cli-agent.ts;
  init-first-agent is DM-only.

Agents cannot set replyTo: it lives only on the inbound/router seam and is
consumed once when writing messages_in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 23:30:47 +03:00
gavrielc
719f97e483 feat(permissions): unknown-channel registration flow with owner approval
When the router sees a mention or DM on a messaging group that isn't wired
to any agent, it now escalates to an owner for approval instead of silently
dropping. Mirrors the existing unknown-sender approval pattern (ACTION-ITEMS
item 22).

Schema (migration 012):
- `messaging_groups.denied_at TEXT NULL` — timestamp set on deny so future
  mentions stop escalating. ALTER TABLE ADD COLUMN, FK-safe (unlike the
  rebuild that bit migration 011).
- `pending_channel_approvals` — PK on `messaging_group_id` gives free
  in-flight dedup. One card per channel, no spam on rapid retries.

Router:
- New hook `setChannelRequestGate(mg, event) => Promise<void>`, invoked
  from the no-wirings branch when the message was addressed to the bot
  (isMention=true). Hook is fire-and-forget.
- Checks `mg.denied_at` before escalating — denied channels drop silently
  and do not re-prompt.
- The two "no-wirings" branches (fresh auto-create and existing mg with
  no agents) are consolidated into one escalation path that calls the
  gate once. Without the module, behavior is log + record (no regression).

Permissions module:
- `channel-approval.ts::requestChannelApproval` — MVP picker: target
  agent is `getAllAgentGroups()[0]`, card names it explicitly ("Wire it
  to <Andy>?"). Approver via existing `pickApprover` + `pickApprovalDelivery`
  primitives.
- Response handler: same click-auth pattern as sender-approval (clicker
  must be the designated approver OR have admin privilege over the
  target agent group).
- Approve defaults per the feature spec:
    engage_mode = 'mention-sticky' for groups, 'pattern' + '.' for DMs
    sender_scope = 'known'
    ignored_message_policy = 'accumulate'
    session_mode = 'shared'
  DM vs group inferred from the original event's threadId (non-null →
  group) because the auto-created mg has a placeholder is_group=0 until
  the adapter fills it in.
- Triggering sender is auto-added to agent_group_members so sender_scope=
  'known' doesn't bounce the replayed message into a sender-approval
  cascade.
- Deny: stamps messaging_groups.denied_at, clears pending row.
- Failure modes — no owner, no agent groups, no reachable DM — log and
  drop without creating a pending row, letting a future attempt try
  again (same as sender-approval).

9 new integration tests cover every branch: mention triggers card, DM
triggers card, dedup, approve creates correct wiring + admits sender +
replays, approve-on-DM uses pattern/'.' defaults, deny sets denied_at
and future mentions drop silently, unauthorized clicker rejected,
no-owner drops, no-agent-groups drops.

168 tests pass (was 159; +9).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:34:00 +03:00
gavrielc
a4061a0012 refactor(channels,router): move all policy to router; bridge is transport
Follow-up to b159722. That shrank the bridge's shouldEngage to a flood
gate + coarse sticky-subscribe signal. This completes the move —
policy lives exclusively in the router, the bridge is transport-only,
and the conversations map + ChannelSetup.conversations +
ChannelAdapter.updateConversations are all gone.

Key shifts:

1. Subscribe moves from bridge to router.

   Bridge used to call `thread.subscribe()` from its onNewMention /
   onDirectMessage handlers based on a coarse "any mention-sticky wiring
   exists on this channel" check. That forced the decision before the
   router could apply per-wiring engage logic, and it relied on the
   conversations map being current (staleness risk).

   ChannelAdapter gains `subscribe?(platformId, threadId)`. The Chat
   SDK bridge implements it via SqliteStateAdapter.subscribe(threadId)
   (idempotent — a repeat call on an already-subscribed thread is a
   no-op). The router's fan-out loop calls it once per message when
   the first mention-sticky wiring actually engages. Precise, not
   coarse.

2. Short-circuit the drop path with one combined query.

   New `getMessagingGroupWithAgentCount(channelType, platformId)` does
   the messaging_groups lookup AND counts wirings in a single SELECT,
   using the existing UNIQUE(channel_type, platform_id) index on
   messaging_groups and UNIQUE(messaging_group_id, agent_group_id) on
   messaging_group_agents for the JOIN. No new indexes needed.

   routeInbound now short-circuits:
     - No messaging_groups row AND not addressed (no mention/DM)
       → return silently. One DB read, nothing written. This is the
       Discord-bot-in-a-big-guild case; we no longer auto-create rows
       for every plain message in every channel the bot can see.
     - Messaging group exists but no wirings AND not addressed
       → return silently. One DB read.
     - Otherwise fall through to sender resolution + fan-out as before.

   Behavioral change: plain chatter on unwired channels no longer gets
   dropped_messages audit rows, which used to bloat the table. Audit
   still fires on addressed-to-bot drops where the admin cares
   ("someone @-mentioned us but nobody's wired").

3. Bridge is now purely transport.

   Deleted entirely: ConversationConfig, ChannelSetup.conversations,
   ChannelAdapter.updateConversations?, bridge's `conversations` map,
   buildConversationMap, shouldEngage, EngageSource, engageDecision,
   bridge.updateConversations method, src/index.ts
   buildConversationConfigs. Four handlers reduce to "resolve channel
   id, build InboundMessage with isMention, call onInbound". Net
   ~130 LOC deleted from the bridge.

   Collateral: the conversations-map staleness problem is gone. The
   upcoming channel-registration feature doesn't need any map-refresh
   plumbing — when an approval creates a new wiring, the next message
   hits the DB fresh and just works.

Bridge tests prune to the narrow platform-adjacent surface (openDM
delegation, subscribe presence). Host-core test that asserted the
old "auto-create on every unknown message" behavior updates to
reflect the new escalation-gated semantics: plain messages on
unknown channels don't auto-create, mentions do.

159 tests pass (was 172 — net -13, almost entirely from
bridge-engage-mode tests that covered logic now owned by the router
and exercised through host-core.test.ts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:55:49 +03:00
gavrielc
b15972284b refactor(channels): shrink bridge shouldEngage to flood gate + subscribe signal
Before this change the bridge and the router both owned engage_mode
policy. Bridge's shouldEngage had a full switch over mention /
mention-sticky / pattern + source-based rules + engage_pattern regex
test + ignored_message_policy accumulate fallback. Router's
evaluateEngage had the same switch against the same fields. Two
parallel logic paths with subtle vocabulary differences (bridge: "which
SDK handler fired"; router: "what isMention says"). Every time we
touched one we had to reason about the other — the Telegram
hasMention bug and the "pattern mode silently drops in group chats"
bug were both drift between the two.

Refactor to one place. Router keeps all per-wiring policy — engage
mode, pattern regex, sender scope, ignored-message policy — unchanged.
Bridge drops to a coarse flood gate + subscribe signal:

  - forward: does this channel have ANY wiring? Forward if yes.
    Unknown channels still forward for subscribed/mention/dm (they may
    be newly auto-created, or will trigger the coming
    channel-registration flow). Unknown channels DROP for new-message
    so we don't flood from every unsubscribed thread the bot happens
    to sit in.

  - stickySubscribe: any mention-sticky wiring on the channel AND the
    source is mention or dm. Coarse union — subscribe is idempotent
    and one call serves every sticky wiring.

The `text` param on shouldEngage is gone (pattern regex lives in the
router now). Four bridge handler sites simplify accordingly. messageToInbound
still carries the SDK-confirmed isMention flag through to the router
unchanged.

Behavioral delta: pure-mention-wired channels (no pattern, no
accumulate) will now see every plain group message reach the router
before being dropped there, where before the bridge dropped at the
transport boundary. Extra DB lookup per dropped message in this
specific case; acceptable for the cleaner seam and can be optimized
back at the bridge if it ever matters in practice.

Bridge tests prune the 10 engage_mode-specific cases that covered
logic now owned by evaluateEngage in the router (host-core.test.ts
covers it end-to-end through routeInbound). Bridge tests keep only
what's bridge-specific: the flood gate and the stickySubscribe
coarse union.

172 tests pass (was 182 — net -10 redundant bridge tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:32:08 +03:00
gavrielc
68058cbc4a fix(permissions): authorize unknown-sender approval clicks
The approval click handler trusted row.approver_user_id as the actor
regardless of who actually clicked the card. A random user who received
the forwarded card could click Approve and get the stranger admitted
to the agent group — their click was simply not checked.

Separately, payload.userId arrives as the raw platform userId from
Chat SDK onAction (e.g. "6037840640"), not the namespaced form
("telegram:6037840640") that matches users(id). Without namespacing,
users-table lookups miss.

Namespace the clicker id with payload.channelType, then authorize: the
clicker must be either the designated approver OR have
owner / admin privilege over the agent group (hasAdminPrivilege covers
owner, global admin, scoped admin). Unauthorized clicks return true
(claim the response so the registry doesn't log it as unclaimed) but
take no action — the pending row stays in place so a legitimate
approver can still act on it.

Existing tests passed a pre-namespaced userId directly, masking the
first bug. Fixed the fixtures to match production plumbing and added
two tests: one asserts a random bystander's click is rejected (row
stays pending, no member added), the other asserts a global admin can
approve even when they weren't the designated approver.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:16:35 +03:00
gavrielc
f74df3b0d3 fix(router): trust SDK isMention signal; drop broken hasMention regex
The router's mention / mention-sticky engage check was regex-matching
@<agent_group.name> (e.g. @Andy) against message text. Platforms don't
work that way — users address bots via the bot's platform username
(@nanoclaw_v2_refactr_1_bot on Telegram, user-id mentions on Slack /
Discord). The regex matched only coincidentally and never on Telegram,
so mention-mode wirings silently never fired there.

Two parallel mention detectors existed: the Chat SDK's onNewMention,
which correctly resolves the bot's platform identity, and the router's
hasMention text regex, which ignored the SDK verdict and invented its
own heuristic. The router's detector was wrong in principle — the agent
group's display name is a NanoClaw-side nickname, not a platform
address.

Thread the SDK signal through: InboundMessage gains an optional
`isMention` field, the bridge sets it from each handler (onNewMention →
true, onDirectMessage → true, onSubscribedMessage → message.isMention,
onNewMessage(/./) → false), src/index.ts forwards it into InboundEvent,
and evaluateEngage now checks `isMention === true` for mention modes.
hasMention deleted entirely — there is only one source of truth for
"did the user mention this bot": the platform / SDK.

Agent-name-in-text matching for disambiguating multiple agents wired to
one chat is a separate feature; users can express it today with
engage_mode='pattern' + the agent's name as the regex.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:16:20 +03:00
gavrielc
0105de0257 fix(host-sweep): skip ceiling check when heartbeat file is absent
decideStuckAction treated a missing heartbeat file as heartbeatAge =
Infinity, which always exceeded the 30-minute ceiling. Result: every
freshly-spawned container got killed within seconds of spawn on the
first sweep pass because it hadn't produced an SDK event yet (heartbeat
is only touched on SDK events inside processQuery, not on boot).

Skip the ceiling branch when heartbeatMtimeMs === 0. Containers that
genuinely never wrote a heartbeat because they died are caught by the
separate "container process not running" cleanup path. Containers that
boot, claim a message, but hang at the gate are caught by the
claim-stuck check below — which correctly fires regardless of heartbeat
presence once claimAge exceeds tolerance.

Updates the "absent heartbeat → kill-ceiling" test (which was encoding
the bug) and adds a companion that the claim-stuck path still fires for
absent-heartbeat containers with aged claims.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:15:52 +03:00
gavrielc
c38e5b11a8 fix(channels): wire accumulate mode through the bridge
The router + session DB were already fully plumbed for
ignored_message_policy='accumulate' — fan-out in routeInbound calls
deliverToAgent(wake=false) for non-engaging agents on accumulate wirings,
writeSessionMessage writes trigger=0, countDueMessages filters trigger=1,
container formatter includes all messages regardless of trigger. But the
Chat SDK bridge dropped non-engaging messages before the router ever saw
them, so accumulate was dead on arrival for every adapter that goes
through the bridge.

Expose ignored_message_policy on ConversationConfig, project it in
buildConversationConfigs, and widen shouldEngage's "forward" decision to
"engage OR accumulate" with the union taken across all wirings on a
conversation. stickySubscribe stays gated on a real engage — subscribing
a thread we'd only silently accumulate on would misrepresent the bot's
presence.

shouldEngage return shape is now { forward, stickySubscribe } — engage
was an internal concept the caller never needed, and conflating it with
forward was the source of this bug.

7 new tests cover: non-engaging messages forwarding under accumulate,
mixed drop/accumulate wirings taking the union, accumulate not
triggering sticky subscribe, unknown-conversation drop precedence over
accumulate, and drop policy preserving existing behavior on engaging
messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 11:18:43 +03:00
gavrielc
ce25e1e97c style(channels): prettier line-wrap in chat-sdk-bridge.test.ts
Post-commit reformat picked up by format:fix hook on the previous commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 11:12:40 +03:00
gavrielc
52c6223292 fix(channels): register onNewMessage(/./) to fix pattern mode in group chats
Chat SDK dispatch (per handling-events.mdx) is exclusive and prioritized:
subscribed → onSubscribedMessage; unsubscribed + mention → onNewMention;
unsubscribed + pattern match → onNewMessage. We never registered the third,
so engage_mode='pattern' silently dropped every message in unsubscribed
group threads — the SDK simply never surfaced them anywhere.

Register chat.onNewMessage(/./, …) and route it through shouldEngage with
a new 'new-message' source. Unknown-conversation policy drops for this
source (would otherwise flood from every unwired channel the bot can see).
mention / mention-sticky wirings ignore 'new-message' — they require an
explicit @mention to start a conversation. Pattern wirings evaluate
normally.

Extracted shouldEngage from a closure to an exported function with an
EngageSource type so it's unit-testable. Added 17 tests covering every
source × engage-mode combination, unknown-conversation behavior, invalid
regex fail-open, and multi-wiring union.

Accumulate (ignored_message_policy='accumulate') is still not plumbed —
the bridge drops non-engaging messages entirely instead of forwarding
them as context-only. That requires a trigger: 0 | 1 field on
InboundMessage → router → writeSessionMessage (schema already has the
column). Separate change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 11:11:56 +03:00
gavrielc
57e0cda9e5 Revert "fix(channels): pre-subscribe group threads for pattern / accumulate wirings"
This reverts commit 73b20880ff.
2026-04-20 10:35:33 +03:00
gavrielc
73b20880ff fix(channels): pre-subscribe group threads for pattern / accumulate wirings
The engage modes shipped in #1869 included `pattern` (regex match any
message) and the `accumulate` ignored-message policy, but neither could
fire in group chats because Chat SDK only surfaces:

  - DMs (onDirectMessage)
  - @mentions in unsubscribed threads (onNewMention)
  - every message in subscribed threads (onSubscribedMessage)

A bot sitting in a Discord/Slack channel hears *nothing* from a plain
message unless the thread is already subscribed. So `pattern '.'` on a
group wiring → silent. `pattern /urgent/i` → silent. `mention +
accumulate` → the non-mention messages that should be stored as context
were never received, so nothing to accumulate.

Fix: call `chat.subscribe(platformId)` at setup time for every wiring
whose `engageMode === 'pattern'` or `ignoredMessagePolicy === 'accumulate'`.
Failures logged + swallowed per-conversation so one un-subscribable
channel doesn't crash startup.

## Knock-on: SDK stops firing onNewMention once subscribed

Per SDK types:1468, `onNewMention` only fires in unsubscribed threads.
Once we pre-subscribe a channel for a pattern wiring, subsequent mentions
arrive as `onSubscribedMessage` with `message.isMention === true`.

Before: a `mention` wiring coexisting with a `pattern` wiring in the
same channel would silently stop firing after pre-subscribe.

After: `shouldEngage` accepts the `isMention` flag independently from
`source`, so the `mention` mode matches on (dm OR mention-new OR
subscribed-with-isMention). Source shape changed
`'subscribed' | 'mention' | 'dm'` → `'subscribed' | 'mention-new' | 'dm'`
to make the "unsubscribed-mention event" distinction explicit.

## New fields

- `ConversationConfig.ignoredMessagePolicy` — projected from the
  messaging_group_agents row so the bridge knows which wirings need
  pre-subscription. buildConversationConfigs in src/index.ts populates
  it.

Tests: host 153/153, container 46/46. No new tests yet — the subscribe
call path needs a Chat mock, deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 10:34:15 +03:00
gavrielc
fca3d8de70 fix(migrations): drop 011 table-rebuild; keep only pending_sender_approvals
The original 011 also rebuilt `messaging_groups` to flip the
`unknown_sender_policy` column DEFAULT from "strict" to "request_approval".
On live DBs the DROP TABLE step fails SQLite's foreign-key integrity
check because `sessions`, `user_dms`, and `pending_sender_approvals` all
reference `messaging_groups(id)`. `PRAGMA foreign_keys=OFF` /
`defer_foreign_keys` can't be toggled inside the implicit migration
transaction, so the rebuild can't be made to apply cleanly.

The default-flip was cosmetic anyway: every `createMessagingGroup`
callsite passes `unknown_sender_policy` explicitly. Router auto-create
was already updated to hardcode "request_approval" (router.ts:151), and
setup / seed scripts pick per context.

Changes:
- Migration 011 now only creates the `pending_sender_approvals` table +
  index. The rebuild block is gone.
- Reference `SCHEMA` in src/db/schema.ts updated to reflect what the
  DB actually has: DEFAULT 'strict' (from migration 001), with a note
  about the effective policy applied at insert sites.

Discovered on v2 post-merge during live restart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 10:08:35 +03:00
gavrielc
9882c94530 fix(channels): use Chat SDK ChatMessage.text, not .content
The engage-mode gating added in #1869 read `message.content` from the
Chat SDK's ChatMessage in all three inbound handlers (onSubscribedMessage,
onNewMention, onDirectMessage). ChatMessage exposes the user-visible
string as `.text` — `.content` exists on the underlying nested structure
but isn't the plain-text field. Result: `shouldEngage` always saw an
empty string, pattern gating never matched, non-wildcard regex wirings
silently dropped every inbound.

Fix: use `message.text` in all three gates.

Discovered during live smoke-test on v2 post-merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 10:08:35 +03:00
gavrielc
622a370815 feat(permissions): unknown-sender request_approval flow + flipped default policy
When an unknown sender writes into a wired messaging group, surface the
situation to an admin instead of silently dropping. Flow:

  1. Router → access gate → handleUnknownSender (policy='request_approval')
  2. Fire-and-forget requestSenderApproval: pickApprover + pickApprovalDelivery
     pick a reachable admin DM; deliver an Approve / Deny card; insert a
     pending_sender_approvals row carrying the original InboundEvent JSON.
  3. In-flight dedup: UNIQUE(messaging_group_id, sender_identity) — a retry
     from the same stranger while pending is silently dropped, not re-carded.
  4. Admin clicks → Chat SDK bridge → onAction → host response-registry.
     The new handleSenderApprovalResponse in the permissions module claims
     responses whose questionId matches a pending_sender_approvals row.
  5. approve: addMember(stranger, agent_group) + replay the stored event via
     routeInbound — the second attempt clears the gate because the user is
     now known.
  6. deny: delete the pending row. No denial persistence (ACTION-ITEMS item 5
     decision) — a future attempt triggers a fresh card.

Schema:
- Migration 011 adds pending_sender_approvals (id, mg_id, agent_group_id,
  sender_identity, sender_name, original_message JSON, approver_user_id,
  created_at, UNIQUE(mg_id, sender_identity)).
- Also flips messaging_groups.unknown_sender_policy default from 'strict'
  to 'request_approval' (rebuild-table). Existing rows unchanged — only
  the default applied to new rows flips.
- Router auto-create for unknown platform/chat drops the hardcoded
  'strict' override; schema default applies.
- src/db/schema.ts reference updated to match.

Why default-flip: users wire their DM during setup and don't discover that
'strict' means "silent drop of everyone not in user_roles/members". The
approval flow is the safe default — the admin sees the stranger, explicitly
decides. 'public' stays opt-in for truly open channels.

Failure modes (row NOT created so a future attempt can try again):
- No eligible approver configured (fresh install before first owner).
- No reachable DM for any approver.
- Delivery adapter missing.

Tests (src/modules/permissions/sender-approval.test.ts, 4 cases):
- First unknown message → card delivered + row created
- Retry while pending → dedup'd (1 card, 1 row)
- Approve → member added + message replayed + container woken
- Deny → row cleared + no member added

Closes: ACTION-ITEMS item 5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:36:11 +03:00
gavrielc
16b9499532 feat(routing): engage modes + sender scope + accumulate/drop + per-agent fan-out
Replaces the opaque trigger_rules JSON + response_scope enum on
messaging_group_agents with four explicit orthogonal columns:

    engage_mode            'pattern' | 'mention' | 'mention-sticky'
    engage_pattern         regex source; required when mode='pattern';
                           '.' is the "always" sentinel
    sender_scope           'all' | 'known'
    ignored_message_policy 'drop' | 'accumulate'

Inbound routing becomes a fan-out — every wired agent is evaluated
independently. A match gets its own session + container wake. A miss
with accumulate keeps the message as context-only (trigger=0) in that
agent's session, so when the agent does eventually engage it sees the
prior chatter.

## Schema

- Migration 010 (`engage-modes`): adds the 4 new columns, backfills
  from trigger_rules.pattern + requiresTrigger + response_scope, drops
  the legacy columns.
- messages_in gains `trigger INTEGER NOT NULL DEFAULT 1` (session DB
  schema + `migrateMessagesInTable` forward-compat).
- countDueMessages gates waking on `trigger = 1`.

## Routing

- `pickAgent` (returns one) → loop over all wired agents. Per agent:
  evaluate engage_mode; run access gate + sender-scope gate; on full
  match → resolveSession + writeSessionMessage(trigger=1) + wake. On
  miss with accumulate → writeSessionMessage(trigger=0), no wake. On
  miss with drop → skip.
- New `findSessionForAgent(agentGroupId, mgId, threadId)` scopes
  session lookup by agent so fan-out doesn't cross sessions.
- `messageIdForAgent` namespaces inbound message ids by agent_group_id
  so PRIMARY KEY doesn't collide across per-agent session DBs.

## Adapter layer

- `ConversationConfig` replaces `triggerPattern` + `requiresTrigger`
  with `engageMode` + `engagePattern`.
- Chat SDK bridge stores `Map<platformId, ConversationConfig[]>` (multi-
  agent per conversation) and applies union gating pre-onInbound:
    * onSubscribedMessage: engage if any wiring keeps firing in
      subscribed state (mention-sticky or pattern)
    * onNewMention: engage on mention; only subscribes the thread if
      at least one wiring is `mention-sticky`
    * onDirectMessage: engage per mode; sticky follows same rule
- Bridge no longer unconditionally calls `thread.subscribe()`.

## Sender scope

- Permissions module registers a second hook `setSenderScopeGate` that
  runs per-wiring after the existing access gate. `sender_scope='known'`
  requires canAccessAgentGroup(); `'all'` is a no-op. Not installed →
  no-op everywhere (default allow).

## Container side

- Host passes `NANOCLAW_MAX_MESSAGES_PER_PROMPT` (reuses existing
  MAX_MESSAGES_PER_PROMPT config; was dead code from v1).
- `getPendingMessages` queries `ORDER BY seq DESC LIMIT N`, reverses to
  chronological order for the prompt — accumulated context rides along
  with trigger rows up to the cap.
- `MessageInRow` gains `trigger: number` so the container can tell them
  apart in downstream code (container still processes both; only the
  host uses `trigger=0` for don't-wake).

## Defaults (per ACTION-ITEMS item 1 decision)

- DM (is_group=0): `engage_mode='pattern'`, `engage_pattern='.'` (always)
- Threaded group: `engage_mode='mention-sticky'` (seed-discord)
- Non-threaded group / CLI: pattern '.' in bootstrap scripts

## Tests

- src/host-core.test.ts: 3 new cases — fan-out (2 agents, 2 sessions,
  2 wakes), accumulate (trigger=0 + no wake), drop (no session created).
- Existing 10 host-core tests still pass.
- Migration 010 runs on an empty DB in 0-row path — verified.

Closes: ACTION-ITEMS items 1, 4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:30:04 +03:00
gavrielc
6a815190c0 feat(lifecycle): stuck detection + heartbeat lifecycle + SDK tool blocklist
Replaces the two overlapping old mechanisms (30-min setTimeout kill in
container-runner, 10-min heartbeat STALE_THRESHOLD reset in host-sweep)
with message-scoped stuck detection anchored to the processing_ack claim
age + an absolute 30-min ceiling that extends for long-declared Bash
tools.

Old model problems:
- IDLE_TIMEOUT setTimeout fired on plain wall-clock time; slow-but-alive
  agents got killed at 30min regardless of activity
- 10-min STALE_THRESHOLD in the sweep was unreliable — the heartbeat is
  only touched on SDK events, so legitimate silent tool work (sleep 30,
  long WebFetch, npm install) looked identical to a hung container
- Two overlapping sources of truth for "when to let go of a container"

New model:
- Host sweep is the single source of truth.
- Container exposes a new `container_state` single-row table in outbound.db
  (schema added; container writes, host reads). PreToolUse hook writes
  current_tool + tool_declared_timeout_ms (read from Bash's tool_input);
  PostToolUse / PostToolUseFailure clear it.
- Sweep decides with a pure helper `decideStuckAction`:
    * absolute ceiling — kill if heartbeat age > max(30min, bash_timeout)
    * per-claim stuck  — kill if any processing_ack row has claim_age >
      max(60s, bash_timeout) AND heartbeat hasn't been touched since claim
    * otherwise ok
  Kill paths reset leftover processing rows with exponential backoff,
  reusing the existing retry machinery.

Tool blocklist expanded:
- AskUserQuestion (SDK placeholder; we have mcp__nanoclaw__ask_user_question)
- EnterPlanMode, ExitPlanMode, EnterWorktree, ExitWorktree (Claude Code UI
  affordances; would hang in headless containers)
PreToolUse hook is also defense-in-depth: if a disallowed tool name slips
through, it returns `{ decision: 'block' }` so the agent sees a clear
error instead of appearing stuck.

Removed:
- container-runner.ts: IDLE_TIMEOUT setTimeout, resetIdle callback on
  activeContainers entry, resetContainerIdleTimer export.
- delivery.ts: the resetContainerIdleTimer call on successful delivery.
- poll-loop.ts: IDLE_END_MS + its setInterval. Keeping the query open is
  cheaper than close+reopen (no cold prompt cache). Liveness is now a
  host-side concern.
- host-sweep.ts: 10-min STALE_THRESHOLD_MS + getStuckProcessingIds in the
  stale-detection path (still exported for kill reset).

Tests:
- src/host-sweep.test.ts — 9 tests for decideStuckAction covering: fresh
  heartbeat, absolute ceiling, absent heartbeat, Bash-timeout extension
  (both ceiling and per-claim), claim age below tolerance, heartbeat
  touched after claim, unparseable timestamps.

Ref: docs/v1-vs-v2/ACTION-ITEMS.md items 9, 6a, 10.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:16:57 +03:00
gavrielc
dcfa12ea06 feat(timezone): recreate v1 TZ-aware formatting + scheduling behavior
The agent needs to perceive times in the user's timezone, not UTC.
Dropping this in the v1→v2 port produced a class of bugs where the agent
would schedule tasks for the wrong hour, suggest dinner at midnight, etc.
This restores v1 parity.

Container side:
- New container/agent-runner/src/timezone.ts mirrors src/timezone.ts with
  isValidTimezone / resolveTimezone / formatLocalTime, plus:
  * TIMEZONE constant resolved at load from process.env.TZ (host sets this
    from src/container-runner.ts:254)
  * parseZonedToUtc(input, tz) — treats a naive ISO as wall-clock time in
    `tz`, returns the corresponding UTC Date. Strings with Z or offset
    are passed through.

- formatter.ts:
  * formatMessages() now prepends <context timezone="IANA"/>\n — matches
    v1 src/v1/router.ts:20-22
  * formatSingleChat uses formatLocalTime(ts, TIMEZONE) instead of a
    home-rolled HH:MM 24h formatter → outputs like "Jun 15, 2026, 8:00 AM"
  * reply_to="<id>" attribute + <quoted_message from="X">Y</quoted_message>
    element — matches v1 format exactly; old <reply-to/> shape is gone
  * stripInternalTags() exported for the dispatch path to reuse

- poll-loop.ts uses the exported stripInternalTags() instead of inline regex.

- mcp-tools/scheduling.ts:
  * schedule_task/update_task descriptions now explicitly document that
    processAfter accepts either UTC or naive local time (interpreted in
    the user's TZ from the context header)
  * handlers normalize through parseZonedToUtc() and store a UTC ISO

Host side:
- src/modules/scheduling/recurrence.ts passes { tz: TIMEZONE } to
  CronExpressionParser.parse. Without this, "0 9 * * *" fires at 09:00
  UTC instead of 09:00 user-local — this was the v1 behavior
  (src/v1/task-scheduler.ts:20-49).

Tests:
- container/agent-runner/src/timezone.test.ts — mirror of src/timezone.test.ts
  + new parseZonedToUtc cases
- container/agent-runner/src/formatter.test.ts — context header, reply_to,
  quoted_message, XML escaping, stripInternalTags (ported from v1
  formatting.test.ts)
- src/modules/scheduling/recurrence.test.ts — cron TZ respected, completed
  rows only cloned when recurrence is set

Ref: docs/v1-vs-v2/ACTION-ITEMS.md item 18 + timezone-formatting-v1-recreation.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:09:14 +03:00
gavrielc
0283391e0a chore(config): remove dead POLL_INTERVAL / SCHEDULER_POLL_INTERVAL / IPC_POLL_INTERVAL
These three constants were carried over from v1's polling + IPC architecture
and have zero callers in the v2 runtime:

- POLL_INTERVAL (2000ms) — v1 message loop; replaced by event-driven
  delivery + delivery.ts's ACTIVE_POLL_MS (hardcoded 1000ms)
- SCHEDULER_POLL_INTERVAL (60000ms) — v1 task scheduler; replaced by
  host-sweep.ts's SWEEP_INTERVAL_MS (hardcoded 60_000)
- IPC_POLL_INTERVAL (1000ms) — v1 file-based IPC; meaningless in v2's
  session-DB architecture

Grep confirms no imports in src/, container/, or tests. Docs/SPEC.md
updated to match.

Ref: docs/v1-vs-v2/ACTION-ITEMS.md item 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:01:47 +03:00
gavrielc
47950671fa docs: add v1→v2 action-items analysis + SDK signal probe tool
- docs/v1-vs-v2/: full v1→v2 regression analysis (SUMMARY + 21 per-module
  docs + ACTION-ITEMS rollup with decisions + timezone recreation spec).
- container/agent-runner/scripts/sdk-signal-probe.ts: empirical harness
  used to characterise Claude Agent SDK event/hook/stderr timing for the
  stuck-detection design in item 9.
- src/channels/chat-sdk-bridge.ts: document the conversations Map staleness
  in a code comment; fix deferred to when dynamic group registration lands
  (ACTION-ITEMS item 17).

No runtime behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:00:04 +03:00
gavrielc
131fc99700 feat(channels): add CLI channel — talk to your agent from the terminal
First default channel that ships with main. Unix-socket adapter + thin
client; plugs into the running daemon rather than spawning its own host.

## src/channels/cli.ts

- ChannelAdapter with channelType='cli', platformId='local'.
- setup() unlinks any stale socket, listens on $DATA_DIR/cli.sock (mode 0600
  so only the local user can connect).
- On client connect: reads newline-delimited JSON ({"text": "..."}) and
  calls config.onInbound('local', null, {id, kind:'chat', content, ts}).
- deliver() writes {"text": <body>} back to the connected socket; silently
  no-ops when no client is attached (outbound row still persists).
- Single-client policy: a second connection supersedes the first with a
  [superseded] notice.
- teardown() closes the client, closes the server, removes the socket file.

## scripts/chat.ts + pnpm run chat

One-shot client:
- pnpm run chat <message...>
- Connects to the socket, writes one JSON line with the message.
- Reads replies; exits 2s after the first reply lands (hard timeout 120s).
- ENOENT/ECONNREFUSED prints a hint to start the daemon.

## scripts/init-first-agent.ts

- Fix stale imports after earlier module extractions (permissions +
  agent-to-agent moved their DB helpers into modules/).
- After wiring the DM channel, also create cli/local messaging_group
  (unknown_sender_policy='public' — local socket perms handle auth) and
  wire it to the same agent. User can `pnpm run chat` immediately.

## package.json

- Add "chat": "tsx scripts/chat.ts" script.

## Validation

- pnpm run build clean.
- pnpm test — 137 host tests pass.
- bun test in container/agent-runner — 17 pass.
- Service boot logs: "CLI channel listening" + "Channel adapter started
  channel=cli type=cli". Clean SIGTERM shutdown; socket file removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:51:04 +03:00
gavrielc
7169c25e70 refactor: relocate outbox I/O to session-manager + dead-code sweep
## Outbox extraction (delivery.ts → session-manager.ts)

File I/O for outbound attachments now lives in session-manager.ts alongside
the symmetric inbound extractAttachmentFiles. delivery.ts no longer touches
the filesystem — it hands buffers to the adapter and calls clearOutbox on
success.

- New `readOutboxFiles(agentGroupId, sessionId, messageId, filenames)` and
  `clearOutbox(agentGroupId, sessionId, messageId)` in session-manager.ts.
- deliverMessage in delivery.ts loses ~35 lines of fs/path code and its
  `fs`/`path` imports.

## Dead-code sweep

TypeScript's --noUnusedLocals surfaced several cruft imports. Fixed:

- src/container-runner.ts: drop unused `markContainerIdle` import; drop
  unused `session` parameter from `buildContainerArgs` signature.
- src/delivery.ts: drop unused `getSession`, `writeSessionMessage`,
  `wakeContainer` imports.
- src/host-sweep.ts: drop unused `updateSession`, `outboundDbPath` imports.
- container/agent-runner/src/poll-loop.ts: drop unused `config`,
  `processingIds` params from `processQuery`.
- Test files: drop unused imports in channel-registry.test, db-v2.test,
  host-core.test.

Skipped: `conversations` state in chat-sdk-bridge.ts (never read but
tangled with public `updateConversations` method; cleaning it risks a
merge conflict with the channels branch at the next sync).

## Validation

- `pnpm run build` clean
- `pnpm test` — 137 host tests pass
- `bun test` in container/agent-runner — 17 tests pass
- Service boots (`NanoClaw running`, `OneCLI approval handler started`)
  and shuts down cleanly on SIGTERM

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:34:08 +03:00
gavrielc
95fdec335a refactor(modules): re-tier approvals as default; extract self-mod as optional
Promotes approvals to the default tier with a public API (requestApproval +
registerApprovalHandler) that other modules consume. Self-modification
(install_packages / request_rebuild / add_mcp_server) moves into a new
optional module that registers delivery actions + matching approval handlers
via the new API.

## Approvals (default tier)

- Adds `src/modules/approvals/primitive.ts` exporting `requestApproval`,
  `registerApprovalHandler`, `notifyAgent`. Absorbs `pickApprover` /
  `pickApprovalDelivery` / `channelTypeOf` from the deleted `src/access.ts`.
- Rewrites `response-handler.ts` to dispatch to registered approval handlers
  on approve (action-keyed Map). Reject path is centralized.
- Drops the three self-mod-specific delivery-action registrations from
  `approvals/index.ts`; they belong to self-mod now.
- `onecli-approvals.ts` now imports picks from the primitive instead of
  `src/access.ts`.

## Self-mod (optional tier)

- New `src/modules/self-mod/` with request handlers (validate input + call
  requestApproval) and apply handlers (orchestration on approve).
- `apply.ts` owns updateContainerConfig + buildAgentGroupImage + killContainer
  calls. Self-mod depends on approvals (via registerApprovalHandler +
  requestApproval + notifyAgent) and on core (container-runner, container-config).
- Registers 3 delivery actions + 3 approval handlers at import time.

## Other changes

- `src/access.ts` and `src/access.test.ts` deleted. Tests split across
  `src/modules/approvals/picks.test.ts` (approver selection) and
  `src/modules/permissions/permissions.test.ts` (access + roles + DM).
- `src/modules/index.ts` barrel: approvals loads before self-mod so
  registerApprovalHandler is bound when self-mod registers at import time.

## Validation

- `pnpm run build` clean
- `pnpm test` — 137 host tests pass
- `bun test` in container/agent-runner — 17 tests pass
- Service starts; boot log shows `OneCLI approval handler started`,
  `NanoClaw running`; clean SIGTERM shutdown

Resolves the transitional tier violation flagged in PR #5 where core
imported from the permissions optional module via `src/access.ts`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:41:26 +03:00
gavrielc
46b19dcf9c refactor(modules): extract agent-to-agent as registry-based module
Last extraction of Phase 3. Moves inter-agent messaging + create_agent +
destination projection into src/modules/agent-to-agent/. Core retains:

- `channel_type === 'agent'` dispatch in delivery.ts, guarded by
  hasTable('agent_destinations') + dynamic import into module.
- Channel-permission ACL in delivery.ts, guarded by hasTable, with
  inlined SQL (no module import from core).
- writeDestinations call in container-runner.ts, guarded by hasTable +
  dynamic import into module.
- createMessagingGroupAgent's destination side effect in db/messaging-groups.ts,
  guarded by hasTable. This is a documented transitional tier violation
  (core imports from optional module), analogous to src/access.ts.

Migration `004-agent-destinations.ts` renamed to `module-agent-to-agent-
destinations.ts` preserving `name: 'agent-destinations'` so existing DBs
don't re-run it.

delivery.ts: 600 → 449 lines. handleSystemAction's last switch case gone
(just registry + default log-and-drop). notifyAgent helper removed (only
create_agent used it).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:00:10 +03:00
gavrielc
32bcc2c5ae refactor(permissions): preserve pre-PR behavior in three spots
PR #5 review flagged three behavior changes that shouldn't have slipped
in. This commit reverts each to match the pre-refactor behavior exactly.

1. User upsert ordering. Split the router hook into two setters:
   setSenderResolver (runs before agent resolution) and setAccessGate
   (runs after). Restores the pre-PR sequence where the users row is
   upserted even if the message is dropped by wiring or trigger rules.

2. dropped_messages audit. Moved src/modules/permissions/db/dropped-messages.ts
   back to src/db/dropped-messages.ts. The table is core audit infra, not
   permissions-specific. Router re-writes rows for no_agent_wired and
   no_trigger_match; the access gate writes rows for policy refusals.

3. Permissionless container fallback. Dropped. poll-loop restores the
   original deny-all check when NANOCLAW_ADMIN_USER_IDS is empty.

Module contract doc updated with the two-hook shape.

Validation: host build clean, 137/137 host tests, 17/17 container
tests, typecheck clean, service boots to "NanoClaw running" with
permissions module registering both hooks and clean SIGTERM shutdown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 18:00:10 +03:00
gavrielc
7cc4ecc3be refactor(modules): extract permissions as optional module
Moves user-roles / users / agent-group-members / user-dms /
dropped-messages / user-dm / canAccessAgentGroup into
src/modules/permissions/. Module registers a single inbound-gate that
owns sender resolution, access decision, unknown-sender policy, and
drop-audit recording.

Router slimmed from 357 → 179 lines; the inline fallback chain
(extractAndUpsertUser / enforceAccess / handleUnknownSender /
recordDroppedMessage) is gone — without the permissions module core
defaults to allow-all with userId=null.

container-runner's admin-ID query is now inline SQL guarded by
sqlite_master on user_roles, keeping core free of any import from the
permissions module. The container-side formatter falls back to
permissionless mode when NANOCLAW_ADMIN_USER_IDS is empty: every sender
with an identifiable senderId is treated as admin.

Module contract doc formalizes the tier model and the dependency rule
(core ← default modules ← optional modules). One transitional violation
flagged: src/access.ts (core) imports from the permissions module for
its remaining approver-picking helpers; resolves in the planned PR #7
re-tier.

Validation: host build clean, 137/137 host tests, 17/17 container
tests, typecheck clean, service boots to "NanoClaw running" with
permissions module registering its gate and clean SIGTERM shutdown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 17:42:14 +03:00
gavrielc
473f766585 refactor(modules): extract scheduling as registry-based module
Moves the scheduling surface — 5 delivery actions (schedule_task,
cancel_task, pause_task, resume_task, update_task), handleRecurrence,
applyPreTaskScripts, and task DB helpers — out of core and into
src/modules/scheduling/ (host) and container/agent-runner/src/scheduling/
(container).

First PR to fill the MODULE-HOOK markers introduced in PR #2:
  - src/host-sweep.ts MODULE-HOOK:scheduling-recurrence now dynamically
    imports handleRecurrence from the module each sweep tick.
  - container/agent-runner/src/poll-loop.ts MODULE-HOOK:scheduling-pre-task
    dynamically imports applyPreTaskScripts before the provider call.
    When the marker block is empty (scheduling uninstalled), `keep`
    falls back to `normalMessages` so non-task messages still flow.

The 5 task cases are removed from delivery.ts's handleSystemAction
switch — the registry now routes them. Task DB helpers moved out of
src/db/session-db.ts (which kept `nextEvenSeq` as a named export so
the module can uphold the host-writes-even-seq invariant). Test suite
split to match: scheduling-specific tests live in the module.

No migration — tasks are messages_in rows with kind='task'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 16:17:47 +03:00
gavrielc
626c565a70 fix(modules): break circular import TDZ between index.ts and modules
PR #3 introduced a circular-import temporal-dead-zone bug that didn't
surface in unit tests but crashed the service at startup:

  src/index.ts imports './modules/index.js' for side effects
  → src/modules/interactive/index.ts calls registerResponseHandler()
  → that function is declared in src/index.ts
  → but src/index.ts's const responseHandlers = [] hasn't been
    initialized yet (we're in the middle of its module-init)
  → ReferenceError: Cannot access 'responseHandlers' before initialization

Same issue for registerResponseHandler itself (the function reference
resolves to undefined) and for onShutdown in the approvals module.

Caught when the operator started the service and systemd flagged the
process as crashing in auto-restart loop.

Fix: extract responseHandlers + registerResponseHandler + shutdownCallbacks
+ onShutdown into src/response-registry.ts, which has no dependencies on
src/index.ts or on modules. index.ts re-exports the same surface for any
existing consumers; modules import directly from response-registry.js.

The bug was latent because:
- Unit tests import pieces, never src/index.ts's main() flow.
- Host builds clean because TypeScript doesn't catch runtime circular
  init order.
- Only surfaces when the ES module loader actually executes src/index.ts
  as the entry point.

Verified: service boots on Linux host with approvals + interactive
loaded; OneCLI handler starts via onDeliveryAdapterReady callback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:48:43 +03:00
gavrielc
a4573395d9 refactor(modules): extract approvals + interactive as registry-based modules
Phase 2 / PR #3 of the module refactor. Moves the approval and interactive-
question flows out of core and into src/modules/, wired through the response
dispatcher and delivery action registries.

New modules:
- src/modules/interactive/ — registers a response handler that claims
  pending_questions rows, writes question_response to the session DB, wakes
  the container. createPendingQuestion call stays inline in delivery.ts
  (guarded by hasTable) per plan.
- src/modules/approvals/ — registers 3 delivery actions (install_packages,
  request_rebuild, add_mcp_server), a response handler for pending_approvals
  (including OneCLI action fall-through), an adapter-ready hook that boots
  the OneCLI manual-approval handler, and a shutdown hook that stops it.
  OneCLI implementation (src/onecli-approvals.ts) moves into the module.

Core lifecycle hooks added (narrow, not registries):
- onDeliveryAdapterReady(cb) in delivery.ts — fires when setDeliveryAdapter
  runs (or immediately if already set). Used by approvals for OneCLI boot.
- onShutdown(cb) in index.ts — fires on SIGTERM/SIGINT. Used by approvals
  for OneCLI teardown.
- getDeliveryAdapter() getter in delivery.ts — for live-flow adapter access
  in registered delivery actions.

Core shrinks: delivery.ts 911 → 665 lines, index.ts 405 → 224 lines.
dispatchResponse now logs "Unclaimed response" instead of falling through
to an inline handler — the inline fallback moved into the two modules.

Migration files renamed to the module-<name>-<short>.ts convention:
- 003-pending-approvals.ts → module-approvals-pending-approvals.ts
- 007-pending-approvals-title-options.ts → module-approvals-title-options.ts
Migration.name fields unchanged so existing DBs treat them as already-applied.

Degradation verified: emptying the modules barrel builds clean and 137/137
tests pass. Actions would log "Unknown system action"; button clicks would
log "Unclaimed response".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:16:53 +03:00
gavrielc
4202041d0b refactor: scaffold module registries and default-module layout
Additive change — existing code paths still run via inline fallbacks.
Prepares core for per-module extractions in PR #3 onward.

Four registries added with empty defaults:
  - delivery action handlers (delivery.ts)
  - router inbound gate (router.ts)
  - response dispatcher (index.ts)
  - MCP tool self-registration (container/agent-runner/src/mcp-tools/server.ts)

Default modules moved to src/modules/ for signaling:
  - src/modules/typing/       (extracted from delivery.ts)
  - src/modules/mount-security/ (moved from src/mount-security.ts)

Both are imported directly by core — no hook, no registry. Removal
requires editing core imports.

Migrator now keys applied rows by name (uniqueness) so module
migrations can pick arbitrary version numbers. Stored version column
is auto-assigned as an applied-order sequence.

sqlite_master guards added around core calls into module-owned tables
(user_roles, agent_destinations, pending_questions). No-ops today;
load-bearing after the owning modules are extracted.

MODULE-HOOK markers placed at scheduling's two skill-edit sites
(host-sweep.ts recurrence call, poll-loop.ts pre-task gate). PR #4
replaces the marked blocks when scheduling moves to its module.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:46:19 +03:00
gavrielc
86becf8dea chore: delete v1 reference code
Removes src/v1/ (37 files) and container/agent-runner/src/v1/ (3 files)
along with the v1 reference note in CLAUDE.md and the now-obsolete
tsconfig exclude. v1 was already out of the runtime path; this just
removes the dead weight.

~8,800 LOC removed, zero runtime change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:23:47 +03:00
gavrielc
27c52205f9 fix(channels): bridge openDM delegates to adapter directly
chat.openDM dispatches via inferAdapterFromUserId, which only recognizes
Discord/Slack/Teams/gChat formats and throws for everything else —
breaking approval delivery on Telegram (numeric IDs) and the other
direct-addressable channels the bridge now wraps. Delegate straight to
adapter.openDM + channelIdFromThreadId, and only expose openDM when the
underlying adapter implements it. Preserves the adapter's native
platform_id encoding (e.g. "telegram:<chatId>") so user_dms caches align
with the messaging_groups rows onInbound wrote.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 18:30:38 +03:00
gavrielc
bd659fd7d6 style(v2/delivery): prettier reflow of test import
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:17:07 +03:00
gavrielc
ac9535baa8 fix(v2/delivery): prevent double-send from overlapping delivery polls
The active poll (1s, running sessions) and sweep poll (60s, all active
sessions) both call deliverSessionMessages, and a running session is in
both result sets. Without locking they race on the same outbound row:
both read it as undelivered, both call the channel adapter, both
markDelivered. INSERT OR IGNORE hides the DB collision but the user has
already received the message twice.

Adds a per-session inflight guard; the second concurrent caller skips
and picks up any leftover work on the next poll tick.

Also makes outbox cleanup in deliverMessage best-effort: the message is
already on the user's screen, so a cleanup throw must not propagate to
the retry path (which would resend).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:17:06 +03:00
gavrielc
c37609ffc8 docs(skills): drop "v2" from skill content + src/index.ts log lines
Cleans up the prose-level v2 references that the rename commit didn't
touch. Skills now describe themselves and the codebase without "v2"
versioning language. /add-X-v2 cross-references in setup, init-first-agent,
and manage-channels updated to /add-X.

Runtime path identifiers (data/v2.db, data/v2-sessions/, container name
nanoclaw-v2) deliberately left as-is — renaming them breaks live installs
without commensurate benefit.

Verified: pnpm run build clean, 326 host tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:42:55 +03:00
gavrielc
712720eef4 refactor(v2): drop @chat-adapter/shared dep — duck-type NetworkError
The only use was channel-registry.ts checking `err instanceof NetworkError`
to retry transient setup failures. Switched to a duck-type predicate
(`err.name === 'NetworkError'`) so the dep is no longer needed at trunk
level. Channel skills bring it in transitively when they install their
Chat SDK adapter package.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:33:22 +03:00
gavrielc
437ba63a2f refactor(v2): move channel adapters off v2 trunk
v2 ships with no channels baked in. All channel adapters (discord, slack,
telegram + helpers, whatsapp, whatsapp-cloud, gchat, github, imessage,
linear, matrix, resend, teams, webex) and their channel-specific setup
steps (pair-telegram, whatsapp-auth) now live on the `channels` branch
and get copied in via /add-*-v2 skills.

Removed:
- src/channels/{discord,slack,telegram*,whatsapp*,gchat,github,imessage,linear,matrix,resend,teams,webex}.ts
- setup/{pair-telegram,whatsapp-auth}.ts
- 14 channel-specific deps from package.json (@chat-adapter/*, @beeper/*,
  @bitbasti/*, @resend/chat-sdk-adapter, @whiskeysockets/baileys,
  chat-adapter-imessage, qrcode, @chat-adapter/state-memory unused)
- Their corresponding STEPS entries from setup/index.ts
- Channel imports from src/channels/index.ts

Kept:
- Channel infra: adapter.ts, channel-registry.ts (+ test), chat-sdk-bridge.ts,
  ask-question.ts, an empty-imports index.ts
- Chat SDK runtime (`chat`) for channels that copy in via Chat SDK bridge
- @chat-adapter/shared promoted from transitive to direct dep
  (channel-registry.ts uses NetworkError from it)

Verified: pnpm run build clean, 326 host tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:20:28 +03:00
gavrielc
03dd559786 style: trim trailing whitespace in providers barrel
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:12:35 +03:00
gavrielc
e0258e8c1b refactor(v2): move opencode provider off v2 trunk
v2 ships with only claude baked in. opencode now lives on the `providers`
branch and gets copied in via the /add-opencode skill.

Removed:
- src/providers/opencode.ts
- container/agent-runner/src/providers/{opencode,mcp-to-opencode}.ts + test
- @opencode-ai/sdk from agent-runner package.json + bun.lock
- opencode-ai global install + OPENCODE_VERSION ARG from Dockerfile
- opencode self-registration imports from both provider barrels
- opencode test case from factory.test.ts

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:10:56 +03:00
Tal Moskovich
22150261c5 feat(v2): OpenCode agent provider
- Add OpenCodeProvider (SSE, session resume, MCP via mcp-to-opencode)
- Register opencode in factory; AGENT_PROVIDER passthrough from DB
- Host: XDG mount, NO_PROXY merge, OPENCODE_* env for opencode sessions
- Dockerfile: opencode-ai CLI; docs checklist + architecture diagram
- Skill add-opencode for v2; AgentProviderName in src/types.ts

Made-with: Cursor
2026-04-17 12:20:22 +03:00
gavrielc
7639f7b1bb style(v2/providers): prettier reflow of provider-container-registry signatures
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 12:17:09 +03:00