Commit Graph

366 Commits

Author SHA1 Message Date
gavrielc
2825f657ca Merge branch 'main' into fix/register-channel-wiring 2026-04-24 17:20:29 +03:00
gavrielc
f804ebf2e9 Merge branch 'main' into fix/session-state-per-provider-and-agent-route-files 2026-04-24 17:13:06 +03:00
grtwrn
fc375ca72b fix(register): wire channels with correct engage fields, skip prefix for native IDs
setup/register.ts had two bugs that prevented new channels from being
registered via `/manage-channels`:

1. createMessagingGroupAgent was called with the legacy field names
   `trigger_rules` and `response_scope`. The SQL INSERT expects
   `engage_mode` / `engage_pattern` / `sender_scope` / `ignored_message_policy`
   (migration 010). Every register call failed with
   `RangeError: Missing named parameter "engage_mode"` after the agent
   and messaging group were partially created — leaving an orphaned pair.

   Now mirrors scripts/init-first-agent.ts:wireIfMissing:
   - Groups (is_group=1) default to engage_mode='mention' (bot only
     responds when addressed).
   - DMs (is_group=0) default to engage_mode='pattern' with '.' (respond
     to every message).
   - An explicit --trigger overrides the pattern regex.

2. The "normalize platform_id" block unconditionally prefixed
   "<channel>:" even for native IDs like WhatsApp JIDs
   ("120363408974444974@g.us"), iMessage emails ("user@example.com"),
   or Signal phones ("+15551234567") / Signal groups ("group:abc"). But
   the router (src/router.ts:158) looks up messaging_groups by the raw
   event.platformId from the adapter, which for these native adapters
   never has a prefix. So the prefixed row was never matched — the
   message was silently dropped with no "Message routed" log.

   Extracted scripts/init-first-agent.ts:namespacedPlatformId into
   src/platform-id.ts so both setup paths use the same heuristic (skip
   the prefix for IDs containing '@', starting with '+', or starting
   with 'group:'). Prevents future drift between the two paths.

Tested by: re-running `setup/index.ts --step register` for a WhatsApp
group JID, confirming the row is created with correct engage fields
and matching platform_id, then sending a test message and observing
"Message routed" with the right agent group.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 17:06:10 +03:00
glifocat
3d6837c411 chore(format): apply prettier to chat-sdk-bridge.ts
Two long-line violations introduced in d121cd1 (isGroup plumbing)
exceed the printWidth limit. CI format:check fails on every PR
opened against main until this is fixed; the fix is isolated here
so no behavior change is mixed in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 12:12:05 +02:00
Adam
fd03b89333 fix(agent-route): reject unsafe attachment filenames to prevent path traversal
Filenames in forwardAttachedFiles arrived from the source agent's
messages_out content and were used directly in path.join on both
source outbox read and target inbox write. A value like `../evil.sh`
could escape `inbox/<a2a-id>/` on the target session (and similarly
the source outbox on read), breaking session isolation — an
adversarial or hallucinating sub-agent could overwrite files in
a sibling session.

Adds isSafeAttachmentName(name) — exported so it's unit-testable —
which rejects empty, `.`, `..`, anything containing `/`, `\`, or
NUL, and anything path.basename would strip. Guard runs before any
I/O. Unsafe names are dropped with a warning log, same pattern as
missing-source-file handling; a bad filename in one attachment
doesn't kill the whole route's text delivery.

Addresses Codex Review P1 on qwibitai/nanoclaw#1967.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:45:08 +10:00
Adam
672e228876 fix(agent-route): forward file attachments between agents
Before: `send_file(to='parent')` from a sub-agent wrote the bytes to
the sub-agent's own session outbox, but agent-to-agent routing copied
only the content JSON — the target's inbound message referenced
`files: ['x.png']` but the bytes lived in a session directory the
target couldn't mount. Parent agents orchestrating sub-agents (e.g.
Design Team delegating illustration work to an Illustrator sub-agent
on Codex) received file-reference messages with nothing to forward.

Fix: on route, if the source's content has `files`, copy each referenced
file from `<source>/outbox/<src-msg-id>/` to
`<target>/inbox/<a2a-msg-id>/`, and emit `attachments` (the existing
formatter convention — see formatter.ts:223) with `localPath` relative
to `/workspace/`. The target formatter already renders these as
`[file: <name> — saved to /workspace/inbox/<a2a-id>/<name>]`, so the
target agent sees the path and can call `send_file(path=…, to=…)` to
forward onward.

Convention matches what session-manager.ts:256 already does for
base64-encoded channel-inbound attachments — same inbox layout, same
content shape. Nothing on the formatter/agent side needed to change.

## Scope

- `forwardAttachedFiles(source, target)` — pure-ish helper that copies
  files and returns the attachments array.
- `forwardFileAttachments(msg, …)` — wraps the helper for the route
  path: parses content, copies files if present, merges into any
  existing `attachments`, re-serialises.
- `routeAgentMessage` — uses the rewritten content when writing the
  target's inbound row.
- Log line now includes `forwardedFileCount` for observability.

Missing source files are skipped with a warning rather than killing
the route — a bad filename in a batch shouldn't drop the
accompanying text.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:34:29 +10:00
exe.dev user
5845a5a980 fix(container-runner): honor agent_provider DB columns with session override
resolveProviderContribution read only containerConfig.provider (from each
group's container.json) and ignored both agent_groups.agent_provider and
sessions.agent_provider. The provider-install skills (opencode, codex)
and CLAUDE.md document those DB columns as the source of truth with
session-overrides-group precedence, but the code never consulted them —
so setting `agent_provider = 'codex'` on a group had no effect, and the
only way to route to a non-default provider was to edit the per-group
JSON directly. Discovered while wiring up Codex: DB update landed but
the spawned container kept running Claude.

Extract a pure `resolveProviderName(session, group, containerConfig)`
with the documented precedence:

    sessions.agent_provider
      → agent_groups.agent_provider
      → container.json `provider`
      → 'claude'

`resolveProviderContribution` now calls it. The container.json fallback
stays so existing installs that only set provider in JSON keep working.
Empty strings treated as unset to avoid footguns when a DB-backed form
writes '' for "no override."

Added unit tests covering precedence, null-fallthrough, empty-string
fallthrough, and case normalization.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 22:47:10 +00:00
gavrielc
c1d0395d11 Merge branch 'main' into main 2026-04-23 23:04:35 +03:00
gavrielc
f351e46008 refactor(approvals): persist title+options on channel/sender approval tables
getAskQuestionRender used to hardcode the card title and option labels
for pending_channel_approvals and pending_sender_approvals in the
DB-access layer, duplicating wording that already lived in the approval
modules. That caused a visible drift between the initial card title —
picked per event in channel-approval.ts ("📣 Bot mentioned in new chat"
vs. "💬 New direct message") — and the post-click render, which
always showed the constant "📣 Channel registration".

Mirror the pattern already used by pending_approvals: add title /
options_json columns on both pending_*_approvals tables via migration
013, have the approval modules write them at creation time, and let
getAskQuestionRender just SELECT.

- Migration 013 ALTERs the two tables to add title + options_json.
- PendingChannelApproval / PendingSenderApproval types and their
  create functions grow the two fields.
- channel-approval.ts / sender-approval.ts normalize options once
  and pass both title and options_json into the insert.
- getAskQuestionRender drops the hardcoded render objects and reads
  the stored values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 22:54:47 +03:00
gavrielc
ffd38f660a Merge branch 'main' into fix/pending-rows-idempotent 2026-04-23 22:37:22 +03:00
exe.dev user
97868af5a7 fix(delivery): make pending_questions/approvals insert idempotent
createPendingQuestion and createPendingApproval both run before the
adapter delivery call. When delivery fails and the retry loop reinvokes
deliverMessage with the same questionId/approvalId, the second attempt
hit UNIQUE constraint on the pending_questions.question_id (or
pending_approvals.approval_id) and threw — so the retry never reached
the send step, and every subsequent retry failed the same way until
max-attempts marked the message permanently failed.

Switch both inserts to INSERT OR IGNORE. Return bool indicating whether
a new row was actually inserted so delivery.ts can avoid logging
"Pending question created" twice for the same card.

Symptom that surfaced this: a send-layer ValidationError on one attempt
followed by SqliteError on every subsequent attempt, with the user
seeing neither the card nor a follow-up. Seen in conjunction with the
Telegram 64-byte callback_data limit (fixed separately in
#1942/chat-sdk-bridge), but the idempotency gap applies to any
transient delivery failure — rate limits, network blips, adapter 5xx —
and is worth fixing on its own.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 17:05:41 +00:00
exe.dev user
ff277c0d49 fix(chat-sdk-bridge): encode option index in callback_data for Telegram 64-byte cap
ask_question cards failed to deliver on Telegram whenever any option had
a non-trivial value (e.g. an ISO datetime, a URL, or a long token).
Telegram limits inline-keyboard callback_data to 64 bytes, and the
previous encoding embedded both the questionId and the full option
value in each button's actionId plus a second copy as value, producing
payloads well over the cap. The adapter threw ValidationError, delivery
was marked permanently failed, and the agent sat waiting on an answer
that never reached the user.

Fix:
  - Button id is now `ncq:<questionId>:<index>` and button value is the
    stringified index. Callback payloads shrink from ~100 bytes to ~40
    and fit Telegram's cap for any option list with <100 items.
  - Both callback-decode sites (Chat SDK `onAction` for Telegram/Slack/
    etc., and the Discord Gateway interaction handler) resolve the
    index back to the real option value via
    `getAskQuestionRender(questionId)` before dispatching to the host's
    onAction — so response handlers (pending_questions, pending_approvals)
    are unchanged and still receive the canonical value.
  - `resolveSelectedOption` helper has a backward-compat fallback:
    non-numeric tails are treated as literal values so any card
    delivered under the old encoding still resolves if the user clicks
    it after deploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 16:56:21 +00:00
Gabi Simons
a8eb82d529 Merge branch 'main' into main 2026-04-23 18:24:24 +03:00
exe.dev user
237876c2c6 chore(format): wrap session-manager import in container-runner
Pre-commit prettier reformatted this in the working tree but didn't
re-stage. Keeping it in a separate commit to avoid amending a prior
commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:12:56 +00:00
exe.dev user
209061f54f fix(sweep): wake before reset + idempotent retry for orphan claims
When a container exits with an unresolved processing_ack claim, the
sweep's crashed-container cleanup would reset the matching inbound
message with tries++ and a future process_after. dueCount then dropped
to 0, so the wake step never fired — and the next sweep tick found the
same orphan claim, bumped tries again, and pushed process_after further
out. The message reached MAX_TRIES and was marked failed without any
container ever being spawned.

Two changes:

1. Reorder sweep so the wake step runs before crashed-container
   cleanup. A fresh container clears orphan 'processing' rows on its
   own startup (container/agent-runner/src/db/connection.ts), so once
   we get it running the claim resolves itself.

2. Make resetStuckProcessingRows idempotent: if a message already has
   process_after set to a future time, skip the retry bump. The wake
   path will pick it up when the backoff elapses. Requires returning
   process_after from getMessageForRetry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:12:16 +00:00
exe.dev user
bee80b0072 fix(container): clear orphan heartbeat before spawn
After a container exits, its .heartbeat file is left behind with the
mtime of its last SDK activity. When the same session spawns a new
container, the host sweep's ceiling check reads that stale mtime and
kills the freshly-spawned container within seconds — before the new
instance has had time to touch the file itself.

The sweep already has a carve-out for "no heartbeat file" (treated as a
fresh spawn, given grace), so simply removing the orphan at spawn time
restores the intended semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:12:02 +00:00
gavrielc
dd5bc85b02 refactor(skill/atomic-chat-tool): ship MCP file in skill folder, revert src edits
The initial /add-atomic-chat-tool merge added src edits directly to main.
That conflicts with the utility-skill pattern used elsewhere (e.g. /claw):
the skill folder should ship the file and SKILL.md should instruct copy +
idempotent edits at install time, not a git merge that carries src diffs.

- Move container/agent-runner/src/atomic-chat-mcp-stdio.ts →
  .claude/skills/add-atomic-chat-tool/atomic-chat-mcp-stdio.ts
- Revert the atomic_chat mcpServers entry in agent-runner index.ts
- Revert mcp__atomic_chat__* from TOOL_ALLOWLIST in providers/claude.ts
- Revert ATOMIC_CHAT_* env forwarding and [ATOMIC] log elevation in
  src/container-runner.ts
- Empty .env.example back out
- Rewrite SKILL.md: copy the shipped file, then apply deterministic Edits
  (index.ts, providers/claude.ts, container-runner.ts, .env.example)
  with exact before/after snippets the installer agent can match.

Main is now back to its pre-PR state for the tool; /add-atomic-chat-tool
re-applies everything at install time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 16:29:10 +03:00
Misha Skvortsov
3a9b98f1a4 feat: add Atomic Chat MCP tool skill
Exposes local Atomic Chat models (OpenAI-compatible API at
127.0.0.1:1337/v1) as tools to the container agent. Adds
atomic_chat_list_models and atomic_chat_generate alongside
the existing Ollama skill.

Rebased on current main:
- MCP server registered in agent-runner index.ts using bun (no tsc
  step in-image), sibling path to index.ts, env: {} with ATOMIC_CHAT_*
  forwarded when set.
- allowedTools entry moved to providers/claude.ts TOOL_ALLOWLIST.
- SKILL.md: drop obsolete per-group copy step (single RO mount
  supersedes it); use pnpm build.

Made-with: Cursor
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 16:18:34 +03:00
exe.dev user
40f5683c36 fix(approvals): show correct post-click labels on channel/sender cards
getAskQuestionRender only checked pending_questions and
pending_approvals, missing the channel and sender approval tables.
Approval button clicks showed the raw value ("approve") instead of
the selectedLabel (" Wired"). Extend the lookup to also check
pending_channel_approvals and pending_sender_approvals.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 12:23:45 +00:00
exe.dev user
15f30682d7 fix(approvals): show human-readable names in approval cards
Channel and sender approval cards showed raw platform IDs
(e.g. discord:1475578393738219540:...) instead of readable context.
Extract sender name from the event content for channel approvals,
and use the channel type name for sender approvals.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 12:23:34 +00:00
exe.dev user
d121cd1cd6 fix(router): pass isGroup from adapter through to messaging group creation
The router hardcoded is_group=0 when auto-creating messaging groups,
causing channel mentions to be misclassified as DMs. The Chat SDK
bridge knows which handler fired (onDirectMessage vs onNewMention)
so thread the signal through InboundMessage → InboundEvent → router.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 12:23:23 +00:00
exe.dev user
61ca43d193 fix(discord): resolve user ID from DM interactions for approval clicks
Discord puts the clicking user at interaction.member.user for guild
interactions but interaction.user for DM interactions. The Gateway
handler only checked interaction.member, so DM button clicks resolved
to an empty user ID and were silently rejected as unauthorized.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 12:23:12 +00:00
Lazer Cohen
2383bde80f fix(container): scope orphan reaper by install label so peers don't kill each other
Two installs on the same host could trash each other's containers: the
reaper used `docker ps --filter name=nanoclaw-`, a substring match that
picked up every install's containers. A crash-looping peer (e.g. a legacy
v1 plist respawning ~6k times) would call cleanupOrphans on every boot and
kill the healthy install's session containers within seconds of spawn.

- Stamp `--label nanoclaw-install=<slug>` onto every spawned container.
- cleanupOrphans filters by that label; healthy peers are left alone.
- Setup preflight enumerates `com.nanoclaw*` launchd plists / nanoclaw
  user systemd units, probes state/runs, and unloads any that are
  crash-looping (state != running AND runs > 10) before installing
  this install's service.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 12:12:30 +03:00
gavrielc
5f1b3e5cad style: apply prettier formatting to install-slug additions 2026-04-23 10:10:48 +03:00
gavrielc
7a9401ddf2 feat(setup): per-checkout service name and docker image tag
Two NanoClaw installs on the same host used to fight over the shared `com.nanoclaw` launchd label / `nanoclaw.service` systemd unit and the `nanoclaw-agent:latest` docker tag — the second install silently rewrote the service pointer and rebuilt the image out from under the first. Introduces a deterministic per-checkout slug (sha1(projectRoot)[:8]) and namespaces everything off it:

- Service: `com.nanoclaw-v2-<slug>` / `nanoclaw-v2-<slug>.service`
- Image:   `nanoclaw-agent-v2-<slug>:latest` (base), `nanoclaw-agent-v2-<slug>:<agentGroupId>` (per-group)

New shared helpers: src/install-slug.ts (host) + setup/lib/install-slug.sh (bash). Both compute the same slug so verify/probe/add-*.sh/build.sh/container-runner all agree. Any v1 `com.nanoclaw` service left on the host stays untouched and can coexist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:10:09 +03:00
gavrielc
3b8240a91b refactor(self-mod): drop request_rebuild — approvals now bundle rebuild+restart
install_packages and add_mcp_server already did the right thing on approve
(install auto-rebuilt+killed, add_mcp_server just killed), so request_rebuild
was redundant plumbing agents sometimes called after an install — wasting an
admin approval round-trip. Delete it end-to-end:

- container/agent-runner/src/mcp-tools/self-mod.ts: remove requestRebuild
  tool + registration; update install_packages description.
- src/modules/self-mod/{request,apply,index}.ts: drop handleRequestRebuild
  + applyRequestRebuild + registrations; rewrite the rebuild-failed notify
  to point admins at retrying install_packages instead.
- src/modules/{approvals,self-mod}/{agent,project}.md and skill/self-
  customize/SKILL.md: scrub agent-facing references; clarify that
  add_mcp_server needs no rebuild (bun runs TS directly).
- docs/{module-contract,architecture-diagram,checklist,db-central,shared-
  source,v1-vs-v2/*}.md, CLAUDE.md, pending-approvals migration comment,
  approvals/index.ts docstring, REFACTOR.md: trailing references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:28:36 +03:00
gavrielc
e64bdb3016 refactor(claude-md): split shared base into module fragments, inject name at runtime
Move every agent-specific instruction out of the shared container/CLAUDE.md
so the base is genuinely universal. Persona/identity now comes from the
system-prompt addendum (buildSystemPromptAddendum now takes assistantName
and prepends "# You are {name}"). Per-module instructions live alongside
each MCP tool source:

  container/agent-runner/src/mcp-tools/core.instructions.md
  container/agent-runner/src/mcp-tools/scheduling.instructions.md
  container/agent-runner/src/mcp-tools/self-mod.instructions.md

composeGroupClaudeMd() scans that directory and emits `module-<name>.md`
fragments as symlinks to /app/src/mcp-tools/<name>.instructions.md (valid
via the existing RO source mount). Skill fragments renamed to
`skill-<name>.md` for naming consistency with `module-*` and `mcp-*`.

Mount tightening so composer-managed files can't be clobbered by agent
writes: nested RO mounts for /workspace/agent/CLAUDE.md and
/workspace/agent/.claude-fragments/. CLAUDE.local.md (per-group memory)
stays RW as the only writable CLAUDE.md-family file.

.gitignore: ignore CLAUDE.local.md, .claude-shared.md, .claude-fragments/
everywhere, and simplify groups/ rules to ignore the whole tree (per-
installation state, not tracked).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:14:51 +03:00
gavrielc
95e74d8383 docs(onecli): expand secrets section; correct stale admin-roles refs
Document the selective-mode gotcha for auto-created OneCLI agents
(no secrets injected by default) with the CLI commands to inspect
and fix it. Note that approval policies are not configurable via
the SDK or `onecli@1.3.0` CLI — web UI only.

Replace stale `NANOCLAW_ADMIN_USER_IDS` / `src/access.ts` references
across CLAUDE.md, docs/architecture.md, docs/checklist.md, and
docs/module-contract.md. Admin gating now runs host-side in
src/command-gate.ts against `user_roles`; approver picks live in
src/modules/approvals/primitive.ts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 16:46:17 +03:00
gavrielc
3db66c0ced fix: forward ONECLI_API_KEY to OneCLI SDK for authenticated container config
Ports the v1 fix from PR #1777 (originally 8b5b581 by @johnnyfish).
Cherry-pick did not apply cleanly because v2 reformatted the surrounding
code and split OneCLI usage into two sites — manual port was needed.

v2-specific adaptations:
- Also forward apiKey at the second OneCLI call site in
  src/modules/approvals/onecli-approvals.ts (v2 split the approvals
  module out of container-runner).
- Skipped the companion test-mock commit (38163bc) — it patches
  src/container-runner.test.ts, which no longer exists in v2 (tests
  consolidated into host-core.test.ts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: johnnyfish <jonathanfishner11@gmail.com>
2026-04-22 15:16:59 +03:00
gavrielc
8e1c8f8f61 style: apply prettier formatting to touched files
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 14:57:09 +03:00
gavrielc
c8fc1da719 refactor(claude-md): compose per-group CLAUDE.md from shared base + fragments
Replace the per-group "written once at init, owned by the group" CLAUDE.md
with a host-regenerated entry point that imports:

  - a shared base (`container/CLAUDE.md` mounted RO at `/app/CLAUDE.md`)
  - optional per-skill fragments (skills that ship `instructions.md`)
  - optional per-MCP-server fragments (inline `instructions` field in
    `container.json`)
  - per-group agent memory (`CLAUDE.local.md`, auto-loaded by Claude Code)

Principle: RW = per-group memory, RO = shared content. Source/skills/base
are shared; personality, config, working files, and Claude state stay
per-group.

Key changes:

  - New `src/claude-md-compose.ts` — per-spawn composition +
    `migrateGroupsToClaudeLocal()` one-time cutover.
  - New `container/CLAUDE.md` — shared base, seeded verbatim from the
    former `groups/global/CLAUDE.md`.
  - `src/container-runner.ts` — swap `/workspace/global` mount for RO
    `/app/CLAUDE.md`; call `composeGroupClaudeMd()` after
    `initGroupFilesystem()`.
  - `src/group-init.ts` — drop `.claude-global.md` symlink + initial
    `CLAUDE.md` write; seed `CLAUDE.local.md` from `opts.instructions`.
  - `src/index.ts` — call `migrateGroupsToClaudeLocal()` at startup.
  - `src/container-config.ts` — add optional `instructions` field to
    `McpServerConfig` (inline per-MCP guidance fragment).
  - `container/Dockerfile` — drop dead `/workspace/global` mkdir.
  - Remove obsolete `scripts/migrate-group-claude-md.ts`.

Migration (runs once at host startup, idempotent):

  - Delete `.claude-global.md` symlinks in each group.
  - Rename each `groups/<folder>/CLAUDE.md` → `CLAUDE.local.md`
    (preserves existing per-group content as memory).
  - Delete `groups/global/` directory.

Design docs: `docs/claude-md-composition.md` and `docs/shared-source.md`
(the latter is the sibling design discussion this refactor builds on).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:58:43 +03:00
exe.dev user
8a12fa61ac refactor: shared source — replace per-group agent-runner copies with single RO mount
Replace the per-group agent-runner-src copy model with a single shared
read-only mount. Source and skills are now RO + shared; personality,
config, working files, and Claude state stay RW + per-group.

Key changes:
- Mount container/agent-runner/src/ RO at /app/src (all groups share one copy)
- Mount container/skills/ RO at /app/skills; per-group skill selection via
  symlinks in .claude-shared/skills/ based on container.json "skills" field
- Mount container.json as nested RO bind on top of RW group dir
- Move all NANOCLAW_* env vars to container.json (runner reads at startup)
- New runner config.ts module replaces process.env reads
- Move command gate (filtered/admin) from container to host router
- Dockerfile: remove source COPY, split CLI installs (claude-code last),
  move agent-runner deps above CLIs for better layer caching
- Add writeOutboundDirect for router denial responses
- Design doc at docs/shared-src.md

Not included (follow-up): DB migration to drop agent_provider columns,
cleanup of orphaned agent-runner-src directories.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 12:58:43 +03:00
gavrielc
1858ef35f0 Merge pull request #1908 from qwibitai/setup-auto
feat(setup): scripted branded setup flow (nanoclaw.sh)
2026-04-22 03:06:30 +03:00
gavrielc
416fe01855 refactor(setup): drop CLI-bonus wiring from init-first-agent
init-first-agent used to double-wire the CLI channel to every new DM
agent as a convenience for `pnpm run chat`, gated by --no-cli-bonus.
With the /new-setup-2 flow gone and a dedicated scratch CLI agent
created earlier in setup:auto, that bonus just stomps on CLI routing
the user already set up. Remove the CLI_CHANNEL/CLI_PLATFORM_ID
constants, ensureCliMessagingGroup, the --no-cli-bonus flag, and the
cli-bonus wiring block.

Pass the paired user's identity through to the welcome delivery so
the sender resolver sees the real owner (e.g. telegram:<id>) instead
of cli:local. Extend the CLI channel's admin-transport payload to
accept optional sender/senderId overrides — falls back to the old
cli/cli:local defaults when omitted, so existing callers are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 02:13:22 +03:00
Dave Kim
91c668e0cc fix: persist SDK session_id on init + split long messages before adapter truncation
Two related bugs that surfaced together when a Discord response exceeded
2000 chars:

1. **Session id lost on mid-turn container exit.** `runPollLoop` was
   calling `setStoredSessionId` only after `processQuery` returned. If
   the container died between the SDK's `init` event (where session_id
   arrives) and the stream completing, the id was never persisted. The
   next wake called `getStoredSessionId()` → undefined and started a
   fresh Claude session, dropping all prior context. Fix: persist
   immediately in the `init` branch inside `processQuery`. The existing
   post-query store becomes a harmless no-op.

2. **Silent truncation past adapter limits.** `chat-sdk-bridge.deliver`
   handed full text straight to `adapter.postMessage`. Discord's adapter
   hard-truncates at 2000 chars; Telegram's at 4096. Responses longer
   than that were cut off without any signal to the user or host. Fix:
   add `maxTextLength` to `ChatSdkBridgeConfig` and a `splitForLimit`
   helper that breaks on paragraph → line → hard-char boundaries, then
   posts chunks sequentially. Files ride on the first chunk; the
   returned id is the first chunk's so edits and reactions still target
   the reply head.

Channel adapter files (Discord, Telegram, …) live on the `channels`
branch — a companion PR wires `maxTextLength: 1900` for Discord and
`4000` for Telegram so the splitter actually engages in those installs.
Without wiring, behavior is unchanged.
2026-04-21 13:04:57 +00:00
gavrielc
01ffce6f74 Revert "fix(permissions): welcome new approved channels via /welcome, route to them"
This reverts commit 9776dd4f32.
2026-04-21 15:20:06 +03:00
Koshkoshinsk
9776dd4f32 fix(permissions): welcome new approved channels via /welcome, route to them
When the unknown-channel approval flow completes, seed a /welcome task
into the newly-wired session so the agent greets the new user on first
contact. The replayed /start (Telegram's default first-message) is filtered
by the agent-runner's command-command filter, so without an explicit
onboarding trigger the first interaction produced nothing.

Pin the destination by its local_name from agent_destinations to avoid the
agent picking the wrong named destination (previously it greeted the owner,
whose DM is in CLAUDE.md).

Also guard dispatchResultText against echoing trailing status lines when
the agent has already sent messages explicitly via send_message. Otherwise
a task-triggered flow that calls send_message then emits "welcome message
sent" produces a duplicate chat to the recipient.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 11:40:12 +00:00
gavrielc
d8d61d3695 fix: Teams user-id prefix + defer cli:local owner grant
parseUserId now falls back to user.kind when the id prefix isn't a
registered adapter — Teams uses `29:` rather than `teams:`, so the
literal prefix wouldn't resolve the channel adapter for cold DMs.

init-cli-agent no longer claims the first-owner slot on `cli:local`.
The CLI identity is scratch; owner promotion belongs to
init-first-agent once the real channel user is wired.
2026-04-21 10:16:13 +03:00
gavrielc
0f6a1ba1ed style: apply prettier formatting to touched files
Pre-commit hook reflowed imports on files changed in the previous commit.
Unrelated format drift on other files intentionally left unstaged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 23:31:42 +03:00
gavrielc
6c26c0413a feat(router,cli): replyTo override + CLI admin-transport flows
- InboundEvent gains an optional replyTo; router stamps the row's address
  fields from it when set, so replies can route to a different channel than
  the one the inbound came in on.
- ChannelSetup adds onInboundEvent for admin-transport adapters that build
  the full event themselves.
- CLI wire format accepts {text, to, reply_to}. Routed messages go through
  onInboundEvent and do not evict an active chat client.
- init-first-agent hands the DM welcome to the running service via
  data/cli.sock — synchronous wake, no sweep wait. Fails loudly if the
  service is down; no silent fallback.
- Split the CLI scratch-agent bootstrap into scripts/init-cli-agent.ts;
  init-first-agent is DM-only.

Agents cannot set replyTo: it lives only on the inbound/router seam and is
consumed once when writing messages_in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 23:30:47 +03:00
gavrielc
719f97e483 feat(permissions): unknown-channel registration flow with owner approval
When the router sees a mention or DM on a messaging group that isn't wired
to any agent, it now escalates to an owner for approval instead of silently
dropping. Mirrors the existing unknown-sender approval pattern (ACTION-ITEMS
item 22).

Schema (migration 012):
- `messaging_groups.denied_at TEXT NULL` — timestamp set on deny so future
  mentions stop escalating. ALTER TABLE ADD COLUMN, FK-safe (unlike the
  rebuild that bit migration 011).
- `pending_channel_approvals` — PK on `messaging_group_id` gives free
  in-flight dedup. One card per channel, no spam on rapid retries.

Router:
- New hook `setChannelRequestGate(mg, event) => Promise<void>`, invoked
  from the no-wirings branch when the message was addressed to the bot
  (isMention=true). Hook is fire-and-forget.
- Checks `mg.denied_at` before escalating — denied channels drop silently
  and do not re-prompt.
- The two "no-wirings" branches (fresh auto-create and existing mg with
  no agents) are consolidated into one escalation path that calls the
  gate once. Without the module, behavior is log + record (no regression).

Permissions module:
- `channel-approval.ts::requestChannelApproval` — MVP picker: target
  agent is `getAllAgentGroups()[0]`, card names it explicitly ("Wire it
  to <Andy>?"). Approver via existing `pickApprover` + `pickApprovalDelivery`
  primitives.
- Response handler: same click-auth pattern as sender-approval (clicker
  must be the designated approver OR have admin privilege over the
  target agent group).
- Approve defaults per the feature spec:
    engage_mode = 'mention-sticky' for groups, 'pattern' + '.' for DMs
    sender_scope = 'known'
    ignored_message_policy = 'accumulate'
    session_mode = 'shared'
  DM vs group inferred from the original event's threadId (non-null →
  group) because the auto-created mg has a placeholder is_group=0 until
  the adapter fills it in.
- Triggering sender is auto-added to agent_group_members so sender_scope=
  'known' doesn't bounce the replayed message into a sender-approval
  cascade.
- Deny: stamps messaging_groups.denied_at, clears pending row.
- Failure modes — no owner, no agent groups, no reachable DM — log and
  drop without creating a pending row, letting a future attempt try
  again (same as sender-approval).

9 new integration tests cover every branch: mention triggers card, DM
triggers card, dedup, approve creates correct wiring + admits sender +
replays, approve-on-DM uses pattern/'.' defaults, deny sets denied_at
and future mentions drop silently, unauthorized clicker rejected,
no-owner drops, no-agent-groups drops.

168 tests pass (was 159; +9).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:34:00 +03:00
gavrielc
a4061a0012 refactor(channels,router): move all policy to router; bridge is transport
Follow-up to b159722. That shrank the bridge's shouldEngage to a flood
gate + coarse sticky-subscribe signal. This completes the move —
policy lives exclusively in the router, the bridge is transport-only,
and the conversations map + ChannelSetup.conversations +
ChannelAdapter.updateConversations are all gone.

Key shifts:

1. Subscribe moves from bridge to router.

   Bridge used to call `thread.subscribe()` from its onNewMention /
   onDirectMessage handlers based on a coarse "any mention-sticky wiring
   exists on this channel" check. That forced the decision before the
   router could apply per-wiring engage logic, and it relied on the
   conversations map being current (staleness risk).

   ChannelAdapter gains `subscribe?(platformId, threadId)`. The Chat
   SDK bridge implements it via SqliteStateAdapter.subscribe(threadId)
   (idempotent — a repeat call on an already-subscribed thread is a
   no-op). The router's fan-out loop calls it once per message when
   the first mention-sticky wiring actually engages. Precise, not
   coarse.

2. Short-circuit the drop path with one combined query.

   New `getMessagingGroupWithAgentCount(channelType, platformId)` does
   the messaging_groups lookup AND counts wirings in a single SELECT,
   using the existing UNIQUE(channel_type, platform_id) index on
   messaging_groups and UNIQUE(messaging_group_id, agent_group_id) on
   messaging_group_agents for the JOIN. No new indexes needed.

   routeInbound now short-circuits:
     - No messaging_groups row AND not addressed (no mention/DM)
       → return silently. One DB read, nothing written. This is the
       Discord-bot-in-a-big-guild case; we no longer auto-create rows
       for every plain message in every channel the bot can see.
     - Messaging group exists but no wirings AND not addressed
       → return silently. One DB read.
     - Otherwise fall through to sender resolution + fan-out as before.

   Behavioral change: plain chatter on unwired channels no longer gets
   dropped_messages audit rows, which used to bloat the table. Audit
   still fires on addressed-to-bot drops where the admin cares
   ("someone @-mentioned us but nobody's wired").

3. Bridge is now purely transport.

   Deleted entirely: ConversationConfig, ChannelSetup.conversations,
   ChannelAdapter.updateConversations?, bridge's `conversations` map,
   buildConversationMap, shouldEngage, EngageSource, engageDecision,
   bridge.updateConversations method, src/index.ts
   buildConversationConfigs. Four handlers reduce to "resolve channel
   id, build InboundMessage with isMention, call onInbound". Net
   ~130 LOC deleted from the bridge.

   Collateral: the conversations-map staleness problem is gone. The
   upcoming channel-registration feature doesn't need any map-refresh
   plumbing — when an approval creates a new wiring, the next message
   hits the DB fresh and just works.

Bridge tests prune to the narrow platform-adjacent surface (openDM
delegation, subscribe presence). Host-core test that asserted the
old "auto-create on every unknown message" behavior updates to
reflect the new escalation-gated semantics: plain messages on
unknown channels don't auto-create, mentions do.

159 tests pass (was 172 — net -13, almost entirely from
bridge-engage-mode tests that covered logic now owned by the router
and exercised through host-core.test.ts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:55:49 +03:00
gavrielc
b15972284b refactor(channels): shrink bridge shouldEngage to flood gate + subscribe signal
Before this change the bridge and the router both owned engage_mode
policy. Bridge's shouldEngage had a full switch over mention /
mention-sticky / pattern + source-based rules + engage_pattern regex
test + ignored_message_policy accumulate fallback. Router's
evaluateEngage had the same switch against the same fields. Two
parallel logic paths with subtle vocabulary differences (bridge: "which
SDK handler fired"; router: "what isMention says"). Every time we
touched one we had to reason about the other — the Telegram
hasMention bug and the "pattern mode silently drops in group chats"
bug were both drift between the two.

Refactor to one place. Router keeps all per-wiring policy — engage
mode, pattern regex, sender scope, ignored-message policy — unchanged.
Bridge drops to a coarse flood gate + subscribe signal:

  - forward: does this channel have ANY wiring? Forward if yes.
    Unknown channels still forward for subscribed/mention/dm (they may
    be newly auto-created, or will trigger the coming
    channel-registration flow). Unknown channels DROP for new-message
    so we don't flood from every unsubscribed thread the bot happens
    to sit in.

  - stickySubscribe: any mention-sticky wiring on the channel AND the
    source is mention or dm. Coarse union — subscribe is idempotent
    and one call serves every sticky wiring.

The `text` param on shouldEngage is gone (pattern regex lives in the
router now). Four bridge handler sites simplify accordingly. messageToInbound
still carries the SDK-confirmed isMention flag through to the router
unchanged.

Behavioral delta: pure-mention-wired channels (no pattern, no
accumulate) will now see every plain group message reach the router
before being dropped there, where before the bridge dropped at the
transport boundary. Extra DB lookup per dropped message in this
specific case; acceptable for the cleaner seam and can be optimized
back at the bridge if it ever matters in practice.

Bridge tests prune the 10 engage_mode-specific cases that covered
logic now owned by evaluateEngage in the router (host-core.test.ts
covers it end-to-end through routeInbound). Bridge tests keep only
what's bridge-specific: the flood gate and the stickySubscribe
coarse union.

172 tests pass (was 182 — net -10 redundant bridge tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:32:08 +03:00
gavrielc
68058cbc4a fix(permissions): authorize unknown-sender approval clicks
The approval click handler trusted row.approver_user_id as the actor
regardless of who actually clicked the card. A random user who received
the forwarded card could click Approve and get the stranger admitted
to the agent group — their click was simply not checked.

Separately, payload.userId arrives as the raw platform userId from
Chat SDK onAction (e.g. "6037840640"), not the namespaced form
("telegram:6037840640") that matches users(id). Without namespacing,
users-table lookups miss.

Namespace the clicker id with payload.channelType, then authorize: the
clicker must be either the designated approver OR have
owner / admin privilege over the agent group (hasAdminPrivilege covers
owner, global admin, scoped admin). Unauthorized clicks return true
(claim the response so the registry doesn't log it as unclaimed) but
take no action — the pending row stays in place so a legitimate
approver can still act on it.

Existing tests passed a pre-namespaced userId directly, masking the
first bug. Fixed the fixtures to match production plumbing and added
two tests: one asserts a random bystander's click is rejected (row
stays pending, no member added), the other asserts a global admin can
approve even when they weren't the designated approver.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:16:35 +03:00
gavrielc
f74df3b0d3 fix(router): trust SDK isMention signal; drop broken hasMention regex
The router's mention / mention-sticky engage check was regex-matching
@<agent_group.name> (e.g. @Andy) against message text. Platforms don't
work that way — users address bots via the bot's platform username
(@nanoclaw_v2_refactr_1_bot on Telegram, user-id mentions on Slack /
Discord). The regex matched only coincidentally and never on Telegram,
so mention-mode wirings silently never fired there.

Two parallel mention detectors existed: the Chat SDK's onNewMention,
which correctly resolves the bot's platform identity, and the router's
hasMention text regex, which ignored the SDK verdict and invented its
own heuristic. The router's detector was wrong in principle — the agent
group's display name is a NanoClaw-side nickname, not a platform
address.

Thread the SDK signal through: InboundMessage gains an optional
`isMention` field, the bridge sets it from each handler (onNewMention →
true, onDirectMessage → true, onSubscribedMessage → message.isMention,
onNewMessage(/./) → false), src/index.ts forwards it into InboundEvent,
and evaluateEngage now checks `isMention === true` for mention modes.
hasMention deleted entirely — there is only one source of truth for
"did the user mention this bot": the platform / SDK.

Agent-name-in-text matching for disambiguating multiple agents wired to
one chat is a separate feature; users can express it today with
engage_mode='pattern' + the agent's name as the regex.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:16:20 +03:00
gavrielc
0105de0257 fix(host-sweep): skip ceiling check when heartbeat file is absent
decideStuckAction treated a missing heartbeat file as heartbeatAge =
Infinity, which always exceeded the 30-minute ceiling. Result: every
freshly-spawned container got killed within seconds of spawn on the
first sweep pass because it hadn't produced an SDK event yet (heartbeat
is only touched on SDK events inside processQuery, not on boot).

Skip the ceiling branch when heartbeatMtimeMs === 0. Containers that
genuinely never wrote a heartbeat because they died are caught by the
separate "container process not running" cleanup path. Containers that
boot, claim a message, but hang at the gate are caught by the
claim-stuck check below — which correctly fires regardless of heartbeat
presence once claimAge exceeds tolerance.

Updates the "absent heartbeat → kill-ceiling" test (which was encoding
the bug) and adds a companion that the claim-stuck path still fires for
absent-heartbeat containers with aged claims.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:15:52 +03:00
gavrielc
c38e5b11a8 fix(channels): wire accumulate mode through the bridge
The router + session DB were already fully plumbed for
ignored_message_policy='accumulate' — fan-out in routeInbound calls
deliverToAgent(wake=false) for non-engaging agents on accumulate wirings,
writeSessionMessage writes trigger=0, countDueMessages filters trigger=1,
container formatter includes all messages regardless of trigger. But the
Chat SDK bridge dropped non-engaging messages before the router ever saw
them, so accumulate was dead on arrival for every adapter that goes
through the bridge.

Expose ignored_message_policy on ConversationConfig, project it in
buildConversationConfigs, and widen shouldEngage's "forward" decision to
"engage OR accumulate" with the union taken across all wirings on a
conversation. stickySubscribe stays gated on a real engage — subscribing
a thread we'd only silently accumulate on would misrepresent the bot's
presence.

shouldEngage return shape is now { forward, stickySubscribe } — engage
was an internal concept the caller never needed, and conflating it with
forward was the source of this bug.

7 new tests cover: non-engaging messages forwarding under accumulate,
mixed drop/accumulate wirings taking the union, accumulate not
triggering sticky subscribe, unknown-conversation drop precedence over
accumulate, and drop policy preserving existing behavior on engaging
messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 11:18:43 +03:00
gavrielc
ce25e1e97c style(channels): prettier line-wrap in chat-sdk-bridge.test.ts
Post-commit reformat picked up by format:fix hook on the previous commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 11:12:40 +03:00
gavrielc
52c6223292 fix(channels): register onNewMessage(/./) to fix pattern mode in group chats
Chat SDK dispatch (per handling-events.mdx) is exclusive and prioritized:
subscribed → onSubscribedMessage; unsubscribed + mention → onNewMention;
unsubscribed + pattern match → onNewMessage. We never registered the third,
so engage_mode='pattern' silently dropped every message in unsubscribed
group threads — the SDK simply never surfaced them anywhere.

Register chat.onNewMessage(/./, …) and route it through shouldEngage with
a new 'new-message' source. Unknown-conversation policy drops for this
source (would otherwise flood from every unwired channel the bot can see).
mention / mention-sticky wirings ignore 'new-message' — they require an
explicit @mention to start a conversation. Pattern wirings evaluate
normally.

Extracted shouldEngage from a closure to an exported function with an
EngageSource type so it's unit-testable. Added 17 tests covering every
source × engage-mode combination, unknown-conversation behavior, invalid
regex fail-open, and multi-wiring union.

Accumulate (ignored_message_policy='accumulate') is still not plumbed —
the bridge drops non-engaging messages entirely instead of forwarding
them as context-only. That requires a trigger: 0 | 1 field on
InboundMessage → router → writeSessionMessage (schema already has the
column). Separate change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 11:11:56 +03:00
gavrielc
57e0cda9e5 Revert "fix(channels): pre-subscribe group threads for pattern / accumulate wirings"
This reverts commit 73b20880ff.
2026-04-20 10:35:33 +03:00