nanoclaw

Author	SHA1	Message	Date
gavrielc	2825f657ca	Merge branch 'main' into fix/register-channel-wiring	2026-04-24 17:20:29 +03:00
gavrielc	f804ebf2e9	Merge branch 'main' into fix/session-state-per-provider-and-agent-route-files	2026-04-24 17:13:06 +03:00
grtwrn	fc375ca72b	fix(register): wire channels with correct engage fields, skip prefix for native IDs setup/register.ts had two bugs that prevented new channels from being registered via `/manage-channels`: 1. createMessagingGroupAgent was called with the legacy field names `trigger_rules` and `response_scope`. The SQL INSERT expects `engage_mode` / `engage_pattern` / `sender_scope` / `ignored_message_policy` (migration 010). Every register call failed with `RangeError: Missing named parameter "engage_mode"` after the agent and messaging group were partially created — leaving an orphaned pair. Now mirrors scripts/init-first-agent.ts:wireIfMissing: - Groups (is_group=1) default to engage_mode='mention' (bot only responds when addressed). - DMs (is_group=0) default to engage_mode='pattern' with '.' (respond to every message). - An explicit --trigger overrides the pattern regex. 2. The "normalize platform_id" block unconditionally prefixed "<channel>:" even for native IDs like WhatsApp JIDs ("120363408974444974@g.us"), iMessage emails ("user@example.com"), or Signal phones ("+15551234567") / Signal groups ("group:abc"). But the router (src/router.ts:158) looks up messaging_groups by the raw event.platformId from the adapter, which for these native adapters never has a prefix. So the prefixed row was never matched — the message was silently dropped with no "Message routed" log. Extracted scripts/init-first-agent.ts:namespacedPlatformId into src/platform-id.ts so both setup paths use the same heuristic (skip the prefix for IDs containing '@', starting with '+', or starting with 'group:'). Prevents future drift between the two paths. Tested by: re-running `setup/index.ts --step register` for a WhatsApp group JID, confirming the row is created with correct engage fields and matching platform_id, then sending a test message and observing "Message routed" with the right agent group. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 17:06:10 +03:00
glifocat	3d6837c411	chore(format): apply prettier to chat-sdk-bridge.ts Two long-line violations introduced in `d121cd1` (isGroup plumbing) exceed the printWidth limit. CI format:check fails on every PR opened against main until this is fixed; the fix is isolated here so no behavior change is mixed in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 12:12:05 +02:00
Adam	fd03b89333	fix(agent-route): reject unsafe attachment filenames to prevent path traversal Filenames in forwardAttachedFiles arrived from the source agent's messages_out content and were used directly in path.join on both source outbox read and target inbox write. A value like `../evil.sh` could escape `inbox/<a2a-id>/` on the target session (and similarly the source outbox on read), breaking session isolation — an adversarial or hallucinating sub-agent could overwrite files in a sibling session. Adds isSafeAttachmentName(name) — exported so it's unit-testable — which rejects empty, `.`, `..`, anything containing `/`, `\`, or NUL, and anything path.basename would strip. Guard runs before any I/O. Unsafe names are dropped with a warning log, same pattern as missing-source-file handling; a bad filename in one attachment doesn't kill the whole route's text delivery. Addresses Codex Review P1 on qwibitai/nanoclaw#1967. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 15:45:08 +10:00
Adam	672e228876	fix(agent-route): forward file attachments between agents Before: `send_file(to='parent')` from a sub-agent wrote the bytes to the sub-agent's own session outbox, but agent-to-agent routing copied only the content JSON — the target's inbound message referenced `files: ['x.png']` but the bytes lived in a session directory the target couldn't mount. Parent agents orchestrating sub-agents (e.g. Design Team delegating illustration work to an Illustrator sub-agent on Codex) received file-reference messages with nothing to forward. Fix: on route, if the source's content has `files`, copy each referenced file from `<source>/outbox/<src-msg-id>/` to `<target>/inbox/<a2a-msg-id>/`, and emit `attachments` (the existing formatter convention — see formatter.ts:223) with `localPath` relative to `/workspace/`. The target formatter already renders these as `[file: <name> — saved to /workspace/inbox/<a2a-id>/<name>]`, so the target agent sees the path and can call `send_file(path=…, to=…)` to forward onward. Convention matches what session-manager.ts:256 already does for base64-encoded channel-inbound attachments — same inbox layout, same content shape. Nothing on the formatter/agent side needed to change. ## Scope - `forwardAttachedFiles(source, target)` — pure-ish helper that copies files and returns the attachments array. - `forwardFileAttachments(msg, …)` — wraps the helper for the route path: parses content, copies files if present, merges into any existing `attachments`, re-serialises. - `routeAgentMessage` — uses the rewritten content when writing the target's inbound row. - Log line now includes `forwardedFileCount` for observability. Missing source files are skipped with a warning rather than killing the route — a bad filename in a batch shouldn't drop the accompanying text. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 15:34:29 +10:00
exe.dev user	5845a5a980	fix(container-runner): honor agent_provider DB columns with session override resolveProviderContribution read only containerConfig.provider (from each group's container.json) and ignored both agent_groups.agent_provider and sessions.agent_provider. The provider-install skills (opencode, codex) and CLAUDE.md document those DB columns as the source of truth with session-overrides-group precedence, but the code never consulted them — so setting `agent_provider = 'codex'` on a group had no effect, and the only way to route to a non-default provider was to edit the per-group JSON directly. Discovered while wiring up Codex: DB update landed but the spawned container kept running Claude. Extract a pure `resolveProviderName(session, group, containerConfig)` with the documented precedence: sessions.agent_provider → agent_groups.agent_provider → container.json `provider` → 'claude' `resolveProviderContribution` now calls it. The container.json fallback stays so existing installs that only set provider in JSON keep working. Empty strings treated as unset to avoid footguns when a DB-backed form writes '' for "no override." Added unit tests covering precedence, null-fallthrough, empty-string fallthrough, and case normalization. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 22:47:10 +00:00
gavrielc	c1d0395d11	Merge branch 'main' into main	2026-04-23 23:04:35 +03:00
gavrielc	f351e46008	refactor(approvals): persist title+options on channel/sender approval tables getAskQuestionRender used to hardcode the card title and option labels for pending_channel_approvals and pending_sender_approvals in the DB-access layer, duplicating wording that already lived in the approval modules. That caused a visible drift between the initial card title — picked per event in channel-approval.ts ("📣 Bot mentioned in new chat" vs. "💬 New direct message") — and the post-click render, which always showed the constant "📣 Channel registration". Mirror the pattern already used by pending_approvals: add title / options_json columns on both pending_*_approvals tables via migration 013, have the approval modules write them at creation time, and let getAskQuestionRender just SELECT. - Migration 013 ALTERs the two tables to add title + options_json. - PendingChannelApproval / PendingSenderApproval types and their create functions grow the two fields. - channel-approval.ts / sender-approval.ts normalize options once and pass both title and options_json into the insert. - getAskQuestionRender drops the hardcoded render objects and reads the stored values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 22:54:47 +03:00
gavrielc	ffd38f660a	Merge branch 'main' into fix/pending-rows-idempotent	2026-04-23 22:37:22 +03:00
exe.dev user	97868af5a7	fix(delivery): make pending_questions/approvals insert idempotent createPendingQuestion and createPendingApproval both run before the adapter delivery call. When delivery fails and the retry loop reinvokes deliverMessage with the same questionId/approvalId, the second attempt hit UNIQUE constraint on the pending_questions.question_id (or pending_approvals.approval_id) and threw — so the retry never reached the send step, and every subsequent retry failed the same way until max-attempts marked the message permanently failed. Switch both inserts to INSERT OR IGNORE. Return bool indicating whether a new row was actually inserted so delivery.ts can avoid logging "Pending question created" twice for the same card. Symptom that surfaced this: a send-layer ValidationError on one attempt followed by SqliteError on every subsequent attempt, with the user seeing neither the card nor a follow-up. Seen in conjunction with the Telegram 64-byte callback_data limit (fixed separately in #1942/chat-sdk-bridge), but the idempotency gap applies to any transient delivery failure — rate limits, network blips, adapter 5xx — and is worth fixing on its own. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 17:05:41 +00:00
exe.dev user	ff277c0d49	fix(chat-sdk-bridge): encode option index in callback_data for Telegram 64-byte cap ask_question cards failed to deliver on Telegram whenever any option had a non-trivial value (e.g. an ISO datetime, a URL, or a long token). Telegram limits inline-keyboard callback_data to 64 bytes, and the previous encoding embedded both the questionId and the full option value in each button's actionId plus a second copy as value, producing payloads well over the cap. The adapter threw ValidationError, delivery was marked permanently failed, and the agent sat waiting on an answer that never reached the user. Fix: - Button id is now `ncq:<questionId>:<index>` and button value is the stringified index. Callback payloads shrink from ~100 bytes to ~40 and fit Telegram's cap for any option list with <100 items. - Both callback-decode sites (Chat SDK `onAction` for Telegram/Slack/ etc., and the Discord Gateway interaction handler) resolve the index back to the real option value via `getAskQuestionRender(questionId)` before dispatching to the host's onAction — so response handlers (pending_questions, pending_approvals) are unchanged and still receive the canonical value. - `resolveSelectedOption` helper has a backward-compat fallback: non-numeric tails are treated as literal values so any card delivered under the old encoding still resolves if the user clicks it after deploy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 16:56:21 +00:00
Gabi Simons	a8eb82d529	Merge branch 'main' into main	2026-04-23 18:24:24 +03:00
exe.dev user	237876c2c6	chore(format): wrap session-manager import in container-runner Pre-commit prettier reformatted this in the working tree but didn't re-stage. Keeping it in a separate commit to avoid amending a prior commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:12:56 +00:00
exe.dev user	209061f54f	fix(sweep): wake before reset + idempotent retry for orphan claims When a container exits with an unresolved processing_ack claim, the sweep's crashed-container cleanup would reset the matching inbound message with tries++ and a future process_after. dueCount then dropped to 0, so the wake step never fired — and the next sweep tick found the same orphan claim, bumped tries again, and pushed process_after further out. The message reached MAX_TRIES and was marked failed without any container ever being spawned. Two changes: 1. Reorder sweep so the wake step runs before crashed-container cleanup. A fresh container clears orphan 'processing' rows on its own startup (container/agent-runner/src/db/connection.ts), so once we get it running the claim resolves itself. 2. Make resetStuckProcessingRows idempotent: if a message already has process_after set to a future time, skip the retry bump. The wake path will pick it up when the backoff elapses. Requires returning process_after from getMessageForRetry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:12:16 +00:00
exe.dev user	bee80b0072	fix(container): clear orphan heartbeat before spawn After a container exits, its .heartbeat file is left behind with the mtime of its last SDK activity. When the same session spawns a new container, the host sweep's ceiling check reads that stale mtime and kills the freshly-spawned container within seconds — before the new instance has had time to touch the file itself. The sweep already has a carve-out for "no heartbeat file" (treated as a fresh spawn, given grace), so simply removing the orphan at spawn time restores the intended semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:12:02 +00:00
gavrielc	dd5bc85b02	refactor(skill/atomic-chat-tool): ship MCP file in skill folder, revert src edits The initial /add-atomic-chat-tool merge added src edits directly to main. That conflicts with the utility-skill pattern used elsewhere (e.g. /claw): the skill folder should ship the file and SKILL.md should instruct copy + idempotent edits at install time, not a git merge that carries src diffs. - Move container/agent-runner/src/atomic-chat-mcp-stdio.ts → .claude/skills/add-atomic-chat-tool/atomic-chat-mcp-stdio.ts - Revert the atomic_chat mcpServers entry in agent-runner index.ts - Revert mcp__atomic_chat__* from TOOL_ALLOWLIST in providers/claude.ts - Revert ATOMIC_CHAT_* env forwarding and [ATOMIC] log elevation in src/container-runner.ts - Empty .env.example back out - Rewrite SKILL.md: copy the shipped file, then apply deterministic Edits (index.ts, providers/claude.ts, container-runner.ts, .env.example) with exact before/after snippets the installer agent can match. Main is now back to its pre-PR state for the tool; /add-atomic-chat-tool re-applies everything at install time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 16:29:10 +03:00
Misha Skvortsov	3a9b98f1a4	feat: add Atomic Chat MCP tool skill Exposes local Atomic Chat models (OpenAI-compatible API at 127.0.0.1:1337/v1) as tools to the container agent. Adds atomic_chat_list_models and atomic_chat_generate alongside the existing Ollama skill. Rebased on current main: - MCP server registered in agent-runner index.ts using bun (no tsc step in-image), sibling path to index.ts, env: {} with ATOMIC_CHAT_* forwarded when set. - allowedTools entry moved to providers/claude.ts TOOL_ALLOWLIST. - SKILL.md: drop obsolete per-group copy step (single RO mount supersedes it); use pnpm build. Made-with: Cursor Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 16:18:34 +03:00
exe.dev user	40f5683c36	fix(approvals): show correct post-click labels on channel/sender cards getAskQuestionRender only checked pending_questions and pending_approvals, missing the channel and sender approval tables. Approval button clicks showed the raw value ("approve") instead of the selectedLabel ("✅ Wired"). Extend the lookup to also check pending_channel_approvals and pending_sender_approvals. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 12:23:45 +00:00
exe.dev user	15f30682d7	fix(approvals): show human-readable names in approval cards Channel and sender approval cards showed raw platform IDs (e.g. discord:1475578393738219540:...) instead of readable context. Extract sender name from the event content for channel approvals, and use the channel type name for sender approvals. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 12:23:34 +00:00
exe.dev user	d121cd1cd6	fix(router): pass isGroup from adapter through to messaging group creation The router hardcoded is_group=0 when auto-creating messaging groups, causing channel mentions to be misclassified as DMs. The Chat SDK bridge knows which handler fired (onDirectMessage vs onNewMention) so thread the signal through InboundMessage → InboundEvent → router. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 12:23:23 +00:00
exe.dev user	61ca43d193	fix(discord): resolve user ID from DM interactions for approval clicks Discord puts the clicking user at interaction.member.user for guild interactions but interaction.user for DM interactions. The Gateway handler only checked interaction.member, so DM button clicks resolved to an empty user ID and were silently rejected as unauthorized. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-23 12:23:12 +00:00
Lazer Cohen	2383bde80f	fix(container): scope orphan reaper by install label so peers don't kill each other Two installs on the same host could trash each other's containers: the reaper used `docker ps --filter name=nanoclaw-`, a substring match that picked up every install's containers. A crash-looping peer (e.g. a legacy v1 plist respawning ~6k times) would call cleanupOrphans on every boot and kill the healthy install's session containers within seconds of spawn. - Stamp `--label nanoclaw-install=<slug>` onto every spawned container. - cleanupOrphans filters by that label; healthy peers are left alone. - Setup preflight enumerates `com.nanoclaw*` launchd plists / nanoclaw user systemd units, probes state/runs, and unloads any that are crash-looping (state != running AND runs > 10) before installing this install's service. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 12:12:30 +03:00
gavrielc	5f1b3e5cad	style: apply prettier formatting to install-slug additions	2026-04-23 10:10:48 +03:00
gavrielc	7a9401ddf2	feat(setup): per-checkout service name and docker image tag Two NanoClaw installs on the same host used to fight over the shared `com.nanoclaw` launchd label / `nanoclaw.service` systemd unit and the `nanoclaw-agent:latest` docker tag — the second install silently rewrote the service pointer and rebuilt the image out from under the first. Introduces a deterministic per-checkout slug (sha1(projectRoot)[:8]) and namespaces everything off it: - Service: `com.nanoclaw-v2-<slug>` / `nanoclaw-v2-<slug>.service` - Image: `nanoclaw-agent-v2-<slug>:latest` (base), `nanoclaw-agent-v2-<slug>:<agentGroupId>` (per-group) New shared helpers: src/install-slug.ts (host) + setup/lib/install-slug.sh (bash). Both compute the same slug so verify/probe/add-*.sh/build.sh/container-runner all agree. Any v1 `com.nanoclaw` service left on the host stays untouched and can coexist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 10:10:09 +03:00
gavrielc	3b8240a91b	refactor(self-mod): drop request_rebuild — approvals now bundle rebuild+restart install_packages and add_mcp_server already did the right thing on approve (install auto-rebuilt+killed, add_mcp_server just killed), so request_rebuild was redundant plumbing agents sometimes called after an install — wasting an admin approval round-trip. Delete it end-to-end: - container/agent-runner/src/mcp-tools/self-mod.ts: remove requestRebuild tool + registration; update install_packages description. - src/modules/self-mod/{request,apply,index}.ts: drop handleRequestRebuild + applyRequestRebuild + registrations; rewrite the rebuild-failed notify to point admins at retrying install_packages instead. - src/modules/{approvals,self-mod}/{agent,project}.md and skill/self- customize/SKILL.md: scrub agent-facing references; clarify that add_mcp_server needs no rebuild (bun runs TS directly). - docs/{module-contract,architecture-diagram,checklist,db-central,shared- source,v1-vs-v2/*}.md, CLAUDE.md, pending-approvals migration comment, approvals/index.ts docstring, REFACTOR.md: trailing references. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 17:28:36 +03:00
gavrielc	e64bdb3016	refactor(claude-md): split shared base into module fragments, inject name at runtime Move every agent-specific instruction out of the shared container/CLAUDE.md so the base is genuinely universal. Persona/identity now comes from the system-prompt addendum (buildSystemPromptAddendum now takes assistantName and prepends "# You are {name}"). Per-module instructions live alongside each MCP tool source: container/agent-runner/src/mcp-tools/core.instructions.md container/agent-runner/src/mcp-tools/scheduling.instructions.md container/agent-runner/src/mcp-tools/self-mod.instructions.md composeGroupClaudeMd() scans that directory and emits `module-<name>.md` fragments as symlinks to /app/src/mcp-tools/<name>.instructions.md (valid via the existing RO source mount). Skill fragments renamed to `skill-<name>.md` for naming consistency with `module-` and `mcp-`. Mount tightening so composer-managed files can't be clobbered by agent writes: nested RO mounts for /workspace/agent/CLAUDE.md and /workspace/agent/.claude-fragments/. CLAUDE.local.md (per-group memory) stays RW as the only writable CLAUDE.md-family file. .gitignore: ignore CLAUDE.local.md, .claude-shared.md, .claude-fragments/ everywhere, and simplify groups/ rules to ignore the whole tree (per- installation state, not tracked). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 17:14:51 +03:00
gavrielc	95e74d8383	docs(onecli): expand secrets section; correct stale admin-roles refs Document the selective-mode gotcha for auto-created OneCLI agents (no secrets injected by default) with the CLI commands to inspect and fix it. Note that approval policies are not configurable via the SDK or `onecli@1.3.0` CLI — web UI only. Replace stale `NANOCLAW_ADMIN_USER_IDS` / `src/access.ts` references across CLAUDE.md, docs/architecture.md, docs/checklist.md, and docs/module-contract.md. Admin gating now runs host-side in src/command-gate.ts against `user_roles`; approver picks live in src/modules/approvals/primitive.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 16:46:17 +03:00
gavrielc	3db66c0ced	fix: forward ONECLI_API_KEY to OneCLI SDK for authenticated container config Ports the v1 fix from PR #1777 (originally `8b5b581` by @johnnyfish). Cherry-pick did not apply cleanly because v2 reformatted the surrounding code and split OneCLI usage into two sites — manual port was needed. v2-specific adaptations: - Also forward apiKey at the second OneCLI call site in src/modules/approvals/onecli-approvals.ts (v2 split the approvals module out of container-runner). - Skipped the companion test-mock commit (`38163bc`) — it patches src/container-runner.test.ts, which no longer exists in v2 (tests consolidated into host-core.test.ts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: johnnyfish <jonathanfishner11@gmail.com>	2026-04-22 15:16:59 +03:00
gavrielc	8e1c8f8f61	style: apply prettier formatting to touched files Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 14:57:09 +03:00
gavrielc	c8fc1da719	refactor(claude-md): compose per-group CLAUDE.md from shared base + fragments Replace the per-group "written once at init, owned by the group" CLAUDE.md with a host-regenerated entry point that imports: - a shared base (`container/CLAUDE.md` mounted RO at `/app/CLAUDE.md`) - optional per-skill fragments (skills that ship `instructions.md`) - optional per-MCP-server fragments (inline `instructions` field in `container.json`) - per-group agent memory (`CLAUDE.local.md`, auto-loaded by Claude Code) Principle: RW = per-group memory, RO = shared content. Source/skills/base are shared; personality, config, working files, and Claude state stay per-group. Key changes: - New `src/claude-md-compose.ts` — per-spawn composition + `migrateGroupsToClaudeLocal()` one-time cutover. - New `container/CLAUDE.md` — shared base, seeded verbatim from the former `groups/global/CLAUDE.md`. - `src/container-runner.ts` — swap `/workspace/global` mount for RO `/app/CLAUDE.md`; call `composeGroupClaudeMd()` after `initGroupFilesystem()`. - `src/group-init.ts` — drop `.claude-global.md` symlink + initial `CLAUDE.md` write; seed `CLAUDE.local.md` from `opts.instructions`. - `src/index.ts` — call `migrateGroupsToClaudeLocal()` at startup. - `src/container-config.ts` — add optional `instructions` field to `McpServerConfig` (inline per-MCP guidance fragment). - `container/Dockerfile` — drop dead `/workspace/global` mkdir. - Remove obsolete `scripts/migrate-group-claude-md.ts`. Migration (runs once at host startup, idempotent): - Delete `.claude-global.md` symlinks in each group. - Rename each `groups/<folder>/CLAUDE.md` → `CLAUDE.local.md` (preserves existing per-group content as memory). - Delete `groups/global/` directory. Design docs: `docs/claude-md-composition.md` and `docs/shared-source.md` (the latter is the sibling design discussion this refactor builds on). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 12:58:43 +03:00
exe.dev user	8a12fa61ac	refactor: shared source — replace per-group agent-runner copies with single RO mount Replace the per-group agent-runner-src copy model with a single shared read-only mount. Source and skills are now RO + shared; personality, config, working files, and Claude state stay RW + per-group. Key changes: - Mount container/agent-runner/src/ RO at /app/src (all groups share one copy) - Mount container/skills/ RO at /app/skills; per-group skill selection via symlinks in .claude-shared/skills/ based on container.json "skills" field - Mount container.json as nested RO bind on top of RW group dir - Move all NANOCLAW_* env vars to container.json (runner reads at startup) - New runner config.ts module replaces process.env reads - Move command gate (filtered/admin) from container to host router - Dockerfile: remove source COPY, split CLI installs (claude-code last), move agent-runner deps above CLIs for better layer caching - Add writeOutboundDirect for router denial responses - Design doc at docs/shared-src.md Not included (follow-up): DB migration to drop agent_provider columns, cleanup of orphaned agent-runner-src directories. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-22 12:58:43 +03:00
gavrielc	1858ef35f0	Merge pull request #1908 from qwibitai/setup-auto feat(setup): scripted branded setup flow (nanoclaw.sh)	2026-04-22 03:06:30 +03:00
gavrielc	416fe01855	refactor(setup): drop CLI-bonus wiring from init-first-agent init-first-agent used to double-wire the CLI channel to every new DM agent as a convenience for `pnpm run chat`, gated by --no-cli-bonus. With the /new-setup-2 flow gone and a dedicated scratch CLI agent created earlier in setup:auto, that bonus just stomps on CLI routing the user already set up. Remove the CLI_CHANNEL/CLI_PLATFORM_ID constants, ensureCliMessagingGroup, the --no-cli-bonus flag, and the cli-bonus wiring block. Pass the paired user's identity through to the welcome delivery so the sender resolver sees the real owner (e.g. telegram:<id>) instead of cli:local. Extend the CLI channel's admin-transport payload to accept optional sender/senderId overrides — falls back to the old cli/cli:local defaults when omitted, so existing callers are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 02:13:22 +03:00
Dave Kim	91c668e0cc	fix: persist SDK session_id on init + split long messages before adapter truncation Two related bugs that surfaced together when a Discord response exceeded 2000 chars: 1. Session id lost on mid-turn container exit. `runPollLoop` was calling `setStoredSessionId` only after `processQuery` returned. If the container died between the SDK's `init` event (where session_id arrives) and the stream completing, the id was never persisted. The next wake called `getStoredSessionId()` → undefined and started a fresh Claude session, dropping all prior context. Fix: persist immediately in the `init` branch inside `processQuery`. The existing post-query store becomes a harmless no-op. 2. Silent truncation past adapter limits. `chat-sdk-bridge.deliver` handed full text straight to `adapter.postMessage`. Discord's adapter hard-truncates at 2000 chars; Telegram's at 4096. Responses longer than that were cut off without any signal to the user or host. Fix: add `maxTextLength` to `ChatSdkBridgeConfig` and a `splitForLimit` helper that breaks on paragraph → line → hard-char boundaries, then posts chunks sequentially. Files ride on the first chunk; the returned id is the first chunk's so edits and reactions still target the reply head. Channel adapter files (Discord, Telegram, …) live on the `channels` branch — a companion PR wires `maxTextLength: 1900` for Discord and `4000` for Telegram so the splitter actually engages in those installs. Without wiring, behavior is unchanged.	2026-04-21 13:04:57 +00:00
gavrielc	01ffce6f74	Revert "fix(permissions): welcome new approved channels via /welcome, route to them" This reverts commit `9776dd4f32`.	2026-04-21 15:20:06 +03:00
Koshkoshinsk	9776dd4f32	fix(permissions): welcome new approved channels via /welcome, route to them When the unknown-channel approval flow completes, seed a /welcome task into the newly-wired session so the agent greets the new user on first contact. The replayed /start (Telegram's default first-message) is filtered by the agent-runner's command-command filter, so without an explicit onboarding trigger the first interaction produced nothing. Pin the destination by its local_name from agent_destinations to avoid the agent picking the wrong named destination (previously it greeted the owner, whose DM is in CLAUDE.md). Also guard dispatchResultText against echoing trailing status lines when the agent has already sent messages explicitly via send_message. Otherwise a task-triggered flow that calls send_message then emits "welcome message sent" produces a duplicate chat to the recipient. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 11:40:12 +00:00
gavrielc	d8d61d3695	fix: Teams user-id prefix + defer cli:local owner grant parseUserId now falls back to user.kind when the id prefix isn't a registered adapter — Teams uses `29:` rather than `teams:`, so the literal prefix wouldn't resolve the channel adapter for cold DMs. init-cli-agent no longer claims the first-owner slot on `cli:local`. The CLI identity is scratch; owner promotion belongs to init-first-agent once the real channel user is wired.	2026-04-21 10:16:13 +03:00
gavrielc	0f6a1ba1ed	style: apply prettier formatting to touched files Pre-commit hook reflowed imports on files changed in the previous commit. Unrelated format drift on other files intentionally left unstaged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 23:31:42 +03:00
gavrielc	6c26c0413a	feat(router,cli): replyTo override + CLI admin-transport flows - InboundEvent gains an optional replyTo; router stamps the row's address fields from it when set, so replies can route to a different channel than the one the inbound came in on. - ChannelSetup adds onInboundEvent for admin-transport adapters that build the full event themselves. - CLI wire format accepts {text, to, reply_to}. Routed messages go through onInboundEvent and do not evict an active chat client. - init-first-agent hands the DM welcome to the running service via data/cli.sock — synchronous wake, no sweep wait. Fails loudly if the service is down; no silent fallback. - Split the CLI scratch-agent bootstrap into scripts/init-cli-agent.ts; init-first-agent is DM-only. Agents cannot set replyTo: it lives only on the inbound/router seam and is consumed once when writing messages_in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 23:30:47 +03:00
gavrielc	719f97e483	feat(permissions): unknown-channel registration flow with owner approval When the router sees a mention or DM on a messaging group that isn't wired to any agent, it now escalates to an owner for approval instead of silently dropping. Mirrors the existing unknown-sender approval pattern (ACTION-ITEMS item 22). Schema (migration 012): - `messaging_groups.denied_at TEXT NULL` — timestamp set on deny so future mentions stop escalating. ALTER TABLE ADD COLUMN, FK-safe (unlike the rebuild that bit migration 011). - `pending_channel_approvals` — PK on `messaging_group_id` gives free in-flight dedup. One card per channel, no spam on rapid retries. Router: - New hook `setChannelRequestGate(mg, event) => Promise<void>`, invoked from the no-wirings branch when the message was addressed to the bot (isMention=true). Hook is fire-and-forget. - Checks `mg.denied_at` before escalating — denied channels drop silently and do not re-prompt. - The two "no-wirings" branches (fresh auto-create and existing mg with no agents) are consolidated into one escalation path that calls the gate once. Without the module, behavior is log + record (no regression). Permissions module: - `channel-approval.ts::requestChannelApproval` — MVP picker: target agent is `getAllAgentGroups()[0]`, card names it explicitly ("Wire it to <Andy>?"). Approver via existing `pickApprover` + `pickApprovalDelivery` primitives. - Response handler: same click-auth pattern as sender-approval (clicker must be the designated approver OR have admin privilege over the target agent group). - Approve defaults per the feature spec: engage_mode = 'mention-sticky' for groups, 'pattern' + '.' for DMs sender_scope = 'known' ignored_message_policy = 'accumulate' session_mode = 'shared' DM vs group inferred from the original event's threadId (non-null → group) because the auto-created mg has a placeholder is_group=0 until the adapter fills it in. - Triggering sender is auto-added to agent_group_members so sender_scope= 'known' doesn't bounce the replayed message into a sender-approval cascade. - Deny: stamps messaging_groups.denied_at, clears pending row. - Failure modes — no owner, no agent groups, no reachable DM — log and drop without creating a pending row, letting a future attempt try again (same as sender-approval). 9 new integration tests cover every branch: mention triggers card, DM triggers card, dedup, approve creates correct wiring + admits sender + replays, approve-on-DM uses pattern/'.' defaults, deny sets denied_at and future mentions drop silently, unauthorized clicker rejected, no-owner drops, no-agent-groups drops. 168 tests pass (was 159; +9). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 14:34:00 +03:00
gavrielc	a4061a0012	refactor(channels,router): move all policy to router; bridge is transport Follow-up to `b159722`. That shrank the bridge's shouldEngage to a flood gate + coarse sticky-subscribe signal. This completes the move — policy lives exclusively in the router, the bridge is transport-only, and the conversations map + ChannelSetup.conversations + ChannelAdapter.updateConversations are all gone. Key shifts: 1. Subscribe moves from bridge to router. Bridge used to call `thread.subscribe()` from its onNewMention / onDirectMessage handlers based on a coarse "any mention-sticky wiring exists on this channel" check. That forced the decision before the router could apply per-wiring engage logic, and it relied on the conversations map being current (staleness risk). ChannelAdapter gains `subscribe?(platformId, threadId)`. The Chat SDK bridge implements it via SqliteStateAdapter.subscribe(threadId) (idempotent — a repeat call on an already-subscribed thread is a no-op). The router's fan-out loop calls it once per message when the first mention-sticky wiring actually engages. Precise, not coarse. 2. Short-circuit the drop path with one combined query. New `getMessagingGroupWithAgentCount(channelType, platformId)` does the messaging_groups lookup AND counts wirings in a single SELECT, using the existing UNIQUE(channel_type, platform_id) index on messaging_groups and UNIQUE(messaging_group_id, agent_group_id) on messaging_group_agents for the JOIN. No new indexes needed. routeInbound now short-circuits: - No messaging_groups row AND not addressed (no mention/DM) → return silently. One DB read, nothing written. This is the Discord-bot-in-a-big-guild case; we no longer auto-create rows for every plain message in every channel the bot can see. - Messaging group exists but no wirings AND not addressed → return silently. One DB read. - Otherwise fall through to sender resolution + fan-out as before. Behavioral change: plain chatter on unwired channels no longer gets dropped_messages audit rows, which used to bloat the table. Audit still fires on addressed-to-bot drops where the admin cares ("someone @-mentioned us but nobody's wired"). 3. Bridge is now purely transport. Deleted entirely: ConversationConfig, ChannelSetup.conversations, ChannelAdapter.updateConversations?, bridge's `conversations` map, buildConversationMap, shouldEngage, EngageSource, engageDecision, bridge.updateConversations method, src/index.ts buildConversationConfigs. Four handlers reduce to "resolve channel id, build InboundMessage with isMention, call onInbound". Net ~130 LOC deleted from the bridge. Collateral: the conversations-map staleness problem is gone. The upcoming channel-registration feature doesn't need any map-refresh plumbing — when an approval creates a new wiring, the next message hits the DB fresh and just works. Bridge tests prune to the narrow platform-adjacent surface (openDM delegation, subscribe presence). Host-core test that asserted the old "auto-create on every unknown message" behavior updates to reflect the new escalation-gated semantics: plain messages on unknown channels don't auto-create, mentions do. 159 tests pass (was 172 — net -13, almost entirely from bridge-engage-mode tests that covered logic now owned by the router and exercised through host-core.test.ts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 13:55:49 +03:00
gavrielc	b15972284b	refactor(channels): shrink bridge shouldEngage to flood gate + subscribe signal Before this change the bridge and the router both owned engage_mode policy. Bridge's shouldEngage had a full switch over mention / mention-sticky / pattern + source-based rules + engage_pattern regex test + ignored_message_policy accumulate fallback. Router's evaluateEngage had the same switch against the same fields. Two parallel logic paths with subtle vocabulary differences (bridge: "which SDK handler fired"; router: "what isMention says"). Every time we touched one we had to reason about the other — the Telegram hasMention bug and the "pattern mode silently drops in group chats" bug were both drift between the two. Refactor to one place. Router keeps all per-wiring policy — engage mode, pattern regex, sender scope, ignored-message policy — unchanged. Bridge drops to a coarse flood gate + subscribe signal: - forward: does this channel have ANY wiring? Forward if yes. Unknown channels still forward for subscribed/mention/dm (they may be newly auto-created, or will trigger the coming channel-registration flow). Unknown channels DROP for new-message so we don't flood from every unsubscribed thread the bot happens to sit in. - stickySubscribe: any mention-sticky wiring on the channel AND the source is mention or dm. Coarse union — subscribe is idempotent and one call serves every sticky wiring. The `text` param on shouldEngage is gone (pattern regex lives in the router now). Four bridge handler sites simplify accordingly. messageToInbound still carries the SDK-confirmed isMention flag through to the router unchanged. Behavioral delta: pure-mention-wired channels (no pattern, no accumulate) will now see every plain group message reach the router before being dropped there, where before the bridge dropped at the transport boundary. Extra DB lookup per dropped message in this specific case; acceptable for the cleaner seam and can be optimized back at the bridge if it ever matters in practice. Bridge tests prune the 10 engage_mode-specific cases that covered logic now owned by evaluateEngage in the router (host-core.test.ts covers it end-to-end through routeInbound). Bridge tests keep only what's bridge-specific: the flood gate and the stickySubscribe coarse union. 172 tests pass (was 182 — net -10 redundant bridge tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 13:32:08 +03:00
gavrielc	68058cbc4a	fix(permissions): authorize unknown-sender approval clicks The approval click handler trusted row.approver_user_id as the actor regardless of who actually clicked the card. A random user who received the forwarded card could click Approve and get the stranger admitted to the agent group — their click was simply not checked. Separately, payload.userId arrives as the raw platform userId from Chat SDK onAction (e.g. "6037840640"), not the namespaced form ("telegram:6037840640") that matches users(id). Without namespacing, users-table lookups miss. Namespace the clicker id with payload.channelType, then authorize: the clicker must be either the designated approver OR have owner / admin privilege over the agent group (hasAdminPrivilege covers owner, global admin, scoped admin). Unauthorized clicks return true (claim the response so the registry doesn't log it as unclaimed) but take no action — the pending row stays in place so a legitimate approver can still act on it. Existing tests passed a pre-namespaced userId directly, masking the first bug. Fixed the fixtures to match production plumbing and added two tests: one asserts a random bystander's click is rejected (row stays pending, no member added), the other asserts a global admin can approve even when they weren't the designated approver. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 12:16:35 +03:00
gavrielc	f74df3b0d3	fix(router): trust SDK isMention signal; drop broken hasMention regex The router's mention / mention-sticky engage check was regex-matching @<agent_group.name> (e.g. @Andy) against message text. Platforms don't work that way — users address bots via the bot's platform username (@nanoclaw_v2_refactr_1_bot on Telegram, user-id mentions on Slack / Discord). The regex matched only coincidentally and never on Telegram, so mention-mode wirings silently never fired there. Two parallel mention detectors existed: the Chat SDK's onNewMention, which correctly resolves the bot's platform identity, and the router's hasMention text regex, which ignored the SDK verdict and invented its own heuristic. The router's detector was wrong in principle — the agent group's display name is a NanoClaw-side nickname, not a platform address. Thread the SDK signal through: InboundMessage gains an optional `isMention` field, the bridge sets it from each handler (onNewMention → true, onDirectMessage → true, onSubscribedMessage → message.isMention, onNewMessage(/./) → false), src/index.ts forwards it into InboundEvent, and evaluateEngage now checks `isMention === true` for mention modes. hasMention deleted entirely — there is only one source of truth for "did the user mention this bot": the platform / SDK. Agent-name-in-text matching for disambiguating multiple agents wired to one chat is a separate feature; users can express it today with engage_mode='pattern' + the agent's name as the regex. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 12:16:20 +03:00
gavrielc	0105de0257	fix(host-sweep): skip ceiling check when heartbeat file is absent decideStuckAction treated a missing heartbeat file as heartbeatAge = Infinity, which always exceeded the 30-minute ceiling. Result: every freshly-spawned container got killed within seconds of spawn on the first sweep pass because it hadn't produced an SDK event yet (heartbeat is only touched on SDK events inside processQuery, not on boot). Skip the ceiling branch when heartbeatMtimeMs === 0. Containers that genuinely never wrote a heartbeat because they died are caught by the separate "container process not running" cleanup path. Containers that boot, claim a message, but hang at the gate are caught by the claim-stuck check below — which correctly fires regardless of heartbeat presence once claimAge exceeds tolerance. Updates the "absent heartbeat → kill-ceiling" test (which was encoding the bug) and adds a companion that the claim-stuck path still fires for absent-heartbeat containers with aged claims. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 12:15:52 +03:00
gavrielc	c38e5b11a8	fix(channels): wire accumulate mode through the bridge The router + session DB were already fully plumbed for ignored_message_policy='accumulate' — fan-out in routeInbound calls deliverToAgent(wake=false) for non-engaging agents on accumulate wirings, writeSessionMessage writes trigger=0, countDueMessages filters trigger=1, container formatter includes all messages regardless of trigger. But the Chat SDK bridge dropped non-engaging messages before the router ever saw them, so accumulate was dead on arrival for every adapter that goes through the bridge. Expose ignored_message_policy on ConversationConfig, project it in buildConversationConfigs, and widen shouldEngage's "forward" decision to "engage OR accumulate" with the union taken across all wirings on a conversation. stickySubscribe stays gated on a real engage — subscribing a thread we'd only silently accumulate on would misrepresent the bot's presence. shouldEngage return shape is now { forward, stickySubscribe } — engage was an internal concept the caller never needed, and conflating it with forward was the source of this bug. 7 new tests cover: non-engaging messages forwarding under accumulate, mixed drop/accumulate wirings taking the union, accumulate not triggering sticky subscribe, unknown-conversation drop precedence over accumulate, and drop policy preserving existing behavior on engaging messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 11:18:43 +03:00
gavrielc	ce25e1e97c	style(channels): prettier line-wrap in chat-sdk-bridge.test.ts Post-commit reformat picked up by format:fix hook on the previous commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 11:12:40 +03:00
gavrielc	52c6223292	fix(channels): register onNewMessage(/./) to fix pattern mode in group chats Chat SDK dispatch (per handling-events.mdx) is exclusive and prioritized: subscribed → onSubscribedMessage; unsubscribed + mention → onNewMention; unsubscribed + pattern match → onNewMessage. We never registered the third, so engage_mode='pattern' silently dropped every message in unsubscribed group threads — the SDK simply never surfaced them anywhere. Register chat.onNewMessage(/./, …) and route it through shouldEngage with a new 'new-message' source. Unknown-conversation policy drops for this source (would otherwise flood from every unwired channel the bot can see). mention / mention-sticky wirings ignore 'new-message' — they require an explicit @mention to start a conversation. Pattern wirings evaluate normally. Extracted shouldEngage from a closure to an exported function with an EngageSource type so it's unit-testable. Added 17 tests covering every source × engage-mode combination, unknown-conversation behavior, invalid regex fail-open, and multi-wiring union. Accumulate (ignored_message_policy='accumulate') is still not plumbed — the bridge drops non-engaging messages entirely instead of forwarding them as context-only. That requires a trigger: 0 \| 1 field on InboundMessage → router → writeSessionMessage (schema already has the column). Separate change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 11:11:56 +03:00
gavrielc	57e0cda9e5	Revert "fix(channels): pre-subscribe group threads for pattern / accumulate wirings" This reverts commit `73b20880ff`.	2026-04-20 10:35:33 +03:00

1 2 3 4 5 ...

366 Commits