Commit Graph

96 Commits

Author SHA1 Message Date
gavrielc
aeeb54a495 Merge pull request #2351 from qwibitai/feat/container-config-to-db
feat(db): move container config from filesystem to DB
2026-05-09 20:26:17 +03:00
gavrielc
aebcffe180 feat: per-group CLI scope (disabled/group/global)
Add cli_scope column to container_configs with three levels:
- disabled: agent never learns about ncl (instructions excluded from
  CLAUDE.md) and host dispatch rejects any cli_request
- group (default): agent can only access groups, sessions, destinations,
  and members resources, scoped to its own agent group with auto-filled
  --id/--agent_group_id/--group args. Help output reflects the scope.
- global: unrestricted access (current behavior)

Enforcement is host-side only — no image rebuild or env var needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-09 20:02:31 +03:00
gavrielc
be3a8a97c6 feat: race-free on-wake messages and explicit restart CLI
Decouple container restart from config updates — config CLI ops now only
write to the DB; restart is a separate `ncl groups restart` command with
--rebuild and --message flags. Add on_wake column to messages_in so wake
messages are only picked up by a fresh container's first poll, preventing
dying containers from stealing them during the SIGTERM grace window.
killContainer accepts an onExit callback for race-free respawn. Agent-
called restart auto-scopes to the calling session.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-09 19:02:15 +03:00
MoBot
78cf2433a3 fix(container-runner): raise install_packages build timeout to 15min
The 5-minute timeout in buildAgentGroupImage was tight for first-time
apt + pnpm global installs on slow networks (the exact scenario
install_packages triggers, since the image hasn't pre-installed the
requested packages). Hit ETIMEDOUT on a real install with apt + npm
packages.

900_000ms gives realistic headroom without masking genuinely hung builds.
2026-05-08 16:10:59 -04:00
gavrielc
31ccc61b27 feat(db): move container config from filesystem to DB
Source of truth for container runtime config moves from
groups/<folder>/container.json to a new container_configs table.
The file becomes a materialized view written at spawn time.

- New container_configs table with scalar columns (provider, model,
  effort, image_tag, assistant_name, max_messages_per_prompt) and
  JSON columns (mcp_servers, packages_apt, packages_npm, skills,
  additional_mounts)
- Startup backfill seeds DB from existing container.json files
- materializeContainerJson() replaces readContainerConfig + ensureRuntimeFields
- Self-mod handlers (install_packages, add_mcp_server) write to DB
- Provider cascade simplified: session -> container_configs -> 'claude'
- ncl groups config-{get,update,add-mcp-server,remove-mcp-server,
  add-package,remove-package} custom operations
- restartAgentGroupContainers() helper for config change propagation
- Container side unchanged (still reads /workspace/agent/container.json)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-08 22:27:55 +03:00
gavrielc
d5b48e4742 fix(credentials): address review feedback
- wakeContainer now never throws — returns Promise<boolean>, catches
  internally. Closes the regression risk for the 5 awaited callers in
  agent-to-agent, interactive, and approvals/response-handler that the
  previous version left unwrapped. Router uses the boolean to stop the
  typing indicator on transient failure; host-sweep just awaits.
- Tighten AUTH_REQUIRED_RE: anchor to start-of-string with the specific
  `·` (U+00B7) separator the CLI uses, so an agent that quotes the
  banner mid-sentence in a normal reply doesn't trip the classifier.
- Log a one-line note from writeAuthRequiredMessage so substitutions
  are visible when debugging "user got the credentials message but I
  don't see why."
- Add unit tests for ClaudeProvider.isAuthRequired covering both banner
  variants, trailing content, mid-sentence quoting, leading-prose
  quoting, alternate separators, and unrelated text.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:51:32 +03:00
gavrielc
5f34e26240 fix(credentials): translate auth errors and require OneCLI for spawn
Two related fixes for the case where credentials aren't usable:

1. Replace Claude Code's "Not logged in / Invalid API key · Please run
   /login" output with a host-aware message. The user can't run /login
   from chat, so the raw text is unhelpful. Provider gains an optional
   isAuthRequired() classifier; the poll-loop substitutes the message
   on both result-text and error paths.

2. Treat OneCLI gateway failure as a transient hard error instead of
   spawning a credential-less container. The catch in container-runner
   now propagates; router and host-sweep wrap wakeContainer to log and
   leave the inbound row pending so the next 60s sweep tick retries.
   Router also stops the typing indicator on failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:02:15 +03:00
exe.dev user
5845a5a980 fix(container-runner): honor agent_provider DB columns with session override
resolveProviderContribution read only containerConfig.provider (from each
group's container.json) and ignored both agent_groups.agent_provider and
sessions.agent_provider. The provider-install skills (opencode, codex)
and CLAUDE.md document those DB columns as the source of truth with
session-overrides-group precedence, but the code never consulted them —
so setting `agent_provider = 'codex'` on a group had no effect, and the
only way to route to a non-default provider was to edit the per-group
JSON directly. Discovered while wiring up Codex: DB update landed but
the spawned container kept running Claude.

Extract a pure `resolveProviderName(session, group, containerConfig)`
with the documented precedence:

    sessions.agent_provider
      → agent_groups.agent_provider
      → container.json `provider`
      → 'claude'

`resolveProviderContribution` now calls it. The container.json fallback
stays so existing installs that only set provider in JSON keep working.
Empty strings treated as unset to avoid footguns when a DB-backed form
writes '' for "no override."

Added unit tests covering precedence, null-fallthrough, empty-string
fallthrough, and case normalization.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 22:47:10 +00:00
exe.dev user
237876c2c6 chore(format): wrap session-manager import in container-runner
Pre-commit prettier reformatted this in the working tree but didn't
re-stage. Keeping it in a separate commit to avoid amending a prior
commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:12:56 +00:00
exe.dev user
bee80b0072 fix(container): clear orphan heartbeat before spawn
After a container exits, its .heartbeat file is left behind with the
mtime of its last SDK activity. When the same session spawns a new
container, the host sweep's ceiling check reads that stale mtime and
kills the freshly-spawned container within seconds — before the new
instance has had time to touch the file itself.

The sweep already has a carve-out for "no heartbeat file" (treated as a
fresh spawn, given grace), so simply removing the orphan at spawn time
restores the intended semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:12:02 +00:00
Lazer Cohen
2383bde80f fix(container): scope orphan reaper by install label so peers don't kill each other
Two installs on the same host could trash each other's containers: the
reaper used `docker ps --filter name=nanoclaw-`, a substring match that
picked up every install's containers. A crash-looping peer (e.g. a legacy
v1 plist respawning ~6k times) would call cleanupOrphans on every boot and
kill the healthy install's session containers within seconds of spawn.

- Stamp `--label nanoclaw-install=<slug>` onto every spawned container.
- cleanupOrphans filters by that label; healthy peers are left alone.
- Setup preflight enumerates `com.nanoclaw*` launchd plists / nanoclaw
  user systemd units, probes state/runs, and unloads any that are
  crash-looping (state != running AND runs > 10) before installing
  this install's service.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 12:12:30 +03:00
gavrielc
5f1b3e5cad style: apply prettier formatting to install-slug additions 2026-04-23 10:10:48 +03:00
gavrielc
7a9401ddf2 feat(setup): per-checkout service name and docker image tag
Two NanoClaw installs on the same host used to fight over the shared `com.nanoclaw` launchd label / `nanoclaw.service` systemd unit and the `nanoclaw-agent:latest` docker tag — the second install silently rewrote the service pointer and rebuilt the image out from under the first. Introduces a deterministic per-checkout slug (sha1(projectRoot)[:8]) and namespaces everything off it:

- Service: `com.nanoclaw-v2-<slug>` / `nanoclaw-v2-<slug>.service`
- Image:   `nanoclaw-agent-v2-<slug>:latest` (base), `nanoclaw-agent-v2-<slug>:<agentGroupId>` (per-group)

New shared helpers: src/install-slug.ts (host) + setup/lib/install-slug.sh (bash). Both compute the same slug so verify/probe/add-*.sh/build.sh/container-runner all agree. Any v1 `com.nanoclaw` service left on the host stays untouched and can coexist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 10:10:09 +03:00
gavrielc
e64bdb3016 refactor(claude-md): split shared base into module fragments, inject name at runtime
Move every agent-specific instruction out of the shared container/CLAUDE.md
so the base is genuinely universal. Persona/identity now comes from the
system-prompt addendum (buildSystemPromptAddendum now takes assistantName
and prepends "# You are {name}"). Per-module instructions live alongside
each MCP tool source:

  container/agent-runner/src/mcp-tools/core.instructions.md
  container/agent-runner/src/mcp-tools/scheduling.instructions.md
  container/agent-runner/src/mcp-tools/self-mod.instructions.md

composeGroupClaudeMd() scans that directory and emits `module-<name>.md`
fragments as symlinks to /app/src/mcp-tools/<name>.instructions.md (valid
via the existing RO source mount). Skill fragments renamed to
`skill-<name>.md` for naming consistency with `module-*` and `mcp-*`.

Mount tightening so composer-managed files can't be clobbered by agent
writes: nested RO mounts for /workspace/agent/CLAUDE.md and
/workspace/agent/.claude-fragments/. CLAUDE.local.md (per-group memory)
stays RW as the only writable CLAUDE.md-family file.

.gitignore: ignore CLAUDE.local.md, .claude-shared.md, .claude-fragments/
everywhere, and simplify groups/ rules to ignore the whole tree (per-
installation state, not tracked).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:14:51 +03:00
gavrielc
3db66c0ced fix: forward ONECLI_API_KEY to OneCLI SDK for authenticated container config
Ports the v1 fix from PR #1777 (originally 8b5b581 by @johnnyfish).
Cherry-pick did not apply cleanly because v2 reformatted the surrounding
code and split OneCLI usage into two sites — manual port was needed.

v2-specific adaptations:
- Also forward apiKey at the second OneCLI call site in
  src/modules/approvals/onecli-approvals.ts (v2 split the approvals
  module out of container-runner).
- Skipped the companion test-mock commit (38163bc) — it patches
  src/container-runner.test.ts, which no longer exists in v2 (tests
  consolidated into host-core.test.ts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: johnnyfish <jonathanfishner11@gmail.com>
2026-04-22 15:16:59 +03:00
gavrielc
8e1c8f8f61 style: apply prettier formatting to touched files
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 14:57:09 +03:00
gavrielc
c8fc1da719 refactor(claude-md): compose per-group CLAUDE.md from shared base + fragments
Replace the per-group "written once at init, owned by the group" CLAUDE.md
with a host-regenerated entry point that imports:

  - a shared base (`container/CLAUDE.md` mounted RO at `/app/CLAUDE.md`)
  - optional per-skill fragments (skills that ship `instructions.md`)
  - optional per-MCP-server fragments (inline `instructions` field in
    `container.json`)
  - per-group agent memory (`CLAUDE.local.md`, auto-loaded by Claude Code)

Principle: RW = per-group memory, RO = shared content. Source/skills/base
are shared; personality, config, working files, and Claude state stay
per-group.

Key changes:

  - New `src/claude-md-compose.ts` — per-spawn composition +
    `migrateGroupsToClaudeLocal()` one-time cutover.
  - New `container/CLAUDE.md` — shared base, seeded verbatim from the
    former `groups/global/CLAUDE.md`.
  - `src/container-runner.ts` — swap `/workspace/global` mount for RO
    `/app/CLAUDE.md`; call `composeGroupClaudeMd()` after
    `initGroupFilesystem()`.
  - `src/group-init.ts` — drop `.claude-global.md` symlink + initial
    `CLAUDE.md` write; seed `CLAUDE.local.md` from `opts.instructions`.
  - `src/index.ts` — call `migrateGroupsToClaudeLocal()` at startup.
  - `src/container-config.ts` — add optional `instructions` field to
    `McpServerConfig` (inline per-MCP guidance fragment).
  - `container/Dockerfile` — drop dead `/workspace/global` mkdir.
  - Remove obsolete `scripts/migrate-group-claude-md.ts`.

Migration (runs once at host startup, idempotent):

  - Delete `.claude-global.md` symlinks in each group.
  - Rename each `groups/<folder>/CLAUDE.md` → `CLAUDE.local.md`
    (preserves existing per-group content as memory).
  - Delete `groups/global/` directory.

Design docs: `docs/claude-md-composition.md` and `docs/shared-source.md`
(the latter is the sibling design discussion this refactor builds on).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:58:43 +03:00
exe.dev user
8a12fa61ac refactor: shared source — replace per-group agent-runner copies with single RO mount
Replace the per-group agent-runner-src copy model with a single shared
read-only mount. Source and skills are now RO + shared; personality,
config, working files, and Claude state stay RW + per-group.

Key changes:
- Mount container/agent-runner/src/ RO at /app/src (all groups share one copy)
- Mount container/skills/ RO at /app/skills; per-group skill selection via
  symlinks in .claude-shared/skills/ based on container.json "skills" field
- Mount container.json as nested RO bind on top of RW group dir
- Move all NANOCLAW_* env vars to container.json (runner reads at startup)
- New runner config.ts module replaces process.env reads
- Move command gate (filtered/admin) from container to host router
- Dockerfile: remove source COPY, split CLI installs (claude-code last),
  move agent-runner deps above CLIs for better layer caching
- Add writeOutboundDirect for router denial responses
- Design doc at docs/shared-src.md

Not included (follow-up): DB migration to drop agent_provider columns,
cleanup of orphaned agent-runner-src directories.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-22 12:58:43 +03:00
gavrielc
16b9499532 feat(routing): engage modes + sender scope + accumulate/drop + per-agent fan-out
Replaces the opaque trigger_rules JSON + response_scope enum on
messaging_group_agents with four explicit orthogonal columns:

    engage_mode            'pattern' | 'mention' | 'mention-sticky'
    engage_pattern         regex source; required when mode='pattern';
                           '.' is the "always" sentinel
    sender_scope           'all' | 'known'
    ignored_message_policy 'drop' | 'accumulate'

Inbound routing becomes a fan-out — every wired agent is evaluated
independently. A match gets its own session + container wake. A miss
with accumulate keeps the message as context-only (trigger=0) in that
agent's session, so when the agent does eventually engage it sees the
prior chatter.

## Schema

- Migration 010 (`engage-modes`): adds the 4 new columns, backfills
  from trigger_rules.pattern + requiresTrigger + response_scope, drops
  the legacy columns.
- messages_in gains `trigger INTEGER NOT NULL DEFAULT 1` (session DB
  schema + `migrateMessagesInTable` forward-compat).
- countDueMessages gates waking on `trigger = 1`.

## Routing

- `pickAgent` (returns one) → loop over all wired agents. Per agent:
  evaluate engage_mode; run access gate + sender-scope gate; on full
  match → resolveSession + writeSessionMessage(trigger=1) + wake. On
  miss with accumulate → writeSessionMessage(trigger=0), no wake. On
  miss with drop → skip.
- New `findSessionForAgent(agentGroupId, mgId, threadId)` scopes
  session lookup by agent so fan-out doesn't cross sessions.
- `messageIdForAgent` namespaces inbound message ids by agent_group_id
  so PRIMARY KEY doesn't collide across per-agent session DBs.

## Adapter layer

- `ConversationConfig` replaces `triggerPattern` + `requiresTrigger`
  with `engageMode` + `engagePattern`.
- Chat SDK bridge stores `Map<platformId, ConversationConfig[]>` (multi-
  agent per conversation) and applies union gating pre-onInbound:
    * onSubscribedMessage: engage if any wiring keeps firing in
      subscribed state (mention-sticky or pattern)
    * onNewMention: engage on mention; only subscribes the thread if
      at least one wiring is `mention-sticky`
    * onDirectMessage: engage per mode; sticky follows same rule
- Bridge no longer unconditionally calls `thread.subscribe()`.

## Sender scope

- Permissions module registers a second hook `setSenderScopeGate` that
  runs per-wiring after the existing access gate. `sender_scope='known'`
  requires canAccessAgentGroup(); `'all'` is a no-op. Not installed →
  no-op everywhere (default allow).

## Container side

- Host passes `NANOCLAW_MAX_MESSAGES_PER_PROMPT` (reuses existing
  MAX_MESSAGES_PER_PROMPT config; was dead code from v1).
- `getPendingMessages` queries `ORDER BY seq DESC LIMIT N`, reverses to
  chronological order for the prompt — accumulated context rides along
  with trigger rows up to the cap.
- `MessageInRow` gains `trigger: number` so the container can tell them
  apart in downstream code (container still processes both; only the
  host uses `trigger=0` for don't-wake).

## Defaults (per ACTION-ITEMS item 1 decision)

- DM (is_group=0): `engage_mode='pattern'`, `engage_pattern='.'` (always)
- Threaded group: `engage_mode='mention-sticky'` (seed-discord)
- Non-threaded group / CLI: pattern '.' in bootstrap scripts

## Tests

- src/host-core.test.ts: 3 new cases — fan-out (2 agents, 2 sessions,
  2 wakes), accumulate (trigger=0 + no wake), drop (no session created).
- Existing 10 host-core tests still pass.
- Migration 010 runs on an empty DB in 0-row path — verified.

Closes: ACTION-ITEMS items 1, 4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:30:04 +03:00
gavrielc
6a815190c0 feat(lifecycle): stuck detection + heartbeat lifecycle + SDK tool blocklist
Replaces the two overlapping old mechanisms (30-min setTimeout kill in
container-runner, 10-min heartbeat STALE_THRESHOLD reset in host-sweep)
with message-scoped stuck detection anchored to the processing_ack claim
age + an absolute 30-min ceiling that extends for long-declared Bash
tools.

Old model problems:
- IDLE_TIMEOUT setTimeout fired on plain wall-clock time; slow-but-alive
  agents got killed at 30min regardless of activity
- 10-min STALE_THRESHOLD in the sweep was unreliable — the heartbeat is
  only touched on SDK events, so legitimate silent tool work (sleep 30,
  long WebFetch, npm install) looked identical to a hung container
- Two overlapping sources of truth for "when to let go of a container"

New model:
- Host sweep is the single source of truth.
- Container exposes a new `container_state` single-row table in outbound.db
  (schema added; container writes, host reads). PreToolUse hook writes
  current_tool + tool_declared_timeout_ms (read from Bash's tool_input);
  PostToolUse / PostToolUseFailure clear it.
- Sweep decides with a pure helper `decideStuckAction`:
    * absolute ceiling — kill if heartbeat age > max(30min, bash_timeout)
    * per-claim stuck  — kill if any processing_ack row has claim_age >
      max(60s, bash_timeout) AND heartbeat hasn't been touched since claim
    * otherwise ok
  Kill paths reset leftover processing rows with exponential backoff,
  reusing the existing retry machinery.

Tool blocklist expanded:
- AskUserQuestion (SDK placeholder; we have mcp__nanoclaw__ask_user_question)
- EnterPlanMode, ExitPlanMode, EnterWorktree, ExitWorktree (Claude Code UI
  affordances; would hang in headless containers)
PreToolUse hook is also defense-in-depth: if a disallowed tool name slips
through, it returns `{ decision: 'block' }` so the agent sees a clear
error instead of appearing stuck.

Removed:
- container-runner.ts: IDLE_TIMEOUT setTimeout, resetIdle callback on
  activeContainers entry, resetContainerIdleTimer export.
- delivery.ts: the resetContainerIdleTimer call on successful delivery.
- poll-loop.ts: IDLE_END_MS + its setInterval. Keeping the query open is
  cheaper than close+reopen (no cold prompt cache). Liveness is now a
  host-side concern.
- host-sweep.ts: 10-min STALE_THRESHOLD_MS + getStuckProcessingIds in the
  stale-detection path (still exported for kill reset).

Tests:
- src/host-sweep.test.ts — 9 tests for decideStuckAction covering: fresh
  heartbeat, absolute ceiling, absent heartbeat, Bash-timeout extension
  (both ceiling and per-claim), claim age below tolerance, heartbeat
  touched after claim, unparseable timestamps.

Ref: docs/v1-vs-v2/ACTION-ITEMS.md items 9, 6a, 10.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:16:57 +03:00
gavrielc
7169c25e70 refactor: relocate outbox I/O to session-manager + dead-code sweep
## Outbox extraction (delivery.ts → session-manager.ts)

File I/O for outbound attachments now lives in session-manager.ts alongside
the symmetric inbound extractAttachmentFiles. delivery.ts no longer touches
the filesystem — it hands buffers to the adapter and calls clearOutbox on
success.

- New `readOutboxFiles(agentGroupId, sessionId, messageId, filenames)` and
  `clearOutbox(agentGroupId, sessionId, messageId)` in session-manager.ts.
- deliverMessage in delivery.ts loses ~35 lines of fs/path code and its
  `fs`/`path` imports.

## Dead-code sweep

TypeScript's --noUnusedLocals surfaced several cruft imports. Fixed:

- src/container-runner.ts: drop unused `markContainerIdle` import; drop
  unused `session` parameter from `buildContainerArgs` signature.
- src/delivery.ts: drop unused `getSession`, `writeSessionMessage`,
  `wakeContainer` imports.
- src/host-sweep.ts: drop unused `updateSession`, `outboundDbPath` imports.
- container/agent-runner/src/poll-loop.ts: drop unused `config`,
  `processingIds` params from `processQuery`.
- Test files: drop unused imports in channel-registry.test, db-v2.test,
  host-core.test.

Skipped: `conversations` state in chat-sdk-bridge.ts (never read but
tangled with public `updateConversations` method; cleaning it risks a
merge conflict with the channels branch at the next sync).

## Validation

- `pnpm run build` clean
- `pnpm test` — 137 host tests pass
- `bun test` in container/agent-runner — 17 tests pass
- Service boots (`NanoClaw running`, `OneCLI approval handler started`)
  and shuts down cleanly on SIGTERM

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:34:08 +03:00
gavrielc
46b19dcf9c refactor(modules): extract agent-to-agent as registry-based module
Last extraction of Phase 3. Moves inter-agent messaging + create_agent +
destination projection into src/modules/agent-to-agent/. Core retains:

- `channel_type === 'agent'` dispatch in delivery.ts, guarded by
  hasTable('agent_destinations') + dynamic import into module.
- Channel-permission ACL in delivery.ts, guarded by hasTable, with
  inlined SQL (no module import from core).
- writeDestinations call in container-runner.ts, guarded by hasTable +
  dynamic import into module.
- createMessagingGroupAgent's destination side effect in db/messaging-groups.ts,
  guarded by hasTable. This is a documented transitional tier violation
  (core imports from optional module), analogous to src/access.ts.

Migration `004-agent-destinations.ts` renamed to `module-agent-to-agent-
destinations.ts` preserving `name: 'agent-destinations'` so existing DBs
don't re-run it.

delivery.ts: 600 → 449 lines. handleSystemAction's last switch case gone
(just registry + default log-and-drop). notifyAgent helper removed (only
create_agent used it).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:00:10 +03:00
gavrielc
7cc4ecc3be refactor(modules): extract permissions as optional module
Moves user-roles / users / agent-group-members / user-dms /
dropped-messages / user-dm / canAccessAgentGroup into
src/modules/permissions/. Module registers a single inbound-gate that
owns sender resolution, access decision, unknown-sender policy, and
drop-audit recording.

Router slimmed from 357 → 179 lines; the inline fallback chain
(extractAndUpsertUser / enforceAccess / handleUnknownSender /
recordDroppedMessage) is gone — without the permissions module core
defaults to allow-all with userId=null.

container-runner's admin-ID query is now inline SQL guarded by
sqlite_master on user_roles, keeping core free of any import from the
permissions module. The container-side formatter falls back to
permissionless mode when NANOCLAW_ADMIN_USER_IDS is empty: every sender
with an identifiable senderId is treated as admin.

Module contract doc formalizes the tier model and the dependency rule
(core ← default modules ← optional modules). One transitional violation
flagged: src/access.ts (core) imports from the permissions module for
its remaining approver-picking helpers; resolves in the planned PR #7
re-tier.

Validation: host build clean, 137/137 host tests, 17/17 container
tests, typecheck clean, service boots to "NanoClaw running" with
permissions module registering its gate and clean SIGTERM shutdown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 17:42:14 +03:00
gavrielc
4202041d0b refactor: scaffold module registries and default-module layout
Additive change — existing code paths still run via inline fallbacks.
Prepares core for per-module extractions in PR #3 onward.

Four registries added with empty defaults:
  - delivery action handlers (delivery.ts)
  - router inbound gate (router.ts)
  - response dispatcher (index.ts)
  - MCP tool self-registration (container/agent-runner/src/mcp-tools/server.ts)

Default modules moved to src/modules/ for signaling:
  - src/modules/typing/       (extracted from delivery.ts)
  - src/modules/mount-security/ (moved from src/mount-security.ts)

Both are imported directly by core — no hook, no registry. Removal
requires editing core imports.

Migrator now keys applied rows by name (uniqueness) so module
migrations can pick arbitrary version numbers. Stored version column
is auto-assigned as an applied-order sequence.

sqlite_master guards added around core calls into module-owned tables
(user_roles, agent_destinations, pending_questions). No-ops today;
load-bearing after the owning modules are extracted.

MODULE-HOOK markers placed at scheduling's two skill-edit sites
(host-sweep.ts recurrence call, poll-loop.ts pre-task gate). PR #4
replaces the marked blocks when scheduling moves to its module.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:46:19 +03:00
gavrielc
1f3b023a5a refactor(v2/providers): self-registration barrel + host container-config registry
Providers now mirror the channels pattern: each module calls
registerProvider() at top level, and providers/index.ts is a barrel of
side-effect imports. createProvider() becomes a thin registry lookup;
the closed ProviderName union is gone (now a string alias, since the
env var is a runtime string anyway).

Also adds a host-side provider-container-registry so providers can
declare their own mounts and env passthrough in src/providers/<name>.ts
instead of the container-runner having to know about each one. The
resolver runs once per spawn and threads provider + contribution
through buildMounts and buildContainerArgs so side effects (mkdir,
etc.) fire exactly once.

Both barrels are append-only — adding a new provider is a new file
+ one import line per barrel, no edits to existing files. The built-in
providers (claude, mock) don't need host-side config, so src/providers/
ships with an empty barrel; the container-side barrel imports both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 12:17:09 +03:00
gavrielc
c5d0ef8b4f feat(v2): migrate container runtime to Bun, improve image build surface
Container side:
- agent-runner switches to Bun. Drops better-sqlite3 (native compile gone),
  drops tsc build step in-image AND the tsc-on-every-session-wake in the
  entrypoint — bun runs src/index.ts directly. bun:sqlite replaces
  better-sqlite3; cross-mount DB invariants (journal_mode=DELETE, busy_timeout)
  preserved. Named params converted from @name to $name because bun:sqlite
  does not auto-strip the prefix the way better-sqlite3 does.
- Tests ported from vitest to bun:test (only describe/it/expect/before/afterEach
  used, API-compatible). vitest.config.ts excludes container/agent-runner/.
- bun.lock replaces pnpm-lock.yaml + pnpm-workspace.yaml under
  container/agent-runner/. Host pnpm workspace does NOT include this tree.

Dockerfile improvements (independent of Bun but bundled while touching the file):
- tini as PID 1 for correct SIGTERM propagation (prevents half-written
  outbound.db on shutdown).
- Extracted entrypoint.sh — readable and diffable vs the old inline printf.
- BuildKit cache mounts for apt + bun install + pnpm install.
- --no-install-recommends on apt, pinned CLAUDE_CODE_VERSION, AGENT_BROWSER,
  VERCEL, BUN_VERSION.
- CJK fonts (~200MB) behind ARG INSTALL_CJK_FONTS=false; build.sh reads from
  .env; setup/container.ts reads the same .env so /setup and manual rebuild
  stay in sync.
- PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1 in case any postinstall tries to pull a
  redundant Chromium.
- /home/node 755 (was 777).

Host side:
- src/container-runner.ts dynamic spawn command collapses from
  `pnpm exec tsc --outDir /tmp/dist … && node /tmp/dist/index.js` to
  `exec bun run /app/src/index.ts` — cold start ~200-500ms faster per wake.

CI:
- oven-sh/setup-bun@v2 alongside Node/pnpm. Adds explicit container
  typecheck (was documented in CLAUDE.md, not enforced) and `bun test` for
  agent-runner tests.
2026-04-17 11:38:01 +03:00
gavrielc
504e5296c9 fix: pnpm-lock sync, per-group build-script allowlist, ppnpm typos
- Regenerate pnpm-lock.yaml to match v2 package.json (Baileys 6.17.16,
  @chat-adapter/linear 4.26.0)
- src/container-runner.ts: when install_packages rebuilds a per-group
  image, append each installed package to /root/.npmrc's
  only-built-dependencies before pnpm install -g, so packages with
  postinstall scripts (playwright, puppeteer, native addons) don't
  install silently broken
- Fix stray 'ppnpm uninstall' in 13 skill files (REMOVE.md + SKILL.md)
  left over from the npm→pnpm sed pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 09:25:59 +03:00
meeech
0b06343323 chore: use pnpm in dynamic container commands
Replace npx tsc with pnpm exec tsc in container entrypoint command,
and npm install -g with pnpm install -g in dynamic Dockerfile
generation. Variable names (npm, npmPackages) intentionally kept as
they refer to npm-registry packages, not the CLI.
2026-04-17 09:18:29 +03:00
gavrielc
81d45b5be9 refactor(v2): remove builder-agent dev-agent/worktree/swap flow
The dev-agent-in-worktree approach for source self-modification is abandoned
in favor of a direct draft/activate flow with OS-level RO enforcement
(planned, not yet implemented). Strip the whole subgraph:
src/builder-agent/, pending-swaps DB module + migration 006, builder-agent
MCP tools, and all host wiring (startup sweep, approval routing, deadman,
worktree mount, freeze gate). Tool descriptions in self-mod.ts / agents.ts
no longer cross-reference create_dev_agent. CLAUDE.md + v2-checklist updated
to describe the new direction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:15:13 +03:00
gavrielc
75c2fde2b5 feat(v2): builder-agent self-modification WIP + container-config as per-group file
Checkpoints the builder-agent dev-agent/worktree/swap flow (create_dev_agent,
request_swap, classifier, deadman, promote) before pivoting to a unified
draft-activate approach with OS-level RO enforcement. Lifts container_config
out of the agent_groups row into groups/<folder>/container.json so install_packages,
add_mcp_server, and rebuild flows can eventually route through the same draft
path as source edits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:15:13 +03:00
gavrielc
0d3326aae5 feat(v2): user-level privilege model + cold DM infra + init-first-agent skill
Replaces the agent-group-centric "main group" concept with user-level
privileges and adds the cold-DM infrastructure needed for proactive
outbound messaging (pairing, approvals, welcome flows).

Privilege model
- New tables: users, user_roles (owner global-only; admin global or
  scoped to an agent_group), agent_group_members (explicit non-
  privileged access; admin/owner imply membership), user_dms (cold-DM
  resolution cache).
- Removed agent_groups.is_admin, messaging_groups.admin_user_id. Replaced
  with messaging_groups.unknown_sender_policy (strict | request_approval
  | public) for per-chat unknown-sender gating.
- src/access.ts: canAccessAgentGroup, pickApprover, pickApprovalDelivery.
- src/router.ts: access gate on every inbound, honoring
  unknown_sender_policy for unknown senders.
- src/channels/telegram.ts: pairing interceptor upserts the paired user
  and promotes them to owner if hasAnyOwner() is false (first-pair-wins).

Cold DM infrastructure
- ChannelAdapter.openDM?(handle) — optional method. Chat-SDK-bridge wires
  it to chat.openDM() for resolution-required channels (Discord, Slack,
  Teams, Webex, gChat); direct-addressable channels (Telegram, WhatsApp,
  iMessage, Matrix, Resend) fall through to the handle directly.
- src/user-dm.ts: ensureUserDm(userId) — resolves + caches via user_dms.

Approval routing
- onecli-approvals + delivery use pickApprover + pickApprovalDelivery:
  scoped admins → global admins → owners (dedup), first reachable via
  ensureUserDm, same-channel-kind tie-break. Approvals land in the
  approver's DM, not the origin chat.

Delivery fixes
- delivery.ts ACL rejection now throws instead of returning undefined —
  the outer loop previously marked rejected messages as delivered.
- Implicit-origin allow: session.messaging_group_id === target skips the
  destination check.
- createMessagingGroupAgent auto-creates the companion agent_destinations
  row (normalized local_name from the messaging group's name, collision-
  broken within the agent's namespace).

Container
- container-runner.ts: /workspace/global always read-only; drops
  NANOCLAW_IS_ADMIN; adds NANOCLAW_ADMIN_USER_IDS (owners + global admins
  + scoped admins for this agent group). Agent-runner poll-loop gates
  slash commands against that set.

New skill: /init-first-agent
- Walks the operator through standing up the first agent for a channel:
  channel pick → identity lookup (reads each channel SKILL.md's
  ## Channel Info > how-to-find-id) → DM platform_id resolution (direct-
  addressable, cold-DM via "user DMs bot first + sqlite lookup", or
  Telegram pair-code fallback) → run scripts/init-first-agent.ts →
  verify via tail of nanoclaw.log.
- scripts/init-first-agent.ts: parameterized helper that upserts the
  user + grants owner (if none), creates dm-with-<display-name> agent
  group + initGroupFilesystem, reuses/creates the DM messaging_group,
  wires it (auto-creates destination), resolves the session, and writes
  a kind:'chat' / sender:'system' welcome message into inbound.db. Host
  sweep wakes the container and the agent DMs the operator via the
  normal delivery path.

/manage-channels rewrite
- Drops --is-main / --jid / main-vs-non-main isolation references.
- First-channel flow delegates to /init-first-agent.
- Explains createMessagingGroupAgent auto-creates destinations.
- Adds a privileged-users show section.

setup/
- register.ts: drop --is-main, --jid, --local-name, --trigger
  requiresTrigger defaults; call initGroupFilesystem; normalize to
  v2 schema (no is_admin, no admin_user_id, sets unknown_sender_policy
  'strict'); let createMessagingGroupAgent handle the destination row.
- pair-telegram.ts: emit PAIRED_USER_ID (namespaced "telegram:<id>")
  instead of ADMIN_USER_ID; update header comment.
- register.test.ts deleted — was v1-only, tested a registered_groups
  table that no longer exists.

Docs
- v2-architecture-diagram.{md,html}: ER diagram updated to drop
  is_admin/admin_user_id, add unknown_sender_policy, and include
  users/user_roles/agent_group_members/user_dms.
- v2-architecture-draft.md: approval-routing paragraph rewritten for
  pickApprover/pickApprovalDelivery/ensureUserDm; SQL schema block
  updated; admin-verification paragraph references
  NANOCLAW_ADMIN_USER_IDS.
- v2-setup-wiring.md: entity-model sketch rewritten.
- v2-checklist.md: marked privilege refactor / container filtering /
  approval routing / unknown-sender gating done; removed obsolete
  admin_user_id and main-vs-non-main items.

Scripts
- scripts/init-first-agent.ts (new) replaces scripts/welcome-owner-dm.ts
  (removed; welcome-owner was a Discord-specific one-off).
- test-v2-host.ts, test-v2-channel-e2e.ts, seed-discord.ts: drop
  is_admin + admin_user_id, use unknown_sender_policy.

Tests
- src/access.test.ts (new): 14 tests for canAccessAgentGroup, role
  helpers, pickApprover, ensureUserDm, pickApprovalDelivery.
- src/db/db-v2.test.ts: adds 3 tests for the auto-created
  agent_destinations row (normalized name, no duplicates, collision
  break within an agent group).
- host-core.test.ts, channel-registry.test.ts: updated fixtures to
  use unknown_sender_policy: 'public' where the test exercises routing
  rather than the access gate.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:03:51 +03:00
gavrielc
2e6dc21748 refactor(v2): per-group filesystem init, persistent across spawns
Each group's on-disk state (CLAUDE.md, .claude-shared/, agent-runner-src/)
is now initialized exactly once at group creation and owned by the group
forever after. Spawn does only mounts — no copies, no settings.json
overwrites, no skill clobbers, no source resyncs.

Global memory composition switches from "host reads /workspace/global/CLAUDE.md
at bootstrap and stuffs it into systemPrompt.append" to "group CLAUDE.md
imports it via @/workspace/global/CLAUDE.md at the top." Edits to global
propagate instantly through the existing read-only mount; no copy, no
restart.

- src/group-init.ts: new initGroupFilesystem(group, opts?) — idempotent,
  populates groups/<folder>/, .claude-shared/, agent-runner-src/ only when
  paths don't already exist.
- src/container-runner.ts: buildMounts() calls init defensively at the
  top (catches existing groups on first spawn after this change), drops
  the inline settings.json write, skills cpSync loop, and agent-runner-src
  rm-then-copy. Just mounts now.
- src/delivery.ts: create_agent flow uses initGroupFilesystem with
  optional instructions, replacing the inline mkdirSync + writeFileSync.
- container/agent-runner/src/index.ts: drops GLOBAL_CLAUDE_MD reading.
  systemContext.instructions is now only the runtime-generated
  destinations addendum.
- scripts/migrate-group-claude-md.ts: one-shot migration that prepends
  the @-import to existing groups' CLAUDE.md. Skips if global doesn't
  exist or if the @-import is already present (regex match on the @ form
  to avoid false positives from prose mentions of the path).
- groups/main/CLAUDE.md: prepended by the migration.

Existing groups need a one-time wipe of their agent-runner-src/ dir so
init re-populates from current host source — done locally before this
commit. Future host-side updates to container/skills/ or
container/agent-runner/src/ won't auto-propagate; that's the trade-off
for unconditional persistence and will be covered by host-mediated
refresh tools in a follow-up.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:17:50 +03:00
gavrielc
d4aacfe416 fix(v2): clear per-group agent-runner src before copy
fs.cpSync never removes files that disappeared from the source, so
renamed or deleted files linger in data/v2-sessions/<group>/agent-runner-src/.
The container's entrypoint runs tsc over the whole mounted src via
tsconfig's `include: ["src/**/*"]`, so a single stale file fails the
compile and the container exits 2.

Latent since the dir was introduced — surfaced when the provider
interface refactor made a leftover index-v2.ts stop typechecking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:25:40 +03:00
gavrielc
e92b245399 feat(v2): OneCLI 0.3.1 — approvals, credential collection, threaded routing
Three features built on top of @onecli-sh/sdk 0.3.1, landed together because
they share wiring surfaces (session DB schema, delivery dispatcher, Chat SDK
bridge, channel adapter contract).

## OneCLI manual-approval handler

* `src/onecli-approvals.ts` — long-polls OneCLI via the SDK's
  `configureManualApproval`; on each request, delivers an `ask_question` card
  to the admin agent group's first messaging group, persists a
  `pending_approvals` row, and waits on an in-memory Promise resolved by the
  admin's button click or an expiry timer. Expired cards are edited to
  "Expired (...)" and a startup sweep flushes any rows left over from a
  previous process.
* Short 11-byte approval id (`oa-<8 base36>`) instead of the SDK's UUID so the
  Telegram 64-byte `callback_data` limit is respected; the OneCLI UUID stays
  in the persisted payload for audit.
* Migration 003 consolidated: `pending_approvals` now has the OneCLI-aware
  columns from the start (`agent_group_id`, `channel_type`, `platform_id`,
  `platform_message_id`, `expires_at`, `status`), `session_id` relaxed to
  nullable so cross-session approvals fit.
* `handleQuestionResponse` in `src/index.ts` now routes OneCLI approvals
  through `resolveOneCLIApproval` before falling back to the
  session-bound approval path.

## Credential collection from chat

New `trigger_credential_collection` MCP tool — the agent researches a
third-party API, calls the tool with `{name, hostPattern, headerName,
valueFormat, description}`, and blocks until the host reports saved, rejected,
or failed. The credential value never enters the agent's context: the user
submits it into a Chat SDK Modal on the host side, the host writes it to
OneCLI via a thin facade (`src/onecli-secrets.ts` — shells out to
`onecli secrets create`, shape mirrors the SDK we expect upstream), and only
the status string flows back to the container via a system message.

* `src/credentials.ts` — host-side handler: delivers the card to the
  conversation's own channel (not the admin channel — credential collection
  is a user-facing flow, distinct from admin approval), persists a
  `pending_credentials` row, drives the submit → `createSecret` → notify
  pipeline. Falls back gracefully when the channel doesn't support modals.
* `src/db/credentials.ts` + migration 005: `pending_credentials` table.
* `src/channels/chat-sdk-bridge.ts`: renders a `credential_request` card,
  handles the `nccr:` action prefix by opening a Modal with a TextInput,
  registers an `onModalSubmit` handler for the `nccm:` callback prefix.
* `container/agent-runner/src/mcp-tools/credentials.ts`: the blocking MCP
  tool, mirroring the `ask_user_question` polling pattern.
* `container/agent-runner/src/db/messages-in.ts`: `findCredentialResponse`
  helper to pick up the system message the host writes back.

## Threaded adapter routing

The destination layer previously didn't carry thread context, so agent replies
to Discord always landed in the root channel regardless of which thread the
inbound came from.

* `ChannelAdapter.supportsThreads: boolean` — declared by every channel skill
  at `createChatSdkBridge`. Threaded: Discord, Slack, Teams, Google Chat,
  Linear, GitHub, Webex. Non-threaded: Telegram, WhatsApp Cloud, Matrix,
  Resend, iMessage.
* `src/router.ts`: non-threaded adapters strip `threadId` at ingest (threads
  collapse to channel-level sessions). Threaded adapters override the
  wiring's `session_mode` to `'per-thread'` so each thread = a session
  (except `agent-shared`, which is preserved as a cross-channel intent the
  adapter can't know about).
* `session_routing` table in `inbound.db` — single-row default reply routing
  written by the host on every container wake from
  `session.messaging_group_id` + `session.thread_id`. Forward-compat
  `CREATE TABLE IF NOT EXISTS` handles older session DBs lazily.
* `container/agent-runner/src/db/session-routing.ts` — container-side reader.
* `send_message` / `send_file` / `ask_user_question` / `send_card` /
  scheduling tools all default their routing (channel, platform, **and**
  thread) from the session when no explicit `to` is given. Explicit `to`
  uses the destination's channel with `thread_id = null` (cross-destination
  sends start a new conversation elsewhere).
* `poll-loop.ts::sendToDestination` (the final-text single-destination
  shortcut) now inherits `thread_id` from `RoutingContext` too — this was
  the root cause of Discord replies landing in the root channel even after
  `send_message` was wired correctly.

## Related cleanups

* `src/container-runner.ts`: OneCLI agent identifier switched from the lossy
  folder-derived string to `agent_group.id`, making `getAgentGroup(externalId)`
  a trivial reverse lookup for per-agent scoping.
* `wakeContainer` race fix via an in-flight promise map — concurrent wakes
  during the async buildContainerArgs / OneCLI `applyContainerConfig` window
  no longer double-spawn containers against the same session directory.
* `src/db/db-v2.test.ts`: dropped the brittle `expect(row.v).toBe(N)` schema
  version assertion — it had to be bumped on every migration addition.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:18:21 +03:00
gavrielc
b591d7ce96 refactor: move destinations from JSON file into inbound.db
The per-session destination map was being written as a sidecar JSON file
(/workspace/.nanoclaw-destinations.json) — inconsistent with the rest of
v2, where all host↔container IO goes through inbound.db / outbound.db.

Move it into a `destinations` table in INBOUND_SCHEMA. The host writes
it before every container wake AND on demand (e.g. after create_agent)
so the creator sees the new child destination mid-session without a
restart. The container queries the table live on every lookup — no
cache, no staleness window.

- src/db/schema.ts: add `destinations` table to INBOUND_SCHEMA.
- src/session-manager.ts: writeDestinationsFile → writeDestinations,
  writes via DELETE + INSERT inside a transaction.
- src/delivery.ts: create_agent handler calls writeDestinations on the
  creator's session after inserting the new destination rows.
- container/agent-runner/src/destinations.ts: queries inbound.db
  directly in every findByName/getAllDestinations/findByRouting call.
  No more cache. No setDestinationsForTest (obsolete). No fs import.
- container/agent-runner/src/index.ts and mcp-tools/index.ts: remove
  loadDestinations() calls — no longer needed.
- Test helper initTestSessionDb creates the destinations table.
  Integration test inserts a row directly instead of mocking the cache.

No backwards compatibility: sessions predating the schema update must
be recreated. This is fine on the v2 branch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 16:45:53 +03:00
gavrielc
e83ffbc103 feat: named destinations + permission enforcement + fire-and-forget self-mod
Replaces implicit routing context (NANOCLAW_PLATFORM_ID env vars) with
per-agent named destination maps. Agents reference channels and peer
agents by local names; the host re-validates every outbound route against
a new agent_destinations table that is both the routing map and the ACL.

Model changes:
- New migration 004 adds agent_destinations (agent_group_id, local_name,
  target_type, target_id). Backfills from existing messaging_group_agents.
- Host writes /workspace/.nanoclaw-destinations.json before every container
  wake so admin changes take effect on next start.
- Container loads map at startup, appends system-prompt addendum listing
  available destinations and the <message to="name">…</message> syntax.
- Agent main output is parsed for <message to="..."> blocks; each block
  becomes a messages_out row with routing resolved via the local map.
  Untagged text and <internal>…</internal> are scratchpad (logged only).
- send_message MCP tool now takes `to` (destination name) instead of raw
  routing fields. send_to_agent deleted (redundant — agents are just
  destinations). send_file/edit_message/add_reaction route via map too.
- Inbound formatter adds from="name" attribute via reverse-lookup so the
  agent sees a consistent namespace in both directions.

Permission enforcement:
- Host checks hasDestination() before every channel delivery AND every
  agent-to-agent route. Unauthorized messages dropped and logged.
- routeAgentMessage simplified: ~15 lines, no JSON parse, content copied
  verbatim (target formatter resolves the sender via its own local map).
- create_agent is admin-only, checked at both the container (tool not
  registered for non-admins) and the host (re-check on receive). Inserts
  bidirectional destination rows so parent↔child comms work immediately.
  Includes path-traversal guard on folder name.

Self-modification cleanup:
- add_mcp_server now requires admin approval (previously had none).
- install_packages validates package names on BOTH sides (container tool
  + host receiver) with strict regex. Max 20 packages per request.
- All three self-mod tools are fire-and-forget: write request, return
  immediately with "submitted" message. Admin approval triggers a chat
  notification to the requesting agent — no tool-call polling, no 5-min
  holds. On rebuild/mcp_server approval, the container is killed so the
  next wake picks up new config/image.
- Approval delivery extracted into requestApproval() helper (the one
  place where three call sites were literally identical).

Also folded in the phase-1 dynamic import cleanup (create_agent no longer
does `await import('./db/agent-groups.js')`) and removes NANOCLAW_PLATFORM_ID
/ CHANNEL_TYPE / THREAD_ID env-var routing entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 16:31:37 +03:00
gavrielc
d8fbd3b239 feat: agent-to-agent communication, dynamic agent creation, self-modification tools
Agent-to-agent: host routes messages with channel_type='agent' to target
agent's inbound.db, enriches with sender info, wakes target container.
Bidirectional routing works via inherited routing context.

Dynamic agents: create_agent MCP tool + system action handler creates
agent groups, folders, and optional CLAUDE.md on the fly.

Self-modification: install_packages (apt/npm, requires admin approval),
add_mcp_server (no approval), request_rebuild (builds per-agent-group
Docker image with approved packages). Approval flow reuses interactive
card infrastructure with pending_approvals table.

Also includes fixes from prior session: attachment download, reply context
extraction, message editing (platform message ID tracking), delivery retry
limits, and card update on button click.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 01:11:06 +03:00
gavrielc
b76fd425c8 style: prettier formatting fixes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 12:18:31 +03:00
gavrielc
82cb363f84 v2: split session DB into inbound/outbound for write isolation
Eliminates SQLite write contention across the host-container mount
boundary by splitting the single session.db into two files, each with
exactly one writer:

  inbound.db  — host writes (messages_in, delivered tracking)
  outbound.db — container writes (messages_out, processing_ack)

Key changes:
- Host uses even seq numbers, container uses odd (collision-free)
- Container heartbeat via file touch instead of DB UPDATE
- Scheduling MCP tools now emit system actions via messages_out
  (host applies them to inbound.db during delivery)
- Host sweep reads processing_ack + heartbeat file for stale detection
- OneCLI ensureAgent() call added (was missing from v2, caused
  applyContainerConfig to reject unknown agent identifiers)

Verified: tsc clean, 327 tests pass, real e2e through Docker works.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 12:17:31 +03:00
gavrielc
9486d56b01 v2: make v2 the main entry point, move v1 to src/v1/
- Move all v1 files (index, router, container-runner, db, ipc, types,
  logger, channels/registry, and all utilities) to src/v1/ as a
  fully self-contained archive with no shared dependencies
- Rename v2 files to remove -v2 suffix (index-v2.ts → index.ts, etc.)
- Update all imports across v2 source, tests, and setup files
- Migrate shared utilities (config, env, container-runtime, mount-security,
  timezone, group-folder) from pino logger to v2 log module
- Migrate setup/ files from logger to log with argument order swap
- Container agent-runner: move v1 entry to v1/, rename v2 to index.ts
- Update setup skill to offer all 13 v2 channels
- Install all Chat SDK adapter packages
- dist/index.js now runs v2; dist/v1/index.js runs v1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 11:40:36 +03:00
gavrielc
90acff28ad chore: set printWidth to 120 and reformat
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 23:34:03 +03:00
Sargun Vohra
1488c5b251 fix: add writable global memory mount for main agent
Main group had no mount for the global memory directory
(/workspace/global), so it could only reach it through the read-only
project root. This meant the main agent couldn't write to global
memory despite groups/main/CLAUDE.md instructing it to do so.

Add a writable mount at /workspace/global for the isMain branch,
matching the read-only mount that non-main groups already have.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:11:48 -07:00
gavrielc
032ba77a7f feat: mount store rw for main agent and add requiresTrigger to register_group
- Mount store/ separately as read-write so the main agent can access
  the SQLite database directly.
- Add requiresTrigger parameter to the register_group MCP tool
  (host IPC already supported it, but the tool never exposed it).
  Defaults to false (no trigger).
- Update group registration instructions to ask user about trigger.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 16:17:57 +03:00
gavrielc
acb0abaf8b fix: broken tests and stale .env.example
- Fix container-runner bug: stopContainer() returns void but was
  passed to exec() as a command string. Replace with direct call
  and try/catch.
- Mock container-runtime in tests so they don't need Docker running.
- Increase claw-skill test timeout to handle slower python startup.
- Clear .env.example (telegram token was added by mistake).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 23:19:07 +03:00
gavrielc
deece6bf35 Merge branch 'main' into feat/scheduled-task-scripts-clean 2026-03-25 17:27:59 +02:00
gavrielc
b112fafff4 Merge branch 'main' into fix/agent-runner-cache-refresh 2026-03-25 17:27:23 +02:00
Gabi Simons
1b18d50ae4 Merge branch 'main' into feat/scheduled-task-scripts-clean 2026-03-25 02:25:23 -07:00
Daniel M
d05a8dec49 fix: refresh stale agent-runner source cache on code changes
Closes #1361

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 15:21:13 +00:00
Gabi Simons
35722801e3 style: fix prettier formatting
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 15:49:42 +02:00
Guy Ben Aharon
e9369617fb feat: replace credential proxy with OneCLI gateway for secret injection 2026-03-23 14:45:58 +02:00