Commit Graph

878 Commits

Author SHA1 Message Date
gavrielc
16b9499532 feat(routing): engage modes + sender scope + accumulate/drop + per-agent fan-out
Replaces the opaque trigger_rules JSON + response_scope enum on
messaging_group_agents with four explicit orthogonal columns:

    engage_mode            'pattern' | 'mention' | 'mention-sticky'
    engage_pattern         regex source; required when mode='pattern';
                           '.' is the "always" sentinel
    sender_scope           'all' | 'known'
    ignored_message_policy 'drop' | 'accumulate'

Inbound routing becomes a fan-out — every wired agent is evaluated
independently. A match gets its own session + container wake. A miss
with accumulate keeps the message as context-only (trigger=0) in that
agent's session, so when the agent does eventually engage it sees the
prior chatter.

## Schema

- Migration 010 (`engage-modes`): adds the 4 new columns, backfills
  from trigger_rules.pattern + requiresTrigger + response_scope, drops
  the legacy columns.
- messages_in gains `trigger INTEGER NOT NULL DEFAULT 1` (session DB
  schema + `migrateMessagesInTable` forward-compat).
- countDueMessages gates waking on `trigger = 1`.

## Routing

- `pickAgent` (returns one) → loop over all wired agents. Per agent:
  evaluate engage_mode; run access gate + sender-scope gate; on full
  match → resolveSession + writeSessionMessage(trigger=1) + wake. On
  miss with accumulate → writeSessionMessage(trigger=0), no wake. On
  miss with drop → skip.
- New `findSessionForAgent(agentGroupId, mgId, threadId)` scopes
  session lookup by agent so fan-out doesn't cross sessions.
- `messageIdForAgent` namespaces inbound message ids by agent_group_id
  so PRIMARY KEY doesn't collide across per-agent session DBs.

## Adapter layer

- `ConversationConfig` replaces `triggerPattern` + `requiresTrigger`
  with `engageMode` + `engagePattern`.
- Chat SDK bridge stores `Map<platformId, ConversationConfig[]>` (multi-
  agent per conversation) and applies union gating pre-onInbound:
    * onSubscribedMessage: engage if any wiring keeps firing in
      subscribed state (mention-sticky or pattern)
    * onNewMention: engage on mention; only subscribes the thread if
      at least one wiring is `mention-sticky`
    * onDirectMessage: engage per mode; sticky follows same rule
- Bridge no longer unconditionally calls `thread.subscribe()`.

## Sender scope

- Permissions module registers a second hook `setSenderScopeGate` that
  runs per-wiring after the existing access gate. `sender_scope='known'`
  requires canAccessAgentGroup(); `'all'` is a no-op. Not installed →
  no-op everywhere (default allow).

## Container side

- Host passes `NANOCLAW_MAX_MESSAGES_PER_PROMPT` (reuses existing
  MAX_MESSAGES_PER_PROMPT config; was dead code from v1).
- `getPendingMessages` queries `ORDER BY seq DESC LIMIT N`, reverses to
  chronological order for the prompt — accumulated context rides along
  with trigger rows up to the cap.
- `MessageInRow` gains `trigger: number` so the container can tell them
  apart in downstream code (container still processes both; only the
  host uses `trigger=0` for don't-wake).

## Defaults (per ACTION-ITEMS item 1 decision)

- DM (is_group=0): `engage_mode='pattern'`, `engage_pattern='.'` (always)
- Threaded group: `engage_mode='mention-sticky'` (seed-discord)
- Non-threaded group / CLI: pattern '.' in bootstrap scripts

## Tests

- src/host-core.test.ts: 3 new cases — fan-out (2 agents, 2 sessions,
  2 wakes), accumulate (trigger=0 + no wake), drop (no session created).
- Existing 10 host-core tests still pass.
- Migration 010 runs on an empty DB in 0-row path — verified.

Closes: ACTION-ITEMS items 1, 4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:30:04 +03:00
gavrielc
6a815190c0 feat(lifecycle): stuck detection + heartbeat lifecycle + SDK tool blocklist
Replaces the two overlapping old mechanisms (30-min setTimeout kill in
container-runner, 10-min heartbeat STALE_THRESHOLD reset in host-sweep)
with message-scoped stuck detection anchored to the processing_ack claim
age + an absolute 30-min ceiling that extends for long-declared Bash
tools.

Old model problems:
- IDLE_TIMEOUT setTimeout fired on plain wall-clock time; slow-but-alive
  agents got killed at 30min regardless of activity
- 10-min STALE_THRESHOLD in the sweep was unreliable — the heartbeat is
  only touched on SDK events, so legitimate silent tool work (sleep 30,
  long WebFetch, npm install) looked identical to a hung container
- Two overlapping sources of truth for "when to let go of a container"

New model:
- Host sweep is the single source of truth.
- Container exposes a new `container_state` single-row table in outbound.db
  (schema added; container writes, host reads). PreToolUse hook writes
  current_tool + tool_declared_timeout_ms (read from Bash's tool_input);
  PostToolUse / PostToolUseFailure clear it.
- Sweep decides with a pure helper `decideStuckAction`:
    * absolute ceiling — kill if heartbeat age > max(30min, bash_timeout)
    * per-claim stuck  — kill if any processing_ack row has claim_age >
      max(60s, bash_timeout) AND heartbeat hasn't been touched since claim
    * otherwise ok
  Kill paths reset leftover processing rows with exponential backoff,
  reusing the existing retry machinery.

Tool blocklist expanded:
- AskUserQuestion (SDK placeholder; we have mcp__nanoclaw__ask_user_question)
- EnterPlanMode, ExitPlanMode, EnterWorktree, ExitWorktree (Claude Code UI
  affordances; would hang in headless containers)
PreToolUse hook is also defense-in-depth: if a disallowed tool name slips
through, it returns `{ decision: 'block' }` so the agent sees a clear
error instead of appearing stuck.

Removed:
- container-runner.ts: IDLE_TIMEOUT setTimeout, resetIdle callback on
  activeContainers entry, resetContainerIdleTimer export.
- delivery.ts: the resetContainerIdleTimer call on successful delivery.
- poll-loop.ts: IDLE_END_MS + its setInterval. Keeping the query open is
  cheaper than close+reopen (no cold prompt cache). Liveness is now a
  host-side concern.
- host-sweep.ts: 10-min STALE_THRESHOLD_MS + getStuckProcessingIds in the
  stale-detection path (still exported for kill reset).

Tests:
- src/host-sweep.test.ts — 9 tests for decideStuckAction covering: fresh
  heartbeat, absolute ceiling, absent heartbeat, Bash-timeout extension
  (both ceiling and per-claim), claim age below tolerance, heartbeat
  touched after claim, unparseable timestamps.

Ref: docs/v1-vs-v2/ACTION-ITEMS.md items 9, 6a, 10.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:16:57 +03:00
gavrielc
dcfa12ea06 feat(timezone): recreate v1 TZ-aware formatting + scheduling behavior
The agent needs to perceive times in the user's timezone, not UTC.
Dropping this in the v1→v2 port produced a class of bugs where the agent
would schedule tasks for the wrong hour, suggest dinner at midnight, etc.
This restores v1 parity.

Container side:
- New container/agent-runner/src/timezone.ts mirrors src/timezone.ts with
  isValidTimezone / resolveTimezone / formatLocalTime, plus:
  * TIMEZONE constant resolved at load from process.env.TZ (host sets this
    from src/container-runner.ts:254)
  * parseZonedToUtc(input, tz) — treats a naive ISO as wall-clock time in
    `tz`, returns the corresponding UTC Date. Strings with Z or offset
    are passed through.

- formatter.ts:
  * formatMessages() now prepends <context timezone="IANA"/>\n — matches
    v1 src/v1/router.ts:20-22
  * formatSingleChat uses formatLocalTime(ts, TIMEZONE) instead of a
    home-rolled HH:MM 24h formatter → outputs like "Jun 15, 2026, 8:00 AM"
  * reply_to="<id>" attribute + <quoted_message from="X">Y</quoted_message>
    element — matches v1 format exactly; old <reply-to/> shape is gone
  * stripInternalTags() exported for the dispatch path to reuse

- poll-loop.ts uses the exported stripInternalTags() instead of inline regex.

- mcp-tools/scheduling.ts:
  * schedule_task/update_task descriptions now explicitly document that
    processAfter accepts either UTC or naive local time (interpreted in
    the user's TZ from the context header)
  * handlers normalize through parseZonedToUtc() and store a UTC ISO

Host side:
- src/modules/scheduling/recurrence.ts passes { tz: TIMEZONE } to
  CronExpressionParser.parse. Without this, "0 9 * * *" fires at 09:00
  UTC instead of 09:00 user-local — this was the v1 behavior
  (src/v1/task-scheduler.ts:20-49).

Tests:
- container/agent-runner/src/timezone.test.ts — mirror of src/timezone.test.ts
  + new parseZonedToUtc cases
- container/agent-runner/src/formatter.test.ts — context header, reply_to,
  quoted_message, XML escaping, stripInternalTags (ported from v1
  formatting.test.ts)
- src/modules/scheduling/recurrence.test.ts — cron TZ respected, completed
  rows only cloned when recurrence is set

Ref: docs/v1-vs-v2/ACTION-ITEMS.md item 18 + timezone-formatting-v1-recreation.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:09:14 +03:00
gavrielc
0283391e0a chore(config): remove dead POLL_INTERVAL / SCHEDULER_POLL_INTERVAL / IPC_POLL_INTERVAL
These three constants were carried over from v1's polling + IPC architecture
and have zero callers in the v2 runtime:

- POLL_INTERVAL (2000ms) — v1 message loop; replaced by event-driven
  delivery + delivery.ts's ACTIVE_POLL_MS (hardcoded 1000ms)
- SCHEDULER_POLL_INTERVAL (60000ms) — v1 task scheduler; replaced by
  host-sweep.ts's SWEEP_INTERVAL_MS (hardcoded 60_000)
- IPC_POLL_INTERVAL (1000ms) — v1 file-based IPC; meaningless in v2's
  session-DB architecture

Grep confirms no imports in src/, container/, or tests. Docs/SPEC.md
updated to match.

Ref: docs/v1-vs-v2/ACTION-ITEMS.md item 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:01:47 +03:00
gavrielc
47950671fa docs: add v1→v2 action-items analysis + SDK signal probe tool
- docs/v1-vs-v2/: full v1→v2 regression analysis (SUMMARY + 21 per-module
  docs + ACTION-ITEMS rollup with decisions + timezone recreation spec).
- container/agent-runner/scripts/sdk-signal-probe.ts: empirical harness
  used to characterise Claude Agent SDK event/hook/stderr timing for the
  stuck-detection design in item 9.
- src/channels/chat-sdk-bridge.ts: document the conversations Map staleness
  in a code comment; fix deferred to when dynamic group registration lands
  (ACTION-ITEMS item 17).

No runtime behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:00:04 +03:00
gavrielc
5ed5b72f10 docs: consolidate refactor follow-ups into a single REFACTOR.md
Single forward-looking reference that replaces the two untracked planning
docs (REFACTOR_PLAN.md + REFACTOR_EXECUTION.md) which had become a mix of
historical PR timeline and still-relevant decisions.

Keeps only what's actionable going forward:
- Module tiers, the four registries, and the module distribution model
  (architecture summary).
- Remaining work: Phase 5 (v2 → main) and the modules-branch decision.
- Operational patterns worth preserving (standing checks, TDZ rule,
  branch-sync file-presence diff procedure, prettier drift pattern).
- 17 curated open questions across design, distribution, core slotting,
  and documentation.

Canonical references (docs/module-contract.md, docs/architecture.md, etc.)
are linked but not duplicated. This doc is transient — retire when the
refactor is fully behind us.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 22:14:09 +03:00
gavrielc
60ffe34396 Merge pull request #1853 from qwibitai/refactor/pr9-cli-channel
feat(channels): add CLI channel — talk to your agent from the terminal
2026-04-18 22:02:02 +03:00
gavrielc
131fc99700 feat(channels): add CLI channel — talk to your agent from the terminal
First default channel that ships with main. Unix-socket adapter + thin
client; plugs into the running daemon rather than spawning its own host.

## src/channels/cli.ts

- ChannelAdapter with channelType='cli', platformId='local'.
- setup() unlinks any stale socket, listens on $DATA_DIR/cli.sock (mode 0600
  so only the local user can connect).
- On client connect: reads newline-delimited JSON ({"text": "..."}) and
  calls config.onInbound('local', null, {id, kind:'chat', content, ts}).
- deliver() writes {"text": <body>} back to the connected socket; silently
  no-ops when no client is attached (outbound row still persists).
- Single-client policy: a second connection supersedes the first with a
  [superseded] notice.
- teardown() closes the client, closes the server, removes the socket file.

## scripts/chat.ts + pnpm run chat

One-shot client:
- pnpm run chat <message...>
- Connects to the socket, writes one JSON line with the message.
- Reads replies; exits 2s after the first reply lands (hard timeout 120s).
- ENOENT/ECONNREFUSED prints a hint to start the daemon.

## scripts/init-first-agent.ts

- Fix stale imports after earlier module extractions (permissions +
  agent-to-agent moved their DB helpers into modules/).
- After wiring the DM channel, also create cli/local messaging_group
  (unknown_sender_policy='public' — local socket perms handle auth) and
  wire it to the same agent. User can `pnpm run chat` immediately.

## package.json

- Add "chat": "tsx scripts/chat.ts" script.

## Validation

- pnpm run build clean.
- pnpm test — 137 host tests pass.
- bun test in container/agent-runner — 17 pass.
- Service boot logs: "CLI channel listening" + "Channel adapter started
  channel=cli type=cli". Clean SIGTERM shutdown; socket file removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:51:04 +03:00
gavrielc
e5319b3e1e Merge pull request #1851 from qwibitai/refactor/pr8-slim-and-cli
refactor: relocate outbox I/O, dead-code sweep, add CLI harness
2026-04-18 21:34:37 +03:00
gavrielc
7169c25e70 refactor: relocate outbox I/O to session-manager + dead-code sweep
## Outbox extraction (delivery.ts → session-manager.ts)

File I/O for outbound attachments now lives in session-manager.ts alongside
the symmetric inbound extractAttachmentFiles. delivery.ts no longer touches
the filesystem — it hands buffers to the adapter and calls clearOutbox on
success.

- New `readOutboxFiles(agentGroupId, sessionId, messageId, filenames)` and
  `clearOutbox(agentGroupId, sessionId, messageId)` in session-manager.ts.
- deliverMessage in delivery.ts loses ~35 lines of fs/path code and its
  `fs`/`path` imports.

## Dead-code sweep

TypeScript's --noUnusedLocals surfaced several cruft imports. Fixed:

- src/container-runner.ts: drop unused `markContainerIdle` import; drop
  unused `session` parameter from `buildContainerArgs` signature.
- src/delivery.ts: drop unused `getSession`, `writeSessionMessage`,
  `wakeContainer` imports.
- src/host-sweep.ts: drop unused `updateSession`, `outboundDbPath` imports.
- container/agent-runner/src/poll-loop.ts: drop unused `config`,
  `processingIds` params from `processQuery`.
- Test files: drop unused imports in channel-registry.test, db-v2.test,
  host-core.test.

Skipped: `conversations` state in chat-sdk-bridge.ts (never read but
tangled with public `updateConversations` method; cleaning it risks a
merge conflict with the channels branch at the next sync).

## Validation

- `pnpm run build` clean
- `pnpm test` — 137 host tests pass
- `bun test` in container/agent-runner — 17 tests pass
- Service boots (`NanoClaw running`, `OneCLI approval handler started`)
  and shuts down cleanly on SIGTERM

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:34:08 +03:00
gavrielc
3d945db6eb Merge pull request #1849 from qwibitai/refactor/pr7-retier-approvals
refactor(modules): re-tier approvals as default, extract self-mod as optional
2026-04-18 19:54:49 +03:00
gavrielc
95fdec335a refactor(modules): re-tier approvals as default; extract self-mod as optional
Promotes approvals to the default tier with a public API (requestApproval +
registerApprovalHandler) that other modules consume. Self-modification
(install_packages / request_rebuild / add_mcp_server) moves into a new
optional module that registers delivery actions + matching approval handlers
via the new API.

## Approvals (default tier)

- Adds `src/modules/approvals/primitive.ts` exporting `requestApproval`,
  `registerApprovalHandler`, `notifyAgent`. Absorbs `pickApprover` /
  `pickApprovalDelivery` / `channelTypeOf` from the deleted `src/access.ts`.
- Rewrites `response-handler.ts` to dispatch to registered approval handlers
  on approve (action-keyed Map). Reject path is centralized.
- Drops the three self-mod-specific delivery-action registrations from
  `approvals/index.ts`; they belong to self-mod now.
- `onecli-approvals.ts` now imports picks from the primitive instead of
  `src/access.ts`.

## Self-mod (optional tier)

- New `src/modules/self-mod/` with request handlers (validate input + call
  requestApproval) and apply handlers (orchestration on approve).
- `apply.ts` owns updateContainerConfig + buildAgentGroupImage + killContainer
  calls. Self-mod depends on approvals (via registerApprovalHandler +
  requestApproval + notifyAgent) and on core (container-runner, container-config).
- Registers 3 delivery actions + 3 approval handlers at import time.

## Other changes

- `src/access.ts` and `src/access.test.ts` deleted. Tests split across
  `src/modules/approvals/picks.test.ts` (approver selection) and
  `src/modules/permissions/permissions.test.ts` (access + roles + DM).
- `src/modules/index.ts` barrel: approvals loads before self-mod so
  registerApprovalHandler is bound when self-mod registers at import time.

## Validation

- `pnpm run build` clean
- `pnpm test` — 137 host tests pass
- `bun test` in container/agent-runner — 17 tests pass
- Service starts; boot log shows `OneCLI approval handler started`,
  `NanoClaw running`; clean SIGTERM shutdown

Resolves the transitional tier violation flagged in PR #5 where core
imported from the permissions optional module via `src/access.ts`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:41:26 +03:00
gavrielc
1f179e07f2 Merge pull request #1848 from qwibitai/refactor/pr6-agent-to-agent
refactor(modules): extract agent-to-agent as registry-based module
2026-04-18 19:12:17 +03:00
gavrielc
46b19dcf9c refactor(modules): extract agent-to-agent as registry-based module
Last extraction of Phase 3. Moves inter-agent messaging + create_agent +
destination projection into src/modules/agent-to-agent/. Core retains:

- `channel_type === 'agent'` dispatch in delivery.ts, guarded by
  hasTable('agent_destinations') + dynamic import into module.
- Channel-permission ACL in delivery.ts, guarded by hasTable, with
  inlined SQL (no module import from core).
- writeDestinations call in container-runner.ts, guarded by hasTable +
  dynamic import into module.
- createMessagingGroupAgent's destination side effect in db/messaging-groups.ts,
  guarded by hasTable. This is a documented transitional tier violation
  (core imports from optional module), analogous to src/access.ts.

Migration `004-agent-destinations.ts` renamed to `module-agent-to-agent-
destinations.ts` preserving `name: 'agent-destinations'` so existing DBs
don't re-run it.

delivery.ts: 600 → 449 lines. handleSystemAction's last switch case gone
(just registry + default log-and-drop). notifyAgent helper removed (only
create_agent used it).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:00:10 +03:00
gavrielc
c80a23e24f Merge pull request #1847 from qwibitai/refactor/pr5-permissions
refactor(modules): extract permissions as optional module
2026-04-18 18:41:52 +03:00
gavrielc
32bcc2c5ae refactor(permissions): preserve pre-PR behavior in three spots
PR #5 review flagged three behavior changes that shouldn't have slipped
in. This commit reverts each to match the pre-refactor behavior exactly.

1. User upsert ordering. Split the router hook into two setters:
   setSenderResolver (runs before agent resolution) and setAccessGate
   (runs after). Restores the pre-PR sequence where the users row is
   upserted even if the message is dropped by wiring or trigger rules.

2. dropped_messages audit. Moved src/modules/permissions/db/dropped-messages.ts
   back to src/db/dropped-messages.ts. The table is core audit infra, not
   permissions-specific. Router re-writes rows for no_agent_wired and
   no_trigger_match; the access gate writes rows for policy refusals.

3. Permissionless container fallback. Dropped. poll-loop restores the
   original deny-all check when NANOCLAW_ADMIN_USER_IDS is empty.

Module contract doc updated with the two-hook shape.

Validation: host build clean, 137/137 host tests, 17/17 container
tests, typecheck clean, service boots to "NanoClaw running" with
permissions module registering both hooks and clean SIGTERM shutdown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 18:00:10 +03:00
gavrielc
7cc4ecc3be refactor(modules): extract permissions as optional module
Moves user-roles / users / agent-group-members / user-dms /
dropped-messages / user-dm / canAccessAgentGroup into
src/modules/permissions/. Module registers a single inbound-gate that
owns sender resolution, access decision, unknown-sender policy, and
drop-audit recording.

Router slimmed from 357 → 179 lines; the inline fallback chain
(extractAndUpsertUser / enforceAccess / handleUnknownSender /
recordDroppedMessage) is gone — without the permissions module core
defaults to allow-all with userId=null.

container-runner's admin-ID query is now inline SQL guarded by
sqlite_master on user_roles, keeping core free of any import from the
permissions module. The container-side formatter falls back to
permissionless mode when NANOCLAW_ADMIN_USER_IDS is empty: every sender
with an identifiable senderId is treated as admin.

Module contract doc formalizes the tier model and the dependency rule
(core ← default modules ← optional modules). One transitional violation
flagged: src/access.ts (core) imports from the permissions module for
its remaining approver-picking helpers; resolves in the planned PR #7
re-tier.

Validation: host build clean, 137/137 host tests, 17/17 container
tests, typecheck clean, service boots to "NanoClaw running" with
permissions module registering its gate and clean SIGTERM shutdown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 17:42:14 +03:00
gavrielc
e75af5e44d Merge pull request #1842 from qwibitai/refactor/pr4-scheduling
refactor: extract scheduling as registry-based module (PR #4)
2026-04-18 17:23:27 +03:00
gavrielc
473f766585 refactor(modules): extract scheduling as registry-based module
Moves the scheduling surface — 5 delivery actions (schedule_task,
cancel_task, pause_task, resume_task, update_task), handleRecurrence,
applyPreTaskScripts, and task DB helpers — out of core and into
src/modules/scheduling/ (host) and container/agent-runner/src/scheduling/
(container).

First PR to fill the MODULE-HOOK markers introduced in PR #2:
  - src/host-sweep.ts MODULE-HOOK:scheduling-recurrence now dynamically
    imports handleRecurrence from the module each sweep tick.
  - container/agent-runner/src/poll-loop.ts MODULE-HOOK:scheduling-pre-task
    dynamically imports applyPreTaskScripts before the provider call.
    When the marker block is empty (scheduling uninstalled), `keep`
    falls back to `normalMessages` so non-task messages still flow.

The 5 task cases are removed from delivery.ts's handleSystemAction
switch — the registry now routes them. Task DB helpers moved out of
src/db/session-db.ts (which kept `nextEvenSeq` as a named export so
the module can uphold the host-writes-even-seq invariant). Test suite
split to match: scheduling-specific tests live in the module.

No migration — tasks are messages_in rows with kind='task'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 16:17:47 +03:00
gavrielc
71aab8c316 Merge pull request #1840 from qwibitai/refactor/pr3-approvals-interactive
refactor(modules): extract approvals + interactive as registry-based modules
2026-04-18 15:57:29 +03:00
gavrielc
626c565a70 fix(modules): break circular import TDZ between index.ts and modules
PR #3 introduced a circular-import temporal-dead-zone bug that didn't
surface in unit tests but crashed the service at startup:

  src/index.ts imports './modules/index.js' for side effects
  → src/modules/interactive/index.ts calls registerResponseHandler()
  → that function is declared in src/index.ts
  → but src/index.ts's const responseHandlers = [] hasn't been
    initialized yet (we're in the middle of its module-init)
  → ReferenceError: Cannot access 'responseHandlers' before initialization

Same issue for registerResponseHandler itself (the function reference
resolves to undefined) and for onShutdown in the approvals module.

Caught when the operator started the service and systemd flagged the
process as crashing in auto-restart loop.

Fix: extract responseHandlers + registerResponseHandler + shutdownCallbacks
+ onShutdown into src/response-registry.ts, which has no dependencies on
src/index.ts or on modules. index.ts re-exports the same surface for any
existing consumers; modules import directly from response-registry.js.

The bug was latent because:
- Unit tests import pieces, never src/index.ts's main() flow.
- Host builds clean because TypeScript doesn't catch runtime circular
  init order.
- Only surfaces when the ES module loader actually executes src/index.ts
  as the entry point.

Verified: service boots on Linux host with approvals + interactive
loaded; OneCLI handler starts via onDeliveryAdapterReady callback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:48:43 +03:00
gavrielc
a4573395d9 refactor(modules): extract approvals + interactive as registry-based modules
Phase 2 / PR #3 of the module refactor. Moves the approval and interactive-
question flows out of core and into src/modules/, wired through the response
dispatcher and delivery action registries.

New modules:
- src/modules/interactive/ — registers a response handler that claims
  pending_questions rows, writes question_response to the session DB, wakes
  the container. createPendingQuestion call stays inline in delivery.ts
  (guarded by hasTable) per plan.
- src/modules/approvals/ — registers 3 delivery actions (install_packages,
  request_rebuild, add_mcp_server), a response handler for pending_approvals
  (including OneCLI action fall-through), an adapter-ready hook that boots
  the OneCLI manual-approval handler, and a shutdown hook that stops it.
  OneCLI implementation (src/onecli-approvals.ts) moves into the module.

Core lifecycle hooks added (narrow, not registries):
- onDeliveryAdapterReady(cb) in delivery.ts — fires when setDeliveryAdapter
  runs (or immediately if already set). Used by approvals for OneCLI boot.
- onShutdown(cb) in index.ts — fires on SIGTERM/SIGINT. Used by approvals
  for OneCLI teardown.
- getDeliveryAdapter() getter in delivery.ts — for live-flow adapter access
  in registered delivery actions.

Core shrinks: delivery.ts 911 → 665 lines, index.ts 405 → 224 lines.
dispatchResponse now logs "Unclaimed response" instead of falling through
to an inline handler — the inline fallback moved into the two modules.

Migration files renamed to the module-<name>-<short>.ts convention:
- 003-pending-approvals.ts → module-approvals-pending-approvals.ts
- 007-pending-approvals-title-options.ts → module-approvals-title-options.ts
Migration.name fields unchanged so existing DBs treat them as already-applied.

Degradation verified: emptying the modules barrel builds clean and 137/137
tests pass. Actions would log "Unknown system action"; button clicks would
log "Unclaimed response".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:16:53 +03:00
gavrielc
a612c2ca24 Merge pull request #1839 from qwibitai/refactor/pr2-scaffolding
refactor: scaffold module registries + default-module layout (PR #2)
2026-04-18 14:48:50 +03:00
gavrielc
4202041d0b refactor: scaffold module registries and default-module layout
Additive change — existing code paths still run via inline fallbacks.
Prepares core for per-module extractions in PR #3 onward.

Four registries added with empty defaults:
  - delivery action handlers (delivery.ts)
  - router inbound gate (router.ts)
  - response dispatcher (index.ts)
  - MCP tool self-registration (container/agent-runner/src/mcp-tools/server.ts)

Default modules moved to src/modules/ for signaling:
  - src/modules/typing/       (extracted from delivery.ts)
  - src/modules/mount-security/ (moved from src/mount-security.ts)

Both are imported directly by core — no hook, no registry. Removal
requires editing core imports.

Migrator now keys applied rows by name (uniqueness) so module
migrations can pick arbitrary version numbers. Stored version column
is auto-assigned as an applied-order sequence.

sqlite_master guards added around core calls into module-owned tables
(user_roles, agent_destinations, pending_questions). No-ops today;
load-bearing after the owning modules are extracted.

MODULE-HOOK markers placed at scheduling's two skill-edit sites
(host-sweep.ts recurrence call, poll-loop.ts pre-task gate). PR #4
replaces the marked blocks when scheduling moves to its module.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:46:19 +03:00
gavrielc
1888ecc1e9 Merge pull request #1838 from qwibitai/refactor/pr1-prep
refactor: PR #1 prep — v1 removal, module contract, sdk bump
2026-04-18 14:26:30 +03:00
gavrielc
a1b227269e chore(deps): bump @onecli-sh/sdk to 0.3.1
Lockfile was pinned to 0.2.0 while package.json already declared
^0.3.1. The code depends on types added in 0.3.x (ApprovalRequest,
ManualApprovalHandle, configureManualApproval), so the host build
was failing on v2. Refreshing the lockfile resolves it.

0.3.1 was published 2026-04-10, well clear of minimumReleaseAge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:24:04 +03:00
gavrielc
53e8135102 docs: add module contract for refactor
Codifies the interface between core and modules: the four registries
(delivery actions, inbound gate, response dispatcher, MCP tool
self-registration), default modules (typing, mount-security),
guarded-inline fallbacks, MODULE-HOOK skill-edit markers, and module
migration naming.

Authoritative reference for downstream extraction PRs and install
skills. See REFACTOR_PLAN.md for broader context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:23:56 +03:00
gavrielc
86becf8dea chore: delete v1 reference code
Removes src/v1/ (37 files) and container/agent-runner/src/v1/ (3 files)
along with the v1 reference note in CLAUDE.md and the now-obsolete
tsconfig exclude. v1 was already out of the runtime path; this just
removes the dead weight.

~8,800 LOC removed, zero runtime change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:23:47 +03:00
gavrielc
27c52205f9 fix(channels): bridge openDM delegates to adapter directly
chat.openDM dispatches via inferAdapterFromUserId, which only recognizes
Discord/Slack/Teams/gChat formats and throws for everything else —
breaking approval delivery on Telegram (numeric IDs) and the other
direct-addressable channels the bridge now wraps. Delegate straight to
adapter.openDM + channelIdFromThreadId, and only expose openDM when the
underlying adapter implements it. Preserves the adapter's native
platform_id encoding (e.g. "telegram:<chatId>") so user_dms caches align
with the messaging_groups rows onInbound wrote.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 18:30:38 +03:00
gavrielc
e93292de2a fix(agent-runner): spawn built-in MCP server with bun, not node
The Bun migration (c5d0ef8) dropped the in-image tsc build step, so
/app/src/mcp-tools/index.js never exists — only index.ts. The spawn
config in container/agent-runner/src/index.ts still pointed at
index.js and invoked it with `node`, which can't execute TypeScript
anyway. Net effect: every session failed to start the `nanoclaw`
MCP server, so scheduling, send_to_agent, interactive questions,
and self-mod tools were silently absent from the agent's toolset.

Matches entrypoint.sh and src/container-runner.ts, which already
use `exec bun run /app/src/index.ts` for the same reason.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 17:17:45 +03:00
gavrielc
bd659fd7d6 style(v2/delivery): prettier reflow of test import
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:17:07 +03:00
gavrielc
ac9535baa8 fix(v2/delivery): prevent double-send from overlapping delivery polls
The active poll (1s, running sessions) and sweep poll (60s, all active
sessions) both call deliverSessionMessages, and a running session is in
both result sets. Without locking they race on the same outbound row:
both read it as undelivered, both call the channel adapter, both
markDelivered. INSERT OR IGNORE hides the DB collision but the user has
already received the message twice.

Adds a per-session inflight guard; the second concurrent caller skips
and picks up any leftover work on the next poll tick.

Also makes outbox cleanup in deliverMessage best-effort: the message is
already on the user's screen, so a cleanup throw must not propagate to
the retry path (which would resend).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:17:06 +03:00
gavrielc
bfc626be82 docs: drop v2 framing across CLAUDE.md and 12 docs
Renamed 12 docs/v2-*.md → docs/*.md (already in index from earlier git mv).
Rewrote CLAUDE.md to describe the codebase as just "the codebase" rather
than "v2"; added a "Channels and Providers (skill-installed)" section
reflecting the new model and updated the docs index links.

Agent (general-purpose) cleaned the 12 doc bodies:
- Dropped "NanoClaw v2" / "v2 schema" / "(v2)" prose throughout
- Rewrote inter-doc cross-references docs/v2-X.md → docs/X.md
- Architecture, agent-runner-details: collapsed v1↔v2 comparison tables
  into present-tense facts; added notes that trunk only ships `claude`
  and that channel adapters are skill-installed from the `channels` branch
- Setup-wiring, checklist: dropped v1→v2 migration items that no longer
  apply
- Frozen runtime paths preserved: data/v2.db, data/v2-sessions/,
  container name nanoclaw-v2

git grep confirms remaining `\bv2\b` matches in docs/ are only those
runtime paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:53:21 +03:00
gavrielc
c37609ffc8 docs(skills): drop "v2" from skill content + src/index.ts log lines
Cleans up the prose-level v2 references that the rename commit didn't
touch. Skills now describe themselves and the codebase without "v2"
versioning language. /add-X-v2 cross-references in setup, init-first-agent,
and manage-channels updated to /add-X.

Runtime path identifiers (data/v2.db, data/v2-sessions/, container name
nanoclaw-v2) deliberately left as-is — renaming them breaks live installs
without commensurate benefit.

Verified: pnpm run build clean, 326 host tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:42:55 +03:00
gavrielc
00fb1bee4a chore(skills): rename add-*-v2 → add-* and drop dead v1 channel skills
Renamed 13 skill folders to drop the -v2 suffix (the v2/v1 distinction
isn't load-bearing anymore — there is no v1 runtime). Deleted the four
v1 channel skills that occupied the rename target paths (add-discord,
add-slack, add-telegram, add-whatsapp); they targeted src/v1 which is
reference-only per CLAUDE.md.

Skill content still says "v2" in places — that's a follow-up commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:38:19 +03:00
gavrielc
4857512267 refactor(v2): move setup/groups.ts off trunk + drop pino dep
setup/groups.ts is whatsapp-only — its inline syncScript imports baileys
and pino to fetch group metadata via Baileys.groupFetchAllParticipating.
On trunk it was a no-op for non-whatsapp users (returned early without
auth) and the only thing keeping pino alive.

Removed:
- setup/groups.ts (lives on `channels` branch; restored by /add-whatsapp-v2)
- `groups` STEPS entry from setup/index.ts
- pino from package.json (no longer used outside the moved file)

/add-whatsapp-v2 skill updated to copy setup/groups.ts and register both
groups + whatsapp-auth in setup/index.ts STEPS, install pino@9.6.0 along
with baileys + qrcode.

Verified: build clean, 326 host tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:34:16 +03:00
gavrielc
712720eef4 refactor(v2): drop @chat-adapter/shared dep — duck-type NetworkError
The only use was channel-registry.ts checking `err instanceof NetworkError`
to retry transient setup failures. Switched to a duck-type predicate
(`err.name === 'NetworkError'`) so the dep is no longer needed at trunk
level. Channel skills bring it in transitively when they install their
Chat SDK adapter package.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:33:22 +03:00
gavrielc
a3376c25df docs(channel-skills): rewrite all 13 /add-*-v2 skills for copy-from-channels-branch pattern
Each skill now:
- Pre-flight: checks for file present + import line + dep listed (idempotent)
- git fetch origin channels
- git show origin/channels:<paths> > <paths> to copy adapter (and helpers/tests/setup-step where applicable)
- Append `import './<chan>.js';` to src/channels/index.ts
- pnpm install <pkg>@<pinned-version>
- pnpm run build

Telegram additionally copies 4 helper/test files + setup/pair-telegram.ts
and registers `'pair-telegram':` in setup/index.ts STEPS.

WhatsApp (native) additionally copies setup/whatsapp-auth.ts and
registers `'whatsapp-auth':` in setup/index.ts STEPS, installs
@whiskeysockets/baileys + qrcode + @types/qrcode pinned.

All credential / next-steps / channel-info / troubleshooting sections
preserved verbatim.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:24:20 +03:00
gavrielc
437ba63a2f refactor(v2): move channel adapters off v2 trunk
v2 ships with no channels baked in. All channel adapters (discord, slack,
telegram + helpers, whatsapp, whatsapp-cloud, gchat, github, imessage,
linear, matrix, resend, teams, webex) and their channel-specific setup
steps (pair-telegram, whatsapp-auth) now live on the `channels` branch
and get copied in via /add-*-v2 skills.

Removed:
- src/channels/{discord,slack,telegram*,whatsapp*,gchat,github,imessage,linear,matrix,resend,teams,webex}.ts
- setup/{pair-telegram,whatsapp-auth}.ts
- 14 channel-specific deps from package.json (@chat-adapter/*, @beeper/*,
  @bitbasti/*, @resend/chat-sdk-adapter, @whiskeysockets/baileys,
  chat-adapter-imessage, qrcode, @chat-adapter/state-memory unused)
- Their corresponding STEPS entries from setup/index.ts
- Channel imports from src/channels/index.ts

Kept:
- Channel infra: adapter.ts, channel-registry.ts (+ test), chat-sdk-bridge.ts,
  ask-question.ts, an empty-imports index.ts
- Chat SDK runtime (`chat`) for channels that copy in via Chat SDK bridge
- @chat-adapter/shared promoted from transitive to direct dep
  (channel-registry.ts uses NetworkError from it)

Verified: pnpm run build clean, 326 host tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:20:28 +03:00
gavrielc
32a973f1cd docs(add-opencode): rewrite skill for copy-from-providers-branch pattern
v2 no longer ships opencode on trunk. The skill now:
- Fetches origin/providers
- Copies opencode source files to their target paths
- Appends self-registration imports to both provider barrels
- Adds @opencode-ai/sdk@1.4.3 as a pinned agent-runner dep
- Adds OPENCODE_VERSION ARG + opencode-ai pnpm global install to Dockerfile
- Rebuilds host + container

All steps idempotent. Credential/env/Zen docs unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:14:10 +03:00
gavrielc
03dd559786 style: trim trailing whitespace in providers barrel
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:12:35 +03:00
gavrielc
e0258e8c1b refactor(v2): move opencode provider off v2 trunk
v2 ships with only claude baked in. opencode now lives on the `providers`
branch and gets copied in via the /add-opencode skill.

Removed:
- src/providers/opencode.ts
- container/agent-runner/src/providers/{opencode,mcp-to-opencode}.ts + test
- @opencode-ai/sdk from agent-runner package.json + bun.lock
- opencode-ai global install + OPENCODE_VERSION ARG from Dockerfile
- opencode self-registration imports from both provider barrels
- opencode test case from factory.test.ts

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:10:56 +03:00
gavrielc
2529c3e387 Merge pull request #1776 from talmosko-code/feat/add-opencode-v2
feat(v2): OpenCode agent provider
2026-04-17 12:22:48 +03:00
gavrielc
f18d0561a6 Merge pull request #1814 from qwibitai/refactor/provider-barrel-v2
refactor(v2/providers): self-registration barrel + host container-config registry
2026-04-17 12:22:12 +03:00
Tal Moskovich
c9dd6e050e docs: update OpenCode Zen integration details in SKILL.md
- Clarified the use of `x-api-key` for Zen's HTTP API, addressing common issues with Bearer tokens.
- Added configuration examples for `.env` and OneCLI registration for Zen keys.
- Provided guidance on naming conventions for OpenCode agent and provider settings.
- Included a note on the difference in authentication methods between OpenCode and OpenRouter.
2026-04-17 12:20:22 +03:00
Tal Moskovich
22150261c5 feat(v2): OpenCode agent provider
- Add OpenCodeProvider (SSE, session resume, MCP via mcp-to-opencode)
- Register opencode in factory; AGENT_PROVIDER passthrough from DB
- Host: XDG mount, NO_PROXY merge, OPENCODE_* env for opencode sessions
- Dockerfile: opencode-ai CLI; docs checklist + architecture diagram
- Skill add-opencode for v2; AgentProviderName in src/types.ts

Made-with: Cursor
2026-04-17 12:20:22 +03:00
gavrielc
7639f7b1bb style(v2/providers): prettier reflow of provider-container-registry signatures
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 12:17:09 +03:00
gavrielc
1f3b023a5a refactor(v2/providers): self-registration barrel + host container-config registry
Providers now mirror the channels pattern: each module calls
registerProvider() at top level, and providers/index.ts is a barrel of
side-effect imports. createProvider() becomes a thin registry lookup;
the closed ProviderName union is gone (now a string alias, since the
env var is a runtime string anyway).

Also adds a host-side provider-container-registry so providers can
declare their own mounts and env passthrough in src/providers/<name>.ts
instead of the container-runner having to know about each one. The
resolver runs once per spawn and threads provider + contribution
through buildMounts and buildContainerArgs so side effects (mkdir,
etc.) fire exactly once.

Both barrels are append-only — adding a new provider is a new file
+ one import line per barrel, no edits to existing files. The built-in
providers (claude, mock) don't need host-side config, so src/providers/
ships with an empty barrel; the container-side barrel imports both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 12:17:09 +03:00
gavrielc
a38c6a962f Merge pull request #1813 from qwibitai/feat/v2-bun-container
feat(v2): Bun container runtime + build-surface improvements
2026-04-17 12:14:48 +03:00
gavrielc
f04921deee docs(v2): runtime-split guide, CLAUDE.md gotchas, setup CJK autodetect
- docs/v2-build-and-runtime.md: new — runtime split rationale (Node host,
  Bun container), lockfile topology, supply-chain trade-offs, image build
  surface, two session-wake paths, CI shape, key invariants. Indexed from
  CLAUDE.md v2 Docs Index.
- CLAUDE.md: Container Runtime (Bun) section with trigger/action gotchas
  a contributor editing the container must know (named-param prefix rule,
  bun:test vs vitest, bun.lock regeneration, no minimumReleaseAge for the
  Bun tree, no tsc build step, DELETE pragma invariant). CJK font support
  section for Claude sessions outside of /setup to proactively offer when
  they detect CJK signals. Development section updated with Bun commands.
- .claude/skills/setup/SKILL.md: step 3b — auto-enable CJK fonts without
  asking if the user is already writing in CJK; otherwise ask only on clear
  signals (CJK timezone from step 2a). 3c renumbered from old 3b.
2026-04-17 11:38:20 +03:00