Commit Graph

1412 Commits

Author SHA1 Message Date
Gavriel Cohen
416c283dcb fix(migrate-v2): bash 3.2 compatibility + reset coverage
migrate-v2.sh
  Replace `declare -A STEP_RESULTS` with two parallel indexed arrays
  (STEP_NAMES + STEP_STATUSES) plus a `record_step` helper. macOS ships
  bash 3.2 which has no associative arrays — `declare -A` errored out
  silently and every `STEP_RESULTS["1a-env"]=...` triggered a fatal
  bash arithmetic error (interpreting "1a" as a number). Visible
  symptom: `steps: {}` in handoff.json. Latent symptom: phase 2c's
  install loop sometimes bailed mid-iteration before invoking the
  channel install script, leaving channel code uninstalled while
  reporting `overall_status: success`.

migrate-v2-reset.sh
  Cover the gaps that left install side-effects in place between
  iterations:
    - Remove untracked adapter files in src/channels/ (mirror the
      pattern already used for container/skills/).
    - Restore tracked setup helpers that channel installs overwrite
      (setup/whatsapp-auth.ts, setup/pair-telegram.ts, setup/index.ts)
      and remove untracked ones they create (setup/groups.ts).
    - Restore package.json + pnpm-lock.yaml (channel installs add
      deps like @whiskeysockets/baileys).
  Setup/migrate-v2/* is intentionally not touched — that's where user
  WIP lives.

Verified end-to-end: reset → migrate → all 9 steps reported in
handoff.json with status "success", phase 2c install actually runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 14:50:21 +03:00
Gavriel Cohen
aec7ddd099 fix(migrate-v2): correct JID parsing, Discord guildId lookup, silent failures
- shared.ts: parseJid now recognizes raw Baileys WhatsApp JIDs
  (`<id>@s.whatsapp.net`, `@g.us`, etc.); v2PlatformId returns the raw
  JID for whatsapp to match what the runtime adapter emits. Without this,
  every WhatsApp group in a v1 install was silently skipped.

- discord-resolver.ts: new helper that uses DISCORD_BOT_TOKEN to look up
  channelId → guildId via the Discord API, since v1 stored only the
  channel id but v2 needs `discord:<guildId>:<channelId>`. Best-effort:
  on missing/invalid token or network error, returns empty resolver and
  the affected groups are skipped with the reason surfaced per channel.

- db.ts, tasks.ts: route Discord groups through the resolver; other
  channels go through v2PlatformId unchanged. Resolver only built when
  at least one Discord group exists, so non-Discord installs incur no
  network.

- db.ts: when every v1 group is skipped, exit non-zero with a FAIL line
  instead of `OK:groups=N,...,skipped=N`, so the wrapper doesn't hide
  total failure under a successful-looking summary.

- migrate-v2.sh: run_step now surfaces ERROR: lines from successful
  steps (with count + first 3 + raw log path); phase 2c install loop
  populates STEP_RESULTS so install failures show in handoff.json
  instead of silently passing.

- sessions.ts: copyTree skips dangling symlinks (e.g. v1's
  `.claude/debug/latest`) instead of crashing the entire step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 14:32:34 +03:00
Mike Nolet
ceb0b9cf5f fix(test-infra): openInboundDb honors in-memory test DB
initTestSessionDb() creates an in-memory inbound singleton, but
openInboundDb() always opened the hardcoded /workspace/inbound.db
path. Every test that exercised getPendingMessages — directly, or via
test fixtures that load data through it (e.g. poll-loop.test.ts:29
loads formatter test rows via getPendingMessages) — failed with
SQLITE_CANTOPEN under `bun test` outside a real container.

Baseline on main: 34 pass, 25 fail across 6 files. After this fix:
59 pass, 0 fail.

In test mode, openInboundDb returns the in-memory singleton. The
singleton's .close() is no-op'd in initTestSessionDb so caller
try/finally cleanup doesn't tear down the shared DB; closeSessionDb
invokes the saved original close to do the real teardown.

Production behavior is unchanged — _inboundIsTest only flips inside
initTestSessionDb, which is never called outside the test runner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 08:45:23 +02:00
Charlie Savage
8d022fd9da fix(host-sweep): reopen outbound DB as writable for orphan claim cleanup
PR #2151 added deleteOrphanProcessingClaims() to resetStuckProcessingRows(),
but outDb is always opened readonly (openOutboundDb uses immutable: true).
The write silently failed, leaving orphan processing_ack rows that block
future message delivery for the session.

Fix: add openOutboundDbRw() alongside the existing readonly opener and use
it in resetStuckProcessingRows() to open a short-lived writable handle just
for the delete. The readonly handle is still used for all reads above.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 23:44:07 -07:00
Mike Nolet
1ebb2dc8d2 fix(poll-loop): slash commands silently broken on warm containers
The follow-up poller filtered /clear out of every tick without acking
the row, and pushed every other slash command through plain
formatMessages() (XML wrapping). On a warm container the outer
while(true) loop never regains control, so:

  - /clear sat pending in messages_in forever (no response at all)
  - /compact, /cost, /context, /files, /remote-control arrived at the
    SDK as XML-wrapped user text and were never dispatched as commands

Both modes are invisible to host monitoring: rows are either left
pending without a processing_ack claim, or marked completed normally;
heartbeat keeps firing inside the SDK event loop.

When the follow-up poller observes any slash command (admin or
passthrough — categorizeMessage decides), end the active query so the
current turn winds down cleanly and the outer loop wakes, re-fetches
the same pending set, and runs them through the canonical path
(/clear handler + formatMessagesWithCommands raw dispatch). Leave the
rows untouched so the outer-loop fetch sees the same set the poller
saw.

Cost: each slash command on a warm container forces close+reopen of
the SDK stream — a few seconds of subprocess startup. The Anthropic
prompt cache is server-side with a 5-min TTL keyed on prefix hash, so
stream lifecycle does not affect cache lifetime; close+reopen within
5 min still gets cache hits.

Also corrects the warm-stream rationale comment on processQuery, which
implied keeping the stream open preserved cache warmth — it doesn't.

Testing evidence — cache stays warm across stream close+reopen:

  Turn 1 (warm session):
    Usage: in=6 out=245 cache_create=92 cache_read=22996
    Full cache hit (22996 tokens).

  Turn 2 — /clear arrives:
    Pending slash command — ending stream so outer loop can process
    Clearing session (resetting continuation)
    Usage: in=6 out=95 cache_create=9393 cache_read=13600
    System prompt + tool defs (~13600 tokens) still hit cache;
    conversation history is gone (continuation reset) so the new turn
    writes fresh context.

  Turn 3 — /cost arrives:
    Pending slash command — ending stream so outer loop can process
    Usage: in=0 out=0 cache_create=0 cache_read=0 wall=0.0s api=0.0s
    /cost is a CLI built-in: dispatched locally by the SDK, no API
    call. Pre-fix this would have arrived as XML-wrapped user text
    and never dispatched — confirms the broader fix works.

  Turn 4 (next chat after /cost):
    Usage: in=6 out=142 cache_create=328 cache_read=22993
    Full cache hit again (22993 tokens read, 328 written). Despite the
    /cost-induced stream close+reopen, the server-side prompt cache
    survived: the new sdkQuery() resumed the same continuation, the
    request prefix matched the cached entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 08:35:14 +02:00
exe.dev user
ce9f175238 fix: reorder phase 3 — Docker before OneCLI
OneCLI runs in a Docker container, so Docker must be installed first.
Reordered: Docker (3a) → OneCLI (3b) → Auth (3c) → Skills (3d) →
Build (3e). OneCLI install now skips with a clear message if Docker
isn't available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 20:28:45 +00:00
exe.dev user
cf3fcc18d4 fix: install Docker if missing, don't skip container build
migrate-v2.sh now runs setup/install-docker.sh when Docker isn't
found instead of just printing a message. The container build step
reports failure (not skip) when Docker is unavailable so the skill
can triage it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 20:28:04 +00:00
exe.dev user
00a30e3eff docs: update changelog, remove experimental label from migration
The migration is no longer experimental — it's been tested end-to-end
with service switchover, session continuity, and revert. Updated the
changelog entry to reflect the new migrate-v2.sh flow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 20:24:39 +00:00
exe.dev user
f35be24aef chore: move shared helpers to migrate-v2/, delete migrate-v1/
Extracted the helpers we use (JID parsing, trigger mapping, channel
auth registry, generateId, v2PlatformId) into setup/migrate-v2/shared.ts.
Deleted setup/migrate-v1/ entirely — no code references it anymore.

Updated README, CLAUDE.md, docs/v1-to-v2-changes.md, and
docs/migration-dev.md to reference the new paths and migrate-v2.sh
entry point.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 20:23:34 +00:00
exe.dev user
67eb85d818 chore: remove old setup-embedded migration steps
The old migration flow (detect → validate → db → groups → env →
channel-auth → channels → tasks) ran inside `bash nanoclaw.sh` via
setup/auto.ts. Replaced by the standalone `bash migrate-v2.sh` flow.

Deleted:
- setup/migrate-v1.ts (orchestrator)
- setup/migrate-v1/{detect,validate,db,env,groups,channel-auth,channels,tasks}.ts

Kept:
- setup/migrate-v1/shared.ts (used by new migrate-v2/ steps)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 20:20:06 +00:00
exe.dev user
1d73b2986a feat: add migrate-v2.sh — standalone v1 → v2 migration script
New entry point: `bash migrate-v2.sh` from the v2 checkout.
Replaces the old setup-embedded migration flow with a standalone
4-phase script + rewritten Claude skill for the interactive parts.

Phase 0: Bootstrap (Node/pnpm/deps via setup.sh) + find v1
Phase 1: Core state (env, DB, groups, sessions, tasks)
Phase 2: Channels (clack multiselect, auth copy, code install)
Phase 3: Infrastructure (OneCLI, auth, Docker, skills, container build)
Service switchover: stop v1 → start v2 → test → keep or revert
Phase 4: Handoff → exec claude "/migrate-from-v1"

The skill handles: owner seeding, access policy, CLAUDE.local.md
cleanup, container config validation, fork customization porting.

Key fixes found during testing:
- triggerToEngage: requires_trigger=0 must override non-empty pattern
- unknown_sender_policy defaults to 'public' (strict drops all msgs
  before owner is seeded)
- Service revert must stop v2 (parse unit name from step log, not
  early tsx one-liner that can fail)
- Session continuity: copy JSONL from -workspace-group/ to
  -workspace-agent/ and write continuation:claude into outbound.db
- container_config.additionalMounts written directly to container.json
  (same shape in v1 and v2)
- EXIT trap writes handoff.json; explicit write_handoff before exec

Includes migrate-v2-reset.sh for dev iteration and docs/migration-dev.md
for testing/debugging reference.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 20:13:38 +00:00
exe.dev user
1b08b58fcd setup: drop redundant agent ping; harden auth detection and OAuth paste
- verify: remove the CLI ping; cli-agent step earlier in setup already
  proved the round-trip works, and the test agent gets cleaned up before
  verify runs — so the ping was guaranteed to fail on installs that wired
  a messaging app instead of staying CLI-only. Status now collapses to
  service-running ∧ credentials ∧ ≥1 wired group.
- agent-ping: catch Claude Code's "Please run /login" / "Not logged in" /
  "Invalid API key" banners so a successfully-spawned agent that has no
  credentials no longer reports as 'ok'.
- auth paste: validate the full sk-ant-oat…AA shape; when the cleaned
  input is under 90 chars, surface a truncation-specific hint pointing at
  terminal wrap as the likely cause. Strip internal whitespace at both
  validate and assignment so multi-line pastes that survive clack also
  go through cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:03:02 +00:00
github-actions[bot]
897b770296 chore: bump version to 2.0.25 2026-05-01 16:03:19 +00:00
github-actions[bot]
a71d2a4e2c docs: update token count to 140k tokens · 70% of context window 2026-05-01 16:03:16 +00:00
gavrielc
39c579ba2a Merge pull request #2151 from glifocat/fix/host-sweep-orphan-processing-ack
fix(host-sweep): clear orphan processing_ack rows on kill to prevent claim-stuck respawn loop
2026-05-01 19:03:00 +03:00
gavrielc
dab4fb497b Merge branch 'main' into fix/host-sweep-orphan-processing-ack 2026-05-01 18:42:04 +03:00
github-actions[bot]
663d9a4091 docs: update token count to 139k tokens · 70% of context window 2026-05-01 13:30:25 +00:00
github-actions[bot]
a590fbd830 chore: bump version to 2.0.24 2026-05-01 13:30:19 +00:00
gavrielc
20a17cbc44 Merge pull request #2160 from kky/pr/inbound-db-fresh-open
fix(agent-runner): open inbound.db fresh per messages_in read
2026-05-01 16:30:07 +03:00
gavrielc
0d836220d9 Merge branch 'main' into pr/inbound-db-fresh-open 2026-05-01 16:29:46 +03:00
gabi-simons
36e731c02d Merge branch 'main' into feat/migrate-from-v1
Resolve import conflict in setup/auto.ts — keep runMigrateV1 import,
deduplicate runWindowedStep and getLaunchdLabel/getSystemdUnit imports.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-01 04:52:41 +00:00
github-actions[bot]
8c962d3f73 chore: bump version to 2.0.23 2026-04-30 23:00:24 +00:00
exe.dev user
28c38ae28b fix(container): pin vercel to 52.2.1 to dodge broken 53.0.1 publish
vercel@53.0.1 declares a dep on @vercel/static-build@2.9.22 which is not
published on npm (only 2.9.21 exists), breaking every fresh container
build that resolves vercel@latest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:00:02 +00:00
github-actions[bot]
7ac8dd0f6d docs: update token count to 139k tokens · 69% of context window 2026-04-30 22:28:25 +00:00
gavrielc
7814e45570 Merge pull request #2001 from Hinotoi-agent/fix/outbox-path-confinement
[security] fix(container): prevent host file read/delete via container-controlled outbox paths
2026-05-01 01:28:07 +03:00
gavrielc
fc3c11b6b9 fix(session-manager): apply outbox path-confinement to inbound attachments
Mirrors the four defenses on the outbound side onto extractAttachmentFiles:

  1. Reject unsafe messageId via isSafeAttachmentName before any inbox path
     is built. WhatsApp passes msg.key.id through raw and that field is
     client generated, so a peer can craft it; future end to end encrypted
     adapters will have the same property.
  2. lstatSync on the inbox dir refuses a pre placed symlink before
     mkdirSync would silently follow it.
  3. realpathSync + isPathInside contains the resolved dir under the
     session inbox root.
  4. writeFileSync uses the wx flag so a pre placed symlink at the file
     path is refused atomically by the kernel; EEXIST surfaces as a
     logged skip.

Threat: the session dir is mounted writable into the container at
/workspace, so a compromised agent can pre place inbox/<future msgId>/
as a symlink and wait for a chat message with a matching id to redirect
the host write. The four guards together close that window.

Consolidates with the existing isSafeAttachmentName helper from
attachment-safety.ts rather than introducing a duplicate basename
validator inside session-manager.

Co-Authored-By: Daisuke Tsuji <dim0627@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 01:27:09 +03:00
hinotoi-agent
852009dcb1 fix(container): confine outbound attachment paths 2026-05-01 01:27:09 +03:00
gavrielc
212281ba8e Merge pull request #2055 from dooha333/pr/setup-local-bin-path
fix(setup): inject ~/.local/bin into PATH so post-install onecli is reachable
2026-05-01 01:20:07 +03:00
gavrielc
6db6bf9c40 Merge branch 'main' into pr/setup-local-bin-path 2026-05-01 01:19:58 +03:00
github-actions[bot]
8977f0d0be chore: bump version to 2.0.22 2026-04-30 21:57:45 +00:00
gavrielc
d13f338af9 Merge pull request #2114 from robbyczgw-cla/fix/poll-loop-prescripts-on-followups
fix(poll-loop): apply pre-task scripts to follow-up injections too
2026-05-01 00:57:34 +03:00
gavrielc
5ab1a2733c review: catch follow-up poll errors + re-check done before push
Two fixes on top of the follow-up pre-task-script work:

1. The void async IIFE inside the interval handler had no catch, so a
   throw from the dynamic import or applyPreTaskScripts escaped as an
   unhandled rejection — terminating the container. The initial-batch
   path is wrapped by processQuery's outer try/catch; the follow-up
   path needs its own. Now logs the error and lets the next tick retry.

2. Re-check `done` immediately before query.push. The flag can flip
   true while applyPreTaskScripts is awaited (outer stream finishes
   during the script execution); without the re-check we'd push into a
   closed query. Claimed messages get released by the host's
   processing-claim sweep — same recovery posture as the rest of the
   poller.

Co-Authored-By: Michael Zazon <mzazon@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 00:55:46 +03:00
gavrielc
7d29888e59 Merge branch 'main' into fix/poll-loop-prescripts-on-followups 2026-05-01 00:34:45 +03:00
github-actions[bot]
58d875b3c3 chore: bump version to 2.0.21 2026-04-30 21:31:18 +00:00
gavrielc
3e7fea0fde Merge pull request #2142 from mnolet/fix/schedule-task-routing
fix(scheduling): include routing in schedule_task content JSON
2026-05-01 00:31:04 +03:00
gavrielc
d418f830db Merge branch 'main' into fix/schedule-task-routing 2026-05-01 00:30:11 +03:00
Mohamed Khedr
32daf607c1 Merge branch 'main' into pr/setup-local-bin-path 2026-04-30 21:57:55 +01:00
gavrielc
524ac221e1 Merge pull request #2111 from qwibitai/setup-scratch-agent-cleanup
feat(setup): delete scratch agent after ping-pong, simplify flow
2026-04-30 23:20:54 +03:00
gavrielc
69b4225916 Merge branch 'main' into setup-scratch-agent-cleanup 2026-04-30 23:20:32 +03:00
gavrielc
3d6a9b74f3 review: surface ping-test cleanup failures + restore copy
Routes the post-ping `_ping-test` cleanup through `spawnQuiet` +
`setupLog.step` so a non-zero exit from `delete-cli-agent.ts` lands
in `logs/setup-steps/cleanup-cli-agent.log` and the progression log,
and prints a one-line warn to the user. Previously the spawnSync was
fire-and-forget with `stdio: 'ignore'`, leaving an orphan agent group
silently if cleanup failed.

Restores the original copy on the cli-agent step labels, the ping
explainer paragraph, and the post-ping spinner stop line — those
copy changes are out of scope for this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 23:16:34 +03:00
gavrielc
dcc625f2b8 Merge pull request #2155 from qwibitai/setup-root-warning-v2
Add root user warning gate to Linux setup
2026-04-30 23:09:36 +03:00
gavrielc
99a8559b14 Merge remote-tracking branch 'origin/main' into setup-root-warning-v2
# Conflicts:
#	setup/auto.ts
2026-04-30 23:07:38 +03:00
gavrielc
3dc772cca0 Merge branch 'main' into setup-scratch-agent-cleanup 2026-04-30 23:05:09 +03:00
gavrielc
5ebad280ce Merge pull request #1502 from Koshkoshinsk/docs/pr-hygiene-check
Add PR hygiene check to CLAUDE.md and contributing guidelines
2026-04-30 23:00:43 +03:00
gavrielc
d73b9e14ad Merge branch 'main' into docs/pr-hygiene-check 2026-04-30 23:00:10 +03:00
gavrielc
681a5b51c8 Merge pull request #2157 from qwibitai/setup-lazy-env-reuse
refactor(setup): per-step env var reuse instead of upfront all-or-nothing
2026-04-30 22:59:03 +03:00
gavrielc
8e45f4e964 Merge branch 'main' into setup-lazy-env-reuse 2026-04-30 22:58:53 +03:00
gavrielc
eb9a5d706d Merge branch 'main' into setup-scratch-agent-cleanup 2026-04-30 22:54:48 +03:00
github-actions[bot]
46cd91c306 docs: update token count to 138k tokens · 69% of context window 2026-04-30 19:54:27 +00:00
github-actions[bot]
0218159ef0 chore: bump version to 2.0.20 2026-04-30 19:54:21 +00:00