Trust the agent to figure out which failed steps actually stop
routing. The rule is the goal ("can the bot route one message?"),
not a hardcoded list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2b-channel-auth: copies the Baileys keystore + channel-specific env
keys. Without it WhatsApp can't connect — saw this firsthand when
the original candidatePaths bug left env_keys=0,files=0.
3c-auth: registers Anthropic credentials in OneCLI. 3b installs the
gateway; 3c puts the secret in the vault. Without 3c every agent
request 401s regardless of 3b's status.
1c-groups stays deferred — agent runs on stock CLAUDE.md without it,
but routing works.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous version spelled out launchctl/systemctl commands, log lines
to grep for, diagnostic recipes — the agent reading this skill knows
all of that. Keep only the parts that aren't obvious from the rest of
the codebase: which steps are blocking vs deferred, the smoke-test
ordering, and the non-destructive framing for the user.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0 used to be "triage every failed step before doing anything
else", which front-loaded a bunch of fixes for things that don't
actually block the user from proving v2 works. Restructure:
- 0a — fix blockers only (1b/1d/2c/2d/3a/3b/3e). Defer non-blockers
(1a, 1c, 1e, 2b, 3c) — most surface naturally in later phases.
- 0b — smoke test: switch v1 → v2, send a real message, verify the
routing chain in logs/nanoclaw.log. AskUserQuestion gates whether
to continue.
- Revert recipe (launchctl/systemctl) called out as always-available,
not destructive — v1 process, data, and credentials are untouched.
Up-front list of what the script handled now also mentions the
WhatsApp LID resolution and Baileys keystore copy, so users see
exactly what continuity they're getting.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README: replace the one-line v1 migration note with a collapsed
<details> block. Quick Start stays compact for the common case (fresh
install) while v1 users get the actual instructions. Calls out
explicitly that the script must be run from a real terminal — not from
inside a Claude session — so the channel-select / switchover prompts
and the Node/pnpm/Docker bootstrap all work.
migrate-from-v1 skill: add a Preflight section that aborts if
logs/setup-migration/handoff.json is missing. Without this, invoking
the skill before the script just leads Claude to start guessing /
running shell commands. The new message redirects them to the script
and tells them it'll hand back to Claude on completion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v1 stored every WhatsApp DM as `<phone>@s.whatsapp.net`. v2's WA
adapter sometimes resolves the chat to `<lid>@lid` instead — when
WhatsApp delivers via the LID protocol and Baileys hasn't yet learned
a LID→phone mapping for that contact (cold cache after migration).
The router then can't find the phone-keyed messaging_group and
silently drops the message at router.ts:184.
Baileys persists every LID↔phone pair it has ever learned to disk as
`store/auth/lid-mapping-<phone>.json` (forward) and
`lid-mapping-<lid>_reverse.json` (reverse). v1 will already have these
populated for every contact it has talked to. New step 2d-whatsapp-lids
parses the reverse files and writes paired LID-keyed `messaging_groups`
+ `messaging_group_agents` rows so both `<phone>@s.whatsapp.net` and
`<lid>@lid` route to the same agent_group with the same engage rules.
No Baileys boot, no WhatsApp connectivity required — pure filesystem
read of files we've already copied via 2b-channel-auth. Step is
no-op-on-skip if either store/auth or whatsapp DM rows are missing.
Anything that slips through (a contact whose LID v1 never learned)
falls back to the runtime approval flow once the WA adapter sets
isMention=true on DMs — each unknown LID DM auto-creates an
approval-required messaging_group and the owner gets a one-tap
register prompt.
Verified end-to-end on a 12-group v1 install: 3 DM rows aliased,
inbound DM routed via the LID-keyed row.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v1 didn't track is_group separately; db.ts hardcoded `is_group: 1` for
every messaging_group. v2 uses is_group=0 to collapse DM sub-thread
sessions and to drive routing decisions, so getting it wrong is latent
risk on otherwise-working installs.
New helper inferIsGroup(channelType, platformId) lives in shared.ts so
tasks.ts and any future migration step can reuse it. Inferred per
channel:
- whatsapp: `<id>@g.us` is a group, anything else is a DM
- telegram: negative chat IDs are groups, positive are DMs
- everything else: default to 1 (least surprising for chats v1 chose
to register, where DM auto-create paths weren't used)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
migrate-v2.sh
Replace `declare -A STEP_RESULTS` with two parallel indexed arrays
(STEP_NAMES + STEP_STATUSES) plus a `record_step` helper. macOS ships
bash 3.2 which has no associative arrays — `declare -A` errored out
silently and every `STEP_RESULTS["1a-env"]=...` triggered a fatal
bash arithmetic error (interpreting "1a" as a number). Visible
symptom: `steps: {}` in handoff.json. Latent symptom: phase 2c's
install loop sometimes bailed mid-iteration before invoking the
channel install script, leaving channel code uninstalled while
reporting `overall_status: success`.
migrate-v2-reset.sh
Cover the gaps that left install side-effects in place between
iterations:
- Remove untracked adapter files in src/channels/ (mirror the
pattern already used for container/skills/).
- Restore tracked setup helpers that channel installs overwrite
(setup/whatsapp-auth.ts, setup/pair-telegram.ts, setup/index.ts)
and remove untracked ones they create (setup/groups.ts).
- Restore package.json + pnpm-lock.yaml (channel installs add
deps like @whiskeysockets/baileys).
Setup/migrate-v2/* is intentionally not touched — that's where user
WIP lives.
Verified end-to-end: reset → migrate → all 9 steps reported in
handoff.json with status "success", phase 2c install actually runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- shared.ts: parseJid now recognizes raw Baileys WhatsApp JIDs
(`<id>@s.whatsapp.net`, `@g.us`, etc.); v2PlatformId returns the raw
JID for whatsapp to match what the runtime adapter emits. Without this,
every WhatsApp group in a v1 install was silently skipped.
- discord-resolver.ts: new helper that uses DISCORD_BOT_TOKEN to look up
channelId → guildId via the Discord API, since v1 stored only the
channel id but v2 needs `discord:<guildId>:<channelId>`. Best-effort:
on missing/invalid token or network error, returns empty resolver and
the affected groups are skipped with the reason surfaced per channel.
- db.ts, tasks.ts: route Discord groups through the resolver; other
channels go through v2PlatformId unchanged. Resolver only built when
at least one Discord group exists, so non-Discord installs incur no
network.
- db.ts: when every v1 group is skipped, exit non-zero with a FAIL line
instead of `OK:groups=N,...,skipped=N`, so the wrapper doesn't hide
total failure under a successful-looking summary.
- migrate-v2.sh: run_step now surfaces ERROR: lines from successful
steps (with count + first 3 + raw log path); phase 2c install loop
populates STEP_RESULTS so install failures show in handoff.json
instead of silently passing.
- sessions.ts: copyTree skips dangling symlinks (e.g. v1's
`.claude/debug/latest`) instead of crashing the entire step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OneCLI runs in a Docker container, so Docker must be installed first.
Reordered: Docker (3a) → OneCLI (3b) → Auth (3c) → Skills (3d) →
Build (3e). OneCLI install now skips with a clear message if Docker
isn't available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
migrate-v2.sh now runs setup/install-docker.sh when Docker isn't
found instead of just printing a message. The container build step
reports failure (not skip) when Docker is unavailable so the skill
can triage it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The migration is no longer experimental — it's been tested end-to-end
with service switchover, session continuity, and revert. Updated the
changelog entry to reflect the new migrate-v2.sh flow.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extracted the helpers we use (JID parsing, trigger mapping, channel
auth registry, generateId, v2PlatformId) into setup/migrate-v2/shared.ts.
Deleted setup/migrate-v1/ entirely — no code references it anymore.
Updated README, CLAUDE.md, docs/v1-to-v2-changes.md, and
docs/migration-dev.md to reference the new paths and migrate-v2.sh
entry point.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The old migration flow (detect → validate → db → groups → env →
channel-auth → channels → tasks) ran inside `bash nanoclaw.sh` via
setup/auto.ts. Replaced by the standalone `bash migrate-v2.sh` flow.
Deleted:
- setup/migrate-v1.ts (orchestrator)
- setup/migrate-v1/{detect,validate,db,env,groups,channel-auth,channels,tasks}.ts
Kept:
- setup/migrate-v1/shared.ts (used by new migrate-v2/ steps)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New entry point: `bash migrate-v2.sh` from the v2 checkout.
Replaces the old setup-embedded migration flow with a standalone
4-phase script + rewritten Claude skill for the interactive parts.
Phase 0: Bootstrap (Node/pnpm/deps via setup.sh) + find v1
Phase 1: Core state (env, DB, groups, sessions, tasks)
Phase 2: Channels (clack multiselect, auth copy, code install)
Phase 3: Infrastructure (OneCLI, auth, Docker, skills, container build)
Service switchover: stop v1 → start v2 → test → keep or revert
Phase 4: Handoff → exec claude "/migrate-from-v1"
The skill handles: owner seeding, access policy, CLAUDE.local.md
cleanup, container config validation, fork customization porting.
Key fixes found during testing:
- triggerToEngage: requires_trigger=0 must override non-empty pattern
- unknown_sender_policy defaults to 'public' (strict drops all msgs
before owner is seeded)
- Service revert must stop v2 (parse unit name from step log, not
early tsx one-liner that can fail)
- Session continuity: copy JSONL from -workspace-group/ to
-workspace-agent/ and write continuation:claude into outbound.db
- container_config.additionalMounts written directly to container.json
(same shape in v1 and v2)
- EXIT trap writes handoff.json; explicit write_handoff before exec
Includes migrate-v2-reset.sh for dev iteration and docs/migration-dev.md
for testing/debugging reference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve import conflict in setup/auto.ts — keep runMigrateV1 import,
deduplicate runWindowedStep and getLaunchdLabel/getSystemdUnit imports.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
vercel@53.0.1 declares a dep on @vercel/static-build@2.9.22 which is not
published on npm (only 2.9.21 exists), breaking every fresh container
build that resolves vercel@latest.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the four defenses on the outbound side onto extractAttachmentFiles:
1. Reject unsafe messageId via isSafeAttachmentName before any inbox path
is built. WhatsApp passes msg.key.id through raw and that field is
client generated, so a peer can craft it; future end to end encrypted
adapters will have the same property.
2. lstatSync on the inbox dir refuses a pre placed symlink before
mkdirSync would silently follow it.
3. realpathSync + isPathInside contains the resolved dir under the
session inbox root.
4. writeFileSync uses the wx flag so a pre placed symlink at the file
path is refused atomically by the kernel; EEXIST surfaces as a
logged skip.
Threat: the session dir is mounted writable into the container at
/workspace, so a compromised agent can pre place inbox/<future msgId>/
as a symlink and wait for a chat message with a matching id to redirect
the host write. The four guards together close that window.
Consolidates with the existing isSafeAttachmentName helper from
attachment-safety.ts rather than introducing a duplicate basename
validator inside session-manager.
Co-Authored-By: Daisuke Tsuji <dim0627@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fixes on top of the follow-up pre-task-script work:
1. The void async IIFE inside the interval handler had no catch, so a
throw from the dynamic import or applyPreTaskScripts escaped as an
unhandled rejection — terminating the container. The initial-batch
path is wrapped by processQuery's outer try/catch; the follow-up
path needs its own. Now logs the error and lets the next tick retry.
2. Re-check `done` immediately before query.push. The flag can flip
true while applyPreTaskScripts is awaited (outer stream finishes
during the script execution); without the re-check we'd push into a
closed query. Claimed messages get released by the host's
processing-claim sweep — same recovery posture as the rest of the
poller.
Co-Authored-By: Michael Zazon <mzazon@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Routes the post-ping `_ping-test` cleanup through `spawnQuiet` +
`setupLog.step` so a non-zero exit from `delete-cli-agent.ts` lands
in `logs/setup-steps/cleanup-cli-agent.log` and the progression log,
and prints a one-line warn to the user. Previously the spawnSync was
fire-and-forget with `stdio: 'ignore'`, leaving an orphan agent group
silently if cleanup failed.
Restores the original copy on the cli-agent step labels, the ping
explainer paragraph, and the post-ping spinner stop line — those
copy changes are out of scope for this PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>