Files

Gavriel Cohen 2617313f19 docs(migrate-from-v1): blockers-first + smoke test before deeper work

Phase 0 used to be "triage every failed step before doing anything
else", which front-loaded a bunch of fixes for things that don't
actually block the user from proving v2 works. Restructure:

- 0a — fix blockers only (1b/1d/2c/2d/3a/3b/3e). Defer non-blockers
  (1a, 1c, 1e, 2b, 3c) — most surface naturally in later phases.
- 0b — smoke test: switch v1 → v2, send a real message, verify the
  routing chain in logs/nanoclaw.log. AskUserQuestion gates whether
  to continue.
- Revert recipe (launchctl/systemctl) called out as always-available,
  not destructive — v1 process, data, and credentials are untouched.

Up-front list of what the script handled now also mentions the
WhatsApp LID resolution and Baileys keystore copy, so users see
exactly what continuity they're getting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-02 18:28:46 +03:00

16 KiB

Raw Blame History

name, description

name	description
migrate-from-v1	Finish migrating a NanoClaw v1 install into v2. Run after `bash migrate-v2.sh` completes. Seeds the owner, cleans up CLAUDE.local.md files, reconciles container configs, and helps port custom v1 code. Triggers on "migrate from v1", "finish migration", "v1 migration".

Finish v1 → v2 migration

bash migrate-v2.sh already ran the deterministic migration. It handled:

.env keys merged
v2 DB seeded (agent_groups, messaging_groups, wiring)
Group folders copied (v1 CLAUDE.md → v2 CLAUDE.local.md)
Session data copied with conversation continuity (incl. Claude Code memory + JSONL transcripts)
Scheduled tasks ported
Channel code installed and auth state copied (incl. WhatsApp Baileys keystore)
WhatsApp LIDs resolved from store/auth and aliased into messaging_groups
Container skills copied
Container image built

Your job is the parts that need human judgment: triage any failed steps, seed the owner, clean up CLAUDE.local.md files, reconcile configs, and port any fork customizations.

Read logs/setup-migration/handoff.json first — it has overall_status, per-step results in steps, and a followups list.

Preflight: was the script run?

Before anything else, check that logs/setup-migration/handoff.json exists. If it doesn't, the user is invoking this skill before migrate-v2.sh ran. Stop and tell them, verbatim:

This skill finishes a migration that migrate-v2.sh started. Run that first, in your terminal — not from inside Claude:
bash migrate-v2.sh
It needs interactive prompts (channel selection, service switchover) and runs Node/pnpm bootstrap, Docker, OneCLI setup, and a container build that don't fit inside a Claude session. When it finishes, it'll hand control back to Claude automatically — at which point this skill picks up.

Do not attempt to run the script yourself, simulate its effects, or pick up the migration mid-stream. The deterministic side has dependencies on a real interactive shell.

Once handoff.json exists, proceed to Phase 0.

Phase 0: Get v2 routing real messages

Goal: get from "the script finished" to "the user sent a message and v2 answered" as fast as possible — before spending tokens on CLAUDE.local.md cleanup, fork customisations, or anything that requires deeper engagement. v1 is paused, not touched; flipping back is a one-line restart.

0a — Fix blockers (only the blockers)

Walk handoff.steps. A step is blocking only if its failure prevents v2 from routing a single message. Treat these as blockers:

Step	Why blocking
`1b-db`	No `messaging_groups` → router has nothing to match
`1d-sessions`	No session → no inbound DB to write into
`2c-install-<channel>`	No adapter for the channel the user wants to test
`2d-whatsapp-lids`	WhatsApp DMs may arrive as `<lid>@lid` and miss migrated phone-keyed rows
`3a-docker` / `3e-build`	No container image → agent can't run
`3b-onecli`	Anthropic credentials not injected → first agent call 401s

Defer these — they don't block a smoke test, and most surface naturally in later phases:

1a-env, 1c-groups, 1e-tasks, 2b-channel-auth, 3c-auth

For each blocker: read handoff.step_logs_dir/<step>.log, identify the cause, re-run the underlying script directly (pnpm exec tsx setup/migrate-v2/<step>.ts <v1_path>) or hand-fix mechanically. Use AskUserQuestion for judgment calls. Don't simulate the script's work.

Common blockers:

1b-db failed: JID couldn't be parsed. Insert agent_groups + messaging_groups for the user's confirmed channel.
2c-install-<channel> failed: git fetch origin channels issue. The user can run bash setup/install-<channel>.sh directly.
3e-build failed: usually stale builder cache. docker buildx prune -f && ./container/build.sh.

0b — Smoke test before any further migration work

Tell the user, verbatim:

Before we touch CLAUDE.local.md or fork customisations, let's confirm v2 actually answers your real messages. This is non-destructive — v1 is just paused, not touched. v1 and v2 share your WhatsApp identity (we copied store/auth/ over), so only one can be online at a time, but flipping back is instant.

Find the v2 service unit (per-checkout hash):

# macOS
launchctl list | grep nanoclaw
# Linux
systemctl --user list-units 'nanoclaw*'

Switch v1 → v2:

# macOS
launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist
launchctl load   ~/Library/LaunchAgents/com.nanoclaw-v2-<hash>.plist

# Linux
systemctl --user stop nanoclaw
systemctl --user start nanoclaw-v2-<hash>

Tail the log and confirm clean boot:

tail -f logs/nanoclaw.log

Watch for NanoClaw running plus Channel adapter started for each installed channel (and Connected to <channel> for native adapters like WhatsApp).

Ask the user to send a real test message — a DM to the bot, or a post in a known group from a non-bot account. A working route logs an inbound event → session resolution → container spawn → outbound delivery.

AskUserQuestion: "Did v2 respond? — Yes / No, here's what happened."

If yes: continue to Phase 1.

If no: do not proceed. Read logs/nanoclaw.log + logs/nanoclaw.error.log and diagnose. Common patterns:

WhatsApp DM with no routing chain in the log → check SELECT platform_id FROM messaging_groups WHERE platform_id LIKE '%@lid'. If empty, re-run setup/migrate-v2/whatsapp-resolve-lids.ts.
Agent inside container fails on Anthropic 401 → OneCLI agents start in selective secret mode. onecli agents set-secret-mode --id <agent-id> --mode all.
Channel disconnected silently → restart: launchctl kickstart -k gui/$(id -u)/com.nanoclaw-v2-<hash>.

Re-test before continuing.

Reverting (anytime — not just now)

# macOS — back to v1
launchctl unload ~/Library/LaunchAgents/com.nanoclaw-v2-<hash>.plist
launchctl load   ~/Library/LaunchAgents/com.nanoclaw.plist

# Linux
systemctl --user stop nanoclaw-v2-<hash>
systemctl --user start nanoclaw

v1's process, data, credentials, and groups are untouched the whole time. Reverting is just a service restart.

Deferred non-blocker failures

If you skipped non-blocker failures in 0a (1a-env, 1c-groups, 1e-tasks, 2b-channel-auth, 3c-auth), they still need fixing — most surface naturally in later phases (1c-groups ↔ Phase 2 CLAUDE.local.md cleanup, 1e-tasks ↔ task verification). Re-run any that don't get covered before declaring the migration done.

Phase 1: Owner and access

v2 auto-creates a users row for every sender it sees (via extractAndUpsertUser in src/modules/permissions/index.ts). By the time this skill runs, the owner's row likely already exists — it just needs the owner role granted.

User ID format: always <channel_type>:<platform_handle>. Each channel populates this differently:

Telegram: telegram:<numeric_user_id> (e.g. telegram:6037840640)
Discord: discord:<snowflake_user_id> (e.g. discord:123456789012345678)
WhatsApp: whatsapp:<phone>@s.whatsapp.net (e.g. whatsapp:14155551234@s.whatsapp.net)
Slack: slack:<user_id> (e.g. slack:U04ABCDEF)
Others: <channel_type>:<platform_id>

Steps:

Query users table: SELECT id, kind, display_name FROM users.
If exactly one user exists, confirm: AskUserQuestion: "Is <display_name> (<id>) you?" — Yes / No, let me type it.
If multiple users exist, present them as options in AskUserQuestion.
If no users exist yet (service hasn't received a message), ask the user to send a test message first, then re-query.

Once confirmed, check user_roles — if the owner role already exists, skip. Otherwise insert:

INSERT INTO user_roles (user_id, role, agent_group_id, granted_by, granted_at)
VALUES ('<user_id>', 'owner', NULL, NULL, datetime('now'))

Use the DB helpers in src/db/user-roles.ts — they keep indexes correct. Init the DB first:

import { initDb } from '../src/db/connection.js';
import { runMigrations } from '../src/db/migrations/index.js';
import { DATA_DIR } from '../src/config.js';
import path from 'path';
const db = initDb(path.join(DATA_DIR, 'v2.db'));
runMigrations(db);

Access policy

After seeding the owner, discuss the access policy. v2's messaging_groups.unknown_sender_policy controls who can interact with the bot. migrate-v2.sh set it to public so the bot would respond during the switchover test, but the user may want to tighten it.

Present the options via AskUserQuestion:

Public (current) — anyone can message the bot. Good for personal DM bots.
Known users only — only users in agent_group_members can trigger the bot. Others are silently dropped.
Approval required — unknown senders trigger an approval request to the owner. Good for group chats where you want to vet new members.

If the user picks option 2 or 3, seed the known users from v1's message history. The v1 database is at <handoff.v1_path>/store/messages.db. It has a messages table with sender and sender_name columns. For each group:

-- v1: unique senders per chat (excluding bot messages)
SELECT DISTINCT sender, sender_name
FROM messages
WHERE chat_jid = '<v1_jid>' AND is_from_me = 0 AND sender IS NOT NULL

The sender value is a platform handle (e.g. 6037840640 for Telegram). Build the v2 user ID by inferring the channel type from the chat JID prefix (use parseJid from setup/migrate-v2/shared.ts) and combining: <channel_type>:<sender>.

For each sender:

Upsert into users(id, kind, display_name) if not already present.
Insert into agent_group_members(user_id, agent_group_id) for each agent group wired to that messaging group.

Show the user the list of senders being imported and let them deselect any they don't want.

Then update the messaging groups:

UPDATE messaging_groups SET unknown_sender_policy = '<chosen_policy>'
WHERE id IN (SELECT id FROM messaging_groups WHERE channel_type IN (<migrated_channels>))

Phase 2: Clean up CLAUDE.local.md

The migration copied v1's entire CLAUDE.md into CLAUDE.local.md for each group. This file now contains v1 boilerplate that v2 handles through its own composed fragments (container/CLAUDE.md + .claude-fragments/module-*.md). The user's customizations are buried inside.

For each group that has a CLAUDE.local.md:

Read the file.
Read the v1 template it was based on. Determine which template by checking the v1 install:
- If the group had is_main=1 in v1's registered_groups, the template was groups/main/CLAUDE.md
- Otherwise, the template was groups/global/CLAUDE.md
- The v1 path is in handoff.json → v1_path
Diff the file against the template. Identify sections that are:
- Stock boilerplate (identical to template) — remove. v2's fragments cover this.
- User customizations (added sections, modified sections) — keep.
The following v1 sections are now handled by v2 fragments and should be removed even if slightly modified:
- "What You Can Do" → v2 runtime system prompt
- "Communication" / "Internal thoughts" / "Sub-agents" → container/CLAUDE.md + module-core.md
- "Your Workspace" / workspace path references → container/CLAUDE.md
- "Memory" (the stock version) → container/CLAUDE.md
- "Message Formatting" → container/CLAUDE.md
- "Admin Context" → v2 uses user_roles, not is_main
- "Authentication" → v2 uses OneCLI
- "Container Mounts" → v2 mounts are different
- "Managing Groups" / "Finding Available Groups" / "Registered Groups Config" → v2 entity model, no IPC
- "Global Memory" → v2 has .claude-shared.md symlink
- "Scheduling for Other Groups" → module-scheduling.md
- "Task Scripts" → module-scheduling.md
- "Sender Allowlist" → v2 uses unknown_sender_policy + user_roles
Fix path references in kept sections:
- /workspace/group/ → /workspace/agent/
- /workspace/project/ → these paths don't exist in v2; discuss with the user
- /workspace/ipc/ → gone; remove references
- /workspace/extra/ → v2 uses container.json additionalMounts; keep but note the path may change
Keep the # Name heading and first paragraph (identity) — this is the user's agent personality.
Show the user the proposed new CLAUDE.local.md before writing it. Use AskUserQuestion: "Here's what I'd keep — look right?" with options to approve, edit, or keep the original.

If a CLAUDE.local.md has no user customizations (pure template copy), write a minimal file with just the identity heading.

Phase 3: Container config

migrate-v2.sh writes container.json directly from v1's container_config (the additionalMounts shape is identical). If the v1 config was unparseable, it falls back to a .v1-container-config.json sidecar.

For each group, check:

If container.json exists, read it and verify the additionalMounts host paths are still valid on this machine. Flag any that don't exist.
If .v1-container-config.json exists (parse failure fallback), read it, discuss with the user, and write a proper container.json. Then delete the sidecar.
Check for env or packages fields — env may overlap with OneCLI vault, packages (apt/npm) are portable.

Phase 4: Fork customizations

Check whether the user's v1 install was a customized fork.

cd <v1_path>
git remote -v
git log --oneline <upstream>/main..HEAD 2>/dev/null

If no commits ahead of upstream: stock v1, skip this phase.

If there are commits:

Show the commit list to the user.
AskUserQuestion: "How do you want to handle your v1 customizations?"
- Copy portable items (recommended) — copy container/skills/*, .claude/skills/*, docs/*. Scan each with scanForV1Patterns from setup/migrate-v2/shared.ts.
- Full walkthrough — go commit by commit, decide together.
- Reference only — stash to docs/v1-fork-reference/ for later.
Source code (src/*, container/agent-runner/src/*) is NOT portable — v2's architecture is fundamentally different. Stash to docs/v1-fork-reference/ with a README explaining what each file did. Don't translate.

Principles

v1 checkout is read-only. Never modify files under handoff.v1_path.
Show before writing. Show diffs/proposed content before modifying CLAUDE.local.md or container.json.
Mask credentials when displaying (first 4 + ... + last 4 characters).
handoff.json is the recovery point. If context gets compacted, re-read it and git status to recover state.

Setup steps you can run

The setup flow at setup/index.ts has individual steps you can invoke if something is missing or failed:

pnpm exec tsx setup/index.ts --step <name>

Step	When to use
`onecli`	OneCLI not installed or not healthy
`auth`	No Anthropic credential in vault
`container`	Container image needs rebuild
`service`	Service not installed or not running
`mounts`	Mount allowlist missing
`verify`	End-to-end health check (run after everything else)
`environment`	System check (Node, dirs)

When done

Run the verify step to confirm everything works:
```
pnpm exec tsx setup/index.ts --step verify
```
Delete logs/setup-migration/handoff.json — offer to save as docs/migration-<date>.md first.

Restart the service if running so changes take effect:

# Linux
systemctl --user restart nanoclaw-v2-*
# macOS
launchctl kickstart -k gui/$(id -u)/com.nanoclaw-v2-*

16 KiB Raw Blame History