Files
nanoclaw/docs/v1-to-v2-changes.md
gabi-simons 3ee7d2147e feat: add v1 → v2 migration to setup flow (experimental)
`bash nanoclaw.sh` detects a v1 install before channel pairing and does a
best-effort automated port of operationally important state. Hands off to
a new `/migrate-from-v1` skill for owner seeding and fork customizations.

Between the timezone and channel steps, `setup/auto.ts` calls
`runMigrateV1()` which orchestrates these registered sub-steps (each a
separate entry in the progression log with its own raw log + status
block — failures never abort the chain):

- **migrate-detect** — scans siblings of the v2 checkout + common $HOME
  locations; `$NANOCLAW_V1_PATH` overrides authoritatively. Relaxed
  `package.json` check lets forks + partial installs still match; DB
  presence is the strongest signal.
- **migrate-validate** — asserts v1 DB shape (tables + required
  columns); writes `schema-mismatch.json` on failure. Subsequent steps
  short-circuit their DB-dependent parts but still run.
- **migrate-db** — seeds `agent_groups` + `messaging_groups` +
  `messaging_group_agents` from v1's `registered_groups`. JID
  decomposition (`dc:123` → `channel_type='discord'`,
  `platform_id='discord:123'`); `trigger_pattern` + `requires_trigger`
  → `engage_mode` + `engage_pattern` (mirrors migration 010 backfill).
  Users + user_roles are NOT seeded — the skill does that with an owner
  interview. Idempotent: existing rows reused, not duplicated.
- **migrate-groups** — rsync group folders. v1 `CLAUDE.md` → v2
  `CLAUDE.local.md` (v2 composes `CLAUDE.md` at container spawn); v1
  `container_config` JSON → `.v1-container-config.json` sidecar for the
  skill to translate. Tight v1-pattern scan (`/workspace/ipc/tasks`,
  `store/messages.db`, `[PR_CONTEXT:`, etc.) flags files referencing
  v1-specific infrastructure — content is NOT modified, just flagged in
  the handoff.
- **migrate-env** — merges v1 `.env` into v2 `.env`, never overwriting
  existing v2 keys.
- **migrate-channel-auth** — per-channel registry tracks v1 env keys,
  v2 required keys (with source-of-key instructions — e.g. Discord
  needs `DISCORD_PUBLIC_KEY` which v1 never stored), and candidate
  on-disk auth state paths (Baileys keystore, matrix sync state,
  etc.). Missing required v2 keys surface as actionable followups and
  flip the step to `partial`.
- **migrate-channels** — runs `setup/install-<channel>.sh` for each
  detected channel in non-interactive mode. Install-script output is
  captured to `logs/setup-migration/install-<channel>.log` sidecars
  (silent under the parent spinner). Channels with no v2 adapter get
  a `not_supported` followup but don't degrade status.
- **migrate-tasks** — v1 `scheduled_tasks` → `messages_in` rows with
  `kind='task'` in each session's `inbound.db`. `schedule_type`
  mapping (cron / interval / once → v2 cron). Idempotent: skips v1
  task ids already present. Inactive rows dumped to
  `inactive-tasks.json` for reference.

Everything writes to `logs/setup-migration/handoff.json` — the source
of truth the skill consumes.

`.claude/skills/migrate-from-v1/SKILL.md`:

- **Phase A** (always): owner seeding + v1 access policy flip
  (`unknown_sender_policy` public/strict) via `AskUserQuestion`. Pulls
  sender candidates from v1's `messages` table as hints.
- **Phase B** (if followups exist): walks
  `handoff.followups` — translates `.v1-container-config.json`
  sidecars, handles `not_supported` channels, fills in missing
  required keys with instructions on where to get them.
- **Phase C** (fork-aware): `git log <upstream>..HEAD` in v1. Empty →
  "no customizations to port." Non-empty → scope choice (mechanical /
  full interview / reference-only). Portable categories
  (`container/skills/*`, `.claude/skills/*`, docs) scan+copy with
  `scanForV1Patterns`. Non-portable (`src/*`,
  `container/agent-runner/src/*`) stash to `docs/v1-fork-reference/`
  — explicit "don't translate v1 infra to v2" warning because v1's
  IPC file queue / single DB don't exist in v2.

Clearly marked in README, CLAUDE.md, SKILL.md header, and via a `p.warn`
that fires once per run when v1 is detected. Users with no v1 install
see a silent skip — no prompts, no noise.

Verified end-to-end against a live v1 install (300 discord + 1
discord-supervisor groups, fork with ~15 commits of PR-factory work):
- Detect → validate → db (301 rows seeded) → groups (301 CLAUDE.local.md
  + 178 other files + 1 container_config sidecar) → env (4 keys copied)
  → channel-auth (flagged missing `DISCORD_APPLICATION_ID` +
  `DISCORD_PUBLIC_KEY`) → channels (discord installed, discord-supervisor
  → not_supported) → tasks (0 rows, skipped)
- Idempotent re-run: 0 rows created, 903 rows reused; tasks skip if
  id already present
- Fresh-user case: silent skip, no prompts, straight to "You're ready!"
- Schema-mismatch case: recorded to `schema-mismatch.json`, chain
  continues

- Unit tests for the pure transforms (`parseJid`,
  `inferChannelType`, `triggerToEngage`, `scanForV1Patterns`,
  `looksLikeV1Install`)
- Validate `requiredV2Keys` for telegram/slack/matrix/teams/webex/
  resend/linear against the actual Chat SDK packages (Discord was
  verified from real error output)
- Widen candidate auth file paths for WhatsApp/Matrix/iMessage based
  on real non-Discord v1 installs once we have some

See docs/v1-to-v2-changes.md for the v1 → v2 architecture diff.
2026-04-23 13:06:14 +00:00

173 lines
12 KiB
Markdown

# NanoClaw v1 → v2 — what changed
Big-picture differences between NanoClaw v1 (the `~/nanoclaw` checkout you've been running) and v2 (this rewrite). Not a migration guide — that's what `bash nanoclaw.sh` and the `/migrate-from-v1` skill are for. This doc is the **vocabulary**: when something has moved or been renamed, find it here.
Read this before touching the migration code or porting customizations forward.
---
## One-line summary
v1 was one Node process with one SQLite file and native channel adapters. v2 is a host that spawns per-session Docker containers, splits state across a central DB + per-session DB pair, routes through an explicit entity model, and installs channels as skills from a sibling branch.
---
## Entity model — the biggest shift
**v1:** one flat table `registered_groups(jid, name, folder, trigger_pattern, requires_trigger, is_main, channel_name)`. A group folder is the unit of agent identity. A chat (JID) is wired to exactly one folder, and `trigger_pattern` is an opaque regex the router applies to every incoming message.
**v2:** three tables, with a deliberate many-to-many in the middle:
```
agent_groups ─┐
├─ messaging_group_agents ─┬─ messaging_groups
│ (engage_mode, │ (channel_type,
│ engage_pattern, │ platform_id,
│ sender_scope, │ unknown_sender_policy)
│ ignored_message_policy,
│ session_mode, priority)
```
Consequences:
- **One agent can answer on many chats, and one chat can fan out to many agents.** v1 couldn't do either.
- **No `is_main` flag.** Privilege is now explicit via `user_roles` (owner/admin, global or scoped). See below.
- **No `trigger_pattern` regex.** Replaced with four orthogonal columns. Mapping rule used by the automated migration and by the `/migrate-from-v1` skill:
- v1 `trigger_pattern` non-empty → v2 `engage_mode='pattern'`, `engage_pattern = <the regex>`
- v1 `requires_trigger=0` or pattern was `.`/`.*` → v2 `engage_mode='pattern'`, `engage_pattern='.'` (the "always" flavor)
- no pattern and requires a trigger → v2 `engage_mode='mention'`
- `sender_scope` and `ignored_message_policy` are new; defaults `all` / `drop`
- **JID decomposition.** v1's `jid` column stored `dc:12345` / `tg:67890`. v2 splits this into `channel_type` + `platform_id`. Concretely: `dc:12345` becomes `channel_type='discord'`, `platform_id='discord:12345'`. Prefix aliases (`dc``discord`, `tg``telegram`, `wa``whatsapp`) are in `setup/migrate-v1/shared.ts`.
- **`channel_name` was unreliable in v1.** Many rows had it empty; the actual channel had to be guessed from the JID prefix. v2's `channel_type` is always explicit.
---
## Central DB vs session DBs
**v1:** one SQLite file at `store/messages.db`. Every chat, message, registered group, scheduled task, and session lived there. Host and any agent processes all opened the same file.
**v2:** three DB shapes.
1. `data/v2.db`**central**. Everything that isn't per-session: users, roles, agent groups, messaging groups, wirings, pending approvals, user DMs, schema migrations.
2. `data/v2-sessions/<session_id>/inbound.db`**host writes, container reads**. `messages_in`, routing, destinations, pending questions, processing_ack. This is where scheduled tasks live (see "Scheduling" below).
3. `data/v2-sessions/<session_id>/outbound.db`**container writes, host reads**. `messages_out`, session_state.
Exactly one writer per file. No cross-mount lock contention. Heartbeat is a file touch at `/workspace/.heartbeat`, not a DB update. Host uses even `seq` numbers, container uses odd.
Message history (v1 `messages` table, v1 `chats` table) is **not migrated**. The migration copies operationally important state forward (agents, channels, wirings, scheduled tasks, group folders) and leaves chat logs behind.
---
## Scheduling
**v1:** dedicated `scheduled_tasks` table in `store/messages.db` with its own columns (`schedule_type`, `schedule_value`, `next_run`, `last_run`, `context_mode`, `script`, `status`). A separate cron-ish scheduler process read from it.
**v2:** scheduled tasks are **`messages_in` rows with `kind='task'`** in a session's `inbound.db`. Relevant columns:
- `process_after` (ISO8601) — host sweep wakes the container when `datetime(process_after) <= datetime('now')`
- `recurrence` — cron string; `NULL` = one-shot
- `series_id` — groups recurring occurrences; set to the task id on first insert
- `status``pending` | `processing` | `completed` | `failed` | `paused`
The public API is `insertTask()` in `src/modules/scheduling/db.ts`. Recurrence is computed in the user's TZ via `cron-parser` (see `src/modules/scheduling/recurrence.ts`). The migration maps v1's `schedule_type`+`schedule_value` pair into a single cron string before calling `insertTask()`.
Tasks can exist before a session is awake — the host sweep creates/wakes the container on the first due tick.
---
## Credentials
**v1:** `.env` — plain environment variables. `DISCORD_BOT_TOKEN`, `ANTHROPIC_API_KEY`, etc. The host read them directly and passed them in to any code that needed them.
**v2:** OneCLI Agent Vault. A separate local service at `http://127.0.0.1:10254` holds secrets. Agents are *scoped* to specific secrets and the vault injects them into approved API requests as they leave the container. The container never sees the raw secret value.
Gotcha: auto-created agents default to `selective` secret mode — no secrets attached, even if matching secrets exist in the vault. See the "auto-created agents start in selective secret mode" section of the root CLAUDE.md for the fix (`onecli agents set-secret-mode --mode all`).
**What the automated migration does:** copies every v1 `.env` key verbatim into v2 `.env`, never overwriting existing v2 keys. The OneCLI vault migration is a separate step owned by the `/init-onecli` skill, which knows how to pull from `.env`.
---
## Channel adapters
**v1:** native adapters (e.g. `discord.js` used directly) imported in `src/channels/`. Installing a channel meant editing code, adding a dependency, and setting env vars.
**v2:** channel adapters live on a sibling `channels` branch. Each `/add-<channel>` skill:
1. `git fetch origin channels`
2. `git show channels:src/channels/<name>.ts > src/channels/<name>.ts`
3. Appends `import './<name>.js';` to `src/channels/index.ts`
4. `pnpm install @chat-adapter/<name>@<pinned>`
5. `pnpm run build`
Idempotent — re-running is a no-op. Pinned versions keep the supply chain honest. The automated migration detects which channels were wired in v1 (via distinct `channel_name` / JID prefix) and runs the matching `setup/install-<channel>.sh` for each. Channels in v1 that don't have a v2 skill (rare now, more common as v2 catches up) are recorded in the handoff file for the `/migrate-from-v1` skill to raise with the user.
**Channel auth beyond `.env`.** Some channels store session state on disk (Baileys WhatsApp keystore, Matrix sync state, iMessage tokens). The `channel-auth` sub-step has a per-channel registry (`setup/migrate-v1/shared.ts: CHANNEL_AUTH_REGISTRY`) that knows which file globs to copy alongside env keys.
---
## Privilege — from implicit to explicit
**v1:** `registered_groups.is_main = 1` flagged one group as the privileged one. No `users` table. Permissions were conventions, not enforced.
**v2:** explicit tables.
- `users(id = "<channel_type>:<handle>", kind, display_name)` — one row per messaging-platform identifier
- `user_roles(user_id, role ∈ {owner, admin}, agent_group_id nullable, granted_by, granted_at)` — owner is always global; admin can be global or scoped
- `agent_group_members(user_id, agent_group_id, ...)` — "known" membership for the `sender_scope='known'` gate
Owner gets seeded during the `/migrate-from-v1` skill's interview phase ("Which handle is you?"). The automated migration doesn't guess — v1 has no source of truth for it.
**Default access — "anyone can talk to the bot" vs "only known users".** v1 stored this implicitly (via trigger regex + `is_main`). v2 exposes it as `messaging_groups.unknown_sender_policy ∈ {'strict', 'request_approval', 'public'}`. The skill asks the user which mode v1 ran in and flips the migrated messaging groups accordingly.
---
## Group folders on disk
**v1:** `groups/<folder>/CLAUDE.md` and optional `logs/`. `CLAUDE.md` was a plain instruction file, group-specific.
**v2:** each group still lives at `groups/<folder>/`, but the shape is richer:
- `CLAUDE.md`**composed at container spawn** from `.claude-shared.md` (symlink to global) + `.claude-fragments/*.md` (module fragments) + `CLAUDE.local.md`. **Don't edit `CLAUDE.md` directly.**
- `CLAUDE.local.md` — per-group content. The migration writes v1's old `CLAUDE.md` here.
- `container.json` — optional per-group container config (apt deps, env, mounts). v1's `registered_groups.container_config` JSON is close but not identical — the migration stores the v1 payload at `groups/<folder>/.v1-container-config.json` for the skill to reconcile, rather than silently mapping it.
- `.claude-fragments/` and `.claude-shared.md` are installed by `initGroupFilesystem()` the first time the host touches the group, so the migration only has to write `CLAUDE.local.md` and leave the scaffolding to the host.
---
## Host process vs containers
**v1:** single Node process. The "agent" was the same process as the router.
**v2:** Node host at top, Bun-runtime Docker container per session. They communicate only via the two session DBs. No shared modules, no IPC, no stdin piping. If you wrote custom code that reached from the agent into host internals (or vice versa), that surface no longer exists — porting it is a `/migrate-from-v1` skill topic, not a mechanical copy.
Lockfiles: host uses `pnpm-lock.yaml`, agent-runner uses `bun.lock`. `minimumReleaseAge: 4320` on the host side (3-day supply-chain wait); agent-runner has no release-age gate.
---
## Self-modification and MCP tools
**v1:** if you added MCP servers or self-modification plumbing, it was usually direct edits to the long-running process.
**v2:**
- MCP servers register through `container/agent-runner/src/mcp-tools/*.ts` and load per-session. There's also `install_packages` and `add_mcp_server` self-mod tools that go through an admin-approval flow (`src/modules/self-mod/apply.ts`) before rebuilding the container image.
- Custom MCP tools you wrote in v1 map cleanly to the v2 tool registry, but the import paths, runtime (Bun vs Node), and SQL helper differences (`bun:sqlite` uses `$name`-prefixed params) may need adjustment. The skill walks through this.
---
## Things that are gone or don't map
- **`scheduled_tasks` as a separate table** — moved into session `inbound.db` under `kind='task'`. Migration ports active rows; inactive/completed are exported to `logs/setup-migration/inactive-tasks.json` for reference.
- **`messages` / `chats` tables (chat history)** — not migrated. Stay in the v1 checkout if you need them.
- **`router_state` (key/value)** — not migrated. v2 state lives in the explicit tables above.
- **`sessions` (v1 group→session_id)** — v1 sessions don't map; v2 sessions are keyed by `(agent_group_id, messaging_group_id, thread_id)` and are created on demand.
- **Raw access to the old `store/messages.db`** — the v1 DB is left in place and untouched. If migration goes wrong you can re-run it (the migration sub-steps are idempotent for agents/channels/wirings; folders use rsync semantics).
---
## Migration surface — where the code lives
- `setup/migrate-v1.ts` — orchestrator called from `setup/auto.ts` between the timezone and channel steps.
- `setup/migrate-v1/<sub-step>.ts` — registered in `setup/index.ts` STEPS; each runs under `runQuietStep`.
- `logs/setup.log` — progression log. Each sub-step appends one entry.
- `logs/setup-steps/NN-migrate-<x>.log` — raw per-sub-step stdout/stderr.
- `logs/setup-migration/handoff.json` — summary of what was migrated, what failed, what was deferred. Read by the `/migrate-from-v1` skill.
- `logs/setup-migration/schema-mismatch.json` — written only if `migrate-validate` finds a v1 DB that doesn't match the expected shape. The skill uses this to decide what to hand back to the user.
- `logs/setup-migration/inactive-tasks.json` — completed or stopped v1 `scheduled_tasks`, exported for reference.
- `.claude/skills/migrate-from-v1/SKILL.md` — tells Claude how to finish anything the automation couldn't and then interview the user about custom code changes.