From 53c11a2d53a97565e18acffc2eeac90005b7e554 Mon Sep 17 00:00:00 2001 From: gavrielc Date: Fri, 17 Apr 2026 15:10:17 +0300 Subject: [PATCH] chore(skills): delete 9 irrelevant legacy skills MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit These shipped with the old v1 architecture and are no longer needed: - add-reactions, add-voice-transcription, add-image-vision, add-pdf-reader, use-local-whisper — Chat SDK channels handle these natively now; the WhatsApp native (Baileys) adapter on the channels branch covers attachments and reactions out of the box. - add-compact — no longer needed. - add-telegram-swarm — Chat SDK Teams adapter handles multi-bot identity. - channel-formatting — Chat SDK does per-channel formatting natively. - add-gmail — was built on a legacy MCP server; deprecated. add-emacs and use-native-credential-proxy are kept and will be ported to the current architecture in follow-up commits. Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/skills/add-compact/SKILL.md | 135 ------ .claude/skills/add-gmail/SKILL.md | 236 ----------- .claude/skills/add-image-vision/SKILL.md | 94 ----- .claude/skills/add-pdf-reader/SKILL.md | 104 ----- .claude/skills/add-reactions/SKILL.md | 117 ------ .claude/skills/add-telegram-swarm/SKILL.md | 384 ------------------ .../skills/add-voice-transcription/SKILL.md | 148 ------- .claude/skills/channel-formatting/SKILL.md | 137 ------- .claude/skills/use-local-whisper/SKILL.md | 152 ------- 9 files changed, 1507 deletions(-) delete mode 100644 .claude/skills/add-compact/SKILL.md delete mode 100644 .claude/skills/add-gmail/SKILL.md delete mode 100644 .claude/skills/add-image-vision/SKILL.md delete mode 100644 .claude/skills/add-pdf-reader/SKILL.md delete mode 100644 .claude/skills/add-reactions/SKILL.md delete mode 100644 .claude/skills/add-telegram-swarm/SKILL.md delete mode 100644 .claude/skills/add-voice-transcription/SKILL.md delete mode 100644 .claude/skills/channel-formatting/SKILL.md delete mode 100644 .claude/skills/use-local-whisper/SKILL.md diff --git a/.claude/skills/add-compact/SKILL.md b/.claude/skills/add-compact/SKILL.md deleted file mode 100644 index ee9674a..0000000 --- a/.claude/skills/add-compact/SKILL.md +++ /dev/null @@ -1,135 +0,0 @@ ---- -name: add-compact -description: Add /compact command for manual context compaction. Solves context rot in long sessions by forwarding the SDK's built-in /compact slash command. Main-group or trusted sender only. ---- - -# Add /compact Command - -Adds a `/compact` session command that compacts conversation history to fight context rot in long-running sessions. Uses the Claude Agent SDK's built-in `/compact` slash command — no synthetic system prompts. - -**Session contract:** `/compact` keeps the same logical session alive. The SDK returns a new session ID after compaction (via the `init` system message), which the agent-runner forwards to the orchestrator as `newSessionId`. No destructive reset occurs — the agent retains summarized context. - -## Phase 1: Pre-flight - -Check if `src/session-commands.ts` exists: - -```bash -test -f src/session-commands.ts && echo "Already applied" || echo "Not applied" -``` - -If already applied, skip to Phase 3 (Verify). - -## Phase 2: Apply Code Changes - -Merge the skill branch: - -```bash -git fetch upstream skill/compact -git merge upstream/skill/compact -``` - -> **Note:** `upstream` is the remote pointing to `qwibitai/nanoclaw`. If using a different remote name, substitute accordingly. - -This adds: -- `src/session-commands.ts` (extract and authorize session commands) -- `src/session-commands.test.ts` (unit tests for command parsing and auth) -- Session command interception in `src/index.ts` (both `processGroupMessages` and `startMessageLoop`) -- Slash command handling in `container/agent-runner/src/index.ts` - -### Validate - -```bash -pnpm test -pnpm run build -``` - -### Rebuild container - -```bash -./container/build.sh -``` - -### Restart service - -```bash -launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS -# Linux: systemctl --user restart nanoclaw -``` - -## Phase 3: Verify - -### Integration Test - -1. Start NanoClaw in dev mode: `pnpm run dev` -2. From the **main group** (self-chat), send exactly: `/compact` -3. Verify: - - The agent acknowledges compaction (e.g., "Conversation compacted.") - - The session continues — send a follow-up message and verify the agent responds coherently - - A conversation archive is written to `groups/{folder}/conversations/` (by the PreCompact hook) - - Container logs show `Compact boundary observed` (confirms SDK actually compacted) - - If `compact_boundary` was NOT observed, the response says "compact_boundary was not observed" -4. From a **non-main group** as a non-admin user, send: `@ /compact` -5. Verify: - - The bot responds with "Session commands require admin access." - - No compaction occurs, no container is spawned for the command -6. From a **non-main group** as the admin (device owner / `is_from_me`), send: `@ /compact` -7. Verify: - - Compaction proceeds normally (same behavior as main group) -8. While an **active container** is running for the main group, send `/compact` -9. Verify: - - The active container is signaled to close (authorized senders only — untrusted senders cannot kill in-flight work) - - Compaction proceeds via a new container once the active one exits - - The command is not dropped (no cursor race) -10. Send a normal message, then `/compact`, then another normal message in quick succession (same polling batch): -11. Verify: - - Pre-compact messages are sent to the agent first (check container logs for two `runAgent` calls) - - Compaction proceeds after pre-compact messages are processed - - Messages **after** `/compact` in the batch are preserved (cursor advances to `/compact`'s timestamp only) and processed on the next poll cycle -12. From a **non-main group** as a non-admin user, send `@ /compact`: -13. Verify: - - Denial message is sent ("Session commands require admin access.") - - The `/compact` is consumed (cursor advanced) — it does NOT replay on future polls - - Other messages in the same batch are also consumed (cursor is a high-water mark — this is an accepted tradeoff for the narrow edge case of denied `/compact` + other messages in the same polling interval) - - No container is killed or interrupted -14. From a **non-main group** (with `requiresTrigger` enabled) as a non-admin user, send bare `/compact` (no trigger prefix): -15. Verify: - - No denial message is sent (trigger policy prevents untrusted bot responses) - - The `/compact` is consumed silently - - Note: in groups where `requiresTrigger` is `false`, a denial message IS sent because the sender is considered reachable -16. After compaction, verify **no auto-compaction** behavior — only manual `/compact` triggers it - -### Validation on Fresh Clone - -```bash -git clone /tmp/nanoclaw-test -cd /tmp/nanoclaw-test -claude # then run /add-compact -pnpm run build -pnpm test -./container/build.sh -# Manual: send /compact from main group, verify compaction + continuation -# Manual: send @ /compact from non-main as non-admin, verify denial -# Manual: send @ /compact from non-main as admin, verify allowed -# Manual: verify no auto-compaction behavior -``` - -## Security Constraints - -- **Main-group or trusted/admin sender only.** The main group is the user's private self-chat and is trusted (see `docs/SECURITY.md`). Non-main groups are untrusted — a careless or malicious user could wipe the agent's short-term memory. However, the device owner (`is_from_me`) is always trusted and can compact from any group. -- **No auto-compaction.** This skill implements manual compaction only. Automatic threshold-based compaction is a separate concern and should be a separate skill. -- **No config file.** NanoClaw's philosophy is customization through code changes, not configuration sprawl. -- **Transcript archived before compaction.** The existing `PreCompact` hook in the agent-runner archives the full transcript to `conversations/` before the SDK compacts it. -- **Session continues after compaction.** This is not a destructive reset. The conversation continues with summarized context. - -## What This Does NOT Do - -- No automatic compaction threshold (add separately if desired) -- No `/clear` command (separate skill, separate semantics — `/clear` is a destructive reset) -- No cross-group compaction (each group's session is isolated) -- No changes to the container image, Dockerfile, or build script - -## Troubleshooting - -- **"Session commands require admin access"**: Only the device owner (`is_from_me`) or main-group senders can use `/compact`. Other users are denied. -- **No compact_boundary in logs**: The SDK may not emit this event in all versions. Check the agent-runner logs for the warning message. Compaction may still have succeeded. -- **Pre-compact failure**: If messages before `/compact` fail to process, the error message says "Failed to process messages before /compact." The cursor advances past sent output to prevent duplicates; `/compact` remains pending for the next attempt. diff --git a/.claude/skills/add-gmail/SKILL.md b/.claude/skills/add-gmail/SKILL.md deleted file mode 100644 index 6a13291..0000000 --- a/.claude/skills/add-gmail/SKILL.md +++ /dev/null @@ -1,236 +0,0 @@ ---- -name: add-gmail -description: Add Gmail integration to NanoClaw. Can be configured as a tool (agent reads/sends emails when triggered from WhatsApp) or as a full channel (emails can trigger the agent, schedule tasks, and receive replies). Guides through GCP OAuth setup and implements the integration. ---- - -# Add Gmail Integration - -This skill adds Gmail support to NanoClaw — either as a tool (read, send, search, draft) or as a full channel that polls the inbox. - -## Phase 1: Pre-flight - -### Check if already applied - -Check if `src/channels/gmail.ts` exists. If it does, skip to Phase 3 (Setup). The code changes are already in place. - -### Ask the user - -Use `AskUserQuestion`: - -AskUserQuestion: Should incoming emails be able to trigger the agent? - -- **Yes** — Full channel mode: the agent listens on Gmail and responds to incoming emails automatically -- **No** — Tool-only: the agent gets full Gmail tools (read, send, search, draft) but won't monitor the inbox. No channel code is added. - -## Phase 2: Apply Code Changes - -### Ensure channel remote - -```bash -git remote -v -``` - -If `gmail` is missing, add it: - -```bash -git remote add gmail https://github.com/qwibitai/nanoclaw-gmail.git -``` - -### Merge the skill branch - -```bash -git fetch gmail main -git merge gmail/main || { - git checkout --theirs pnpm-lock.yaml - git add pnpm-lock.yaml - git merge --continue -} -``` - -This merges in: -- `src/channels/gmail.ts` (GmailChannel class with self-registration via `registerChannel`) -- `src/channels/gmail.test.ts` (unit tests) -- `import './gmail.js'` appended to the channel barrel file `src/channels/index.ts` -- Gmail credentials mount (`~/.gmail-mcp`) in `src/container-runner.ts` -- Gmail MCP server (`@gongrzhe/server-gmail-autoauth-mcp`) and `mcp__gmail__*` allowed tool in `container/agent-runner/src/index.ts` -- `googleapis` npm dependency in `package.json` - -If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides. - -### Add email handling instructions (Channel mode only) - -If the user chose channel mode, append the following to `groups/main/CLAUDE.md` (before the formatting section): - -```markdown -## Email Notifications - -When you receive an email notification (messages starting with `[Email from ...`), inform the user about it but do NOT reply to the email unless specifically asked. You have Gmail tools available — use them only when the user explicitly asks you to reply, forward, or take action on an email. -``` - -### Validate code changes - -```bash -pnpm install -pnpm run build -pnpm exec vitest run src/channels/gmail.test.ts -``` - -All tests must pass (including the new Gmail tests) and build must be clean before proceeding. - -## Phase 3: Setup - -### Check existing Gmail credentials - -```bash -ls -la ~/.gmail-mcp/ 2>/dev/null || echo "No Gmail config found" -``` - -If `credentials.json` already exists with real tokens (not `onecli-managed` values), skip to "Build and restart" below. - -### GCP Project Setup - -Check if OneCLI is configured: - -```bash -grep -q 'ONECLI_URL=.' .env 2>/dev/null && echo "onecli" || echo "manual" -``` - -**If OneCLI:** Tell the user to open `${ONECLI_URL}/connections?connect=gmail` to set up their Gmail connection. The dashboard walks them through creating a Google Cloud OAuth app and authorizing it. Ask them to let you know when done. - -Once the user confirms, run: - -```bash -onecli apps get --provider gmail -``` - -Check that `config.hasCredentials` is `true` or `connection` is not null. The response `hint` field has instructions and a docs URL for what stub credential files to create under `~/.gmail-mcp/`. Follow the hint — never overwrite existing files that don't contain `onecli-managed` values. - -**If manual:** Tell the user: - -> I need you to set up Google Cloud OAuth credentials: -> -> 1. Open https://console.cloud.google.com — create a new project or select existing -> 2. Go to **APIs & Services > Library**, search "Gmail API", click **Enable** -> 3. Go to **APIs & Services > Credentials**, click **+ CREATE CREDENTIALS > OAuth client ID** -> - If prompted for consent screen: choose "External", fill in app name and email, save -> - Application type: **Desktop app**, name: anything (e.g., "NanoClaw Gmail") -> 4. Click **DOWNLOAD JSON** and save as `gcp-oauth.keys.json` -> -> Where did you save the file? (Give me the full path, or paste the file contents here) - -If user provides a path, copy it: - -```bash -mkdir -p ~/.gmail-mcp -cp "/path/user/provided/gcp-oauth.keys.json" ~/.gmail-mcp/gcp-oauth.keys.json -``` - -If user pastes JSON content, write it to `~/.gmail-mcp/gcp-oauth.keys.json`. - -### OAuth Authorization - -Tell the user: - -> I'm going to run Gmail authorization. A browser window will open — sign in and grant access. If you see an "app isn't verified" warning, click "Advanced" then "Go to [app name] (unsafe)" — this is normal for personal OAuth apps. - -Run the authorization: - -```bash -pnpm dlx @gongrzhe/server-gmail-autoauth-mcp auth -``` - -If that fails (some versions don't have an auth subcommand), try `timeout 60 pnpm dlx @gongrzhe/server-gmail-autoauth-mcp || true`. Verify with `ls ~/.gmail-mcp/credentials.json`. - -### Build and restart - -Clear stale per-group agent-runner copies (they only get re-created if missing, so existing copies won't pick up the new Gmail server): - -```bash -rm -r data/sessions/*/agent-runner-src 2>/dev/null || true -``` - -Rebuild the container (agent-runner changed): - -```bash -cd container && ./build.sh -``` - -Then compile and restart: - -```bash -pnpm run build -launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS -# Linux: systemctl --user restart nanoclaw -``` - -## Phase 4: Verify - -### Test tool access (both modes) - -Tell the user: - -> Gmail is connected! Send this in your main channel: -> -> `@Andy check my recent emails` or `@Andy list my Gmail labels` - -### Test channel mode (Channel mode only) - -Tell the user to send themselves a test email. The agent should pick it up within a minute. Monitor: `tail -f logs/nanoclaw.log | grep -iE "(gmail|email)"`. - -Once verified, offer filter customization via `AskUserQuestion` — by default, only emails in the Primary inbox trigger the agent (Promotions, Social, Updates, and Forums are excluded). The user can keep this default or narrow further by sender, label, or keywords. No code changes needed for filters. - -### Check logs if needed - -```bash -tail -f logs/nanoclaw.log -``` - -## Troubleshooting - -### Gmail connection not responding - -Test directly: - -```bash -pnpm dlx @gongrzhe/server-gmail-autoauth-mcp -``` - -### OAuth token expired - -Re-authorize: - -```bash -rm ~/.gmail-mcp/credentials.json -pnpm dlx @gongrzhe/server-gmail-autoauth-mcp -``` - -### Container can't access Gmail - -- Verify `~/.gmail-mcp` is mounted: check `src/container-runner.ts` for the `.gmail-mcp` mount -- Check container logs: `cat groups/main/logs/container-*.log | tail -50` - -### Emails not being detected (Channel mode only) - -- By default, the channel polls unread Primary inbox emails (`is:unread category:primary`) -- Check logs for Gmail polling errors - -## Removal - -### Tool-only mode - -1. Remove `~/.gmail-mcp` mount from `src/container-runner.ts` -2. Remove `gmail` MCP server and `mcp__gmail__*` from `container/agent-runner/src/index.ts` -3. Rebuild and restart -4. Clear stale agent-runner copies: `rm -r data/sessions/*/agent-runner-src 2>/dev/null || true` -5. Rebuild: `cd container && ./build.sh && cd .. && pnpm run build && launchctl kickstart -k gui/$(id -u)/com.nanoclaw` (macOS) or `systemctl --user restart nanoclaw` (Linux) - -### Channel mode - -1. Delete `src/channels/gmail.ts` and `src/channels/gmail.test.ts` -2. Remove `import './gmail.js'` from `src/channels/index.ts` -3. Remove `~/.gmail-mcp` mount from `src/container-runner.ts` -4. Remove `gmail` MCP server and `mcp__gmail__*` from `container/agent-runner/src/index.ts` -5. Uninstall: `pnpm uninstall googleapis` -6. Rebuild and restart -7. Clear stale agent-runner copies: `rm -r data/sessions/*/agent-runner-src 2>/dev/null || true` -8. Rebuild: `cd container && ./build.sh && cd .. && pnpm run build && launchctl kickstart -k gui/$(id -u)/com.nanoclaw` (macOS) or `systemctl --user restart nanoclaw` (Linux) diff --git a/.claude/skills/add-image-vision/SKILL.md b/.claude/skills/add-image-vision/SKILL.md deleted file mode 100644 index 4a9da26..0000000 --- a/.claude/skills/add-image-vision/SKILL.md +++ /dev/null @@ -1,94 +0,0 @@ ---- -name: add-image-vision -description: Add image vision to NanoClaw agents. Resizes and processes WhatsApp image attachments, then sends them to Claude as multimodal content blocks. ---- - -# Image Vision Skill - -Adds the ability for NanoClaw agents to see and understand images sent via WhatsApp. Images are downloaded, resized with sharp, saved to the group workspace, and passed to the agent as base64-encoded multimodal content blocks. - -## Phase 1: Pre-flight - -1. Check if `src/image.ts` exists — skip to Phase 3 if already applied -2. Confirm `sharp` is installable (native bindings require build tools) - -**Prerequisite:** WhatsApp must be installed first (`skill/whatsapp` merged). This skill modifies WhatsApp channel files. - -## Phase 2: Apply Code Changes - -### Ensure WhatsApp fork remote - -```bash -git remote -v -``` - -If `whatsapp` is missing, add it: - -```bash -git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git -``` - -### Merge the skill branch - -```bash -git fetch whatsapp skill/image-vision -git merge whatsapp/skill/image-vision || { - git checkout --theirs pnpm-lock.yaml - git add pnpm-lock.yaml - git merge --continue -} -``` - -This merges in: -- `src/image.ts` (image download, resize via sharp, base64 encoding) -- `src/image.test.ts` (8 unit tests) -- Image attachment handling in `src/channels/whatsapp.ts` -- Image passing to agent in `src/index.ts` and `src/container-runner.ts` -- Image content block support in `container/agent-runner/src/index.ts` -- `sharp` npm dependency in `package.json` - -If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides. - -### Validate code changes - -```bash -pnpm install -pnpm run build -pnpm exec vitest run src/image.test.ts -``` - -All tests must pass and build must be clean before proceeding. - -## Phase 3: Configure - -1. Rebuild the container (agent-runner changes need a rebuild): - ```bash - ./container/build.sh - ``` - -2. Sync agent-runner source to group caches: - ```bash - for dir in data/sessions/*/agent-runner-src/; do - cp container/agent-runner/src/*.ts "$dir" - done - ``` - -3. Restart the service: - ```bash - launchctl kickstart -k gui/$(id -u)/com.nanoclaw - ``` - -## Phase 4: Verify - -1. Send an image in a registered WhatsApp group -2. Check the agent responds with understanding of the image content -3. Check logs for "Processed image attachment": - ```bash - tail -50 groups/*/logs/container-*.log - ``` - -## Troubleshooting - -- **"Image - download failed"**: Check WhatsApp connection stability. The download may timeout on slow connections. -- **"Image - processing failed"**: Sharp may not be installed correctly. Run `pnpm ls sharp` to verify. -- **Agent doesn't mention image content**: Check container logs for "Loaded image" messages. If missing, ensure agent-runner source was synced to group caches. diff --git a/.claude/skills/add-pdf-reader/SKILL.md b/.claude/skills/add-pdf-reader/SKILL.md deleted file mode 100644 index aecc347..0000000 --- a/.claude/skills/add-pdf-reader/SKILL.md +++ /dev/null @@ -1,104 +0,0 @@ ---- -name: add-pdf-reader -description: Add PDF reading to NanoClaw agents. Extracts text from PDFs via pdftotext CLI. Handles WhatsApp attachments, URLs, and local files. ---- - -# Add PDF Reader - -Adds PDF reading capability to all container agents using poppler-utils (pdftotext/pdfinfo). PDFs sent as WhatsApp attachments are auto-downloaded to the group workspace. - -## Phase 1: Pre-flight - -1. Check if `container/skills/pdf-reader/pdf-reader` exists — skip to Phase 3 if already applied -2. Confirm WhatsApp is installed first (`skill/whatsapp` merged). This skill modifies WhatsApp channel files. - -## Phase 2: Apply Code Changes - -### Ensure WhatsApp fork remote - -```bash -git remote -v -``` - -If `whatsapp` is missing, add it: - -```bash -git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git -``` - -### Merge the skill branch - -```bash -git fetch whatsapp skill/pdf-reader -git merge whatsapp/skill/pdf-reader || { - git checkout --theirs pnpm-lock.yaml - git add pnpm-lock.yaml - git merge --continue -} -``` - -This merges in: -- `container/skills/pdf-reader/SKILL.md` (agent-facing documentation) -- `container/skills/pdf-reader/pdf-reader` (CLI script) -- `poppler-utils` in `container/Dockerfile` -- PDF attachment download in `src/channels/whatsapp.ts` -- PDF tests in `src/channels/whatsapp.test.ts` - -If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides. - -### Validate - -```bash -pnpm run build -pnpm exec vitest run src/channels/whatsapp.test.ts -``` - -### Rebuild container - -```bash -./container/build.sh -``` - -### Restart service - -```bash -launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS -# Linux: systemctl --user restart nanoclaw -``` - -## Phase 3: Verify - -### Test PDF extraction - -Send a PDF file in any registered WhatsApp chat. The agent should: -1. Download the PDF to `attachments/` -2. Respond acknowledging the PDF -3. Be able to extract text when asked - -### Test URL fetching - -Ask the agent to read a PDF from a URL. It should use `pdf-reader fetch `. - -### Check logs if needed - -```bash -tail -f logs/nanoclaw.log | grep -i pdf -``` - -Look for: -- `Downloaded PDF attachment` — successful download -- `Failed to download PDF attachment` — media download issue - -## Troubleshooting - -### Agent says pdf-reader command not found - -Container needs rebuilding. Run `./container/build.sh` and restart the service. - -### PDF text extraction is empty - -The PDF may be scanned (image-based). pdftotext only handles text-based PDFs. Consider using the agent-browser to open the PDF visually instead. - -### WhatsApp PDF not detected - -Verify the message has `documentMessage` with `mimetype: application/pdf`. Some file-sharing apps send PDFs as generic files without the correct mimetype. diff --git a/.claude/skills/add-reactions/SKILL.md b/.claude/skills/add-reactions/SKILL.md deleted file mode 100644 index 435bef9..0000000 --- a/.claude/skills/add-reactions/SKILL.md +++ /dev/null @@ -1,117 +0,0 @@ ---- -name: add-reactions -description: Add WhatsApp emoji reaction support — receive, send, store, and search reactions. ---- - -# Add Reactions - -This skill adds emoji reaction support to NanoClaw's WhatsApp channel: receive and store reactions, send reactions from the container agent via MCP tool, and query reaction history from SQLite. - -## Phase 1: Pre-flight - -### Check if already applied - -Check if `src/status-tracker.ts` exists: - -```bash -test -f src/status-tracker.ts && echo "Already applied" || echo "Not applied" -``` - -If already applied, skip to Phase 3 (Verify). - -## Phase 2: Apply Code Changes - -### Ensure WhatsApp fork remote - -```bash -git remote -v -``` - -If `whatsapp` is missing, add it: - -```bash -git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git -``` - -### Merge the skill branch - -```bash -git fetch whatsapp skill/reactions -git merge whatsapp/skill/reactions || { - git checkout --theirs pnpm-lock.yaml - git add pnpm-lock.yaml - git merge --continue -} -``` - -This adds: -- `scripts/migrate-reactions.ts` (database migration for `reactions` table with composite PK and indexes) -- `src/status-tracker.ts` (forward-only emoji state machine for message lifecycle signaling, with persistence and retry) -- `src/status-tracker.test.ts` (unit tests for StatusTracker) -- `container/skills/reactions/SKILL.md` (agent-facing documentation for the `react_to_message` MCP tool) -- Reaction support in `src/db.ts`, `src/channels/whatsapp.ts`, `src/types.ts`, `src/ipc.ts`, `src/index.ts`, `src/group-queue.ts`, and `container/agent-runner/src/ipc-mcp-stdio.ts` - -### Run database migration - -```bash -pnpm exec tsx scripts/migrate-reactions.ts -``` - -### Validate code changes - -```bash -pnpm test -pnpm run build -``` - -All tests must pass and build must be clean before proceeding. - -## Phase 3: Verify - -### Build and restart - -```bash -pnpm run build -``` - -Linux: -```bash -systemctl --user restart nanoclaw -``` - -macOS: -```bash -launchctl kickstart -k gui/$(id -u)/com.nanoclaw -``` - -### Test receiving reactions - -1. Send a message from your phone -2. React to it with an emoji on WhatsApp -3. Check the database: - -```bash -sqlite3 store/messages.db "SELECT * FROM reactions ORDER BY timestamp DESC LIMIT 5;" -``` - -### Test sending reactions - -Ask the agent to react to a message via the `react_to_message` MCP tool. Check your phone — the reaction should appear on the message. - -## Troubleshooting - -### Reactions not appearing in database - -- Check NanoClaw logs for `Failed to process reaction` errors -- Verify the chat is registered -- Confirm the service is running - -### Migration fails - -- Ensure `store/messages.db` exists and is accessible -- If "table reactions already exists", the migration already ran — skip it - -### Agent can't send reactions - -- Check IPC logs for `Unauthorized IPC reaction attempt blocked` — the agent can only react in its own group's chat -- Verify WhatsApp is connected: check logs for connection status diff --git a/.claude/skills/add-telegram-swarm/SKILL.md b/.claude/skills/add-telegram-swarm/SKILL.md deleted file mode 100644 index 8f6a4fc..0000000 --- a/.claude/skills/add-telegram-swarm/SKILL.md +++ /dev/null @@ -1,384 +0,0 @@ ---- -name: add-telegram-swarm -description: Add Agent Swarm (Teams) support to Telegram. Each subagent gets its own bot identity in the group. Requires Telegram channel to be set up first (use /add-telegram). Triggers on "agent swarm", "agent teams telegram", "telegram swarm", "bot pool". ---- - -# Add Agent Swarm to Telegram - -This skill adds Agent Teams (Swarm) support to an existing Telegram channel. Each subagent in a team gets its own bot identity in the Telegram group, so users can visually distinguish which agent is speaking. - -**Prerequisite**: Telegram must already be set up via the `/add-telegram` skill. If `src/telegram.ts` does not exist or `TELEGRAM_BOT_TOKEN` is not configured, tell the user to run `/add-telegram` first. - -## How It Works - -- The **main bot** receives messages and sends lead agent responses (already set up by `/add-telegram`) -- **Pool bots** are send-only — each gets a Grammy `Api` instance (no polling) -- When a subagent calls `send_message` with a `sender` parameter, the host assigns a pool bot and renames it to match the sender's role -- Messages appear in Telegram from different bot identities - -``` -Subagent calls send_message(text: "Found 3 results", sender: "Researcher") - → MCP writes IPC file with sender field - → Host IPC watcher picks it up - → Assigns pool bot #2 to "Researcher" (round-robin, stable per-group) - → Renames pool bot #2 to "Researcher" via setMyName - → Sends message via pool bot #2's Api instance - → Appears in Telegram from "Researcher" bot -``` - -## Prerequisites - -### 1. Create Pool Bots - -Tell the user: - -> I need you to create 3-5 Telegram bots to use as the agent pool. These will be renamed dynamically to match agent roles. -> -> 1. Open Telegram and search for `@BotFather` -> 2. Send `/newbot` for each bot: -> - Give them any placeholder name (e.g., "Bot 1", "Bot 2") -> - Usernames like `myproject_swarm_1_bot`, `myproject_swarm_2_bot`, etc. -> 3. Copy all the tokens -> 4. Add all bots to your Telegram group(s) where you want agent teams - -Wait for user to provide the tokens. - -### 2. Disable Group Privacy for Pool Bots - -Tell the user: - -> **Important**: Each pool bot needs Group Privacy disabled so it can send messages in groups. -> -> For each pool bot in `@BotFather`: -> 1. Send `/mybots` and select the bot -> 2. Go to **Bot Settings** > **Group Privacy** > **Turn off** -> -> Then add all pool bots to your Telegram group(s). - -## Implementation - -### Step 1: Update Configuration - -Read `src/config.ts` and add the bot pool config near the other Telegram exports: - -```typescript -export const TELEGRAM_BOT_POOL = (process.env.TELEGRAM_BOT_POOL || '') - .split(',') - .map((t) => t.trim()) - .filter(Boolean); -``` - -### Step 2: Add Bot Pool to Telegram Module - -Read `src/telegram.ts` and add the following: - -1. **Update imports** — add `Api` to the Grammy import: - -```typescript -import { Api, Bot } from 'grammy'; -``` - -2. **Add pool state** after the existing `let bot` declaration: - -```typescript -// Bot pool for agent teams: send-only Api instances (no polling) -const poolApis: Api[] = []; -// Maps "{groupFolder}:{senderName}" → pool Api index for stable assignment -const senderBotMap = new Map(); -let nextPoolIndex = 0; -``` - -3. **Add pool functions** — place these before the `isTelegramConnected` function: - -```typescript -/** - * Initialize send-only Api instances for the bot pool. - * Each pool bot can send messages but doesn't poll for updates. - */ -export async function initBotPool(tokens: string[]): Promise { - for (const token of tokens) { - try { - const api = new Api(token); - const me = await api.getMe(); - poolApis.push(api); - logger.info( - { username: me.username, id: me.id, poolSize: poolApis.length }, - 'Pool bot initialized', - ); - } catch (err) { - logger.error({ err }, 'Failed to initialize pool bot'); - } - } - if (poolApis.length > 0) { - logger.info({ count: poolApis.length }, 'Telegram bot pool ready'); - } -} - -/** - * Send a message via a pool bot assigned to the given sender name. - * Assigns bots round-robin on first use; subsequent messages from the - * same sender in the same group always use the same bot. - * On first assignment, renames the bot to match the sender's role. - */ -export async function sendPoolMessage( - chatId: string, - text: string, - sender: string, - groupFolder: string, -): Promise { - if (poolApis.length === 0) { - // No pool bots — fall back to main bot - await sendTelegramMessage(chatId, text); - return; - } - - const key = `${groupFolder}:${sender}`; - let idx = senderBotMap.get(key); - if (idx === undefined) { - idx = nextPoolIndex % poolApis.length; - nextPoolIndex++; - senderBotMap.set(key, idx); - // Rename the bot to match the sender's role, then wait for Telegram to propagate - try { - await poolApis[idx].setMyName(sender); - await new Promise((r) => setTimeout(r, 2000)); - logger.info({ sender, groupFolder, poolIndex: idx }, 'Assigned and renamed pool bot'); - } catch (err) { - logger.warn({ sender, err }, 'Failed to rename pool bot (sending anyway)'); - } - } - - const api = poolApis[idx]; - try { - const numericId = chatId.replace(/^tg:/, ''); - const MAX_LENGTH = 4096; - if (text.length <= MAX_LENGTH) { - await api.sendMessage(numericId, text); - } else { - for (let i = 0; i < text.length; i += MAX_LENGTH) { - await api.sendMessage(numericId, text.slice(i, i + MAX_LENGTH)); - } - } - logger.info({ chatId, sender, poolIndex: idx, length: text.length }, 'Pool message sent'); - } catch (err) { - logger.error({ chatId, sender, err }, 'Failed to send pool message'); - } -} -``` - -### Step 3: Add sender Parameter to MCP Tool - -Read `container/agent-runner/src/ipc-mcp-stdio.ts` and update the `send_message` tool to accept an optional `sender` parameter: - -Change the tool's schema from: -```typescript -{ text: z.string().describe('The message text to send') }, -``` - -To: -```typescript -{ - text: z.string().describe('The message text to send'), - sender: z.string().optional().describe('Your role/identity name (e.g. "Researcher"). When set, messages appear from a dedicated bot in Telegram.'), -}, -``` - -And update the handler to include `sender` in the IPC data: - -```typescript -async (args) => { - const data: Record = { - type: 'message', - chatJid, - text: args.text, - sender: args.sender || undefined, - groupFolder, - timestamp: new Date().toISOString(), - }; - - writeIpcFile(MESSAGES_DIR, data); - - return { content: [{ type: 'text' as const, text: 'Message sent.' }] }; - }, -``` - -### Step 4: Update Host IPC Routing - -Read `src/ipc.ts` and make these changes: - -1. **Add imports** — add `sendPoolMessage` and `initBotPool` from the Telegram swarm module, and `TELEGRAM_BOT_POOL` from config. - -2. **Update IPC message routing** — in `src/ipc.ts`, find where the `sendMessage` dependency is called to deliver IPC messages (inside `processIpcFiles`). The `sendMessage` is passed in via the `IpcDeps` parameter. Wrap it to route Telegram swarm messages through the bot pool: - -```typescript -if (data.sender && data.chatJid.startsWith('tg:')) { - await sendPoolMessage( - data.chatJid, - data.text, - data.sender, - sourceGroup, - ); -} else { - await deps.sendMessage(data.chatJid, data.text); -} -``` - -Note: The assistant name prefix is handled by `formatOutbound()` in the router — Telegram channels have `prefixAssistantName = false` so no prefix is added for `tg:` JIDs. - -3. **Initialize pool in `main()` in `src/index.ts`** — after creating the Telegram channel, add: - -```typescript -if (TELEGRAM_BOT_POOL.length > 0) { - await initBotPool(TELEGRAM_BOT_POOL); -} -``` - -### Step 5: Update CLAUDE.md Files - -#### 5a. Add global message formatting rules - -Read `groups/global/CLAUDE.md` and add a Message Formatting section: - -```markdown -## Message Formatting - -NEVER use markdown. Only use WhatsApp/Telegram formatting: -- *single asterisks* for bold (NEVER **double asterisks**) -- _underscores_ for italic -- • bullet points -- ```triple backticks``` for code - -No ## headings. No [links](url). No **double stars**. -``` - -#### 5b. Update existing group CLAUDE.md headings - -In any group CLAUDE.md that has a "WhatsApp Formatting" section (e.g. `groups/main/CLAUDE.md`), rename the heading to reflect multi-channel support: - -``` -## WhatsApp Formatting (and other messaging apps) -``` - -#### 5c. Add Agent Teams instructions to Telegram groups - -For each Telegram group that will use agent teams, create or update its `groups/{folder}/CLAUDE.md` with these instructions. Read the existing CLAUDE.md first (or `groups/global/CLAUDE.md` as a base) and add the Agent Teams section: - -```markdown -## Agent Teams - -When creating a team to tackle a complex task, follow these rules: - -### CRITICAL: Follow the user's prompt exactly - -Create *exactly* the team the user asked for — same number of agents, same roles, same names. Do NOT add extra agents, rename roles, or use generic names like "Researcher 1". If the user says "a marine biologist, a physicist, and Alexander Hamilton", create exactly those three agents with those exact names. - -### Team member instructions - -Each team member MUST be instructed to: - -1. *Share progress in the group* via `mcp__nanoclaw__send_message` with a `sender` parameter matching their exact role/character name (e.g., `sender: "Marine Biologist"` or `sender: "Alexander Hamilton"`). This makes their messages appear from a dedicated bot in the Telegram group. -2. *Also communicate with teammates* via `SendMessage` as normal for coordination. -3. Keep group messages *short* — 2-4 sentences max per message. Break longer content into multiple `send_message` calls. No walls of text. -4. Use the `sender` parameter consistently — always the same name so the bot identity stays stable. -5. NEVER use markdown formatting. Use ONLY WhatsApp/Telegram formatting: single *asterisks* for bold (NOT **double**), _underscores_ for italic, • for bullets, ```backticks``` for code. No ## headings, no [links](url), no **double asterisks**. - -### Example team creation prompt - -When creating a teammate, include instructions like: - -\``` -You are the Marine Biologist. When you have findings or updates for the user, send them to the group using mcp__nanoclaw__send_message with sender set to "Marine Biologist". Keep each message short (2-4 sentences max). Use emojis for strong reactions. ONLY use single *asterisks* for bold (never **double**), _underscores_ for italic, • for bullets. No markdown. Also communicate with teammates via SendMessage. -\``` - -### Lead agent behavior - -As the lead agent who created the team: - -- You do NOT need to react to or relay every teammate message. The user sees those directly from the teammate bots. -- Send your own messages only to comment, share thoughts, synthesize, or direct the team. -- When processing an internal update from a teammate that doesn't need a user-facing response, wrap your *entire* output in `` tags. -- Focus on high-level coordination and the final synthesis. -``` - -### Step 6: Update Environment - -Add pool tokens to `.env`: - -```bash -TELEGRAM_BOT_POOL=TOKEN1,TOKEN2,TOKEN3,... -``` - -**Important**: Sync to all required locations: - -```bash -cp .env data/env/env -``` - -Also add `TELEGRAM_BOT_POOL` to the launchd plist (`~/Library/LaunchAgents/com.nanoclaw.plist`) in the `EnvironmentVariables` dict if using launchd. - -### Step 7: Rebuild and Restart - -```bash -pnpm run build -./container/build.sh # Required — MCP tool changed -# macOS: -launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist -launchctl load ~/Library/LaunchAgents/com.nanoclaw.plist -# Linux: -# systemctl --user restart nanoclaw -``` - -Must use `unload/load` (macOS) or `restart` (Linux) because the service env vars changed. - -### Step 8: Test - -Tell the user: - -> Send a message in your Telegram group asking for a multi-agent task, e.g.: -> "Assemble a team of a researcher and a coder to build me a hello world app" -> -> You should see: -> - The lead agent (main bot) acknowledging and creating the team -> - Each subagent messaging from a different bot, renamed to their role -> - Short, scannable messages from each agent -> -> Check logs: `tail -f logs/nanoclaw.log | grep -i pool` - -## Architecture Notes - -- Pool bots use Grammy's `Api` class — lightweight, no polling, just send -- Bot names are set via `setMyName` — changes are global to the bot, not per-chat -- A 2-second delay after `setMyName` allows Telegram to propagate the name change before the first message -- Sender→bot mapping is stable within a group (keyed as `{groupFolder}:{senderName}`) -- Mapping resets on service restart — pool bots get reassigned fresh -- If pool runs out, bots are reused (round-robin wraps) - -## Troubleshooting - -### Pool bots not sending messages - -1. Verify tokens: `curl -s "https://api.telegram.org/botTOKEN/getMe"` -2. Check pool initialized: `grep "Pool bot" logs/nanoclaw.log` -3. Ensure all pool bots are members of the Telegram group -4. Check Group Privacy is disabled for each pool bot - -### Bot names not updating - -Telegram caches bot names client-side. The 2-second delay after `setMyName` helps, but users may need to restart their Telegram client to see updated names immediately. - -### Subagents not using send_message - -Check the group's `CLAUDE.md` has the Agent Teams instructions. The lead agent reads this when creating teammates and must include the `send_message` + `sender` instructions in each teammate's prompt. - -## Removal - -To remove Agent Swarm support while keeping basic Telegram: - -1. Remove `TELEGRAM_BOT_POOL` from `src/config.ts` -2. Remove pool code from `src/telegram.ts` (`poolApis`, `senderBotMap`, `initBotPool`, `sendPoolMessage`) -3. Remove pool routing from IPC handler in `src/index.ts` (revert to plain `sendMessage`) -4. Remove `initBotPool` call from `main()` -5. Remove `sender` param from MCP tool in `container/agent-runner/src/ipc-mcp-stdio.ts` -6. Remove Agent Teams section from group CLAUDE.md files -7. Remove `TELEGRAM_BOT_POOL` from `.env`, `data/env/env`, and launchd plist/systemd unit -8. Rebuild: `pnpm run build && ./container/build.sh && launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist && launchctl load ~/Library/LaunchAgents/com.nanoclaw.plist` (macOS) or `pnpm run build && ./container/build.sh && systemctl --user restart nanoclaw` (Linux) diff --git a/.claude/skills/add-voice-transcription/SKILL.md b/.claude/skills/add-voice-transcription/SKILL.md deleted file mode 100644 index cae1e47..0000000 --- a/.claude/skills/add-voice-transcription/SKILL.md +++ /dev/null @@ -1,148 +0,0 @@ ---- -name: add-voice-transcription -description: Add voice message transcription to NanoClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them. ---- - -# Add Voice Transcription - -This skill adds automatic voice message transcription to NanoClaw's WhatsApp channel using OpenAI's Whisper API. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as `[Voice: ]`. - -## Phase 1: Pre-flight - -### Check if already applied - -Check if `src/transcription.ts` exists. If it does, skip to Phase 3 (Configure). The code changes are already in place. - -### Ask the user - -Use `AskUserQuestion` to collect information: - -AskUserQuestion: Do you have an OpenAI API key for Whisper transcription? - -If yes, collect it now. If no, direct them to create one at https://platform.openai.com/api-keys. - -## Phase 2: Apply Code Changes - -**Prerequisite:** WhatsApp must be installed first (`skill/whatsapp` merged). This skill modifies WhatsApp channel files. - -### Ensure WhatsApp fork remote - -```bash -git remote -v -``` - -If `whatsapp` is missing, add it: - -```bash -git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git -``` - -### Merge the skill branch - -```bash -git fetch whatsapp skill/voice-transcription -git merge whatsapp/skill/voice-transcription || { - git checkout --theirs pnpm-lock.yaml - git add pnpm-lock.yaml - git merge --continue -} -``` - -This merges in: -- `src/transcription.ts` (voice transcription module using OpenAI Whisper) -- Voice handling in `src/channels/whatsapp.ts` (isVoiceMessage check, transcribeAudioMessage call) -- Transcription tests in `src/channels/whatsapp.test.ts` -- `openai` npm dependency in `package.json` -- `OPENAI_API_KEY` in `.env.example` - -If the merge reports conflicts, resolve them by reading the conflicted files and understanding the intent of both sides. - -### Validate code changes - -```bash -pnpm install -pnpm run build -pnpm exec vitest run src/channels/whatsapp.test.ts -``` - -All tests must pass and build must be clean before proceeding. - -## Phase 3: Configure - -### Get OpenAI API key (if needed) - -If the user doesn't have an API key: - -> I need you to create an OpenAI API key: -> -> 1. Go to https://platform.openai.com/api-keys -> 2. Click "Create new secret key" -> 3. Give it a name (e.g., "NanoClaw Transcription") -> 4. Copy the key (starts with `sk-`) -> -> Cost: ~$0.006 per minute of audio (~$0.003 per typical 30-second voice note) - -Wait for the user to provide the key. - -### Add to environment - -Add to `.env`: - -```bash -OPENAI_API_KEY= -``` - -Sync to container environment: - -```bash -mkdir -p data/env && cp .env data/env/env -``` - -The container reads environment from `data/env/env`, not `.env` directly. - -### Build and restart - -```bash -pnpm run build -launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS -# Linux: systemctl --user restart nanoclaw -``` - -## Phase 4: Verify - -### Test with a voice note - -Tell the user: - -> Send a voice note in any registered WhatsApp chat. The agent should receive it as `[Voice: ]` and respond to its content. - -### Check logs if needed - -```bash -tail -f logs/nanoclaw.log | grep -i voice -``` - -Look for: -- `Transcribed voice message` — successful transcription with character count -- `OPENAI_API_KEY not set` — key missing from `.env` -- `OpenAI transcription failed` — API error (check key validity, billing) -- `Failed to download audio message` — media download issue - -## Troubleshooting - -### Voice notes show "[Voice Message - transcription unavailable]" - -1. Check `OPENAI_API_KEY` is set in `.env` AND synced to `data/env/env` -2. Verify key works: `curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200` -3. Check OpenAI billing — Whisper requires a funded account - -### Voice notes show "[Voice Message - transcription failed]" - -Check logs for the specific error. Common causes: -- Network timeout — transient, will work on next message -- Invalid API key — regenerate at https://platform.openai.com/api-keys -- Rate limiting — wait and retry - -### Agent doesn't respond to voice notes - -Verify the chat is registered and the agent is running. Voice transcription only runs for registered groups. diff --git a/.claude/skills/channel-formatting/SKILL.md b/.claude/skills/channel-formatting/SKILL.md deleted file mode 100644 index 8d27ffc..0000000 --- a/.claude/skills/channel-formatting/SKILL.md +++ /dev/null @@ -1,137 +0,0 @@ ---- -name: channel-formatting -description: Convert Claude's Markdown output to each channel's native text syntax before delivery. Adds zero-dependency formatting for WhatsApp, Telegram, and Slack (marker substitution). Also ships a Signal rich-text helper (parseSignalStyles) used by the Signal skill. ---- - -# Channel Formatting - -This skill wires channel-aware Markdown conversion into the outbound pipeline so Claude's -responses render natively on each platform — no more literal `**asterisks**` in WhatsApp or -Telegram. - -| Channel | Transformation | -|---------|---------------| -| WhatsApp | `**bold**` → `*bold*`, `*italic*` → `_italic_`, headings → bold, links → `text (url)` | -| Telegram | same as WhatsApp, but `[text](url)` links are preserved (Markdown v1 renders them natively) | -| Slack | same as WhatsApp, but links become `` | -| Discord | passthrough (Discord already renders Markdown) | -| Signal | passthrough for `parseTextStyles`; `parseSignalStyles` in `src/text-styles.ts` produces plain text + native `textStyle` ranges for use by the Signal skill | - -Code blocks (fenced and inline) are always protected — their content is never transformed. - -## Phase 1: Pre-flight - -### Check if already applied - -```bash -test -f src/text-styles.ts && echo "already applied" || echo "not yet applied" -``` - -If `already applied`, skip to Phase 3 (Verify). - -## Phase 2: Apply Code Changes - -### Ensure the upstream remote - -```bash -git remote -v -``` - -If an `upstream` remote pointing to `https://github.com/qwibitai/nanoclaw.git` is missing, -add it: - -```bash -git remote add upstream https://github.com/qwibitai/nanoclaw.git -``` - -### Merge the skill branch - -```bash -git fetch upstream skill/channel-formatting -git merge upstream/skill/channel-formatting -``` - -If there are merge conflicts on `pnpm-lock.yaml`, resolve them by accepting the incoming -version and continuing: - -```bash -git checkout --theirs pnpm-lock.yaml -git add pnpm-lock.yaml -git merge --continue -``` - -For any other conflict, read the conflicted file and reconcile both sides manually. - -This merge adds: - -- `src/text-styles.ts` — `parseTextStyles(text, channel)` for marker substitution and - `parseSignalStyles(text)` for Signal native rich text -- `src/router.ts` — `formatOutbound` gains an optional `channel` parameter; when provided - it calls `parseTextStyles` after stripping `` tags -- `src/index.ts` — both outbound `sendMessage` paths pass `channel.name` to `formatOutbound` -- `src/formatting.test.ts` — test coverage for both functions across all channels - -### Validate - -```bash -pnpm install -pnpm run build -pnpm exec vitest run src/formatting.test.ts -``` - -All 73 tests should pass and the build should be clean before continuing. - -## Phase 3: Verify - -### Rebuild and restart - -```bash -pnpm run build -launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS -# Linux: systemctl --user restart nanoclaw -``` - -### Spot-check formatting - -Send a message through any registered WhatsApp or Telegram chat that will trigger a -response from Claude. Ask something that will produce formatted output, such as: - -> Summarise the three main advantages of TypeScript using bullet points and **bold** headings. - -Confirm that the response arrives with native bold (`*text*`) rather than raw double -asterisks. - -### Check logs if needed - -```bash -tail -f logs/nanoclaw.log -``` - -## Signal Skill Integration - -If you have the Signal skill installed, `src/channels/signal.ts` can import -`parseSignalStyles` from the newly present `src/text-styles.ts`: - -```typescript -import { parseSignalStyles, SignalTextStyle } from '../text-styles.js'; -``` - -`parseSignalStyles` returns `{ text: string, textStyle: SignalTextStyle[] }` where -`textStyle` is an array of `{ style, start, length }` objects suitable for the -`signal-cli` JSON-RPC `textStyles` parameter (format: `"start:length:STYLE"`). - -## Removal - -```bash -# Remove the new file -rm src/text-styles.ts - -# Revert router.ts to remove the channel param -git diff upstream/main src/router.ts # review changes -git checkout upstream/main -- src/router.ts - -# Revert the index.ts sendMessage call sites to plain formatOutbound(rawText) -# (edit manually or: git checkout upstream/main -- src/index.ts) - -pnpm run build -``` \ No newline at end of file diff --git a/.claude/skills/use-local-whisper/SKILL.md b/.claude/skills/use-local-whisper/SKILL.md deleted file mode 100644 index 664cafa..0000000 --- a/.claude/skills/use-local-whisper/SKILL.md +++ /dev/null @@ -1,152 +0,0 @@ ---- -name: use-local-whisper -description: Use when the user wants local voice transcription instead of OpenAI Whisper API. Switches to whisper.cpp running on Apple Silicon. WhatsApp only for now. Requires voice-transcription skill to be applied first. ---- - -# Use Local Whisper - -Switches voice transcription from OpenAI's Whisper API to local whisper.cpp. Runs entirely on-device — no API key, no network, no cost. - -**Channel support:** Currently WhatsApp only. The transcription module (`src/transcription.ts`) uses Baileys types for audio download. Other channels (Telegram, Discord, etc.) would need their own audio-download logic before this skill can serve them. - -**Note:** The Homebrew package is `whisper-cpp`, but the CLI binary it installs is `whisper-cli`. - -## Prerequisites - -- `voice-transcription` skill must be applied first (WhatsApp channel) -- macOS with Apple Silicon (M1+) recommended -- `whisper-cpp` installed: `brew install whisper-cpp` (provides the `whisper-cli` binary) -- `ffmpeg` installed: `brew install ffmpeg` -- A GGML model file downloaded to `data/models/` - -## Phase 1: Pre-flight - -### Check if already applied - -Check if `src/transcription.ts` already uses `whisper-cli`: - -```bash -grep 'whisper-cli' src/transcription.ts && echo "Already applied" || echo "Not applied" -``` - -If already applied, skip to Phase 3 (Verify). - -### Check dependencies are installed - -```bash -whisper-cli --help >/dev/null 2>&1 && echo "WHISPER_OK" || echo "WHISPER_MISSING" -ffmpeg -version >/dev/null 2>&1 && echo "FFMPEG_OK" || echo "FFMPEG_MISSING" -``` - -If missing, install via Homebrew: -```bash -brew install whisper-cpp ffmpeg -``` - -### Check for model file - -```bash -ls data/models/ggml-*.bin 2>/dev/null || echo "NO_MODEL" -``` - -If no model exists, download the base model (148MB, good balance of speed and accuracy): -```bash -mkdir -p data/models -curl -L -o data/models/ggml-base.bin "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin" -``` - -For better accuracy at the cost of speed, use `ggml-small.bin` (466MB) or `ggml-medium.bin` (1.5GB). - -## Phase 2: Apply Code Changes - -### Ensure WhatsApp fork remote - -```bash -git remote -v -``` - -If `whatsapp` is missing, add it: - -```bash -git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git -``` - -### Merge the skill branch - -```bash -git fetch whatsapp skill/local-whisper -git merge whatsapp/skill/local-whisper || { - git checkout --theirs pnpm-lock.yaml - git add pnpm-lock.yaml - git merge --continue -} -``` - -This modifies `src/transcription.ts` to use the `whisper-cli` binary instead of the OpenAI API. - -### Validate - -```bash -pnpm run build -``` - -## Phase 3: Verify - -### Ensure launchd PATH includes Homebrew - -The NanoClaw launchd service runs with a restricted PATH. `whisper-cli` and `ffmpeg` are in `/opt/homebrew/bin/` (Apple Silicon) or `/usr/local/bin/` (Intel), which may not be in the plist's PATH. - -Check the current PATH: -```bash -grep -A1 'PATH' ~/Library/LaunchAgents/com.nanoclaw.plist -``` - -If `/opt/homebrew/bin` is missing, add it to the `` value inside the `PATH` key in the plist. Then reload: -```bash -launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist -launchctl load ~/Library/LaunchAgents/com.nanoclaw.plist -``` - -### Build and restart - -```bash -pnpm run build -launchctl kickstart -k gui/$(id -u)/com.nanoclaw -``` - -### Test - -Send a voice note in any registered group. The agent should receive it as `[Voice: ]`. - -### Check logs - -```bash -tail -f logs/nanoclaw.log | grep -i -E "voice|transcri|whisper" -``` - -Look for: -- `Transcribed voice message` — successful transcription -- `whisper.cpp transcription failed` — check model path, ffmpeg, or PATH - -## Configuration - -Environment variables (optional, set in `.env`): - -| Variable | Default | Description | -|----------|---------|-------------| -| `WHISPER_BIN` | `whisper-cli` | Path to whisper.cpp binary | -| `WHISPER_MODEL` | `data/models/ggml-base.bin` | Path to GGML model file | - -## Troubleshooting - -**"whisper.cpp transcription failed"**: Ensure both `whisper-cli` and `ffmpeg` are in PATH. The launchd service uses a restricted PATH — see Phase 3 above. Test manually: -```bash -ffmpeg -f lavfi -i anullsrc=r=16000:cl=mono -t 1 -f wav /tmp/test.wav -y -whisper-cli -m data/models/ggml-base.bin -f /tmp/test.wav --no-timestamps -nt -``` - -**Transcription works in dev but not as service**: The launchd plist PATH likely doesn't include `/opt/homebrew/bin`. See "Ensure launchd PATH includes Homebrew" in Phase 3. - -**Slow transcription**: The base model processes ~30s of audio in <1s on M1+. If slower, check CPU usage — another process may be competing. - -**Wrong language**: whisper.cpp auto-detects language. To force a language, you can set `WHISPER_LANG` and modify `src/transcription.ts` to pass `-l $WHISPER_LANG`.