From 54bf4543f27f7d2152e4e5eaacc82788682e6d9d Mon Sep 17 00:00:00 2001 From: gavrielc Date: Sun, 5 Apr 2026 10:07:48 +0300 Subject: [PATCH] refactor: rework wiki skill to use Karpathy's original text as reference Remove pre-written container skill. Instead, include llm-wiki.md (Karpathy's gist) as the reference material and have the setup skill guide the user through collaboratively building their own wiki schema, container skill, and directory structure based on the pattern. Add NanoClaw-specific notes: image vision, PDF reader, voice transcription, curl for full document fetch, file attachment handling. Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/skills/add-wiki/SKILL.md | 161 ++++++++++++---------------- .claude/skills/add-wiki/llm-wiki.md | 75 +++++++++++++ container/skills/wiki/SKILL.md | 67 ------------ 3 files changed, 144 insertions(+), 159 deletions(-) create mode 100644 .claude/skills/add-wiki/llm-wiki.md delete mode 100644 container/skills/wiki/SKILL.md diff --git a/.claude/skills/add-wiki/SKILL.md b/.claude/skills/add-wiki/SKILL.md index 4c6a20b..afaa678 100644 --- a/.claude/skills/add-wiki/SKILL.md +++ b/.claude/skills/add-wiki/SKILL.md @@ -1,114 +1,91 @@ --- name: add-wiki -description: Add a persistent wiki knowledge base to a NanoClaw group. The agent ingests sources (URLs, files, attachments), builds interlinked wiki pages, answers questions from accumulated knowledge, and runs periodic health checks. Based on the LLM Wiki pattern. Triggers on "add wiki", "wiki", "knowledge base", "llm wiki". +description: Add a persistent wiki knowledge base to a NanoClaw group. Based on Karpathy's LLM Wiki pattern. Triggers on "add wiki", "wiki", "knowledge base", "llm wiki". --- # Add Wiki -Adds a persistent wiki knowledge base to a NanoClaw group. The agent builds and maintains structured, interlinked markdown pages from sources you provide. Knowledge compounds over time rather than being re-derived on every question. +Set up a persistent wiki knowledge base on NanoClaw, based on Karpathy's LLM Wiki pattern. -Based on the [LLM Wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f). +## Step 1: Read the pattern -## Phase 1: Pre-flight +Read `${CLAUDE_SKILL_DIR}/llm-wiki.md` — this is the full LLM Wiki idea as written by Karpathy. Understand it thoroughly before proceeding. Summarize the core idea to the user briefly, then discuss what they want to build. -Check if `container/skills/wiki/SKILL.md` exists. If it does, skip to Phase 3. +## Step 2: Choose a group -## Phase 2: Apply Code Changes +AskUserQuestion: "Which group should have the wiki?" + +1. **Main group** — add to your existing main chat +2. **Dedicated group** — create a new group just for the wiki +3. **Other** — pick an existing group + +If dedicated: ask which channel and chat, then register with `npx tsx setup/index.ts --step register`. + +## Step 3: Design collaboratively + +Discuss with the user based on the pattern: +- What's the wiki's domain or topic? +- What kinds of sources will they add? (URLs, PDFs, images, voice notes, books, transcripts) +- Do they want the full three-layer architecture or a lighter version? +- Any specific conventions they care about? (The pattern intentionally leaves this open.) + +Based on this discussion, create three things: + +### 3a. Directory structure + +Create `wiki/` and `sources/` directories in the group folder. Create initial `index.md` and `log.md` per the pattern's Indexing and Logging section. Adapt to the user's domain. + +### 3b. Container skill + +Create a `container/skills/wiki/SKILL.md` tailored to this user's wiki. This is the schema layer from the pattern — it tells the agent how to maintain the wiki. Base it on the pattern's Operations section (ingest, query, lint) and the conventions you agreed on with the user. Don't over-prescribe — the pattern says "your LLM figures out the rest." + +### 3c. Group CLAUDE.md + +Add a wiki section to the group's CLAUDE.md that activates the wiki behavior and points to the container skill. + +## Step 4: Source handling skills + +Check which source-handling capabilities are installed and offer to add missing ones based on what the user plans to ingest: + +| Source type | Skill needed | Check | +|---|---|---| +| Images | `/add-image-vision` | `src/channels/image-vision.ts` or similar exists | +| PDFs | `/add-pdf-reader` | `container/skills/pdf-reader/` exists | +| Voice notes | `/add-voice-transcription` | `container/skills/voice-transcription/` exists | + +For each missing skill the user needs, invoke it. + +### URL handling note + +The agent has built-in `WebFetch`, but it returns a summary, not the full document. For wiki ingestion where the full text matters, the container skill should instruct the agent to use `curl` piped through an HTML-to-text conversion instead: ```bash -git fetch origin skill/wiki -git merge origin/skill/wiki +curl -sL "" | sed 's/<[^>]*>//g' ``` -If merge conflicts, resolve them. Then: +Or better, use `agent-browser` to open the page and extract full text if available. The container skill should note this so the agent gets full content for sources rather than summaries. + +### File attachments + +If the user's channel supports file attachments (WhatsApp documents, Telegram files, Slack uploads), these arrive in the container's workspace. The container skill should note that attached files can be read directly and saved to `sources/`. + +## Step 5: Optional lint schedule + +AskUserQuestion: "Want periodic wiki health checks?" + +1. **Weekly** +2. **Monthly** +3. **Skip** — lint manually + +If yes, schedule via `mcp__nanoclaw__schedule_task` with a prompt based on the pattern's Lint operation. + +## Step 6: Build and restart ```bash npm run build ./container/build.sh -``` - -## Phase 3: Setup - -### Choose target group - -AskUserQuestion: "Which group should have the wiki?" - -1. **Main group** — add wiki to your existing main chat -2. **Dedicated wiki group** — create a new group just for the wiki (recommended for focused research) -3. **Other** — pick an existing group - -If dedicated: ask which channel and chat to use, then register with `npx tsx setup/index.ts --step register`. - -### Wiki topic - -Ask the user: "What's this wiki for?" (e.g. AI research, health tracking, competitive analysis, trip planning, book companion, general knowledge base) - -This shapes the initial index categories and the CLAUDE.md additions. - -### Create directory structure - -In the target group folder: - -```bash -mkdir -p groups//wiki groups//sources -``` - -Create initial `wiki/index.md`: - -```markdown -# Index - -_Last updated: _ - -(Pages will appear here as sources are added.) -``` - -Create initial `wiki/log.md`: - -```markdown -# Log - -## [] setup | Wiki initialized -Wiki created. Topic: . -``` - -### Update group CLAUDE.md - -Add a wiki section to the group's CLAUDE.md. Keep it brief — the container skill has the full workflow: - -```markdown -## Wiki - -You maintain a persistent wiki on . When sources arrive (URLs, files, attachments), ingest them into the wiki — don't just answer and move on. The `/wiki` container skill has the full ingest/query/lint workflow. - -- Wiki pages: `wiki/` (start with `wiki/index.md`) -- Raw sources: `sources/` (immutable — never modify) -``` - -### Optional: Schedule lint - -AskUserQuestion: "Want periodic wiki health checks?" - -1. **Weekly** — every Sunday at 10am -2. **Monthly** — first of each month -3. **Skip** — lint manually when needed - -If yes, use `mcp__nanoclaw__schedule_task`: -- prompt: "Run a wiki lint. Check for contradictions, orphan pages, stale content, missing cross-references, and gaps. Report findings." -- schedule_type: "cron" -- schedule_value: `"0 10 * * 0"` (weekly) or `"0 10 1 * *"` (monthly) - -### Optional: Obsidian - -If the user uses Obsidian, mention they can point a vault at `groups//` for graph view, backlinks, and visual browsing. The wiki is just markdown files on disk. - -## Phase 4: Verify - -Restart the service to pick up the new container skill: - -```bash launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS # Linux: systemctl --user restart nanoclaw ``` -Tell the user to test: send a URL to the wiki group. The agent should ingest it, create wiki pages, and update the index. +Tell the user to test by sending a source to the wiki group. diff --git a/.claude/skills/add-wiki/llm-wiki.md b/.claude/skills/add-wiki/llm-wiki.md new file mode 100644 index 0000000..829d21c --- /dev/null +++ b/.claude/skills/add-wiki/llm-wiki.md @@ -0,0 +1,75 @@ +# LLM Wiki + +> Source: [karpathy/llm-wiki.md](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) + +A pattern for building personal knowledge bases using LLMs. + +This is an idea file, designed to be copied to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, etc.). Its goal is to communicate the high-level idea, with your agent building out specifics through collaboration with you. + +## The Core Idea + +Most interactions with LLMs and documents follow RAG patterns: upload files, retrieve relevant chunks at query time, generate answers. The knowledge is re-derived on each question with no accumulation. + +The concept here differs fundamentally. Rather than just retrieving from raw documents, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked markdown collection sitting between you and raw sources. When adding new material, the LLM reads it, extracts key information, and integrates it into existing wiki pages—updating entities, revising summaries, flagging contradictions, strengthening synthesis. Knowledge compiles once and stays current rather than re-deriving on every query. + +The wiki becomes a persistent, compounding artifact. Cross-references already exist. Contradictions are flagged. Synthesis reflects everything read. The wiki enriches with every source added and question asked. + +You source material and ask questions; the LLM maintains everything—summarizing, cross-referencing, filing, and organizing. The LLM acts as programmer; Obsidian serves as IDE; the wiki functions as codebase. + +**Applications include:** +- Personal: tracking goals, health, self-improvement +- Research: deep dives over weeks/months +- Reading: building companion wikis while progressing through books +- Business/teams: internal wikis fed by Slack, transcripts, documents +- Analysis: competitive research, due diligence, trip planning, hobby deep-dives + +## Architecture + +Three layers comprise the system: + +**Raw sources** — immutable curated documents (articles, papers, images, data). The LLM reads but never modifies these. + +**The wiki** — LLM-generated markdown directories containing summaries, entity pages, concept pages, comparisons, syntheses. The LLM owns this entirely, creating and updating pages while maintaining cross-references and consistency. + +**The schema** — configuration document (e.g., CLAUDE.md) explaining wiki structure, conventions, and workflows for ingestion, querying, and maintenance. This key file transforms the LLM into disciplined wiki maintainer rather than generic chatbot. + +## Operations + +**Ingest:** Drop new sources into the raw collection; the LLM processes them. The agent reads sources, discusses takeaways, writes summaries, updates indexes, refreshes entity and concept pages, logs entries. Single sources might touch 10-15 wiki pages. Prefer ingesting individually while staying involved, though batch ingestion with less oversight is possible. + +**Query:** Ask questions against the wiki. The LLM searches relevant pages, synthesizes answers with citations. Answers take various forms—markdown pages, comparison tables, slide decks, charts, canvas. Good answers can be filed back into the wiki as new pages—explorations compound in the knowledge base rather than disappearing into chat history. + +**Lint:** Periodically health-check the wiki. Look for contradictions, stale claims superseded by newer sources, orphan pages lacking inbound links, important concepts lacking dedicated pages, missing cross-references, data gaps. The LLM suggests investigations and sources to pursue, keeping the wiki healthy as it grows. + +## Indexing and Logging + +Two special files help navigate the growing wiki: + +**index.md** — content-oriented catalog of everything (each page with link, one-line summary, optional metadata like dates or source counts), organized by category. The LLM updates it on every ingest. When answering queries, read the index first to locate relevant pages before drilling deeper. This approach works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) while avoiding embedding-based RAG infrastructure needs. + +**log.md** — append-only chronological record of what happened and when (ingests, queries, lint passes). Each entry beginning with consistent prefix (e.g., `## [2026-04-02] ingest | Article Title`) becomes parseable with simple tools—`grep "^## \[" log.md | tail -5` yields last 5 entries. The log shows wiki evolution timeline and helps the LLM understand recent activity. + +## Optional: CLI Tools + +At scale, small tools help the LLM operate more efficiently. Search engine over wiki pages is most obvious—at small scale the index suffices, but as the wiki grows, proper search becomes necessary. qmd (https://github.com/tobi/qmd) offers local search with hybrid BM25/vector search and LLM re-ranking, entirely on-device. It includes both CLI (so LLMs can shell out) and MCP server (native tool integration). Build simpler custom search scripts as needs arise. + +## Tips and Tricks + +- **Obsidian Web Clipper** converts web articles to markdown for quick source collection +- **Download images locally:** Set attachment folder in Obsidian Settings, bind download hotkey. All images store locally; LLM views and references directly instead of relying on potentially broken URLs +- **Obsidian's graph view** visualizes wiki connectivity—what connects to what, hub pages, orphans +- **Marp** provides markdown-based slide deck format with Obsidian plugin integration +- **Dataview** plugin queries page frontmatter, generating dynamic tables/lists when LLM adds YAML frontmatter +- The wiki is simply a git-backed markdown directory—version history, branching, collaboration included + +## Why This Works + +Knowledge base maintenance's tedious part is bookkeeping, not reading/thinking: updating cross-references, keeping summaries current, noting data contradictions, maintaining consistency across pages. Humans abandon wikis as maintenance burden outpaces value. LLMs don't bore, don't forget updates, can touch 15 files in one pass. Wiki maintenance becomes nearly free. + +Humans curate sources, direct analysis, ask good questions, think about meaning. LLMs handle everything else. + +This relates in spirit to Vannevar Bush's 1945 Memex—personal curated knowledge stores with associative document trails. Bush's vision resembled this more than what the web became: private, actively curated, with connections between documents as valuable as documents themselves. Bush couldn't solve maintenance; LLMs handle that. + +## Note + +This document intentionally remains abstract, describing the idea rather than specific implementation. Directory structure, schema conventions, page formats, tooling—all depend on domain, preferences, and LLM choice. Everything is optional and modular. Pick what's useful; ignore what isn't. Your sources might be text-only (no image handling needed). Your wiki might stay small enough that index files suffice (no search engine required). You might want different output formats entirely. Share this with your LLM agent and work collaboratively to instantiate a version fitting your needs. This document's sole purpose is communicating the pattern; your LLM figures out the rest. diff --git a/container/skills/wiki/SKILL.md b/container/skills/wiki/SKILL.md deleted file mode 100644 index a390a51..0000000 --- a/container/skills/wiki/SKILL.md +++ /dev/null @@ -1,67 +0,0 @@ ---- -name: wiki -description: Maintain a persistent wiki knowledge base. Ingest sources (URLs, files, attachments), build and update interlinked wiki pages, answer questions from the wiki, and run periodic health checks. Use when the user sends sources to add, asks questions the wiki can answer, or requests wiki maintenance. ---- - -# Wiki Knowledge Base - -You maintain a persistent wiki in your workspace. The wiki sits between you and raw sources — when new material arrives, you read it and integrate it into structured, interlinked pages. Knowledge compounds over time rather than being re-derived on every question. - -## Directory Structure - -``` -wiki/ # LLM-generated pages (you own this entirely) - index.md # Content catalog — updated on every ingest - log.md # Append-only activity log - ... # Entity pages, concept pages, comparisons, syntheses -sources/ # Raw immutable material (never modify these) - ... # Fetched articles, clipped pages, uploaded files -``` - -## Operations - -### Ingest - -When the user sends a URL, file, or says to add something: - -1. Fetch or read the source material -2. Save a copy to `sources/` (URLs: fetch and save as markdown; files: copy as-is) -3. Discuss key takeaways with the user -4. Create or update wiki pages — summaries, entity pages, concept pages, cross-references -5. Flag contradictions with existing wiki content -6. Update `index.md` with new and changed pages -7. Append to `log.md` - -A single source often touches many wiki pages. Prefer ingesting one source at a time with user involvement, though batch ingestion works for bulk imports. - -### Query - -When the user asks a question: - -1. Read `index.md` to locate relevant pages -2. Read those pages and synthesize an answer -3. Cite which wiki pages informed the answer -4. If the answer is substantial, offer to file it back as a new wiki page — explorations should compound in the wiki, not disappear into chat history - -### Lint - -When asked to health-check the wiki (or triggered by a scheduled task): - -- Contradictions between pages -- Stale claims superseded by newer sources -- Orphan pages with no inbound links -- Important concepts that lack dedicated pages -- Missing cross-references -- Data gaps — suggest sources to pursue - -Report findings and offer to fix issues. - -## Conventions - -- Markdown with YAML frontmatter (`date_created`, `last_updated`, `sources`, `tags`) -- Link between pages with relative markdown links: `[Page Title](page-title.md)` -- One entity or concept per page — split pages over ~500 lines -- `index.md`: organized by category, each entry is `- [Page Title](path.md) — one-line summary` -- `log.md`: append-only, each entry starts with `## [YYYY-MM-DD] | ` - -These are defaults. Adapt the structure to the domain — the user's wiki, their conventions.