docs: add v1→v2 action-items analysis + SDK signal probe tool

- docs/v1-vs-v2/: full v1→v2 regression analysis (SUMMARY + 21 per-module
  docs + ACTION-ITEMS rollup with decisions + timezone recreation spec).
- container/agent-runner/scripts/sdk-signal-probe.ts: empirical harness
  used to characterise Claude Agent SDK event/hook/stderr timing for the
  stuck-detection design in item 9.
- src/channels/chat-sdk-bridge.ts: document the conversations Map staleness
  in a code comment; fix deferred to when dynamic group registration lands
  (ACTION-ITEMS item 17).

No runtime behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
gavrielc
2026-04-20 01:00:04 +03:00
parent 5ed5b72f10
commit 47950671fa
26 changed files with 3653 additions and 0 deletions

View File

@@ -0,0 +1,154 @@
# formatting (test-only) : v1 vs v2
## Scope
- **v1**: `/Users/gavriel/nanoclaw4/src/v1/formatting.test.ts` (316 lines)
- **v1 production sibling**: `/Users/gavriel/nanoclaw4/src/v1/router.ts` (43 lines) — `escapeXml()`, `formatMessages()`, `stripInternalTags()`, `formatOutbound()`, plus `/Users/gavriel/nanoclaw4/src/v1/config.ts` (63 lines) — `getTriggerPattern()`, `TRIGGER_PATTERN`, `buildTriggerPattern()`, `DEFAULT_TRIGGER`
- **v2 counterparts**:
- Inbound message formatting: `/Users/gavriel/nanoclaw4/container/agent-runner/src/formatter.ts` (228 lines) — `formatMessages()`, `categorizeMessage()`, `extractRouting()`
- Outbound tag stripping: embedded in container delivery logic
- Trigger patterns: moved to DB model (`messaging_group_agents.trigger_rules` JSON) — no code-level function
- v2 tests: `/Users/gavriel/nanoclaw4/container/agent-runner/src/poll-loop.test.ts:2684` (formatter section only)
---
## Test-case map
| v1 Test Case | v2 Formatter Handling | Status | Notes |
|---|---|---|---|
| **escapeXml: ampersands** (src/v1/formatting.test.ts:2223) | `/container/agent-runner/src/formatter.ts:225` `escapeXml()` with `&``&amp;` | ✅ Preserved | Both use identical regex replacement. V2 escaping is used in `formatSingleChat()` for sender, time, text. |
| **escapeXml: less-than** (test:2627) | `formatter.ts:225` `escapeXml()` with `<``&lt;` | ✅ Preserved | Used in XML attributes and content. |
| **escapeXml: greater-than** (test:3031) | `formatter.ts:225` with `>``&gt;` | ✅ Preserved | Same. |
| **escapeXml: double quotes** (test:3435) | `formatter.ts:225` with `"``&quot;` | ✅ Preserved | Same. |
| **escapeXml: multiple special characters** (test:3839) | `formatter.ts:225` (regex composition) | ✅ Preserved | Single pass through all four replacements. |
| **escapeXml: passthrough clean text** (test:4243) | `formatter.ts:225` (no-op if no specials) | ✅ Preserved | Same. |
| **escapeXml: empty string** (test:4647) | `formatter.ts:225` (no-op on empty) | ✅ Preserved | Same. |
| **formatMessages: single message with context header & time** (test:5662) | `/container/agent-runner/src/formatter.ts:124158` `formatChatMessages()` & `formatSingleChat()` | ⚠️ Changed | v1 formats as `<context timezone="UTC" />\n<messages>...\n</messages>` with full timestamp in US locale. v2 uses `<message id="seq" from="dest-name" sender="..." time="HH:MM">...` with 24-hour time only. No context header. |
| **formatMessages: multiple messages** (test:6484) | `formatter.ts:124134` (batch wrapping in `<messages>` tag) | ⚠️ Changed | v2 wraps multiple chat messages in `<messages>` tags but structure differs: no timezone attribute, different time format, `from` attribute added. |
| **formatMessages: escape sender names** (test:8688) | `formatter.ts:157` `sender="${escapeXml(sender)}"` | ✅ Preserved | Same escaping strategy. |
| **formatMessages: escape content** (test:9193) | `formatter.ts:157` `${escapeXml(text)}` | ✅ Preserved | Same. |
| **formatMessages: empty array** (test:9699) | `formatter.ts:98` — returns empty string if no messages | ❌ Incompatible | v1 returns `<context>\n<messages>\n\n</messages>` even for empty. v2 returns empty string. Different expected output. |
| **formatMessages: reply context (quoted_message)** (test:102116) | `formatter.ts:143, 183188` `formatReplyContext()` | ⚠️ Changed | v1 renders `reply_to="42"` attribute + `<quoted_message from="Bob">text</quoted_message>` child. v2 renders as `<reply-to sender="..." >preview</reply-to>` without message ID attribute. |
| **formatMessages: omit reply when absent** (test:119122) | `formatter.ts:183` (conditional) | ✅ Preserved | Both check for presence before rendering. |
| **formatMessages: omit quoted_message when content missing** (test:125136) | `formatter.ts:184` (check `replyTo.text`) | ✅ Preserved | Both guard against missing content. |
| **formatMessages: escape reply context** (test:139151) | `formatter.ts:188` `escapeXml()` on sender and text | ✅ Preserved | Same escaping applied. |
| **formatMessages: timezone conversion** (test:154160) | `formatter.ts:216223` `formatTime()` — HH:MM UTC only | ❌ Incompatible | v1 uses `formatLocalTime()` (full locale string with date, month, am/pm) from `timezone.ts:2637`. v2 uses 24-hour `HH:MM` UTC only; no timezone localization. |
| **TRIGGER_PATTERN: matches @name at start** (test:170171) | No v2 code equivalent | ❌ Not in v2 | v2 moved trigger rules to DB; no regex pattern in code. Router evaluates `messaging_group_agents.trigger_rules` JSON. |
| **TRIGGER_PATTERN: case-insensitive** (test:174176) | DB model (applied at runtime by router) | ❌ Not in v2 | Same behavior (case-insensitive in router) but no test coverage for trigger logic in v2. |
| **TRIGGER_PATTERN: word boundary checks** (test:179192) | DB model (router enforces) | ❌ Not in v2 | Router evaluates trigger rules; no unit tests for pattern matching. |
| **getTriggerPattern: custom per-group trigger** (test:201206) | `/src/router.ts` evaluates `messaging_group_agents.trigger_rules` at delivery time | ❌ Not tested in v2 | v2 has no unit test for custom trigger selection. Behavior preserved in router but untested. |
| **getTriggerPattern: regex characters literal** (test:215219) | DB-stored rule (router uses literal match or regex) | ❌ Not tested | v2 stores trigger as string in DB; runtime evaluation depends on router implementation (not inspected here). |
| **stripInternalTags: single-line** (test:226227) | No direct v2 function — embedded in polling | ❌ Not isolated | v1 regex `/<internal>[\s\S]*?<\/internal>/g` with `.trim()`. v2 container poll-loop does not test this; no dedicated outbound function in v2 agent-runner. |
| **stripInternalTags: multi-line** (test:230231) | Not tested in v2 | ❌ Not isolated | v1 regex handles `[\s\S]*?` (newlines included). |
| **stripInternalTags: multiple blocks** (test:234235) | Not tested in v2 | ❌ Not isolated | Regex global flag `/g` handles multiple. Not verified in v2 tests. |
| **stripInternalTags: only internal tags** (test:238239) | Not tested in v2 | ❌ Not isolated | v1 returns empty after trim; behavior not verified in v2. |
| **formatOutbound: passthrough clean text** (test:244245) | Not tested in v2 | ❌ Not isolated | v1 calls `stripInternalTags()` then returns. v2 does not have isolated test. |
| **formatOutbound: empty after strip** (test:248249) | Not tested in v2 | ❌ Not isolated | v1 returns empty if all was internal. |
| **formatOutbound: strip tags from text** (test:252253) | Not tested in v2 | ❌ Not isolated | v1 example: `<internal>thinking</internal>The answer is 42``The answer is 42`. |
| **trigger gating: main group always processes** (test:277279) | No unit test in v2; logic in `/src/router.ts` routing decision | ❌ Not tested | v1 shows that main groups bypass trigger check. Behavior likely preserved (main group always forwards to agent) but not verified by test. |
| **trigger gating: main group ignores requiresTrigger flag** (test:282284) | Not tested in v2 | ❌ Not tested | v1 shows `isMainGroup=true` overrides `requiresTrigger` flag. No v2 test. |
| **trigger gating: non-main needs trigger (default)** (test:287289) | Not tested in v2 | ❌ Not tested | v1 default behavior: non-main group requires trigger unless explicitly disabled. |
| **trigger gating: custom per-group trigger enforcement** (test:302309) | Not tested in v2 | ❌ Not tested | v1 shows per-group trigger override. Behavior in v2 DB but no test. |
| **trigger gating: requiresTrigger=false disables check** (test:312314) | Not tested in v2 | ❌ Not tested | v1 allows opting out of trigger requirement per group. |
---
## Missing from v2
1. **Timezone-aware time formatting**
- v1: `formatLocalTime(utcIso, timezone)` in `src/v1/timezone.ts:2637` converts UTC ISO timestamp to user's local timezone with full locale formatting (date, month, am/pm).
- v2: `formatTime()` in `container/agent-runner/src/formatter.ts:216223` only extracts `HH:MM` in UTC, no localization.
- **Impact**: v2 loses per-agent timezone context. Timestamps appear in UTC only, potentially confusing users in different timezones.
2. **Context header with timezone attribute**
- v1: Every message batch includes `<context timezone="..."/>` header.
- v2: No context header; timestamp is a message attribute only.
- **Impact**: Agent sees no explicit timezone declaration; must infer from message times or system prompt.
3. **Reply context with message ID attribute**
- v1: `reply_to="<message_id>"` attribute on message; separate `<quoted_message from="...">content</quoted_message>` child.
- v2: Consolidated into `<reply-to sender="...">preview</reply-to>` without message ID; preview truncated to 100 chars.
- **Impact**: v2 loses structured reply tracking; agent can't reference specific message IDs in follow-ups.
4. **Message ID sequence in XML**
- v1: No `id` attribute on messages (WhatsApp-era design).
- v2: Each message has `id="seq"` (database sequence number).
- **Impact**: Allows agent to reference messages by ID, but v1 tests do not verify this.
5. **Trigger pattern unit tests**
- v1: Comprehensive tests for `getTriggerPattern()`, `TRIGGER_PATTERN`, case-insensitivity, word boundaries, regex escaping.
- v2: No unit tests; trigger logic moved to DB and router. Untested.
- **Impact**: Trigger matching behavior not verified by tests; regression risk if router changes.
6. **Internal tag stripping tests**
- v1: `stripInternalTags()` and `formatOutbound()` tested for single-line, multi-line, multiple blocks, edge cases.
- v2: No isolated tests for outbound tag stripping.
- **Impact**: No verification that internal tags are reliably removed before delivery.
7. **Trigger gating (requiresTrigger flag) tests**
- v1: Detailed tests of main-group bypass, per-group override, default behavior, flag combinations.
- v2: No tests; logic moved to DB schema and router evaluation.
- **Impact**: Trigger enforcement behavior not verified.
8. **Empty message batch handling**
- v1: Explicitly returns `<context>\n<messages>\n\n</messages>` for empty array.
- v2: Returns empty string.
- **Impact**: No clear protocol for "no messages to process" signals.
---
## Behavioral discrepancies
### 1. Message XML structure (formatMessages)
- **v1**: `<context timezone="..."/>\n<messages>\n<message sender="..." time="...">content</message>\n</messages>`
- **v2**: `<message id="seq" from="dest-name" sender="..." time="HH:MM">content</message>` (no wrapper for single message)
- **v1 line**: `src/v1/router.ts:923`
- **v2 line**: `container/agent-runner/src/formatter.ts:124158`
### 2. Time formatting
- **v1**: Full locale string (e.g., "Jan 1, 2024, 1:30 PM") using `Intl.DateTimeFormat` with timezone localization (`src/v1/timezone.ts:2637`)
- **v2**: 24-hour UTC only (e.g., "13:30") without timezone info (`container/agent-runner/src/formatter.ts:216223`)
- **Impact**: v2 loses timezone awareness; agent cannot distinguish between user's local time and server time.
### 3. Reply context structure
- **v1**: Two-part — `reply_to="<id>"` attribute + `<quoted_message from="...">text</quoted_message>` child element
- **v2**: Single element — `<reply-to sender="...">100-char preview</reply-to>` (no ID, preview truncated)
- **v1 line**: `src/v1/router.ts:1216`
- **v2 line**: `container/agent-runner/src/formatter.ts:143, 183188`
- **Impact**: v2 cannot support message-ID-based threading; loses structured reply metadata.
### 4. Trigger pattern matching
- **v1**: Implemented as regex returned by `getTriggerPattern()` with word-boundary enforcement (`config.ts:4049`)
- **v2**: Stored in DB as JSON in `messaging_group_agents.trigger_rules`; evaluated by router at delivery time
- **v1 line**: `src/v1/config.ts:4049`
- **v2 line**: `/src/router.ts` (router logic, not inspected in detail here)
- **Impact**: v1 enforces word boundaries via regex (`\b`); v2 implementation unknown (DB-driven).
### 5. Empty message handling
- **v1**: Returns `<context>\n<messages>\n\n</messages>` — preserves structure
- **v2**: Returns empty string
- **v1 line**: `src/v1/router.ts:22`
- **v2 line**: `container/agent-runner/src/formatter.ts:98`
### 6. Internal tag stripping
- **v1**: Regex-based, `.trim()` called after removal
- **v2**: Not isolated; no dedicated function or test in v2 formatter
- **v1 line**: `src/v1/router.ts:2526`
- **v2 line**: No equivalent
---
## Worth preserving?
**Partially.** The v1 formatting test suite is **essential for documenting lost functionality**, not for v2 regression. Key behaviors that should be preserved in v2 but are currently missing:
1. **Timezone-aware message timestamps** — v2 should restore `formatLocalTime()` from `src/v1/timezone.ts` and include timezone context in the XML header. Without this, agents cannot reason about when messages arrived relative to the user's clock.
2. **Reply context with message IDs** — v2's truncated reply preview is lossy. Consider restoring the `reply_to="<id>"` attribute so agents can reference prior messages by sequence number for structured threading.
3. **Trigger pattern unit tests** — v2 moved trigger logic to the DB but lost test coverage. The DB schema and router must enforce the same invariants (word boundaries, case-insensitivity, custom per-group overrides) that v1 tested. Recommend adding integration tests to `src/router.ts` or `src/channels/adapter.ts` to verify trigger matching.
4. **Internal tag stripping tests** — v2 agent-runner should include unit tests for `stripInternalTags()` (if the skill applies) to prevent regression when Claude adds `<internal>` thinking tags.
The v1 test file serves as a **specification document** for channel formatting and trigger gating that v2 partially refactored away. Keeping it in the repo (even unpowered) documents the intended semantics.