fix(poll-loop): slash commands silently broken on warm containers
The follow-up poller filtered /clear out of every tick without acking
the row, and pushed every other slash command through plain
formatMessages() (XML wrapping). On a warm container the outer
while(true) loop never regains control, so:
- /clear sat pending in messages_in forever (no response at all)
- /compact, /cost, /context, /files, /remote-control arrived at the
SDK as XML-wrapped user text and were never dispatched as commands
Both modes are invisible to host monitoring: rows are either left
pending without a processing_ack claim, or marked completed normally;
heartbeat keeps firing inside the SDK event loop.
When the follow-up poller observes any slash command (admin or
passthrough — categorizeMessage decides), end the active query so the
current turn winds down cleanly and the outer loop wakes, re-fetches
the same pending set, and runs them through the canonical path
(/clear handler + formatMessagesWithCommands raw dispatch). Leave the
rows untouched so the outer-loop fetch sees the same set the poller
saw.
Cost: each slash command on a warm container forces close+reopen of
the SDK stream — a few seconds of subprocess startup. The Anthropic
prompt cache is server-side with a 5-min TTL keyed on prefix hash, so
stream lifecycle does not affect cache lifetime; close+reopen within
5 min still gets cache hits.
Also corrects the warm-stream rationale comment on processQuery, which
implied keeping the stream open preserved cache warmth — it doesn't.
Testing evidence — cache stays warm across stream close+reopen:
Turn 1 (warm session):
Usage: in=6 out=245 cache_create=92 cache_read=22996
Full cache hit (22996 tokens).
Turn 2 — /clear arrives:
Pending slash command — ending stream so outer loop can process
Clearing session (resetting continuation)
Usage: in=6 out=95 cache_create=9393 cache_read=13600
System prompt + tool defs (~13600 tokens) still hit cache;
conversation history is gone (continuation reset) so the new turn
writes fresh context.
Turn 3 — /cost arrives:
Pending slash command — ending stream so outer loop can process
Usage: in=0 out=0 cache_create=0 cache_read=0 wall=0.0s api=0.0s
/cost is a CLI built-in: dispatched locally by the SDK, no API
call. Pre-fix this would have arrived as XML-wrapped user text
and never dispatched — confirms the broader fix works.
Turn 4 (next chat after /cost):
Usage: in=6 out=142 cache_create=328 cache_read=22993
Full cache hit again (22993 tokens read, 328 written). Despite the
/cost-induced stream close+reopen, the server-side prompt cache
survived: the new sdkQuery() resumed the same continuation, the
request prefix matched the cached entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -66,6 +66,18 @@ export function isClearCommand(msg: MessageInRow): boolean {
|
||||
return text.toLowerCase().startsWith('/clear');
|
||||
}
|
||||
|
||||
/**
|
||||
* True for any chat that needs the outer loop's command path: /clear plus
|
||||
* admin/passthrough slash commands the SDK can only dispatch when they are
|
||||
* a query's first input. Used by the follow-up poller to bail out and let
|
||||
* the outer loop reopen the query.
|
||||
*/
|
||||
export function isRunnerCommand(msg: MessageInRow): boolean {
|
||||
if (msg.kind !== 'chat' && msg.kind !== 'chat-sdk') return false;
|
||||
const cat = categorizeMessage(msg).category;
|
||||
return cat === 'admin' || cat === 'passthrough';
|
||||
}
|
||||
|
||||
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||
function extractSenderId(msg: MessageInRow, content: any): string | null {
|
||||
const raw: string | null = content?.senderId || content?.author?.userId || null;
|
||||
|
||||
@@ -7,7 +7,7 @@ import {
|
||||
migrateLegacyContinuation,
|
||||
setContinuation,
|
||||
} from './db/session-state.js';
|
||||
import { formatMessages, extractRouting, categorizeMessage, isClearCommand, stripInternalTags, type RoutingContext } from './formatter.js';
|
||||
import { formatMessages, extractRouting, categorizeMessage, isClearCommand, isRunnerCommand, stripInternalTags, type RoutingContext } from './formatter.js';
|
||||
import type { AgentProvider, AgentQuery, ProviderEvent } from './providers/types.js';
|
||||
|
||||
const POLL_INTERVAL_MS = 1000;
|
||||
@@ -255,30 +255,46 @@ async function processQuery(
|
||||
let done = false;
|
||||
|
||||
// Concurrent polling: push follow-ups into the active query as they arrive.
|
||||
// We do NOT force-end the stream on silence — keeping the query open is
|
||||
// strictly cheaper than close+reopen (no cold prompt cache, no reconnect).
|
||||
// We do NOT force-end the stream on silence — keeping the query open avoids
|
||||
// re-spawning the SDK subprocess (~few seconds) and re-loading the .jsonl
|
||||
// transcript on every turn. The Anthropic prompt cache is server-side with
|
||||
// a 5-min TTL keyed on prefix hash, so stream lifecycle does NOT affect
|
||||
// cache lifetime — close+reopen within 5 min still gets cache hits.
|
||||
// Stream liveness is decided host-side via the heartbeat file + processing
|
||||
// claim age (see src/host-sweep.ts); if something is truly stuck, the host
|
||||
// will kill the container and messages get reset to pending.
|
||||
let pollInFlight = false;
|
||||
let endedForCommand = false;
|
||||
const pollHandle = setInterval(() => {
|
||||
if (done || pollInFlight) return;
|
||||
if (done || pollInFlight || endedForCommand) return;
|
||||
pollInFlight = true;
|
||||
|
||||
void (async () => {
|
||||
try {
|
||||
// Skip system messages (MCP tool responses) and /clear (needs fresh query).
|
||||
const pending = getPendingMessages();
|
||||
|
||||
// Slash commands need a fresh query: /clear resets the SDK's
|
||||
// resume id (fixed at sdkQuery() time); admin/passthrough commands
|
||||
// (/compact, /cost, …) only dispatch when they're the first input
|
||||
// of a query — pushed mid-stream they arrive as plain text and
|
||||
// the SDK never runs them. End the stream and leave the rows
|
||||
// pending; the outer loop handles them on next iteration via the
|
||||
// canonical command path + formatMessagesWithCommands.
|
||||
if (pending.some((m) => isRunnerCommand(m))) {
|
||||
log('Pending slash command — ending stream so outer loop can process');
|
||||
endedForCommand = true;
|
||||
query.end();
|
||||
return;
|
||||
}
|
||||
|
||||
// Skip system messages (MCP tool responses).
|
||||
// Thread routing is the router's concern — if a message landed in this
|
||||
// session, the agent should see it. Per-thread sessions already isolate
|
||||
// threads into separate containers; shared sessions intentionally merge
|
||||
// everything. Filtering on thread_id here caused deadlocks when the
|
||||
// initial batch and follow-ups had mismatched thread_ids (e.g. a
|
||||
// host-generated welcome trigger with null thread vs a Discord DM reply).
|
||||
const newMessages = getPendingMessages().filter((m) => {
|
||||
if (m.kind === 'system') return false;
|
||||
if ((m.kind === 'chat' || m.kind === 'chat-sdk') && isClearCommand(m)) return false;
|
||||
return true;
|
||||
});
|
||||
const newMessages = pending.filter((m) => m.kind !== 'system');
|
||||
if (newMessages.length === 0) return;
|
||||
|
||||
const newIds = newMessages.map((m) => m.id);
|
||||
|
||||
Reference in New Issue
Block a user