fix: auto-recover from stale Claude Code session on exit code 1

When Claude Code exits with code 1 during a session resume because the
session transcript file no longer exists (ENOENT on .jsonl), clear the
stale session from SQLite and retry once with a fresh session.

Detection is targeted: only triggers on ENOENT referencing a .jsonl
file or explicit "session not found" errors. Transient failures
(network, API) fall through to the normal backoff retry path.

Also removes unrelated ollama files that were mixed in during rebase.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Gary Walker
2026-03-30 23:03:44 +11:00
parent 3098f28b74
commit 38009be263
5 changed files with 14 additions and 338 deletions

View File

@@ -400,12 +400,7 @@ export async function runContainerAgent(
const chunk = data.toString();
const lines = chunk.trim().split('\n');
for (const line of lines) {
if (!line) continue;
if (line.includes('[OLLAMA]')) {
logger.info({ container: group.folder }, line);
} else {
logger.debug({ container: group.folder }, line);
}
if (line) logger.debug({ container: group.folder }, line);
}
// Don't reset timeout on stderr — SDK writes debug logs continuously.
// Timeout only resets on actual output (OUTPUT_MARKER in stdout).

View File

@@ -403,12 +403,20 @@ async function runAgent(
}
if (output.status === 'error') {
// Detect stale/corrupt session: container failed while resuming an existing session.
// Clear the session and retry once with a fresh session to avoid infinite retry loops.
if (sessionId) {
// Detect stale/corrupt session: the SDK throws ENOENT when the session
// transcript file (.jsonl) doesn't exist inside the container. This
// happens after container restarts since the filesystem is ephemeral.
// Only clear + retry for this specific signal — transient errors
// (network, API) should fall through to the normal backoff path.
const isStaleSession =
sessionId &&
output.error &&
/ENOENT.*\.jsonl|session.*not found/i.test(output.error);
if (isStaleSession) {
logger.warn(
{ group: group.name, staleSessionId: sessionId, error: output.error },
'Container failed with existing session — clearing stale session and retrying with fresh session',
'Stale session detected (ENOENT on session transcript) — clearing and retrying with fresh session',
);
delete sessions[group.folder];
deleteSession(group.folder);