fix(host-sweep): clear orphan processing_ack on kill to prevent claim-stuck loop

When the host kills a container (absolute-ceiling, claim-stuck, or crashed),
resetStuckProcessingRows reset messages_in but left orphan rows in
processing_ack. The next sweep tick spawned a fresh container and, on the
same tick, ran enforceRunningContainerSla against outbound.db that still
contained the previous container's claim with a hours-old status_changed
timestamp — instant kill-claim, before the agent-runner could open
outbound.db to run its own clearStaleProcessingAcks(). Loop until tries
hit MAX_TRIES.

Add deleteOrphanProcessingClaims() in session-db and call it at the end of
resetStuckProcessingRows. Safe to write outbound.db here because the host
only enters this path after killContainer (or when no container is running).

Tests in host-sweep.test.ts cover the helper plus the regression: orphan
claim from a 2h-old kill is now removed atomically with the messages_in
reset, so the next sweep tick sees an empty claims list and the freshly
respawned container survives long enough to start its agent-runner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Ethan
2026-04-30 12:54:42 +02:00
parent f828e2971c
commit 7ce9922cde
3 changed files with 183 additions and 1 deletions

View File

@@ -180,6 +180,19 @@ export function getProcessingClaims(outDb: Database.Database): ProcessingClaim[]
.all() as ProcessingClaim[];
}
/**
* Delete orphan 'processing' rows. Called by the host after killing a
* container so the leftover claim doesn't trip claim-stuck on the next sweep
* tick (which would kill the freshly respawned container before its
* agent-runner can run its own startup cleanup).
*
* Safe because the host only writes to outbound.db when no container is
* running (we just killed it). Returns the number of rows deleted.
*/
export function deleteOrphanProcessingClaims(outDb: Database.Database): number {
return outDb.prepare("DELETE FROM processing_ack WHERE status = 'processing'").run().changes;
}
export interface ContainerState {
current_tool: string | null;
tool_declared_timeout_ms: number | null;