fix(host-sweep): clear orphan processing_ack on kill to prevent claim-stuck loop
When the host kills a container (absolute-ceiling, claim-stuck, or crashed), resetStuckProcessingRows reset messages_in but left orphan rows in processing_ack. The next sweep tick spawned a fresh container and, on the same tick, ran enforceRunningContainerSla against outbound.db that still contained the previous container's claim with a hours-old status_changed timestamp — instant kill-claim, before the agent-runner could open outbound.db to run its own clearStaleProcessingAcks(). Loop until tries hit MAX_TRIES. Add deleteOrphanProcessingClaims() in session-db and call it at the end of resetStuckProcessingRows. Safe to write outbound.db here because the host only enters this path after killContainer (or when no container is running). Tests in host-sweep.test.ts cover the helper plus the regression: orphan claim from a 2h-old kill is now removed atomically with the messages_in reset, so the next sweep tick sees an empty claims list and the freshly respawned container survives long enough to start its agent-runner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -180,6 +180,19 @@ export function getProcessingClaims(outDb: Database.Database): ProcessingClaim[]
|
||||
.all() as ProcessingClaim[];
|
||||
}
|
||||
|
||||
/**
|
||||
* Delete orphan 'processing' rows. Called by the host after killing a
|
||||
* container so the leftover claim doesn't trip claim-stuck on the next sweep
|
||||
* tick (which would kill the freshly respawned container before its
|
||||
* agent-runner can run its own startup cleanup).
|
||||
*
|
||||
* Safe because the host only writes to outbound.db when no container is
|
||||
* running (we just killed it). Returns the number of rows deleted.
|
||||
*/
|
||||
export function deleteOrphanProcessingClaims(outDb: Database.Database): number {
|
||||
return outDb.prepare("DELETE FROM processing_ack WHERE status = 'processing'").run().changes;
|
||||
}
|
||||
|
||||
export interface ContainerState {
|
||||
current_tool: string | null;
|
||||
tool_declared_timeout_ms: number | null;
|
||||
|
||||
Reference in New Issue
Block a user