fix(container): scope orphan reaper by install label so peers don't kill each other

Two installs on the same host could trash each other's containers: the
reaper used `docker ps --filter name=nanoclaw-`, a substring match that
picked up every install's containers. A crash-looping peer (e.g. a legacy
v1 plist respawning ~6k times) would call cleanupOrphans on every boot and
kill the healthy install's session containers within seconds of spawn.

- Stamp `--label nanoclaw-install=<slug>` onto every spawned container.
- cleanupOrphans filters by that label; healthy peers are left alone.
- Setup preflight enumerates `com.nanoclaw*` launchd plists / nanoclaw
  user systemd units, probes state/runs, and unloads any that are
  crash-looping (state != running AND runs > 10) before installing
  this install's service.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Lazer Cohen
2026-04-23 12:12:30 +03:00
parent 990d243dbd
commit 2383bde80f
6 changed files with 234 additions and 7 deletions

View File

@@ -12,6 +12,7 @@ import { OneCLI } from '@onecli-sh/sdk';
import {
CONTAINER_IMAGE,
CONTAINER_IMAGE_BASE,
CONTAINER_INSTALL_LABEL,
DATA_DIR,
GROUPS_DIR,
ONECLI_API_KEY,
@@ -389,7 +390,7 @@ async function buildContainerArgs(
providerContribution: ProviderContainerContribution,
agentIdentifier?: string,
): Promise<string[]> {
const args: string[] = ['run', '--rm', '--name', containerName];
const args: string[] = ['run', '--rm', '--name', containerName, '--label', CONTAINER_INSTALL_LABEL];
// Environment — only vars read by code we don't own.
// Everything NanoClaw-specific is in container.json (read by runner at startup).