# Next Session: Phone API Subprocess Supervision

## What
phone_api.py (aiohttp on Panda:8770) crashed silently on 2026-04-05, causing Annie to ask for Reethi's phone number instead of auto-resolving from the contact book. Implement Option C: the phone daemon (`phone_call.py auto`) spawns and supervises phone_api.py as a child subprocess with auto-restart and circuit breaker.

## Plan
`~/.claude/plans/whimsical-jingling-blanket.md`

Read the plan first — it has the full implementation (v3, post-adversarial review), all review findings addressed, state machine, pre-mortem table, and detailed code for all 3 files.

## Key Design Decisions (from adversarial review)

1. **Signal handler is flag-only** — NO lock acquisition, NO blocking wait in SIGTERM handler. This eliminates the deadlock risk between the signal handler and monitor thread. Cleanup happens after the while loop on the main thread.
2. **Circuit breaker** — After 5 consecutive short-lived restarts (alive < 30s each), the monitor logs CRITICAL and stops retrying. Prevents infinite crash-loop.
3. **pgrep uses bracket trick** — `[p]hone_api.py --port` prevents the classic pgrep self-match bug.
4. **Log file opened once** — The `/tmp/phone-api.log` file handle is opened once at supervision startup and reused across spawns. No `open()` calls under lock.
5. **ADB/PipeWire env vars included** — `_api_env()` allowlist includes `DBUS_SESSION_BUS_ADDRESS` and `XDG_RUNTIME_DIR` (needed by `phone_adb.py`'s `PANDA_ENV`).
6. **`stop.sh phone)` has orphan cleanup** — Individual `./stop.sh phone` now also kills phone_api.py, not just the daemon.
7. **start.sh gates "Started" on both checks** — Success message only after BOTH daemon log check AND API port health check pass.
8. **`last_start` passed as argument** — Monitor thread receives the actual spawn timestamp, not its own start time.
9. **PHONE_CALL_TOKEN added to start_phone** — Was previously only in start_phone_api (which is being removed from start_all_panda).

## Files to Modify
1. `scripts/phone_call.py` — Add supervision code in auto block (imports, _api_env, _spawn_api, _monitor_api with circuit breaker, pgrep guard, simplified SIGTERM handler, cleanup after loop)
2. `start.sh` — Add PHONE_CALL_TOKEN to start_phone env, remove start_phone_api from start_all_panda, add gated port-8770 health check
3. `stop.sh` — Fix kill order in stop_panda_services (daemon first, sleep 3s, orphan cleanup), update phone) individual case, add warning to phone-api) case

## Start Command
```
cat ~/.claude/plans/whimsical-jingling-blanket.md
```
Then implement the plan. All adversarial findings are already addressed in it.

## Verification
1. `git push` → `ssh panda "cd ~/workplace/her/her-os && git pull && find . -name __pycache__ -exec rm -rf {} + 2>/dev/null"`
2. `./stop.sh panda` → `./start.sh panda` — verify "auto-answering + API on port 8770"
3. `ssh panda "pstree -p $(pgrep -f 'phone_call.py auto' | head -1)"` — phone_api.py is child
4. `ssh panda "pkill -f phone_api.py"` → wait 10s → verify auto-restart in logs
5. Contact lookup after restart: `ssh panda "curl -s -H 'X-Phone-Token: $(cat ~/.her-os-phone-token)' http://localhost:8770/v1/phone/contacts/lookup?name=Reethi"`
6. Telegram: "call Reethi" — verify auto-resolve works
7. `./stop.sh panda` → verify no orphan processes
8. `./stop.sh phone` → verify phone_api.py also stopped (orphan cleanup)
