# Next Session: Implement Streaming Vision Navigation

## What

Replace the per-request poll navigation architecture (Annie orchestrates 5 HTTP round-trips per cycle, ~1-3s per cycle, blind to mid-nav state) with a continuous streaming vision pipeline. Pi streams camera (MJPEG) + sensors (WebSocket) to Panda. Panda composites frame + sensor overlays, runs VLM at 2-5 Hz, and drives Pi directly. Annie becomes a goal-setting commander who starts nav, polls status, and reads snapshots.

**Why this matters:** Session 77 E2E testing proved the current architecture can't navigate effectively. The robot facing a curtain at 27cm couldn't escape because each cycle is ~1-3s, the alternating search undoes escape turns, and Annie's LLM never sees what's happening.

## Plan

**Read the plan first:** `~/.claude/plans/partitioned-sniffing-swing.md`

It contains the full implementation (5 phases), all 30 adversarial review findings (all addressed), state machine, pre-mortem analysis, 13 known gotchas, exact code for all components, and verification steps.

## Key Design Decisions (from adversarial review)

1. **Legacy fallback uses `_navigate_legacy()`** — NOT `handle_navigate_robot.__wrapped__` (doesn't exist). Extract existing body to `_navigate_legacy(args, user_message)` and call that. (CRITICAL-1/BUG-2)
2. **Zombie MJPEG generator prevention** — Pi `/stream` generator checks `request.is_disconnected()` every iteration. Without this, each WiFi reconnect leaks a generator, accumulating CPU until YOLO safety daemon starves. (CRITICAL-2)
3. **`asyncio.Lock` lazy init** — `NavController._start_lock` starts as None, created on first async use. Direct `__init__` creation crashes on Python <3.10 and breaks test fixtures. (CRITICAL-3/BUG-5)
4. **Sonar zero check**: `if sonar_mm is not None` — NOT `if sonar_mm`. Sonar reading of 0mm is the most dangerous distance; the falsy check disables sonar safety gates. (BUG-3)
5. **MJPEG client uses `async with httpx.AsyncClient(...)`** — prevents fd leak on every reconnect. (BUG-4)
6. **Fix `build_vlm_prompt()` line 75**: change `{goal}` → `{safe_goal}` — pre-existing prompt injection bug amplified by 2-5Hz continuous calls. (SEC-2)
7. **Remove `safe_forward` from sensor WS** — the field was `not _hardware_estop.is_set()` which ignores YOLO. NavController derives safe_forward from lidar Forward sector directly. (HIGH-1/BUG-6)
8. **Compositor overlays avoid bottom of frame** — sonar bar moved to right side only, status text to top-left. Bottom preserved for VLM goal detection. Semi-transparent blending. (HIGH-2)
9. **Separate VLM client for NavController** — don't share `_llama_client` with `/v1/nav/decide`. Independent lifecycle, no shutdown race. (HIGH-3)
10. **Annie detects stalled NavController** — monitors `cycle_count` across polls. If frozen for 3 polls (6s), stops nav and reports stall. (ARCH-1)
11. **`_run_loop finally` sends stop to Pi** — prevents 10s robot drift when loop exits abnormally. (SM-3)
12. **Escape sequence: rate floor + ESTOP check on turn** — `await asyncio.sleep(0.1)` between backup and turn (G10), check 409 on turn response (BUG-9).
13. **Black frame detection** — skip cycles where `frame.mean() < 5.0` without burning search rotations. Camera tilt causes 2-3 black frames. (PM-MISS-2)
14. **WebSocket auth via headers** — NOT query params (token appears in uvicorn logs). (SEC-1)
15. **Panda `/health` reports `streaming_enabled`** — Annie checks before starting streaming nav. Mismatch falls back to legacy with warning. (MAINT-1)

## Files to Modify

*In implementation order (dependencies → dependents):*

| # | File | Change |
|---|------|--------|
| 1 | `services/panda_nav/server.py` | **FIX** line 75: `{goal}` → `{safe_goal}` in `build_vlm_prompt()` |
| 2 | `services/turbopi-server/main.py` | **ADD** `GET /stream` (MJPEG with disconnect detection), `WS /sensors` (10Hz JSON, header auth). ~60 lines after line 1007. Import `WebSocket, WebSocketDisconnect, StreamingResponse`. |
| 3 | `services/panda_nav/stream_client.py` | **NEW** — `MjpegClient` (MJPEG consumer, single-slot buffer, 2MB cap, `async with` client) + `SensorClient` (WS consumer, deep-copy, header auth). ~130 lines. |
| 4 | `services/panda_nav/compositor.py` | **NEW** — OpenCV overlay renderer. Lidar radar top-right, sonar bar right-side, heading top-left, status top-left below heading. Bottom of frame clear. Semi-transparent. ~120 lines. |
| 5 | `services/panda_nav/server.py` | **ADD** `NavController` class (lazy Lock, separate VLM client, Pi stop in finally, black-frame detection, heartbeat, VLM latency monitoring), 4 new endpoints (`/v1/nav/start`, `/v1/nav/stop`, `/v1/nav/status`, `/v1/nav/snapshot`), expanded lifespan, health reports `streaming_enabled`. ~300 lines added. Existing `/v1/nav/decide` untouched. |
| 6 | `services/annie-voice/robot_tools.py` | **ADD** `_navigate_streaming()` (commander pattern with stall detection), `_call_panda()` helper. **EXTRACT** existing nav body to `_navigate_legacy()`. Feature flag `NAV_STREAMING`. ~100 lines added, 1 line modified. |
| 7 | `services/panda_nav/requirements.txt` | **ADD** `websockets>=12.0,<14.0`, `numpy>=1.26` |
| 8 | `services/panda_nav/tests/test_compositor.py` | **NEW** — 5 compositor tests |
| 9 | `services/panda_nav/tests/test_stream_client.py` | **NEW** — 5 stream client tests |
| 10 | `services/panda_nav/tests/test_nav_controller.py` | **NEW** — 10 NavController lifecycle tests |
| 11 | `services/annie-voice/tests/test_streaming_nav.py` | **NEW** — 5 commander pattern tests |

## Start Command

```
cat ~/.claude/plans/partitioned-sniffing-swing.md
```

Then implement the plan. All 30 adversarial findings are already addressed in it — do not revert the fixes. Key fixes are in the "Critical Code Fixes (Updated from Review)" section.

**Agent execution plan:**
```
Agent A (Pi: /stream + /sensors)         ──┐
Agent B (Panda: stream_client +           ─┼──► Agent D (deploy + E2E verify)
         compositor + NavController)       │
Agent C (Annie: commander pattern)        ──┘
```

A, B, C can run in parallel (independent files). D runs after all three pass tests. B is the heaviest agent (3 new files + server.py modifications).

## Verification

### Gate 1: Pi streaming (after deploying Pi)
1. `curl -H "Authorization: Bearer $TOKEN" http://192.168.68.61:8080/stream -o /dev/null -w "%{http_code}"` → 200
2. `wscat -H "Authorization: Bearer $TOKEN" -c "ws://192.168.68.61:8080/sensors"` → JSON at 10Hz with lidar, sonar_mm, imu_heading_deg, estop_active, timestamp_ms
3. Existing endpoints still work: `/photo`, `/drive`, `/scan`, `/health`
4. Pi CPU < 60% (`top`)

### Gate 2: Panda (after deploying Panda)
1. With `NAV_STREAMING=0`: `POST /v1/nav/decide` still works
2. With `NAV_STREAMING=1`:
   - `GET /health` shows `streaming_enabled: true`
   - `GET /v1/nav/status` → `{"running": false, ...}`
   - `POST /v1/nav/start {"goal": "red ball"}` → 200
   - `GET /v1/nav/status` → `running=true`, incrementing `cycle_count`
   - `GET /v1/nav/snapshot` → JPEG with visible overlays
   - `POST /v1/nav/stop` → 200

### Gate 3: Annie (after deploying Annie)
1. With `NAV_STREAMING=0`: existing nav works
2. With `NAV_STREAMING=1`:
   - "Find the red ball" → streaming nav starts, polls, reports result

### Gate 4: E2E
1. "Find the red ball" — robot navigates, finds it, stops
2. Robot facing curtain at 27cm — escape + search works (original session 77 failure)
3. Kill Annie mid-nav → Panda orphan detection stops after 30s
4. Disconnect MJPEG stream → NavController detects stale frames, stops
