# Next Session: Streaming Vision Nav — Panda VLM as Continuous Perception Loop

## What
Replace the current per-request photo-fetch nav loop with a **continuous streaming vision pipeline** on Panda. The panda-nav VLM (E2B 2B) receives a composite frame — camera image + lidar radar overlay + sonar distance bar — and outputs navigation commands in real-time. Annie just sets the goal and monitors progress.

**Why this matters:** Session 77 E2E testing proved the current architecture can't navigate effectively. The robot facing a curtain at 27cm couldn't escape because:
1. Each nav cycle fetches a stale photo via HTTP (Annie→Pi→Titan→Panda round-trip ~1s)
2. The black-box nav loop alternates search direction (left→right), netting zero rotation
3. Annie's LLM never sees what's happening — can't steer or adjust strategy
4. Safety daemon kills turns via `_stop_event` (fixed, but symptom of over-coupled architecture)

**Architecture principle:** Pi = camera + motors + sensors. Panda = eyes + brain (VLM). Annie = commander + monitor.

## Current Architecture (what to replace)

```
Annie (Titan)                    Panda                         Pi
─────────────                    ─────                         ──
navigate_robot(goal) ──────────────────────────────> GET /photo (HTTP)
                                                    <── JPEG ──
                     ──> POST /v1/nav/decide ──>
                         (forwards JPEG + goal)
                     <── {command, reason} ────<
                     ──────────────────────────────> POST /drive/turn (HTTP)
                     ... repeat N times ...
<── "I found it" ──
```

**Problems:** Each cycle is ~1-3s. Photo is stale. Annie is blind. Nav decisions are disconnected from sensor state.

## Target Architecture

```
Pi                              Panda                         Annie (Titan)
──                              ─────                         ─────────────
camera stream ──MJPEG──>  ┌─ frame grabber ─┐
lidar sectors ──WS/UDP──> │  compositor      │
sonar distance ──WS/UDP──>│  (overlay render)│
                          └────────┬─────────┘
                                   │ composite frame
                                   ▼
                          ┌─ VLM loop (2-5 Hz) ─┐
                          │ "Where is [goal]?"   │
                          │ sees: camera + radar  │
                          │       + sonar bar     │
                          └────────┬──────────────┘
                                   │ {command, reason, confidence}
                                   ▼
                          ┌─ command sender ──────────> POST /drive or /drive/turn
                          │ (rate-limited to Pi)       on Pi (HTTP)
                          └──────────────────────────>
                          
                          progress updates ──────────> Annie sees periodic
                          (every 5 cycles or          summaries + key frames
                           on state change)            via tool result
```

## Implementation Plan

### Phase 1: Pi Camera Stream (MJPEG over HTTP)

**File:** `services/turbopi-server/main.py` — add `/stream` endpoint

MJPEG is simplest (no WebRTC signaling needed). Pi's FrameGrabber already captures frames. Just stream them as multipart JPEG:

```python
@app.get("/stream")
async def video_stream():
    """MJPEG stream of camera frames at ~10 FPS."""
    async def generate():
        while True:
            frame = _grabber.get_frame()  # already exists
            if frame is not None:
                _, jpeg = cv2.imencode('.jpg', frame, [cv2.IMWRITE_JPEG_QUALITY, 70])
                yield (b'--frame\r\n'
                       b'Content-Type: image/jpeg\r\n\r\n' + jpeg.tobytes() + b'\r\n')
            await asyncio.sleep(0.1)  # 10 FPS
    return StreamingResponse(generate(), media_type='multipart/x-mixed-replace; boundary=frame')
```

Also add a WebSocket or SSE endpoint for sensor data:
```python
@app.websocket("/sensors")
async def sensor_stream(ws):
    """Stream lidar + sonar at ~10 Hz."""
    while True:
        sectors = _lidar_daemon.get_sectors() if _lidar_daemon else []
        sonar = _read_sonar_sync(_sonar) if _sonar else -1
        await ws.send_json({
            "lidar": [{"id": s.id, "name": s.name, "min_mm": s.min_mm} for s in sectors],
            "sonar_mm": sonar,
            "imu_heading": _imu_reader.get_heading_deg() if _imu_reader and _imu_reader.is_healthy() else None,
        })
        await asyncio.sleep(0.1)
```

### Phase 2: Panda Compositor (overlay lidar + sonar on camera frame)

**File:** `services/panda_nav/compositor.py` (NEW)

Receives MJPEG frames + sensor WebSocket data. Renders composite image:
- **Camera** (640×480) — main area
- **Lidar radar ring** (top-right corner, ~120×120px) — 12 sectors as colored arcs: green (>1m), yellow (0.3-1m), red (<0.3m). Goal direction highlighted.
- **Sonar bar** (bottom strip) — horizontal bar showing forward distance, colored same scheme. Numeric label.
- **Heading indicator** (top-left) — current IMU heading as compass arrow.

Use OpenCV for rendering — it's already a dependency on Panda.

```python
def composite_frame(camera_frame: np.ndarray, lidar_sectors: list, sonar_mm: float, heading_deg: float | None) -> np.ndarray:
    """Overlay sensor data onto camera frame for VLM consumption."""
    frame = camera_frame.copy()
    _draw_lidar_radar(frame, lidar_sectors, x=520, y=10, radius=55)
    _draw_sonar_bar(frame, sonar_mm, y=460)
    if heading_deg is not None:
        _draw_heading(frame, heading_deg, x=10, y=10)
    return frame
```

### Phase 3: Continuous VLM Loop

**File:** `services/panda_nav/server.py` — refactor from request-response to continuous loop

Current: `POST /v1/nav/decide` — one-shot VLM call per request.
New: Background asyncio task that runs continuously when a goal is active.

```python
class NavController:
    """Continuous navigation controller. Runs VLM in a loop on composite frames."""
    
    def __init__(self):
        self.goal: str | None = None
        self.running = False
        self.last_command: dict | None = None
        self.cycle_count = 0
        self.history: list[dict] = []
    
    async def start(self, goal: str):
        self.goal = goal
        self.running = True
        self.cycle_count = 0
        self.history = []
        asyncio.create_task(self._run_loop())
    
    async def stop(self):
        self.running = False
        self.goal = None
    
    async def _run_loop(self):
        while self.running:
            # 1. Get latest composite frame
            frame = compositor.get_latest_frame()
            if frame is None:
                await asyncio.sleep(0.1)
                continue
            
            # 2. Ask VLM
            command = await self._ask_vlm(frame)
            self.last_command = command
            self.cycle_count += 1
            
            # 3. Execute on Pi
            await self._execute_command(command)
            
            # 4. Check terminal conditions
            if command["command"] == "stop":
                self.running = False
                break
            
            # Rate: ~2-5 Hz (limited by VLM inference, not sleep)
```

API changes:
- `POST /v1/nav/start` — set goal, start continuous loop
- `POST /v1/nav/stop` — stop loop
- `GET /v1/nav/status` — current state (cycle count, last command, running)
- `GET /v1/nav/snapshot` — latest composite frame as JPEG (for Annie/debugging)

### Phase 4: Annie Integration

**File:** `services/annie-voice/robot_tools.py` — simplify `handle_navigate_robot`

Annie's role changes from orchestrator to commander:

```python
async def handle_navigate_robot(args: dict, user_message: str) -> str:
    await _ensure_demo_stopped()
    goal = args.get("goal", "explore")
    
    # Start continuous nav on Panda
    await _call_panda("POST", "/v1/nav/start", {"goal": goal})
    
    # Monitor progress — poll every 3s, get snapshot + status
    for check in range(20):  # max ~60s
        await asyncio.sleep(3.0)
        status = await _call_panda("GET", "/v1/nav/status")
        
        if not status.get("running"):
            reason = status.get("last_command", {}).get("reason", "unknown")
            if reason == "goal_reached":
                return f"Found the {goal}! Took {status['cycle_count']} nav cycles."
            else:
                return f"Stopped navigating: {reason} after {status['cycle_count']} cycles."
    
    # Timeout — stop nav and report
    await _call_panda("POST", "/v1/nav/stop")
    return f"Navigation timed out after 60s. Completed {status['cycle_count']} cycles."
```

For the **feedback loop** (user's request): Annie can also get a snapshot at each check:
```python
        snapshot = await _call_panda("GET", "/v1/nav/snapshot")
        # Feed snapshot to Annie's LLM for strategic decisions
        # "Should I keep going? Adjust goal? Try different approach?"
```

## VLM Prompt Update

The composite frame means the VLM prompt should reference the overlays:

```
You are navigating a small robot car toward: {goal}

The image shows:
- CAMERA VIEW: What the robot sees ahead
- RADAR (top-right): 12-sector lidar map. Green=clear, Yellow=close, Red=blocked
- SONAR BAR (bottom): Forward obstacle distance
- HEADING (top-left): Current compass direction

Reply with: POSITION SIZE
- POSITION: LEFT, CENTER, RIGHT, or NONE (if goal not visible)
- SIZE: SMALL (far), MEDIUM (mid), LARGE (close/arrived)
```

## Verification

1. **Pi stream:** `curl http://192.168.68.61:8080/stream` returns MJPEG frames
2. **Sensor WS:** `wscat -c ws://192.168.68.61:8080/sensors` shows lidar+sonar JSON at 10Hz
3. **Compositor:** `GET /v1/nav/snapshot` returns composite JPEG with overlays visible
4. **VLM loop:** Start nav, check `/v1/nav/status` shows incrementing cycle_count
5. **E2E:** "find the red ball" — robot facing curtain → continuous turns+backup → finds ball → approaches → stops
6. **E2E 2:** "find the red ball, then find the blue ball" — sequential goals via Annie tool chaining

## Migration Path

Phase 1-2 can ship independently (streaming + compositor are new endpoints, don't break existing nav).
Phase 3-4 replace the existing nav loop — feature-flag with `NAV_STREAMING=1` env var.
Keep the old `POST /v1/nav/decide` endpoint for backward compatibility during transition.

## Files to Create/Modify

| File | Change |
|------|--------|
| `services/turbopi-server/main.py` | Add `/stream` (MJPEG) and `/sensors` (WebSocket) endpoints |
| `services/panda_nav/compositor.py` | NEW: frame compositor (camera + lidar radar + sonar bar overlay) |
| `services/panda_nav/stream_client.py` | NEW: MJPEG + WebSocket client consuming Pi streams |
| `services/panda_nav/server.py` | Add NavController continuous loop, `/v1/nav/start`, `/v1/nav/stop`, `/v1/nav/status`, `/v1/nav/snapshot` |
| `services/annie-voice/robot_tools.py` | Simplify `handle_navigate_robot` to commander pattern |
| `services/panda_nav/tests/test_compositor.py` | NEW: compositor overlay tests |
| `services/panda_nav/tests/test_nav_controller.py` | NEW: continuous loop tests |

## Key Design Decisions

1. **MJPEG over WebRTC** — simpler (no signaling server), sufficient for LAN at 10 FPS, one-way stream
2. **Sensor data via WebSocket** (not baked into MJPEG) — separate channel allows different rates, easier parsing
3. **Compositor on Panda** (not Pi) — Pi CPU is limited; Panda has GPU for OpenCV overlay rendering
4. **VLM rate ~2-5 Hz** — limited by llama-server inference, not artificial sleep. Each cycle: grab frame → encode → infer → parse → ~200-500ms
5. **Annie polls every 3s** — strategic oversight, not fine-grained control. Can request snapshot for LLM reasoning.
6. **No alternating search** — VLM sees lidar radar, can reason about which direction has space. Remove hardcoded left→right alternation.