# Next Session: 4-Command Nav VLM on Panda

## What

Replace Annie's 7-action nav VLM prompt (`forward/backward/left/right/goal_reached/give_up/stop`) with a **4-command output format** that includes steering angles. The 2B E2B model on Panda can't do spatial reasoning ("ball is in left half of image → turn left") but it CAN identify objects and estimate rough positions. Move the centering logic into code on the Panda side — the VLM reports what it sees, code computes the steering.

## Why

Session 72 proved:
- **VLM nav works** — 10/10 cycles correctly chose `forward` toward a red ball, sonar confirmed approach (196→53cm)
- **VLM can't steer** — it never outputs `left` or `right` to correct for drift, even with explicit "keep goal centered" prompt rules. All 25 cycles across 2 runs were `forward`
- **VLM can't stop** — never said `goal_reached` even with sonar at 43cm and prompt rule "sonar < 30cm → goal_reached"
- **Root cause:** 2B model is too small for multi-step spatial reasoning. It sees ball + clear forward → "forward", every time

## Design: Panda Nav Endpoint

**New endpoint on Panda (llama-server sidecar):** `POST /v1/nav/decide`

Instead of Annie calling raw `/v1/chat/completions` and parsing action words, Annie calls a structured endpoint on Panda that:
1. Receives the camera image + goal description + sensor data
2. Asks the VLM a structured question (not "choose an action" but "describe what you see")
3. Parses the VLM response into a structured decision
4. Returns one of exactly **4 commands** to Annie

### The 4 Commands

```json
{"command": "left", "angle_deg": 15}     // rotate left by N degrees
{"command": "right", "angle_deg": 15}    // rotate right by N degrees  
{"command": "forward"}                    // drive straight ahead
{"command": "stop", "reason": "goal_reached"}  // stop (with reason)
```

Nothing else. No backward, no give_up, no strafe.

### How Centering Works (code, not VLM)

The VLM answers a simpler question: **"Where is the [goal] in this image?"**

```
Prompt: "You see a robot camera image. Is there a [red ball] visible?
If yes, is it in the LEFT third, CENTER third, or RIGHT third of the image?
How large is it (small/medium/large)?
Reply in format: POSITION SIZE  (e.g. 'LEFT SMALL' or 'CENTER LARGE' or 'NONE')"
```

Then **code** maps the response:
| VLM says | Command |
|----------|---------|
| `LEFT *` | `{"command": "right", "angle_deg": 20}` (goal is left → robot drifted right → turn left... wait) |
| Actually: | Goal in LEFT of frame → robot needs to turn LEFT to center it |
| `LEFT SMALL` | `{"command": "left", "angle_deg": 20}` |
| `LEFT MEDIUM` | `{"command": "left", "angle_deg": 15}` |
| `LEFT LARGE` | `{"command": "left", "angle_deg": 10}` |
| `CENTER SMALL` | `{"command": "forward"}` |
| `CENTER MEDIUM` | `{"command": "forward"}` |
| `CENTER LARGE` | `{"command": "stop", "reason": "goal_reached"}` |
| `RIGHT SMALL` | `{"command": "right", "angle_deg": 20}` |
| `RIGHT MEDIUM` | `{"command": "right", "angle_deg": 15}` |
| `RIGHT LARGE` | `{"command": "right", "angle_deg": 10}` |
| `NONE` | `{"command": "left", "angle_deg": 30}` (search: rotate to scan) |

**Size as distance proxy:** LARGE = close (stop), MEDIUM = approaching (go forward/correct), SMALL = far (correct more aggressively). No need for sonar in the VLM prompt — it's redundant with visual size.

### Alternative: Split into 2 VLM Calls

If single-call accuracy is too low, split into two fast calls (each ~20ms on Panda):

1. **Detection call:** "Is there a [red ball] in this image? Reply YES or NO."
2. **Position call (if YES):** "The [red ball] is in which third: LEFT, CENTER, or RIGHT?"

Total: ~40ms, still 25 Hz. Each question is dead simple for a 2B model.

## Architecture

```
Annie (Titan) → POST Panda:11435/v1/nav/decide
                  ├─ image (base64)
                  ├─ goal ("red ball")  
                  └─ sensors (sonar_cm, lidar_summary — optional)
                  
Panda nav-server:
  1. Build VLM prompt (position+size question)
  2. POST localhost:11435/v1/chat/completions (llama-server, ~20ms)
  3. Parse "LEFT SMALL" → {"command": "left", "angle_deg": 20}
  4. Override with sonar if provided: sonar < 25cm → force stop
  5. Return command JSON

Annie:
  1. Receives {"command": "left", "angle_deg": 20}
  2. Translates to /drive call: POST Pi:8080/drive {"action": "left", "duration": 0.3, "speed": 40}
     (angle_deg → duration via calibration: ~25°/0.5s at speed 40, so 20° ≈ 0.4s)
  3. Next cycle: take photo, call Panda again
```

## Implementation Plan

### Phase 1: Panda nav-server (new FastAPI sidecar, ~200 lines)

**File:** `services/panda-nav/server.py` (new service on Panda)

- FastAPI app, port 11436 (next to llama-server on 11435)
- `POST /v1/nav/decide` — accepts image + goal + sensors, returns command JSON
- Calls llama-server internally at localhost:11435
- Prompt template with position+size format
- Response parser: regex for `(LEFT|CENTER|RIGHT)\s+(SMALL|MEDIUM|LARGE|NONE)`
- Sonar override: if sonar_cm provided and < 25 → force `stop`
- Health endpoint: `GET /health` (checks llama-server reachable)
- No auth needed (internal Panda service, not exposed)

### Phase 2: Annie robot_tools.py changes

- `_ask_nav_combined()` → replace raw VLM call with `POST Panda:11435/v1/nav/decide`
- Response is already structured — no more parsing action words from free text
- Add `NAV_DECIDE_URL` env var (default: `http://192.168.68.57:11436`)
- Keep fallback: if Panda unreachable, fall back to Titan 26B with current prompt (26B CAN do spatial reasoning)
- Translate `angle_deg` to drive duration using calibration constant

### Phase 3: Angle-to-duration calibration

From session 70 IMU data:
- 0.5s right at speed=40 → ~25° physical (from IMU)
- 2.0s right at speed=40 → ~225° physical

Calibration: **~50°/s at speed 40** → `duration = angle_deg / 50.0`

| angle_deg | duration_s |
|-----------|-----------|
| 10 | 0.2 |
| 15 | 0.3 |
| 20 | 0.4 |
| 30 | 0.6 |

### Phase 4: Deploy + test

1. Deploy panda-nav service (systemd unit, or just run alongside llama-server)
2. Update start.sh with NAV_DECIDE_URL
3. Restart Annie
4. Place red ball, send "navigate to the red ball"
5. Expect: robot corrects drift with left/right turns, stops near ball

## Files to Create/Modify

| File | Change |
|------|--------|
| `services/panda-nav/server.py` | **NEW** — FastAPI nav decision endpoint |
| `services/panda-nav/requirements.txt` | **NEW** — fastapi, uvicorn, httpx |
| `services/annie-voice/robot_tools.py` | Replace VLM call with `/v1/nav/decide` call. Add angle→duration. |
| `start.sh` | Add `start_panda_nav()`, add `NAV_DECIDE_URL` to `start_annie()` |
| `docs/RESOURCE-REGISTRY.md` | Add panda-nav entry (CPU only, negligible memory) |

## Key Decisions to Make

1. **Single call vs split call?** Start with single ("POSITION SIZE" format). If accuracy < 80%, split into detection + position.
2. **Search behavior when goal not visible?** Rotate 30° left and retry. After 12 rotations (full circle), give_up.
3. **Forward duration** — currently 1.0s. Should it vary? SMALL goal = 1.0s, MEDIUM = 0.5s? Keeps approach speed proportional to distance.
4. **IMU-assisted turns?** If `/imu` is deployed, use closed-loop rotation: "turn left 20°" → read IMU until heading delta = 20°. Much more accurate than timed open-loop.

## What NOT to Change

- llama-server container (unchanged, still serves raw completions)
- Pi turbopi-server (unchanged, still accepts /drive /photo /scan)
- Titan vLLM (unchanged, still serves as fallback)
- car_demo tools (still available but Annie won't use them per feedback)

## Verification

1. `curl Panda:11436/health` → ok
2. `curl -X POST Panda:11436/v1/nav/decide -d '{"image_b64":"...","goal":"red ball"}'` → `{"command":"forward"}` or similar
3. Live test: red ball 2m away → robot steers toward it with corrections, stops within 30cm
4. Drift test: place ball 30° off-axis → robot should turn to center, then approach
5. No-goal test: no ball visible → robot rotates to search, eventually gives up