# Next Session: Nav VLM Production Deploy (Corrected)

**Supersedes:** `docs/NEXT-SESSION-NAV-VLM-PRODUCTION-DEPLOY.md` (has wrong target machine in Step E)

---

## What

Switch Annie-voice's navigation VLM from Titan's shared vLLM (26B, localhost:8003) to a dedicated Panda llama-server (2B E2B, 192.168.68.57:11435) with automatic fallback to Titan. This gives 8.5x latency improvement (18ms vs 156ms) and isolates nav from voice/extraction contention. Also fixes a latent `enable_thinking` key bug that would have broken nav on the switch.

## Plan

**Path:** `~/.claude/plans/glowing-snacking-toucan.md`

Read the plan first — it has the full implementation with all adversarial review findings addressed, pre-mortem analysis, agent execution strategy, and verification checklist.

## Key Design Decisions (from adversarial review)

1. **CRITICAL BUG FIX:** `robot_tools.py:320,375` uses `{"thinking": False}` — wrong key. Must be `{"enable_thinking": False}`. Has been working accidentally because Titan vLLM disables thinking at server level. Panda llama-server has no server-level override — will break without this fix. Every other module in annie-voice already uses the correct key.

2. **Change is on TITAN (annie-voice), NOT the Pi.** The original deploy doc's Step E is wrong. `NAV_VLLM_URL` is read by `robot_tools.py` which runs inside Annie on Titan. The Pi's `TITAN_VLLM_URL` (for image descriptions) is unrelated.

3. **Fallback URL pattern:** Try Panda (0.5s connect timeout), fall back to Titan localhost:8003 on `ConnectError`. Panda is not always-on — without fallback, nav silently returns "stop" when Panda is off.

4. **Separate httpx client** for VLM calls (0.5s connect, 10s read) vs robot calls (2s connect, 5s read). Currently shared with mismatched timeouts.

5. **systemd unit:** `Type=oneshot` + `RemainAfterExit=yes` (NOT `Type=forking`). Add `TimeoutStopSec=30`.

6. **start.sh health warning:** After launching Annie, check Panda reachability and print yellow warning if unreachable.

7. **Two separate commits:** Bug fix first (cherry-pickable), then feature commit.

## Files to Modify

| Order | File | Change |
|-------|------|--------|
| 1 | `services/annie-voice/robot_tools.py` | Fix `thinking`→`enable_thinking` (lines 320, 375). Add `NAV_VLLM_FALLBACK_URL` env. Add `_get_vlm_client()` with 0.5s connect. Add fallback loop in `_ask_nav_combined` and `_ask_nav_return`. Add health preflight in `handle_navigate_robot`. |
| 2 | `services/annie-voice/tests/test_robot_tools.py` | Fix assertion at line 1094: `["thinking"]` → `["enable_thinking"]`. Add test for fallback behavior. |
| 3 | `start.sh` | Add `NAV_VLLM_URL`, `NAV_VLLM_FALLBACK_URL`, `NAV_VLLM_MODEL` to `start_annie()` env block (~line 513). Add Panda health check after launch. Guard IndicF5 command. |
| 4 | `docs/RESOURCE-REGISTRY.md` | Mark IndicF5 retired. Update llamacpp to "always loaded". Recalculate VRAM. Add change log entry. |
| 5 | `scripts/benchmark_*.py` (3 files) | Add header docstrings ("Run on Panda only") |

**Not in git (Panda SSH):**
| 6 | `/etc/systemd/system/panda-llamacpp.service` on Panda | Create systemd unit (Type=oneshot) |
| 7 | IndicF5 service removal | `systemctl disable` + delete service file |
| 8 | Disk cleanup | Remove vLLM image + HF cache (~30 GB) |

## Agent Execution Strategy

Use **3 agents** — 2 parallel, then 1 sequential:

| Agent | Tasks | Machine |
|-------|-------|---------|
| **A** (parallel) | Code changes: fix enable_thinking, add fallback logic, update start.sh, update registry, add script headers, commit | Local |
| **B** (parallel) | Panda ops: verify container, retire IndicF5, create systemd unit, cleanup 30GB | Panda SSH |
| **C** (after A+B) | Deploy: push, git pull Titan, restart Annie, verify env vars, E2E test, fallback test | Titan/Pi SSH |

## Start Command

```
cat ~/.claude/plans/glowing-snacking-toucan.md
```

Then implement the plan using `superpowers:subagent-driven-development` or `superpowers:executing-plans`. All adversarial findings are already addressed in the plan.

## Verification

1. `grep "enable_thinking" services/annie-voice/robot_tools.py` — 2 matches (was 0 before)
2. `pytest services/annie-voice/tests/test_robot_tools.py -v -k "thinking or nav"` — all green
3. `ssh panda "systemctl is-active panda-llamacpp"` — active
4. `ssh panda "curl -s localhost:11435/health"` — `{"status":"ok"}`
5. `ssh titan "cat /proc/$(pgrep -f '.venv/bin/python server.py')/environ | tr '\0' '\n' | grep NAV"` — 3 vars set
6. Telegram: "Annie, explore the room for 3 cycles" — robot moves, returns summary
7. `ssh panda "sudo systemctl stop panda-llamacpp"` then send nav command — should fall back to Titan (check logs)
8. `ssh panda "df -h ~/"` — ~30 GB freed from dead-end cleanup
9. `git status` — clean working tree