# Next Session: Hailo-8 L1 Reflex — Validate & Close Gaps (V2)

**Status of predecessor `NEXT-SESSION-HAILO-ACTIVATION.md`:** superseded. That spec assumed greenfield install; this V2 handoff reflects that the L1 Hailo reflex was already implemented in sessions 41-88 and the real work is validation + 11 concrete code changes surfaced by adversarial review.

---

## What

Activate and validate the Hailo-8 L1 safety reflex on the TurboPi Pi 5 (`192.168.68.61`). The `HailoSafetyDaemon` is already wired inside `turbopi-server` (see `services/turbopi-server/safety.py` and its 59-test suite at `services/turbopi-server/test_safety.py`). Pi was unreachable at plan-write time ("No route to host"), so first step is bringing the robot online and verifying HailoRT, `.hef`, and daemon health. Then apply 11 small code changes surfaced by adversarial review, run a distance calibration + WiFi-drop validation, update docs + MEMORY + perspectives, and hand off two explicitly-deferred follow-ups.

---

## Plan

**Full plan:** `~/.claude/plans/hailo-l1-activation.md`

> Read the plan first. It contains:
> - Executive Reframe (why this is validate-and-close-gaps, not greenfield)
> - Stage 0 Known Gotchas (camera WB, HailoRT 4.23 API quirks, FrameGrabber single-owner, `yolov8s_h8.hef` vs `h8l`, etc.)
> - Stage 1 Phases A-E (verification → install/repair → code changes → validation → docs)
> - Stage 1B State Machine (SA1 frame plumbing, SA2 ESTOP latch, SA3 sonar polling)
> - Stage 1C Pre-Mortem (8 failure scenarios with mitigations)
> - Stage 1D Scope Coherence (every sibling surface enumerated and classified)
> - Stage 6 Feedback Response table (23 review findings, 19 Implemented, 4 LOW)
> - Phases A-E (amended)
> - Success Criteria (one checkbox per review finding)

All adversarial findings are already addressed in the plan. Do NOT re-review — execute.

---

## Key Design Decisions (from adversarial review)

1. **Wedged-inference watchdog (C5 / WEDGE-1):** the safety daemon must write `_last_infer_ts` after every `pipeline.infer()` call; a separate `_WatchdogThread` fires ESTOP if no inference in 3s. Without this, a stuck Hailo call is invisible because `_healthy` only flips on stale frames. This is load-bearing for the L1 safety claim.

2. **Auto-clear race guard (C6 / RACE-3):** `_safety_estop_clear_callback` must `_motor_lock.acquire(blocking=False)` before clearing `_hardware_estop`. If the lock is held (a drive is in-flight), defer by one cycle. Prevents motors restarting while a prior drive's `finally` is still zeroing them.

3. **Camera-stale recovery (C7 / STALE-4):** when camera-stale ESTOP fires, switch to a slower 1-per-10s retry (not 1-per-second spam) and allow `_healthy` to re-arm when a fresh frame finally arrives. Add `_camera_stale_estop_latched` to `/status`.

4. **Lidar in auto-clear gate (C8 / LIDAR-6):** auto-clear must check forward-lidar-sector ≥ `SAFETY_DISTANCE_CM * 10` mm (when lidar is healthy) alongside yolo_clear + sonar_safe. Fail-open to yolo+sonar only if lidar unhealthy.

5. **Event-based startup (C9 / START-7):** replace `time.sleep(1.0)` at `safety.py:231` with `self._grabber._running.wait(timeout=5.0)` + first-frame poll; differentiate `STARTUP-ESTOP` from `RUNTIME-ESTOP` in log reason prefixes.

6. **Auto-clear callback gating (C10 / AUTOC-11):** add `self._is_estop_latched` daemon-side; only call `_estop_clear_callback` when ESTOP is actually latched, not every cycle at 30 Hz.

7. **`/obstacles.safe_forward` single source of truth (C11 / BEAR-12):** `safe_forward = not _hardware_estop.is_set() and ...`. Eliminates the current asymmetry where daemon ESTOPs on a 45°-bearing object but `/obstacles` reports `safe_forward=True` because of the 30° cone filter.

8. **WiFi-independence test rewritten (C1 / WIFI-8):** static-analysis + FakeBoard + `socket.socket` block — the original httpx monkey-patch tested nothing because the safety path doesn't use httpx and the test would hit an early-return at `_safety_estop_callback:398`.

9. **Distance calibration MANDATORY (D1a / DIST-2):** before golden-path, measure `/obstacles.distance_cm` at 30/50/80/120 cm against tape-measured truth. Pass if ≤±25% error. `REFERENCE_DISTANCES` in `safety.py:68-76` is handwritten, not calibrated.

10. **HailoRT + `.hef` version pin (PIN-5):** `KNOWN_GOOD_HAILORT="4.23.0"`; `.hef` from pinned `hailo_model_zoo@v2.13.0` asset with `sha256sum -c` verification.

11. **WiFi-drop test uses nav-goal + `iptables -I 1` + `trap`-cleanup script (D2NAV-15 / IPT-10 / TRAP-18):** `scripts/test_wifi_drop.sh` runs a navigation goal (routes through Panda VLM, so blocking Panda actually tests WiFi-Panda-independence) with `iptables -I OUTPUT 1` (insert-at-top; `-A` may land after existing ACCEPT rules and silently fail to block).

---

## Files to Modify (ordered)

| # | File | One-line description |
|---|------|----------------------|
| 1 | `services/turbopi-server/safety.py` | C5 watchdog + C7 stale recovery + C8 lidar gate + C9 event-based start + C10 is-latched flag |
| 2 | `services/turbopi-server/main.py` | C6 auto-clear race guard in `_safety_estop_clear_callback`; C11 `/obstacles.safe_forward` fix |
| 3 | `services/turbopi-server/test_wifi_independence_safety.py` (NEW) | C1 static-analysis + FakeBoard + socket-block test |
| 4 | `services/turbopi-server/test_safety.py` | C1b sonar-stale-blocks-auto-clear + C8 lidar-auto-clear + C5 watchdog + C7 recovery tests |
| 5 | `scripts/calibrate_hailo_distance.py` (NEW, ~40 lines) | D1a distance calibration script |
| 6 | `scripts/test_wifi_drop.sh` (NEW, ~30 lines) | D2/D2b WiFi-drop validation with trap cleanup |
| 7 | `docs/RESOURCE-REGISTRY.md` | E1 Strategic Architecture table row + Change Log + SHA-256 pin |
| 8 | `docs/perspectives-vlm-primary-hybrid-nav-v3.html` | E2 Lens 04 reframe |
| 9 | `docs/day-in-life-annie.html` + `docs/day-in-life-rajesh.html` | E3 WiFi-hiccup scene bidirectional sync |
| 10 | `~/.claude/projects/-home-rajesh-workplace-her-her-os/memory/project_hailo_l1_status.md` (NEW) | E4 MEMORY entry |
| 11 | `docs/PROJECT.md` | E5 open-questions update |
| 12 | `docs/NEXT-SESSION-PI-LOCAL-WANDER.md` (NEW) | E6 deferred wander + PUSH-23 + STOP-Scope follow-ups |
| 13 | `docs/NEXT-SESSION-DASHBOARD-HAILO-PANEL.md` (NEW) | E7 deferred dashboard panel |

---

## Start Command

```bash
cat ~/.claude/plans/hailo-l1-activation.md
```

Then execute Phases A → B (only if A surfaces gaps) → C → D → E **in order**. Do not skip validation (D1a calibration is mandatory — it may reveal that the whole ESTOP system is mis-calibrated). Do not update docs (E) until D passes.

**Safety first at every step:** if anything in Phase A surfaces a hardware fault (PCIe retrain, firmware mismatch, kernel module missing), STOP and diagnose before writing code. See Phase B-ROLLBACK in the plan for recovery from a bad HailoRT install.

---

## Verification

1. **Phase A** — all 7 A-checks pass (see Success Criteria in the plan)
2. **Phase C** — each of C1, C1b, C5-C11 has at least one green unit test; `pytest services/turbopi-server/` full-green
3. **Phase D1a** — distance error ≤ ±25% at 30/50/80/120 cm; otherwise add `REFERENCE_DISTANCES` patch and re-run
4. **Phase D1** — golden-path: backpack at 40 cm → ESTOP in log within ~33 ms of detection
5. **Phase D2** — run `scripts/test_wifi_drop.sh`; script exits 0 and auto-removes iptables rule on EXIT/INT/TERM; robot holds position, no collision
6. **Phase D3** — `ssh panda 'nvidia-smi --query-gpu=memory.used --format=csv,noheader'` delta is near-zero (validates the base-spec "800 MB" claim was wrong)
7. **Phase E** — every doc/MEMORY target in the Files-to-Modify table is committed; bidirectional narrative grep: `grep -l "WiFi hiccup\|Hailo reflex" docs/day-in-life-*.html` returns both files with equivalent scene content

Commit ceremony at the end:
- One commit per phase (A: no commit — verification only; B: `chore(turbopi): pin HailoRT 4.23 + sha256 .hef`; C: one commit per code change; D: `feat(turbopi): calibrate Hailo distance + wifi-drop test script`; E: `docs(hailo-l1): update perspectives + narrative + registry`).
- Push; Pi is deployed via `git pull` on Pi (not rsync; not start.sh — those are for Panda/Titan).
- Verify Pi is at HEAD: `ssh pi 'cd ~/workplace/her/her-os && git log --oneline -1'` matches laptop HEAD.
- Restart: `ssh pi 'sudo systemctl restart turbopi-server'`
- Re-run Phase A checks post-deploy.