# Next Session: Fix Zenoh Version Mismatch + Deploy SLAM

## What

The ROS2 SLAM stack (slam_toolbox + rf2o + EKF in Docker on Pi 5) is fully implemented but blocked by a Zenoh wire protocol mismatch. The Jazzy apt package (`ros-jazzy-rmw-zenoh-cpp` 0.2.9) ships zenoh 0.x, but native Python (`eclipse-zenoh` 1.9.0) is wire v1. Fix: rebuild rmw_zenoh from source (jazzy branch = zenoh 1.7.1, wire v1). Also fixes a stale reset-ack bug in slam_bridge.py and silent failures in pose_publisher.py.

## Plan

`~/.claude/plans/floating-frolicking-kahn.md` — adversarial-reviewed (6 CRITICAL, 5 HIGH, 6 MEDIUM found, all addressed).

Read the plan first — it has the full implementation, all review findings, and design decisions.

## Key Design Decisions (from adversarial review)

1. **Multi-stage Docker build** — Rust toolchain only in build stage, not in runtime image (saves ~2GB)
2. **`CARGO_BUILD_JOBS=2` + `colcon --executor sequential`** — prevents OOM on Pi 5 during Rust link
3. **Pinned commit hash** — `git checkout <SHA>` instead of `git clone -b jazzy` for deterministic builds
4. **Split RUN layers** — rosdep install separate from colcon build, so apt failures don't restart 15-min Rust compile
5. **zenohd healthcheck** — `service_healthy` condition in depends_on, not just container-started
6. **`_on_reset_ack` uses `==` not `>=`** — prevents stale replay acks from zeroing pose (was a bug)
7. **KEEP `zenoh_session_config.json5`** — it's NOT a workaround, it's required for client→router topology
8. **`SLAM_BACKEND=ros2`** must be set in systemd override — default is `hector`, bridge won't start without it
9. **Buildx on Titan is the primary build path** — 5 min vs 20 min, no OOM risk. Pi 5 build is fallback.
10. **Type hash verification post-deploy** — `ros2 topic list -t` inside Docker confirms matching types

## Files to Modify

1. `services/ros2-slam/Dockerfile` — multi-stage build, replace apt rmw_zenoh with source build, Rust toolchain, memory limits, improved entrypoint logging
2. `services/ros2-slam/docker-compose.yml` — source rmw_ws overlay in zenohd, add healthcheck, service_healthy
3. `services/turbopi-server/slam_bridge.py` — fix `_on_reset_ack` condition `>=` → `==`
4. `services/ros2-slam/src/pose_publisher.py` — move client to `__init__`, fix map save callback error handling

## Start Command

```
cat ~/.claude/plans/floating-frolicking-kahn.md
```

Then implement the plan task by task. All adversarial findings are already addressed in it.

Before starting, pin the rmw_zenoh commit:
```
git ls-remote --refs https://github.com/ros2/rmw_zenoh.git jazzy
```

## Verification

1. Run existing tests: `cd services/turbopi-server && python -m pytest tests/ -x -q` (23 SLAM + 81 nav tests)
2. Build image via buildx on Titan: `docker buildx build --platform linux/arm64 -t ros2-slam:latest --load services/ros2-slam/`
3. Ship to Pi: `docker save ros2-slam:latest | ssh pi "docker load"`
4. Deploy: `ssh pi "cd ~/workplace/her/her-os/services/ros2-slam && docker compose down && docker compose up -d"`
5. Set env: verify/create systemd override with `SLAM_BACKEND=ros2`, restart turbopi-server
6. Verify type hashes: `docker exec ros2-slam-ros2-slam-1 bash -c 'source /opt/ros/jazzy/setup.bash && source /rmw_ws/install/setup.bash && ros2 topic list -t'`
7. Verify SLAM: `curl http://192.168.68.61:8080/pose` → `state: running`
8. Drive robot forward → `y_m` changes in repeated `/pose` calls
9. Fetch map: `curl http://192.168.68.61:8080/map -o /tmp/map.png` → valid PNG

## Deferred (from review — not blocking this session)

- IMU loss not surfaced in SLAM state (pre-existing, not caused by Zenoh fix) → address in Phase 2 (Annie integration)
