# Next Session: ROS2 slam_toolbox Implementation (Proposal B, Phase 1)

## What

Replace the existing Python HectorSLAM daemon (`slam.py`) with ROS2 `slam_toolbox` running in Docker on the Pi 5. This fixes two known issues: **7.9x rotation drift** (session 69, no loop closure) and **28% distance undercount** (session 66, no scan-to-scan odometry). The API surface (`/pose`, `/map`, `/slam/reset`) stays identical. Annie's VLM navigation is completely unaffected — SLAM is purely additive.

The existing `slam.py` is NOT renamed or deleted — it stays as the default fallback. A new `slam_bridge.py` implements the same interface but delegates to ROS2 via a msgpack WebSocket bridge. Switching is controlled by `SLAM_BACKEND=ros2` environment variable.

## Plan

**Path:** `~/.claude/plans/iterative-baking-cook.md`

Read the plan first — it has the full implementation, all adversarial review findings (35 issues, all addressed), state machine, pre-mortem (13 failure scenarios), and design decisions.

## Key Design Decisions (from adversarial review)

1. **msgpack binary protocol, not JSON** — JSON serialization of 500 lidar points at 10Hz adds 20-50ms latency. msgpack cuts this to ~5ms.
2. **Heading delta computed on native side** — NOT over WebSocket. `consume_heading_delta_deg()` is destructive and atomic. SlamBridge tracks heading delta locally, sends it with sequence numbers for drop detection.
3. **Single asyncio event loop** — The `websockets` library is NOT thread-safe for concurrent send+recv. SlamBridge uses one asyncio event loop with send/recv as tasks and an `asyncio.Queue(maxsize=5)` for non-blocking sends.
4. **Two bridge nodes, not one** — `sensor_bridge.py` (WS→ROS2) and `pose_bridge.py` (ROS2→WS). A bug in map PNG rendering cannot kill scan ingestion.
5. **Safety daemon pinned to core 0** — `os.sched_setaffinity(0, {0})` in safety.py's `run()`. Docker pinned to cores 1-3 via `--cpuset-cpus=1-3`.
6. **CONNECTING state** — Between WS connect and first pose from slam_toolbox (5-15s startup cascade). 30s timeout → DEGRADED.
7. **Reset ack protocol** — `reset_seq` numbers prevent stale pre-reset poses from clobbering the zeroed state.
8. **Auto-save maps every 60s** — pose_bridge calls slam_toolbox serialize service. Auto-loads on startup.
9. **Startup angle self-test** — First 10 scans verify nearest point is within ±45° of forward. If not, refuse to publish (catches convention errors).
10. **No slam.py rename** — Keep git blame intact. slam_bridge.py is a new file. Conditional import in main.py.

## Files to Modify

Ordered by implementation phase:

### Phase 0 (preparation, no behavior change)
1. `services/turbopi-server/imu.py` — add `get_gyro_z_dps()` (3 lines)
2. `services/turbopi-server/lidar.py` — add `get_scan_snapshot()` (8 lines)
3. `services/turbopi-server/safety.py` — add `os.sched_setaffinity(0, {0})` (3 lines)
4. `services/turbopi-server/requirements.txt` — add websockets, msgpack (2 lines)
5. `services/turbopi-server/slam_protocol.py` — NEW: shared message TypedDicts (~60 lines)

### Phase 1 (Docker ROS2 container)
6. `services/ros2-slam/Dockerfile` — NEW (~45 lines)
7. `services/ros2-slam/docker-compose.yml` — NEW (~20 lines, build only)
8. `services/ros2-slam/slam.service` — NEW: systemd unit (~15 lines)
9. `services/ros2-slam/sensor_bridge.py` — NEW: WS server → ROS2 (~200 lines)
10. `services/ros2-slam/pose_bridge.py` — NEW: ROS2 → WS client (~150 lines)
11. `services/ros2-slam/conversions.py` — NEW: pure-Python math (~80 lines)
12. `services/ros2-slam/config/slam_toolbox.yaml` — NEW (~75 lines)
13. `services/ros2-slam/config/ekf.yaml` — NEW (~55 lines)
14. `services/ros2-slam/config/rf2o.yaml` — NEW (~12 lines)
15. `services/ros2-slam/launch/slam.launch.py` — NEW (~85 lines)

### Phase 2 (SlamBridge)
16. `services/turbopi-server/slam_bridge.py` — NEW (~300 lines)
17. `services/turbopi-server/main.py` — SLAM_BACKEND conditional (~20 lines)

### Phase 3 (tests)
18. `services/turbopi-server/test_slam_bridge.py` — NEW (~400 lines)
19. `services/ros2-slam/test_conversions.py` — NEW (~150 lines)
20. `services/ros2-slam/test_docker.sh` — NEW (~30 lines)

## Start Command

```
Read the plan at ~/.claude/plans/iterative-baking-cook.md.
Then implement Phase 0 first (5 small changes to existing files + 1 new protocol file).
Run existing tests to verify no regressions.
Then Phase 1 (Docker container + ROS2 nodes).
Then Phase 2 (SlamBridge).
Then Phase 4 (tests).
Pi must be powered on for Phase 5 (deployment + E2E verification).
```

## Verification

1. **Phase 0**: `pytest services/turbopi-server/` — all existing tests pass
2. **Phase 1**: `cd services/ros2-slam && docker compose build` succeeds on Pi 5
3. **Phase 2**: `pytest services/turbopi-server/test_slam_bridge.py` — all new tests pass
4. **Phase 3**: `pytest services/ros2-slam/test_conversions.py` — pure-Python tests pass
5. **Phase 5 on Pi**:
   - `docker exec turbopi-ros2-slam ros2 topic hz /scan` → ~10Hz
   - `curl /pose` returns valid JSON with correct keys
   - `curl /map -o map.png` returns valid PNG
   - Safety daemon ESTOP fires under `stress-ng` load on cores 1-3
   - 360° rotation: heading error <10°
   - Rectangular path: return-to-start error <0.5m
   - Rollback: `SLAM_BACKEND=hector` restores original behavior
