# Next Session: ROS2 SLAM on TurboPi — Research + Hybrid VLM Architecture

## Goal

Explore bringing SLAM to the TurboPi via ROS2, but **research-first** — don't jump into implementation. Specifically, investigate a **hybrid architecture** where traditional SLAM provides the spatial backbone (map, localization, path planning) and Annie's VLM provides semantic understanding (what she sees, goal interpretation, obstacle context).

## Context

Session 84 cloned `vendor/mentorpi/` (Matzefritz/HiWonder_MentorPi ROS2 starter pack) and discovered:
- MentorPi's `odom_raw` is dead-reckoning from `cmd_vel` — no encoder reads in code
- Real odometry comes from `rf2o_laser_odometry` (lidar scan matching at 10Hz)
- Our RPLIDAR C1 (DenseBoost, 10.24m) is better than MentorPi's LDRobot STL-19P
- EKF fuses `odom_raw` + `odom_rf2o` + IMU → single `/odom` topic
- `slam_toolbox` consumes fused odom + lidar scans → occupancy grid map

The naive path is: install ROS2 Docker → rf2o → slam_toolbox → Nav2 → done. But this ignores Annie's existing VLM capabilities and risks building a "dumb robot with a map" instead of an "intelligent agent that also has a map."

## Research Questions (answer ALL before any implementation)

### R1: ROS2 on Pi 5 — Docker vs Native?

MentorPi uses native ROS2 Jazzy on Ubuntu 24.04. Our Pi runs Raspberry Pi OS (Bookworm). Options:
- **Docker**: Isolated, no OS conflict, but serial device passthrough for lidar/IMU adds complexity
- **Native**: Requires Ubuntu or source-building ROS2 on Bookworm — heavy lift
- **Hybrid**: ROS2 in Docker, our turbopi-server stays native, communicate via shared topics or bridge

Research: What do people actually run on Pi 5 with RPi OS? Is there a well-tested Docker image for ROS2 Jazzy on aarch64? What's the latency overhead of Docker for real-time sensor data?

### R2: rf2o_laser_odometry — Does it work with RPLIDAR C1?

rf2o expects LaserScan messages. Our lidar daemon already produces scan data. Questions:
- Does `rplidar_ros2` (official SLAMTEC package) work with C1 model specifically?
- What frame rate does rf2o need? Our lidar runs at ~10Hz — is that sufficient?
- Does the CCW angle convention (our lidar's quirk) need correction before publishing?
- Any gotchas with the DenseBoost scan mode (variable point density)?

### R3: How do existing VLM+SLAM hybrid systems work?

This is the key research question. The naive approach (SLAM for navigation, VLM for perception) misses opportunities for deeper integration. Explore:

- **Semantic SLAM** — systems that fuse object detection with occupancy grids (e.g., OVSLam, ConceptGraphs, NLMap). Can VLM labels be added to map cells?
- **VLM-guided exploration** — instead of random frontier exploration, use VLM to decide "that looks like a kitchen" and prioritize exploration toward goal-relevant areas
- **Natural language goals** — "go to the kitchen table" requires mapping VLM scene understanding to SLAM coordinates. How do systems like SayCan, NLMap, or LM-Nav bridge this?
- **Visual place recognition** — can VLM embeddings replace or augment loop closure in SLAM?
- **Affordance mapping** — beyond "where can I go" (occupancy), "what can I do here" (VLM-inferred affordances on map regions)

### R4: What's the right hybrid architecture for Annie?

Annie currently has:
- **VLM nav** (streaming, Gemma 4 E2B on Panda) — semantic goal-seeking, ~18ms/frame, no persistent map
- **ArUco homing** — fiducial-based, no VLM, no map
- **Lidar safety daemon** — reactive obstacle avoidance, no map
- **IMU** — heading via Pico MPU-6050

Possible hybrid architectures:

**A: SLAM as infrastructure, VLM as commander**
```
Annie (LLM) → "go to the kitchen"
    ↓
VLM identifies current scene → matches to known map region
    ↓
Nav2 plans path on occupancy grid → executes
    ↓
VLM confirms arrival ("I see the kitchen counter")
```

**B: VLM-annotated SLAM map**
```
SLAM builds occupancy grid (walls, obstacles)
VLM periodically labels regions ("bedroom", "hallway", "kitchen")
Annie queries: "where did I last see Rajesh's keys?" → retrieval from annotated map
```

**C: Dual-loop control**
```
Outer loop: Annie (LLM) sets goals, interprets context
Middle loop: SLAM/Nav2 handles path planning + execution
Inner loop: Lidar safety daemon handles reactive avoidance
VLM runs in parallel: scene understanding, anomaly detection, goal verification
```

**D: Minimal SLAM — just localization, no Nav2**
```
Skip Nav2 path planning entirely. Use SLAM only for "where am I on the map?"
Annie's VLM still drives navigation decisions (turn left, go forward)
But now she knows WHERE she is, can build spatial memory over time
```

Research which approach fits our hardware constraints (Pi 5 CPU budget, Panda GPU for VLM, Titan for LLM).

### R5: CPU/memory budget on Pi 5

Currently running on Pi 5:
- turbopi-server (FastAPI, camera, lidar daemon, safety daemon, homing)
- Hailo-8 NPU (when safety daemon active)
- Pico IMU bridge (USB serial, minimal CPU)

Adding ROS2 + rf2o + slam_toolbox. What's the CPU cost? Memory cost? Does the Pi 5 (16GB) have headroom?
Check: what does MentorPi's Pi 5 actually use in terms of resources when SLAM is running?

### R6: Existing TurboPi-specific ROS2 work?

Search for anyone who has:
- Put ROS2 on a TurboPi specifically
- Used RPLIDAR C1 with slam_toolbox
- Combined Hailo NPU with ROS2 perception
- Used rf2o on a mecanum platform (lateral motion is tricky for scan matching)

## Research Approach

1. **Web search first** — semantic SLAM + VLM papers/projects (NLMap, ConceptGraphs, SayCan, LM-Nav, OK-Robot, VoxPoser, HomeRobot)
2. **GitHub search** — rf2o + rplidar, slam_toolbox + Pi 5, VLM + navigation, semantic mapping
3. **MentorPi repo deep-dive** — check if `vendor/mentorpi/` has any VLM/perception integration we missed
4. **Docker feasibility** — find and test a ROS2 Jazzy aarch64 Docker image
5. **Architecture decision** — pick one of A/B/C/D based on research findings

## What NOT to do this session

- Do NOT install ROS2 on the Pi
- Do NOT write any ROS2 nodes
- Do NOT start implementing any architecture
- DO produce a research doc (`docs/RESEARCH-SLAM-VLM-HYBRID.md`) with findings
- DO produce a concrete architecture recommendation with trade-offs
- DO identify which components already exist vs need building

## Start Command

```
Research hybrid VLM+SLAM architecture for TurboPi. Read docs/NEXT-SESSION-SLAM-ROS2-HYBRID.md.

This is a RESEARCH-ONLY session. Answer R1-R6, produce docs/RESEARCH-SLAM-VLM-HYBRID.md with findings and architecture recommendation. Do NOT install or implement anything.
```
