# Research: rf2o Callback Not Firing via rmw_zenoh_cpp

**Date:** 2026-04-13 (session 90)
**Status:** Root cause identified, fix designed, not yet implemented

## Problem

rf2o_laser_odometry inside Docker (ROS2 Jazzy + rmw_zenoh_cpp) logs "Waiting for laser_scans" indefinitely despite `/scan` data flowing at 9.9 Hz. `ros2 topic echo /scan` works; rf2o's subscription callback never fires.

## Root Cause: `spin_some()` + rmw_zenoh_cpp async delivery

rf2o uses a non-blocking polling loop:

```cpp
while (rclcpp::ok()) { 
    rclcpp::spin_some(myLaserOdomNode);  // Zero-timeout poll (rcl_wait timeout=0)
    myLaserOdomNode->process();           // "Waiting for laser_scans...."
    rate.sleep();                         // Sleep remaining 100ms
}
```

In rmw_zenoh_cpp, data arrives on a Zenoh async callback thread → pushed into queue → condition_variable.notify_one(). But `spin_some()` with zero timeout **never blocks on the condition variable**. It checks once, finds nothing, returns immediately. Data arriving between polls is invisible until the next poll, but the timing window causes consistent misses.

**Why `ros2 topic echo` works:** Uses `rclcpp::spin()` (BLOCKING) which properly waits on the condition variable. Zenoh notification wakes it immediately.

**Why this didn't happen with DDS:** FastDDS/CycloneDDS deliver data more synchronously (shared memory, listener callbacks) — data visible to `rcl_wait` even at zero timeout.

## Secondary Bug: Wrong Environment Variable

`docker-compose.yml` sets `ZENOH_SESSION_CONFIG` but rmw_zenoh_cpp reads `ZENOH_SESSION_CONFIG_URI` (with `_URI` suffix).

**Source:** `vendor/rmw-zenoh/rmw_zenoh_cpp/src/detail/zenoh_config.cpp` line 43:
```cpp
{"ZENOH_SESSION_CONFIG_URI", "DEFAULT_RMW_ZENOH_SESSION_CONFIG.json5"}
```

Result: Custom session config (client mode, no multicast) never loaded. Nodes fall back to DEFAULT: `mode: "peer"`, multicast + gossip enabled. Doesn't directly cause the callback issue but creates unintended topology.

## Contributing: rf2o Dual-Node Bug (MAPIRlab/rf2o_laser_odometry#42)

`CLaserOdometry2D` inherits from `rclcpp::Node` AND `CLaserOdometry2DNode` also inherits from `rclcpp::Node`. Creates two ROS2 nodes per process, both named `/rf2o` via launch remapping. **This is the real source of "2 nodes with exact name /rf2o"** — not stale liveliness tokens from container restarts.

## Ruled Out

| Hypothesis | Why |
|---|---|
| Stale liveliness tokens | Liveliness is discovery-only; data routing uses topic key expressions independently |
| QoS mismatch | Both BEST_EFFORT/VOLATILE, confirmed via `ros2 topic info -v` |
| Key expression mismatch | Both zenoh_ros2_sdk and rmw_zenoh_cpp produce identical key: `0/scan/sensor_msgs::msg::dds_::LaserScan_/RIHS01_...` |
| Zenoh version (1.9.0 vs 1.7.1) | Both wire protocol v1, `ros2 topic echo` proves data crosses the boundary |
| rf2o data validation | Callback does NO validation — copies ranges unconditionally. Callback never fires at all. |
| CDR serialization | Both use CDR (rosbags vs FastCDR), wire-compatible |
| Parameter namespace | YAML key `rf2o_laser_odometry_node:` vs node name `rf2o` — defaults match intended values |

## Fix Plan (3 changes)

### Fix 1: Patch rf2o main loop (PRIMARY)

Replace `spin_some()` polling with `spin()` + timer (~5 line patch):

```cpp
// In CLaserOdometry2DNode constructor, add:
timer_ = this->create_wall_timer(
    std::chrono::milliseconds(static_cast<int>(1000.0 / freq)),
    std::bind(&CLaserOdometry2DNode::process, this));

// In main(), replace while loop with:
rclcpp::spin(myLaserOdomNode);
```

Apply as a patch file in Dockerfile before `colcon build`.

### Fix 2: Correct env var name

```yaml
# docker-compose.yml line 34
- ZENOH_SESSION_CONFIG_URI=/ros2-slam/config/zenoh_session_config.json5
```

### Fix 3: Fix parameter namespace (optional)

Change `rf2o_params.yaml` key from `rf2o_laser_odometry_node:` to `rf2o:` to match launch file `name="rf2o"`.

## Key References

- **rf2o source (ROS2 branch):** https://github.com/MAPIRlab/rf2o_laser_odometry/tree/ros2
- **rf2o dual-node bug:** https://github.com/MAPIRlab/rf2o_laser_odometry/issues/42
- **rf2o Jazzy issue:** https://github.com/MAPIRlab/rf2o_laser_odometry/issues/43
- **rmw_zenoh interop:** https://github.com/ros2/rmw_zenoh/issues/929
- **rmw_zenoh QoS mixing fix:** https://github.com/ros2/rmw_zenoh/issues/854
- **rmw_zenoh key expression design:** https://github.com/ros2/rmw_zenoh/blob/jazzy/docs/design.md
- **Zenoh liveliness:** https://github.com/eclipse-zenoh/roadmap/blob/main/rfcs/ALL/Liveliness.md
- **rmw_zenoh wait_set_lock fix:** https://github.com/ros2/rmw_zenoh/pull/528

## Research Methodology

8 parallel research agents covering: rf2o source code analysis, rmw_zenoh_cpp GitHub issues, Zenoh liveliness deep dive, zenoh_ros2_sdk key expression internals, ROS2 community rf2o+Zenoh reports, timer vs subscription callback scheduling, official docs, and rmw_zenoh_cpp source verification. All findings cross-referenced against local vendored source at `vendor/rmw-zenoh/`.
