LENS 09 — TRADEOFF RADAR

Question: What are you sacrificing, and is that the right sacrifice?

The radar maps seven axes of system quality: Perception Depth, Semantic Richness, Latency, VRAM Efficiency, Robustness, Spatial Accuracy, and Implementation Simplicity. Three polygons are drawn. Annie's VLM-primary approach in amber. Traditional SLAM-primary in purple. And a projected "Annie plus Hailo L1" overlay in cyan.

The shape is striking. Annie and SLAM-primary are almost perfect anti-profiles. Where Annie peaks, SLAM troughs. Where SLAM dominates, Annie collapses.

Annie's current scores:
- Perception Depth: 85 out of 100. The VLM describes furniture, room type, goal position, and occlusion in a single 18-millisecond pass.
- Semantic Richness: 90. Room labels, obstacle names, and goal-relative directions in natural language.
- Latency: 80. 58 frames per second via llama-server direct.
- VRAM Efficiency: 45. Gemma 4 E2B occupies 3.5 gigabytes of VRAM on Panda.
- Robustness: 35. One WiFi hiccup, one Zenoh version mismatch, one llama-server restart — and the pipeline stalls.
- Spatial Accuracy: 30. "LEFT MEDIUM" is qualitative direction, not metric position.
- Implementation Simplicity: 40. Adding ask-vlm is simple. Keeping it running across Zenoh, IMU, and lidar is not.

SLAM-primary scores:
- Perception Depth: 30. Geometry only. No objects, no semantics, no language.
- Semantic Richness: 20. Float coordinates, not concepts.
- Latency: 55. Full A-star path planning plus slam_toolbox lifecycle overhead.
- VRAM Efficiency: 80. CPU-bound on the Pi. Zero GPU footprint.
- Robustness: 88. All-local, no network, deterministic scan-matching.
- Spatial Accuracy: 92. 10-millimeter localization from lidar.
- Implementation Simplicity: 30. slam_toolbox lifecycle, rf2o lidar odometry, IMU frame IDs, EKF tuning, Zenoh source builds. Session 89 spent an entire session on a single version mismatch.

THE UNACKNOWLEDGED TRADEOFF

Every benchmark in the VLM navigation literature measures inference latency. Nobody benchmarks network reliability.

The research assumes the inference node is co-located or always reachable. Annie's architecture has a mandatory WiFi hop between the Pi 5 and Panda — typically 5 to 15 milliseconds under ideal conditions, but potentially 80 to 300 milliseconds under 2.4 gigahertz congestion or during a llama-server restart.

At 58 frames per second, a single 100-millisecond WiFi hiccup produces 5 to 6 stale commands issued to the motor controller. The Robustness score of 35 reflects this. More critically: the latency advantage of 58 Hz inference is partially illusory. The effective update rate under realistic home WiFi, accounting for packet jitter, is closer to 15 to 20 Hz. Lens 04 independently found a WiFi cliff edge at 100 milliseconds where VLM rate becomes insensitive above 15 Hz. These findings converge: investing in inference speed above 15 Hz — for example, the move from 29 Hz to 58 Hz via single-query optimization — has near-zero user-facing benefit if the real bottleneck is network jitter, not GPU throughput.

THE HAILO PROJECTION — SINGLE BIGGEST AXIS-MOVER

The cyan dashed polygon shows the single largest structural move available on this radar: activating the idle Hailo-8 AI HAT+ that is already on the Pi 5 as an L1 safety layer. 26 tera-ops per second of compute. YOLOv8n running at 430 frames per second. Under 10 milliseconds of local inference. Zero WiFi dependency.

The Robustness axis jumps from roughly 35 to roughly 65. This is the biggest single-axis delta any non-hardware-swap move produces on this chart.

Why 65 and not 88? Because the semantic path still rides the WiFi hop. "Where is the kitchen?" still requires Gemma 4 on Panda, and that request still depends on network reachability. But the compound failure mode — the one where a single WiFi brownout silences both obstacle avoidance and goal reasoning at the same time — is eliminated. Safety stops no longer share a failure domain with semantic queries. The IROS dual-process paper, arXiv 2601.21506, measured this exact split yielding 66 percent latency reduction and 67.5 percent task-success versus 5.83 percent for VLM-only.

The trade is visible on the Implementation Simplicity axis, which edges down from 40 to roughly 32. HailoRT and TAPPAS and model compilation add real cognitive load. Working Pi 5 examples exist in the Hailo repository. The learning curve is days. This is the cheapest robustness move available on Annie's current hardware, because the hardware is already on the robot, already wired, already idle.

TRADEOFFS MOVABLE BY A DIFFERENT APPROACH

Two gaps in the radar are not truly intrinsic to the architecture. First: Annie's spatial accuracy deficit of 30 can be addressed without touching the VLM at all. The VLM never needs metric precision. It only needs directional intent. Metric precision is delegated to the lidar ESTOP. This reframes the chart: Annie does not sacrifice spatial accuracy — it delegates it. Second: the VRAM efficiency gap can be addressed by running SigLIP 2 ViT at 800 megabytes instead of the full E2B model for embedding extraction, changing the cost structure substantially.

WHERE GOOD ENOUGH IS DRAMATICALLY CHEAPER THAN OPTIMAL

For spatial accuracy: "chair at 300 millimeters right" is good enough for safety. "Chair at 287 millimeters right" costs ten times as much in SLAM infrastructure. The ESTOP at 200 millimeters makes sub-300-millimeter accuracy irrelevant.

For semantic richness: kitchen, hallway, bedroom covers 90 percent of room-routing decisions. A full ConceptGraphs scene graph is academic overhead for a single-robot home environment.

For place recognition: text2nav achieved 74 percent navigation success using frozen SigLIP embeddings with no fine-tuning. For Annie's home environment of 10 to 15 visually distinct places, a K-nearest cosine search over about 100 stored embeddings is computationally trivial and likely sufficient.

For multi-query rate: 15 Hz per query across 4 alternating tasks is good enough. The motor command rate of 1 to 2 Hz is the real ceiling. Chasing 58 Hz per query is solving the wrong bottleneck.

WHAT THE USER WOULD CHOOSE DIFFERENTLY

The research literature treats implementation complexity as a one-time engineering cost that amortizes to zero over a robot fleet. For a single-developer project, implementation complexity is a first-class runtime constraint. A system you cannot debug in-field is effectively unavailable. The implicit assumption — that deployment effort eventually approaches zero — does not apply here. This is why the SLAM-primary approach scores only 30 on Implementation Simplicity despite being theoretically simpler: "simple in theory" and "simple to deploy on ARM64 with rmw_zenoh_cpp from source" are not the same axis.

The lesson. The frontier is not fixed. Sometimes the move that reshapes the tradeoff map is not tuning along an existing axis. It is activating a piece of hardware that was already on the robot.