LENS 23

Energy Landscape

"What resists change — and what would lower the barrier?"

ADOPTION BARRIERS — ACTIVATION ENERGY CHART (higher bar = harder to cross)

Panda+WiFi safety path (power)

~15 W: GPU ~10 W + WiFi radios ~3–5 W (both ends)

SLAM setup

6+ dedicated sessions, 3 running services, Docker

WiFi reliability

uncontrollable — cliff edge at 100ms

hardware cost

$500–800 for full stack

embedding extraction

llama-server blocker — separate SigLIP needed

trust building

Mom must witness ~20 successful runs

semantic map annotation

requires SLAM first + labeling pipeline

voice query integration

Pipecat already wired — 1–2 new tool calls

Hailo-8 safety path (power)

~2 W on-robot, 430 FPS YOLOv8n — 7× less than WiFi path

multi-query pipeline

one-line dispatch in _run_loop() — 90% P(success)

Beast ambient workloads (marginal W)

0 W marginal — always-on idle (~40–60 W) is sunk cost

coral = high barrier (systemic, environmental) | amber = medium barrier (effort, cost, dependency) | green = low barrier (code-change only)

The dominant feature of this energy landscape is the gap between the lowest bar and the highest bar. Multi-query pipeline — a cycle_count % N dispatch inside NavController._run_loop() — sits at 15% activation energy. SLAM deployment sits at 85%. Both are described in the same research document as "Phase 2a" and "Phase 1" respectively. But they are not remotely comparable undertakings. One is an afternoon. The other consumed six dedicated debugging sessions, three running services (rf2o, EKF, slam_toolbox), a Docker container, a patched Zenoh RMW, and still exhibits residual queue drops due to a hardcoded C++ constant in the slam_toolbox codebase. The research document describes both under the same architectural heading without signaling the 6× difference in activation energy. That asymmetry is the key finding of this lens.

The "good enough" competitor is not Roomba. It is the existing VLM-only pipeline that Annie already has. The current system — camera at 54 Hz, Panda E2B, four commands LEFT/RIGHT/FORWARD/BACKWARD — is already deployed, already working, and already exceeds Tesla FSD's perception frame rate. The activation energy question for every Phase 2 capability is not "what does it take to beat Roomba?" but "what does it take to beat what Annie already has?" Roomba costs $300 and avoids obstacles without any intelligence. Annie already navigates to named goals. The incumbent is herself, and she is surprisingly capable.

The switching cost for SLAM is not just technical — it is political capital. Every system that depends on SLAM introduces three new failure modes into the trust relationship with Mom: the robot stops unexpectedly (SLAM lost localization), the robot ignores a goal (map not yet annotated), the robot drives in a confident straight line into a glass door (SLAM occupancy grid has no semantic layer yet). Trust is the asymmetric resource in home robotics — easy to spend, expensive to rebuild. One dramatic failure resets the trust meter regardless of how many successful runs preceded it. SLAM's activation energy is therefore not measured only in engineering hours; it is also measured in how many trust-recovery sessions it might require if the SLAM stack behaves unpredictably during a Mom-witnessed demo.

Who has to say yes for adoption to happen — and what do they care about? There is exactly one decision-maker: Mom. She does not care about SLAM accuracy, embedding dimensionality, or loop closure P/R curves. She cares about one question: does the robot do what I asked, without drama, and stop when I tell it to stop? The activation energy for adoption is therefore dominated by trust, not by technical complexity. The multi-query pipeline lowers the barrier precisely because it produces visible, audible richness — "I can see a chair on my left and this looks like the hallway" — without adding any new failure mode. Annie knows more. Annie explains more. The robot becomes more legible to its human, and legibility is the currency that buys trust.

The catalytic event that lowers all other barriers is multi-query going live. Here is the mechanism: when Annie narrates scene context ("I see a hallway, your charger is ahead to the right, there is a chair cluster on my left") instead of silently driving, Mom begins to model Annie's perception as a competency rather than a mystery. A robot that explains itself is a robot that can be trusted incrementally. That trust accumulation is what lowers the activation energy for Mom to say "yes, you can try the SLAM version" — because she has a mental model of Annie's perception and a track record of Annie being right. The multi-query pipeline is therefore not just Phase 2a on a technical roadmap. It is the trust-building instrument that makes everything else possible. It costs one session. It returns a future where SLAM deployment feels safe because Mom already knows Annie's eyes are good.

The literal energy landscape — watts — reveals a 7× asymmetry that nobody has priced yet. Routing safety-layer obstacle detection through Panda costs ~15 W per inference cycle: RTX 5070 Ti burns ~10 W on active inference, and the WiFi radios on both ends (Pi 5 transmitter + Panda receiver) add another ~3–5 W during the sustained frame stream. The same detection task running on the already-installed, currently-idle Hailo-8 AI HAT+ costs ~2 W — YOLOv8n at 430 FPS, entirely on-robot, zero radio traffic. That is a 7× reduction in continuous power draw for the identical safety output. On a robot whose 44–52 Wh battery pack already limits runtime to 45–90 minutes, 13 W of avoidable inference-plus-radio overhead is not a rounding error — it is measurable minutes of missing autonomy per charge. The inverse case is equally counterintuitive: Beast has been always-on since session 449, burning ~40–60 W idle regardless of workload. Any ambient observation or background reasoning we move onto Beast has a marginal power cost of zero, because those watts are already flowing into the wall socket. Not all "always-on" is equal — always-on-idle is sunk cost, and scheduling work onto sunk cost is free energy.

Hardware cost is not the binding constraint — it is a trailing indicator. The $500–800 full-stack cost (Pi 5 + Panda + lidar + camera + enclosure) is presented as a barrier, but the actual adoption sequence does not start with hardware. It starts with: does the software convince a skeptical household member that the robot is worth having? If multi-query makes Annie legible and legibility earns trust, the hardware investment becomes an obvious next step rather than a speculative bet. Conversely, if SLAM is deployed first and produces three dramatic failures, no amount of hardware budget discussion matters — the robot goes in a cupboard. The adoption energy landscape is serial, not parallel: trust first, then complexity, then cost. See also Lens 06 (hardware topology), Lens 15 (WiFi cliff-edge), Lens 19 (Hailo activation), Lens 24 (Beast sunk-cost reasoning).

The 6× activation energy gap between multi-query (15%) and SLAM (85%) is the load-bearing asymmetry. Both appear in the same research document as sequential phases, but they belong to fundamentally different implementation classes: one is a config change, the other is a distributed systems project. Executing multi-query first does not delay SLAM — it builds the trust reservoir that makes SLAM worth attempting.

The "good enough" incumbent is Annie herself, not Roomba. Phase 2 capabilities must justify their activation energy against an already-working VLM pipeline. Multi-query justifies itself immediately (scene richness, zero failure modes). SLAM must justify itself against 5 debugging sessions and 3 new services — and that justification is earned through the trust account that multi-query builds first.

Trust is the rate-limiting reagent. Mom's "yes" lowers every other barrier. Multi-query is the cheapest trust-building instrument available. It narrates Annie's perception aloud, turning a mystery into a competency. Every adoption decision downstream — more hardware, SLAM, semantic maps — becomes easier once the human has a mental model of what Annie can see.

Two literal-energy wins are sitting unclaimed on the table.

Robot battery: moving the safety layer from Panda+WiFi (~15 W) to the idle Hailo-8 on Pi 5 (~2 W) is a 7× power reduction for identical obstacle-detection output. On a 44–52 Wh pack, that reclaims meaningful minutes of autonomy per charge and removes the WiFi radio from the safety path entirely.
Beast cycles: Beast is already burning ~40–60 W idle, 24/7. Any ambient observation, background reasoning, or overnight analytics we schedule onto Beast has a marginal power cost of zero. Always-on-idle is sunk cost; scheduling work onto sunk cost is free energy and should be treated as a first-class deployment target.

If you could only ship one thing this week to lower the overall adoption energy of the VLM nav system, what would it be — and why does it unlock everything else?

Click to reveal

Ship multi-query. One session, cycle_count % 6 dispatch in _run_loop(), Annie narrates scene and obstacle awareness in addition to steering. The direct effect: Annie gets richer perception at zero hardware cost. The indirect effect: Mom hears "I can see a chair on my left, the hallway is clear ahead" instead of silence, and for the first time understands what Annie's camera is doing. That understanding is the substrate on which every downstream adoption decision rests. SLAM, semantic maps, embedding extraction — none of them become safe bets without Mom's trust. Multi-query buys that trust at 15% activation energy. Everything else charges against that account.