LENS 21: STAKEHOLDER KALEIDOSCOPE
"Who sees what — and whose view are we ignoring?"

---

FOUR PERSPECTIVES ON THE SAME SYSTEM

MOM — PRIMARY USER (Underrepresented)

What she sees: A small machine that sometimes moves purposefully and sometimes freezes in the hallway for no reason. She does not see tiers, latencies, or frame rates. She sees behavior and its effect on her home.

What she needs: Sub-1-second voice ESTOP — "Ruko!" must stop the robot immediately, not after 5 seconds of pipeline propagation. Predictable movement: no sudden direction changes, no speed surges, no approaching her from behind. Audible state: she needs to know what Annie is doing right now — "I'm going to the kitchen" — not silence. Graceful freezes: if Annie must pause, she should say why, not simply stop. No camera surprises: she should know when Annie is looking at her and why.

What the research gives her: One paragraph in the Day-in-Life section. The phrase "Mom's bedroom" appears once. Her needs are never directly stated as system requirements.

What is missing: A Mom-perspective acceptance test. No requirement states "Mom must be able to halt Annie via voice within 1 second." No scenario asks "what does Mom experience when the VLM times out?" The research was written in engineering language for an engineering audience. Mom's requirements are inferred from architecture, never stated as primary.

Trust-curve shift — the Hailo-8 activation. The 7:30 AM WiFi-brownout freezes documented in Lens 20, the "Annie, did you stop?" moments, are the single biggest trust-eroding moments in Mom's day. Activating the idle Hailo-8 AI HAT+ on the Pi 5 — 26 TOPS of NPU, YOLOv8n at 430 frames per second, less than 10 milliseconds of local inference, zero WiFi dependence — gives Annie a WiFi-independent safety layer. Post-Hailo, Annie no longer dies mid-hallway when the semantic pipeline stalls. She keeps moving safely while the vision language model recovers. The cumulative effect on Mom's trust curve is larger than any single user-facing feature. The robot becomes something she can count on during network stress, which is precisely when her anxiety peaks.

---

RAJESH — ENGINEER / EXPERIMENTER

What he sees: A 4-tier hierarchical fusion system with clean separation of concerns, 58 Hz throughput, academic validation from Waymo, Tesla, and VLMaps, and a clear 5-phase implementation roadmap. Architecturally satisfying.

What he needs: Observable system — dashboard metrics, per-tier latency, VLM confidence scores. Testable components — each tier independently runnable, simulation mode for integration testing. Failure visibility — when something breaks, he needs to know where in the 4-tier stack it broke. Iteration speed — the ability to swap the VLM, tune EMA alpha, change the query cycle without rebuilding the whole stack.

What the research gives him: Everything. The research is written from his perspective. Every architectural decision, every academic citation, every phase roadmap assumes his mental model as the reader.

The tension this creates: Rajesh's experimentalist instinct — Phase 2a this week, 2b next week, 2c after SLAM is stable — is structurally in conflict with Mom's need for consistency. Every experiment that changes Annie's behavior is a new surprise for Mom. A navigation pipeline that is a research platform cannot simultaneously be a trustworthy household companion, unless experimentation is explicitly contained away from Mom's hours of use.

Highest-leverage single change available — the Hailo-8 activation. From the engineer's vantage point, the idle Hailo-8 AI HAT+ on the Pi 5 is the lowest-risk, highest-value move that was not visible before this research. Cost: approximately 1 to 2 engineering sessions of work — HailoRT install plus a TAPPAS GStreamer pipeline. Hardware cost: zero. The NPU is already bolted to the robot, drawing power, doing nothing for navigation. Architecture impact: purely additive. A new L1 reactive safety layer slotted beneath the existing VLM stack. Academic validation: the IROS dual-process paper shows 66 percent latency reduction. Rollback: trivial — disable the systemd unit and behavior reverts to today. This is the rare intervention where the engineer's "interesting experiment" box and the user's "make it stop freezing" box get checked at the same time.

---

ANNIE — THE AI AGENT

What she sees: A stream of camera frames, lidar sectors, IMU headings, and natural-language goals. Her job is to reconcile these signals into motor commands. She has no concept of "Mom's comfort" or "Rajesh's experiment" — only the signals she receives and the rules she follows.

What she needs: A consistent environment — furniture rearranged overnight means her SLAM map is wrong, and she doesn't know it's wrong. Honest sensors — a glass door that reads as CLEAR is not lying, it is a systematic blind spot her architecture cannot self-correct. Stable goals — a goal interrupted mid-navigation leaves her in an ambiguous recovery state she has no procedure for. Latency budget honesty — she is designed for 18 millisecond inference and needs defined behavior when inference takes 90 milliseconds.

What is missing: A failure-mode specification. When the VLM times out, what does Annie do? When the IMU goes to REPL, what does Annie announce? Annie's behavior in degraded states is unspecified — which means it is unpredictable — which means it violates Mom's most basic need: predictability.

---

VISITOR / FAMILY MEMBER

What they see: A camera-equipped robot moving through a home. They have no context for what it is, who controls it, what it records, or how to stop it. They encounter it without onboarding.

What they need: Immediate legibility — what is this thing, is it recording, who can I ask to turn it off. A pause gesture or command that works for strangers — "Stop" or a raised hand should halt Annie even from an unknown voice. Honest signaling — if Annie's camera is active, a visible indicator should make this unambiguous. Privacy opt-out — the ability to be excluded from the semantic map without requiring Rajesh to intervene.

What the research gives them: Nothing. The word "visitor" does not appear in the research document. The privacy concern is noted once as a concern for Mom, not for third parties.

The underappreciated risk: Phase 2c — semantic map annotation — will record who was in which room at what time. A visitor who sits in the living room for two hours is in the semantic map. They did not consent to this. Local-only storage does not eliminate the privacy issue — it only changes who can access the data.

---

WHERE STAKEHOLDER NEEDS DIRECTLY CONFLICT

Conflict 1 — Experimentation vs. predictability: Rajesh wants to deploy Phase 2a this week, tune EMA, try new queries. Mom needs Annie to behave the same way every day; surprises are frightening. Resolution path: experiments only during Mom's sleep hours; freeze navigation behavior from 7am to 10pm.

Conflict 2 — Speed vs. safety margin: Rajesh wants confidence accumulation leading to faster navigation and more impressive demos. Mom needs slower, because she cannot react fast enough to a speeding robot. Resolution path: speed cap in Mom's presence zones; voice-triggered slow mode.

Conflict 3 — Camera-always-on vs. privacy: Rajesh needs continuous VLM inference at 58 Hz, which requires a constant camera stream. Mom should be able to stop the robot from watching, especially in the bedroom. Resolution path: camera-off room tags on the SLAM map; "don't enter bedroom" constraint layer.

Conflict 4 — Dashboard metrics vs. lived experience: Rajesh sees 94% navigation success rate over 24 hours and concludes the system is working. Mom experienced three freezes during the 7 to 9pm window and concludes the system is broken. Resolution path: per-user per-hour success windows as primary dashboard metric.

Conflict 5 — Silent failure vs. audible failure: Rajesh wants clean logs with no noisy announcements cluttering dev output. Mom needs to know when Annie is confused; silence is not neutral, it is alarming. Resolution path: production voice layer for all failure states; dev-mode flag to suppress for testing.

---

THE UNDERREPRESENTED PERSPECTIVE: MOM

The research is excellent engineering. It is thorough on Waymo's MotionLM, precise on EMA filter alpha values, careful about VRAM budgets. What it does not contain, anywhere, is a single sentence written from Mom's perspective. Mom is mentioned as the person who wants tea. She is not consulted as a primary stakeholder whose requirements should shape the architecture.

This is not an oversight — it is a structural consequence of who writes research documents. The danger is not that the engineering is wrong. It is that the engineering is optimized for the wrong utility function. The research maximizes VLM throughput and architectural elegance. Mom's utility function is entirely different: does Annie behave consistently? Can I stop it? Does it tell me what it's doing? Will it knock over my tea?

The critical finding from this lens: the voice-to-ESTOP gap is not a safety feature missing from the architecture. It is a Mom requirement that was never written. No section of the research states "Mom must be able to halt Annie via voice within 1 second." The 4-tier architecture has ESTOP in Tier 3 with absolute priority over all tiers — but this is a sensor-triggered ESTOP at 80 millimeters, not a voice-triggered ESTOP. A voice ESTOP requires a separate always-listening path that bypasses the VLM pipeline entirely. This path does not exist in the architecture. It was never designed because the architect never asked: what does Mom need when she is scared?

The conflict between Rajesh and Mom is not a personality conflict — it is a values conflict. Rajesh's values: learn, iterate, improve, tolerate failures as data. Mom's values: consistency, safety, dignity, trust. These are not reconcilable by better code. They require an explicit protocol: the system's external behavior is frozen during experimentation; changes are deployed only when they don't alter Mom's experience; and any change that does alter her experience requires her informed acceptance first. The research has no such protocol. It has a roadmap. Roadmaps serve Rajesh. Protocols serve Mom.

---

WHAT WOULD CHANGE IF WE DESIGNED FOR MOM FIRST

The 4-tier architecture would remain — but its design priorities would invert. The ESTOP gap would be identified as the first engineering problem, not an afterthought. The voice interrupt path would be specified before the multi-query pipeline.

The evaluation framework would look completely different. Instead of Absolute Trajectory Error, VLM obstacle accuracy, and place recognition precision and recall, it would start with: voice ESTOP latency under load; number of silent freezes per hour during Mom's usage window; number of times Annie announces what she is doing versus acts silently; and Mom's subjective safety rating after a 2-week deployment. These metrics are not in the research. They are not even suggested.

The Visitor perspective adds a legal dimension the research ignores: a semantic map that records room occupancy at all times is a data product requiring explicit consent from everyone in the home. The consent architecture is the Visitor's primary requirement. It is absent from the research entirely.

---

THE STAKEHOLDER ASYMMETRY: SAME CHANGE, DIFFERENT VALUE

The Hailo-8 activation surfaces the kaleidoscope's most important property. The same engineering change carries dramatically different perceived value depending on whose face is pressed against the lens. To Rajesh, Hailo-8 reads as: interesting optimization, 1 to 2 sessions of work, additive L1 layer, 26 TOPS NPU currently idle, YOLOv8n at 430 frames per second, IROS-validated dual-process pattern, zero hardware cost, rollback-safe. It is a technically elegant cleanup of a wasted resource. To Mom, the exact same change reads as: the robot stops having the scary freezes in the hallway at 7:30 in the morning during the WiFi brownout. She does not know what a TOPS is. She does not know what YOLO is. She knows that last Tuesday Annie stopped for two seconds in front of her bedroom door and she had to ask, "Annie, did you stop?", and nobody answered. After Hailo, that moment stops happening. To the Visitor, Hailo-8 is invisible. The robot still moves through the house, the camera is still on, the consent architecture is still missing. To Annie herself, Hailo-8 is the first honest sensor layer: a fast, local, deterministic obstacle detector whose behavior is independent of the WiFi weather.

The stakeholder kaleidoscope's lesson is that the value of a change is not a scalar. It is a vector indexed by perspective, and the vector components can differ by orders of magnitude. Hailo-8 scores medium-interesting to Rajesh, trust-transforming to Mom, invisible to the Visitor, and grounding to Annie, all from a single patch of software.

---

KEY FINDINGS

The research document contains exactly four stakeholders — implicitly. It was written by an engineer, for an engineer, about a system that will be experienced primarily by a non-engineer. The voice-to-ESTOP gap is not a missing feature. It is proof that the Mom Requirements Spec was never written.

Hailo-8 activation is the single change most stakeholders would agree on. Mom gains trust — no more WiFi-brownout freezes in the hallway. Rajesh gains his highest leverage-per-hour move available. Annie gains her first honest local sensor. Only the Visitor is unmoved. When a single change serves three of four stakeholders and harms none, it is the intervention the kaleidoscope is telling you to ship first.

THINK ABOUT IT

What is the minimum voice ESTOP latency Mom would experience as responsive? Is it 500 milliseconds? 1 second? 3 seconds? This is empirically measurable and currently unknown — nobody has asked her. If you had to write a 5-line Mom's Acceptance Test that must pass before any Phase 2 sub-phase ships, what would those 5 lines be?