LENS 21 — STAKEHOLDER KALEIDOSCOPE: CROSS-LENS CONNECTIONS ============================================================================== PRIMARY CONNECTIONS ============================================================================== LENS 06 (Second-Order Effects) Lens 06 identified the transition where Mom discovers she can ask "Annie, what's in the kitchen?" — a use-case that falls out of VLMaps semantic annotation without being explicitly designed. Lens 21 reveals the governance problem this creates: Mom will discover and love this feature before Rajesh has designed its privacy controls or uncertainty expression. The semantic map goes from a background infrastructure component to a load-bearing household feature in the moment Mom asks her first spatial question. Lens 06 correctly named it a "phase transition" in the human-robot relationship. Lens 21 identifies who is responsible for managing that transition: not the architecture, but the explicit consent and communication protocols that the architecture never specifies. Lens 06 also surfaced the ESTOP gap directly: "Mom ESTOP gap worsens as speed rises — at 1 m/s, 10 Hz semantic obstacle detection is too slow at elevated speed." Lens 21 makes the mechanism explicit: this is not a tuning problem, it is a missing requirement problem. The ESTOP gap exists because nobody wrote "Mom must be able to halt Annie via voice within 1 second" as a primary system requirement. Until that sentence appears in a requirements document with a passing test, the gap is a known risk with no mitigation. Lens 06's third-order "privacy as surveillance" branch — "the map records who was in which room at what time" — becomes Lens 21's Visitor card. A Visitor who sits in the living room for two hours is in the semantic map. The consent architecture that Lens 06 calls for is the specific gap Lens 21 names as the Visitor's unmet requirement. Both lenses arrive at the same conclusion via different paths: Phase 2c cannot ship without a consent layer, and the consent layer cannot be designed without first acknowledging the Visitor as a stakeholder. LENS 10 (Failure Pre-mortem) Lens 10's August 2026 event — Mom stops using Annie after three freezes during the 7-9pm window — is the realized version of every conflict this lens documents. The team doesn't notice for two weeks because the dashboard shows 94% navigation success (all hours). Lens 21 names this precisely: the dashboard was built from Rajesh's perspective (system-wide metrics) and is blind to Mom's perspective (per-user per-hour windows). The metric aggregation was not a technical error. It was a stakeholder representation error: the metric designer only consulted one stakeholder's utility function. Lens 10's glass door incident is the collision between Annie's perspective (both sensors report CLEAR, which is truthful from her signal stream) and Mom's perspective (the robot just hit my door, trust is gone). The disconnect is not engineering failure — it is stakeholder failure. Annie had no specification for "what do I do when both sensors agree and both are wrong?" because the failure mode was never written from Mom's perspective ("Annie must not hit furniture even when sensors are confused"). Lens 10's pre-mortem ultimately traces every failure to the same root: "we built the fast path, forgot the slow path." Lens 21 reframes this in stakeholder terms: the fast path was designed for Rajesh (58 Hz throughput, architectural elegance); the slow path was the entirety of Mom's usage experience. Slow path = what happens when Annie freezes, crashes, gets confused, hits something, or loses WiFi. These are the moments that matter most to Mom and are specified least in the research. LENS 20 (Multi-modal Convergence) Lens 20's analysis of convergence between voice, vision, spatial memory, and emotional context identifies the moment when all channels compose into a single coherent experience. From Rajesh's perspective, this convergence is an architectural achievement. From Mom's perspective, it is the moment Annie stops feeling like a collection of features and starts feeling like a presence. The convergence is emotionally legible to Mom before it is technically documented by Rajesh. Lens 21 reveals the governance challenge of this convergence: the moment Annie can proactively say "I saw your glasses on the nightstand at 2pm" (compositing Context Engine + semantic map + voice), she crosses from tool to agent. Mom's relationship with an agent requires different safety guarantees than her relationship with a tool. A tool that fails silently is annoying. An agent that fails silently feels deceptive. The voice-to-ESTOP gap, audible state announcements, and failure communication are not just safety features — they are the conditions under which Mom can maintain a healthy relationship with an agent she will increasingly rely on and trust. Lens 20's convergence also creates the Visitor problem in its most acute form: a fully converged system (voice + vision + memory + emotion) is indistinguishable from a surveillance apparatus to someone who encounters it without context. The Visitor needs to be able to understand what the system is doing in under 10 seconds of direct observation. This "legibility requirement" is unspecified in the research and is the Visitor's primary unmet need. ============================================================================== SECONDARY CONNECTIONS ============================================================================== LENS 01 (Constraint Hierarchy) Lens 01 identified a 10-layer constraint hierarchy from physical limits up to social conventions. Lens 21 reveals that Mom's requirements should sit ABOVE the engineering constraints in this hierarchy, not below them. The current hierarchy is implicitly built from the bottom up: physics first, then hardware, then software architecture, then user experience as an afterthought. A Mom-first design would build the hierarchy from the top down: Mom's safety requirements first (voice ESTOP <1s, no sudden movements, audible state), then the architecture that satisfies those requirements, then the hardware that supports that architecture. The current research does the opposite. LENS 03 (Dependency Graph) Lens 03 identified the llama-server embedding blocker as the highest-leverage addressable dependency. Lens 21 adds a dependency that doesn't appear in any technical dependency graph: Mom's trust. Mom's trust is a prerequisite for Annie's long-term deployment. Mom's trust depends on consistent behavior, audible state, and sub-1-second voice ESTOP. These are all unimplemented. The dependency graph is missing a human node at the top. Every technical component is ultimately a dependency of Mom's continued willingness to live with the robot. If that node fails (Lens 10, August 2026: Mom stops asking), the entire graph is irrelevant. LENS 04 (Connectivity and Latency) — PRIMARY FOR HAILO FINDING Lens 04 identified the WiFi cliff edge at 100ms as session 119's central finding. From Rajesh's perspective, this is a latency engineering problem. From Mom's perspective, it is a behavioral consistency problem: sometimes Annie freezes for no visible reason. The same technical event (WiFi timeout) has completely different stakeholder interpretations. Rajesh sees a metric; Mom sees a betrayal of expectation. Lens 21's Hailo-8 activation is the direct architectural answer to Lens 04's WiFi cliff. A local Hailo-8 NPU running YOLOv8n at 430 FPS with <10 ms inference and zero WiFi dependence eliminates the cliff-edge failure for the safety layer. When WiFi stalls past 100 ms, L1 (Hailo) keeps Annie moving safely; L2-L4 (VLM tiers on Panda/Titan) are allowed to degrade without producing a visible freeze. The IROS dual-process paper (arXiv 2601.21506) validates a 66% latency reduction from exactly this pattern. The communication layer Lens 04 suggested ("my eyes are slow, I'll wait a moment") remains necessary — but becomes exceptional narration rather than routine explanation, because the freezes themselves become rare. The technical solution (Hailo) and the communication solution (audible state) are co-designed in the Mom-first reading of this architecture. LENS 07 (Market Positioning) Lens 07 identified Annie as targeting the empty "edge+rich" quadrant in the home robotics market. From Rajesh's perspective, this is a strategic position. From Mom's perspective, it is invisible — she does not compare Annie to commodity robots. Her evaluation is entirely relative to her own experience over time: is Annie more reliable than last week? Is Annie more useful than asking Rajesh directly? The "edge+rich" positioning exists in Rajesh's head. The actual value proposition that Annie must deliver to Mom is simpler and harder: be consistently useful in her usage window without requiring her to understand or manage the system. LENS 08 (Neuroscience Analogies) Lens 08 introduced the hippocampal replay mechanism — the slow path that consolidates fast perception into durable memory. The stakeholder analogy is precise: Mom's trust in Annie is built through hippocampal-equivalent processes — repeated consistent experiences that consolidate into a stable mental model of what Annie does and doesn't do. One glass door collision or three freezes in one evening creates a negative engram that is much harder to overwrite than a positive one. This is asymmetric trust formation: bad experiences are weighted more heavily than good ones in the formation of a lasting behavioral model. The implication for deployment: the first two weeks of Annie's life with Mom are the most critical. Trust formed in that window determines the baseline for the entire relationship. The system should be explicitly de-featured and over-conservative during onboarding, then gradually expand capabilities as the trust baseline is established. LENS 11 (Red Team Brief) Lens 11's competitor analysis identified the $200 robot vacuum with a depth sensor as the "boring failure" adversarial scenario. From Mom's perspective, that $200 robot might actually be preferable in one important dimension: it is legible. It has one job, one behavior pattern, one failure mode (stuck). Annie has four tiers, five perception capabilities, and undefined behavior in 8 distinct failure modes. Annie is more capable but less legible than the commodity alternative. Lens 21 reveals that legibility is a stakeholder requirement that capability does not satisfy. The red team's most effective attack on Annie is not "it's slower" or "it's less accurate." It is "Mom can't tell what it's doing and it scares her." LENS 14 (Academic vs. Reality Gap) Lens 14 identified that the research describes Waymo's pattern (lidar-primary) but implements the opposite (VLM-primary). From the Visitor's perspective, this gap is invisible — they don't know what sensor Annie is using. But from the privacy perspective, VLM-primary matters enormously: a camera-primary system creates rich visual data that a lidar-primary system does not. The decision to use VLM-primary for semantic richness (the research's explicit goal) is simultaneously a decision to have a camera continuously observing the home. This tradeoff is never discussed in the research. Lens 21 makes it explicit: the VLM-primary architecture is a surveillance architecture that the research treats as a navigation architecture. Both descriptions are true. Only one is acknowledged. LENS 15 (Hardware Constraint Relaxation) Lens 15 argued that the last 40% accuracy costs 10x hardware, and identified 3 constraints relaxable for under $200. From Mom's perspective, this framing is irrelevant — she doesn't experience accuracy percentages. The constraint she cares about is the one that produces freezes during her evening tea time. From the Visitor's perspective, the constraint that matters is whether the robot has a visible indicator that its camera is active. A $5 LED that lights up when the VLM is processing frames is a better privacy solution than any number of policy documents. Lens 21 reveals that some hardware constraints are worth relaxing for stakeholder-experience reasons, not just accuracy reasons. LENS 25 (Leverage Ranking / Minimum Viable Intervention) Lens 25's core mechanic — rank candidate changes by leverage-per-engineering- hour — produces a different ordering depending on whose utility function you plug in. For Rajesh's utility (learning, elegance, throughput), Phase 2c (semantic map annotation) scores highly because it unlocks new capabilities. For Mom's utility (consistency, audible state, no freezes), Phase 2c scores near-zero — it adds complexity without addressing a single freeze moment. The Hailo-8 activation inverts this. For Mom's utility it is rank 1 — the largest trust-curve shift available from any single change. For Rajesh's utility it is also rank 1 — the highest leverage-per-hour available, because the NPU is already bolted on, zero hardware cost, ~1-2 sessions of work, IROS-validated, rollback-safe. This is the unusual case where the leverage-per-hour ranking agrees across stakeholder utility functions. Lens 25's output should not be a single leaderboard. It should be a per-stakeholder leaderboard with an explicit intersection column. Items that score highly on the intersection column are the interventions that the kaleidoscope is telling you to ship first, because they resolve conflict rather than create it. Hailo-8 activation is the lens's canonical example of such an intervention. ============================================================================== THE HAILO-8 ACTIVATION AS STAKEHOLDER-VALUE VECTOR ============================================================================== The same engineering change — activate the idle Hailo-8 AI HAT+ on Pi 5, ~1-2 engineering sessions, zero hardware cost — produces dramatically different value readings per stakeholder. This is the central finding of Lens 21 when composed with Lens 04's WiFi cliff and Lens 20's 7:30 AM event: MOM | Trust-transforming. The 7:30 AM WiFi-brownout freezes | ("Annie, did you stop?") are her biggest trust-eroding moments. | Post-Hailo, those moments stop happening. The cumulative effect | on her trust curve is larger than any single user-facing feature. RAJESH | Highest-leverage single change available. Lowest risk × highest | value. 26 TOPS NPU currently idle, YOLOv8n @ 430 FPS, <10 ms | inference, zero hardware cost, purely additive, rollback-safe. | IROS paper validates the dual-process pattern (66% latency | reduction). ANNIE | Grounding. Her first honest local sensor — fast, deterministic, | independent of WiFi weather. Closes the "what do I do when | inference takes 90ms" gap that her current architecture leaves | unspecified. VISITOR | Invisible. Hailo does not touch the consent architecture. The | camera is still on, the semantic map still records occupancy. | This is informative — it reminds us that Hailo solves three of | four stakeholder problems, not all four. The Visitor's unmet | requirement remains open. The Lens 21 synthesis: when a single change serves three of four stakeholders and harms none, ship it first. This is the rare intervention that the kaleidoscope is telling you to prioritize above every other item on the roadmap. Value is a vector; Hailo's vector is unusually well-aligned. ============================================================================== SYNTHESIS: THE MISSING REQUIREMENTS DOCUMENT ============================================================================== Every cross-lens connection above traces to the same structural deficit: the research has an architecture document (the 4-tier fusion hierarchy), a roadmap document (Phases 2a-2e), an evaluation framework (ATE, VLM accuracy, P/R), and an academic literature review. What it does not have is a requirements document written from the perspective of the people who will live with the system. The Mom Requirements Spec, if it existed, would look roughly like this: MOM-REQ-01: Voice ESTOP latency <1 second from "Ruko" utterance to wheel stop. This is a hard requirement. All other performance can be traded. MOM-REQ-02: Annie must announce intent before every navigation start: "I'm going to the kitchen" / "I'm returning home." MOM-REQ-03: Annie must announce failure states audibly: "My eyes are slow, I'll wait" / "I can't find a clear path." MOM-REQ-04: Navigation behavior in the 7am-10pm window must not change between software updates without Mom's explicit acceptance. MOM-REQ-05: Annie must not enter rooms tagged as private (bedroom, bathroom) without explicit request for each entry. The Visitor Requirements Spec would add: VISITOR-REQ-01: An obvious visual indicator when the camera is active. VISITOR-REQ-02: "Stop" or a raised-palm gesture must halt Annie from any person in the household, not just registered users. VISITOR-REQ-03: A privacy opt-out: "please stop recording" must cause Annie to leave the room and stop the camera stream. VISITOR-REQ-04: The semantic map must not store visual data of rooms in which a visitor has requested privacy. None of these requirements appears in the research document. None of them is derivable from the 4-tier architecture. They require a different kind of thinking — stakeholder-primary, not architecture-primary — that the research never applies. The deepest insight from Lens 21 is not about safety or privacy, though both are important. It is about epistemology: the research knows everything about how the system works, and nothing about who the system is for. These are different kinds of knowledge. The first kind is documented in 8 detailed sections. The second kind is absent. Until the second kind exists, every architectural decision — however elegant — is solving the wrong problem with great precision.