LENS 21 — STAKEHOLDER KALEIDOSCOPE: CROSS-LENS CONNECTIONS

==============================================================================
PRIMARY CONNECTIONS
==============================================================================

LENS 06 (Second-Order Effects)
  Lens 06 identified the transition where Mom discovers she can ask "Annie,
  what's in the kitchen?" — a use-case that falls out of VLMaps semantic
  annotation without being explicitly designed. Lens 21 reveals the governance
  problem this creates: Mom will discover and love this feature before Rajesh
  has designed its privacy controls or uncertainty expression. The semantic map
  goes from a background infrastructure component to a load-bearing household
  feature in the moment Mom asks her first spatial question. Lens 06 correctly
  named it a "phase transition" in the human-robot relationship. Lens 21
  identifies who is responsible for managing that transition: not the
  architecture, but the explicit consent and communication protocols that the
  architecture never specifies.

  Lens 06 also surfaced the ESTOP gap directly: "Mom ESTOP gap worsens as
  speed rises — at 1 m/s, 10 Hz semantic obstacle detection is too slow at
  elevated speed." Lens 21 makes the mechanism explicit: this is not a tuning
  problem, it is a missing requirement problem. The ESTOP gap exists because
  nobody wrote "Mom must be able to halt Annie via voice within 1 second" as a
  primary system requirement. Until that sentence appears in a requirements
  document with a passing test, the gap is a known risk with no mitigation.

  Lens 06's third-order "privacy as surveillance" branch — "the map records who
  was in which room at what time" — becomes Lens 21's Visitor card. A Visitor
  who sits in the living room for two hours is in the semantic map. The consent
  architecture that Lens 06 calls for is the specific gap Lens 21 names as the
  Visitor's unmet requirement. Both lenses arrive at the same conclusion via
  different paths: Phase 2c cannot ship without a consent layer, and the consent
  layer cannot be designed without first acknowledging the Visitor as a
  stakeholder.

LENS 10 (Failure Pre-mortem)
  Lens 10's August 2026 event — Mom stops using Annie after three freezes during
  the 7-9pm window — is the realized version of every conflict this lens
  documents. The team doesn't notice for two weeks because the dashboard shows
  94% navigation success (all hours). Lens 21 names this precisely: the
  dashboard was built from Rajesh's perspective (system-wide metrics) and is
  blind to Mom's perspective (per-user per-hour windows). The metric aggregation
  was not a technical error. It was a stakeholder representation error: the
  metric designer only consulted one stakeholder's utility function.

  Lens 10's glass door incident is the collision between Annie's perspective
  (both sensors report CLEAR, which is truthful from her signal stream) and
  Mom's perspective (the robot just hit my door, trust is gone). The disconnect
  is not engineering failure — it is stakeholder failure. Annie had no
  specification for "what do I do when both sensors agree and both are wrong?"
  because the failure mode was never written from Mom's perspective ("Annie must
  not hit furniture even when sensors are confused").

  Lens 10's pre-mortem ultimately traces every failure to the same root: "we
  built the fast path, forgot the slow path." Lens 21 reframes this in
  stakeholder terms: the fast path was designed for Rajesh (58 Hz throughput,
  architectural elegance); the slow path was the entirety of Mom's usage
  experience. Slow path = what happens when Annie freezes, crashes, gets
  confused, hits something, or loses WiFi. These are the moments that matter
  most to Mom and are specified least in the research.

LENS 20 (Multi-modal Convergence)
  Lens 20's analysis of convergence between voice, vision, spatial memory, and
  emotional context identifies the moment when all channels compose into a
  single coherent experience. From Rajesh's perspective, this convergence is an
  architectural achievement. From Mom's perspective, it is the moment Annie
  stops feeling like a collection of features and starts feeling like a presence.
  The convergence is emotionally legible to Mom before it is technically
  documented by Rajesh.

  Lens 21 reveals the governance challenge of this convergence: the moment Annie
  can proactively say "I saw your glasses on the nightstand at 2pm" (compositing
  Context Engine + semantic map + voice), she crosses from tool to agent. Mom's
  relationship with an agent requires different safety guarantees than her
  relationship with a tool. A tool that fails silently is annoying. An agent
  that fails silently feels deceptive. The voice-to-ESTOP gap, audible state
  announcements, and failure communication are not just safety features — they
  are the conditions under which Mom can maintain a healthy relationship with an
  agent she will increasingly rely on and trust.

  Lens 20's convergence also creates the Visitor problem in its most acute form:
  a fully converged system (voice + vision + memory + emotion) is indistinguishable
  from a surveillance apparatus to someone who encounters it without context.
  The Visitor needs to be able to understand what the system is doing in under
  10 seconds of direct observation. This "legibility requirement" is unspecified
  in the research and is the Visitor's primary unmet need.

==============================================================================
SECONDARY CONNECTIONS
==============================================================================

LENS 01 (Constraint Hierarchy)
  Lens 01 identified a 10-layer constraint hierarchy from physical limits up to
  social conventions. Lens 21 reveals that Mom's requirements should sit ABOVE
  the engineering constraints in this hierarchy, not below them. The current
  hierarchy is implicitly built from the bottom up: physics first, then hardware,
  then software architecture, then user experience as an afterthought. A
  Mom-first design would build the hierarchy from the top down: Mom's safety
  requirements first (voice ESTOP <1s, no sudden movements, audible state),
  then the architecture that satisfies those requirements, then the hardware that
  supports that architecture. The current research does the opposite.

LENS 03 (Dependency Graph)
  Lens 03 identified the llama-server embedding blocker as the highest-leverage
  addressable dependency. Lens 21 adds a dependency that doesn't appear in any
  technical dependency graph: Mom's trust. Mom's trust is a prerequisite for
  Annie's long-term deployment. Mom's trust depends on consistent behavior,
  audible state, and sub-1-second voice ESTOP. These are all unimplemented.
  The dependency graph is missing a human node at the top. Every technical
  component is ultimately a dependency of Mom's continued willingness to live
  with the robot. If that node fails (Lens 10, August 2026: Mom stops asking),
  the entire graph is irrelevant.

LENS 04 (Connectivity and Latency) — PRIMARY FOR HAILO FINDING
  Lens 04 identified the WiFi cliff edge at 100ms as session 119's central
  finding. From Rajesh's perspective, this is a latency engineering problem.
  From Mom's perspective, it is a behavioral consistency problem: sometimes
  Annie freezes for no visible reason. The same technical event (WiFi timeout)
  has completely different stakeholder interpretations. Rajesh sees a metric;
  Mom sees a betrayal of expectation.

  Lens 21's Hailo-8 activation is the direct architectural answer to Lens 04's
  WiFi cliff. A local Hailo-8 NPU running YOLOv8n at 430 FPS with <10 ms
  inference and zero WiFi dependence eliminates the cliff-edge failure for the
  safety layer. When WiFi stalls past 100 ms, L1 (Hailo) keeps Annie moving
  safely; L2-L4 (VLM tiers on Panda/Titan) are allowed to degrade without
  producing a visible freeze. The IROS dual-process paper (arXiv 2601.21506)
  validates a 66% latency reduction from exactly this pattern.

  The communication layer Lens 04 suggested ("my eyes are slow, I'll wait a
  moment") remains necessary — but becomes exceptional narration rather than
  routine explanation, because the freezes themselves become rare. The
  technical solution (Hailo) and the communication solution (audible state)
  are co-designed in the Mom-first reading of this architecture.

LENS 07 (Market Positioning)
  Lens 07 identified Annie as targeting the empty "edge+rich" quadrant in the
  home robotics market. From Rajesh's perspective, this is a strategic position.
  From Mom's perspective, it is invisible — she does not compare Annie to
  commodity robots. Her evaluation is entirely relative to her own experience
  over time: is Annie more reliable than last week? Is Annie more useful than
  asking Rajesh directly? The "edge+rich" positioning exists in Rajesh's head.
  The actual value proposition that Annie must deliver to Mom is simpler and
  harder: be consistently useful in her usage window without requiring her to
  understand or manage the system.

LENS 08 (Neuroscience Analogies)
  Lens 08 introduced the hippocampal replay mechanism — the slow path that
  consolidates fast perception into durable memory. The stakeholder analogy is
  precise: Mom's trust in Annie is built through hippocampal-equivalent
  processes — repeated consistent experiences that consolidate into a stable
  mental model of what Annie does and doesn't do. One glass door collision or
  three freezes in one evening creates a negative engram that is much harder to
  overwrite than a positive one. This is asymmetric trust formation: bad
  experiences are weighted more heavily than good ones in the formation of a
  lasting behavioral model. The implication for deployment: the first two weeks
  of Annie's life with Mom are the most critical. Trust formed in that window
  determines the baseline for the entire relationship. The system should be
  explicitly de-featured and over-conservative during onboarding, then gradually
  expand capabilities as the trust baseline is established.

LENS 11 (Red Team Brief)
  Lens 11's competitor analysis identified the $200 robot vacuum with a depth
  sensor as the "boring failure" adversarial scenario. From Mom's perspective,
  that $200 robot might actually be preferable in one important dimension: it is
  legible. It has one job, one behavior pattern, one failure mode (stuck). Annie
  has four tiers, five perception capabilities, and undefined behavior in 8
  distinct failure modes. Annie is more capable but less legible than the
  commodity alternative. Lens 21 reveals that legibility is a stakeholder
  requirement that capability does not satisfy. The red team's most effective
  attack on Annie is not "it's slower" or "it's less accurate." It is "Mom
  can't tell what it's doing and it scares her."

LENS 14 (Academic vs. Reality Gap)
  Lens 14 identified that the research describes Waymo's pattern (lidar-primary)
  but implements the opposite (VLM-primary). From the Visitor's perspective,
  this gap is invisible — they don't know what sensor Annie is using. But from
  the privacy perspective, VLM-primary matters enormously: a camera-primary
  system creates rich visual data that a lidar-primary system does not. The
  decision to use VLM-primary for semantic richness (the research's explicit
  goal) is simultaneously a decision to have a camera continuously observing the
  home. This tradeoff is never discussed in the research. Lens 21 makes it
  explicit: the VLM-primary architecture is a surveillance architecture that the
  research treats as a navigation architecture. Both descriptions are true. Only
  one is acknowledged.

LENS 15 (Hardware Constraint Relaxation)
  Lens 15 argued that the last 40% accuracy costs 10x hardware, and identified
  3 constraints relaxable for under $200. From Mom's perspective, this framing
  is irrelevant — she doesn't experience accuracy percentages. The constraint she
  cares about is the one that produces freezes during her evening tea time. From
  the Visitor's perspective, the constraint that matters is whether the robot has
  a visible indicator that its camera is active. A $5 LED that lights up when
  the VLM is processing frames is a better privacy solution than any number of
  policy documents. Lens 21 reveals that some hardware constraints are worth
  relaxing for stakeholder-experience reasons, not just accuracy reasons.

LENS 25 (Leverage Ranking / Minimum Viable Intervention)
  Lens 25's core mechanic — rank candidate changes by leverage-per-engineering-
  hour — produces a different ordering depending on whose utility function you
  plug in. For Rajesh's utility (learning, elegance, throughput), Phase 2c
  (semantic map annotation) scores highly because it unlocks new capabilities.
  For Mom's utility (consistency, audible state, no freezes), Phase 2c scores
  near-zero — it adds complexity without addressing a single freeze moment.
  The Hailo-8 activation inverts this. For Mom's utility it is rank 1 — the
  largest trust-curve shift available from any single change. For Rajesh's
  utility it is also rank 1 — the highest leverage-per-hour available, because
  the NPU is already bolted on, zero hardware cost, ~1-2 sessions of work,
  IROS-validated, rollback-safe.

  This is the unusual case where the leverage-per-hour ranking agrees across
  stakeholder utility functions. Lens 25's output should not be a single
  leaderboard. It should be a per-stakeholder leaderboard with an explicit
  intersection column. Items that score highly on the intersection column are
  the interventions that the kaleidoscope is telling you to ship first, because
  they resolve conflict rather than create it. Hailo-8 activation is the lens's
  canonical example of such an intervention.

==============================================================================
THE HAILO-8 ACTIVATION AS STAKEHOLDER-VALUE VECTOR
==============================================================================

The same engineering change — activate the idle Hailo-8 AI HAT+ on Pi 5,
~1-2 engineering sessions, zero hardware cost — produces dramatically
different value readings per stakeholder. This is the central finding of
Lens 21 when composed with Lens 04's WiFi cliff and Lens 20's 7:30 AM event:

  MOM      | Trust-transforming. The 7:30 AM WiFi-brownout freezes
           | ("Annie, did you stop?") are her biggest trust-eroding moments.
           | Post-Hailo, those moments stop happening. The cumulative effect
           | on her trust curve is larger than any single user-facing feature.

  RAJESH   | Highest-leverage single change available. Lowest risk × highest
           | value. 26 TOPS NPU currently idle, YOLOv8n @ 430 FPS, <10 ms
           | inference, zero hardware cost, purely additive, rollback-safe.
           | IROS paper validates the dual-process pattern (66% latency
           | reduction).

  ANNIE    | Grounding. Her first honest local sensor — fast, deterministic,
           | independent of WiFi weather. Closes the "what do I do when
           | inference takes 90ms" gap that her current architecture leaves
           | unspecified.

  VISITOR  | Invisible. Hailo does not touch the consent architecture. The
           | camera is still on, the semantic map still records occupancy.
           | This is informative — it reminds us that Hailo solves three of
           | four stakeholder problems, not all four. The Visitor's unmet
           | requirement remains open.

The Lens 21 synthesis: when a single change serves three of four stakeholders
and harms none, ship it first. This is the rare intervention that the
kaleidoscope is telling you to prioritize above every other item on the
roadmap. Value is a vector; Hailo's vector is unusually well-aligned.

==============================================================================
SYNTHESIS: THE MISSING REQUIREMENTS DOCUMENT
==============================================================================

Every cross-lens connection above traces to the same structural deficit: the
research has an architecture document (the 4-tier fusion hierarchy), a roadmap
document (Phases 2a-2e), an evaluation framework (ATE, VLM accuracy, P/R), and
an academic literature review. What it does not have is a requirements document
written from the perspective of the people who will live with the system.

The Mom Requirements Spec, if it existed, would look roughly like this:

  MOM-REQ-01: Voice ESTOP latency <1 second from "Ruko" utterance to wheel stop.
               This is a hard requirement. All other performance can be traded.

  MOM-REQ-02: Annie must announce intent before every navigation start:
               "I'm going to the kitchen" / "I'm returning home."

  MOM-REQ-03: Annie must announce failure states audibly:
               "My eyes are slow, I'll wait" / "I can't find a clear path."

  MOM-REQ-04: Navigation behavior in the 7am-10pm window must not change
               between software updates without Mom's explicit acceptance.

  MOM-REQ-05: Annie must not enter rooms tagged as private (bedroom, bathroom)
               without explicit request for each entry.

The Visitor Requirements Spec would add:

  VISITOR-REQ-01: An obvious visual indicator when the camera is active.

  VISITOR-REQ-02: "Stop" or a raised-palm gesture must halt Annie from any
                  person in the household, not just registered users.

  VISITOR-REQ-03: A privacy opt-out: "please stop recording" must cause Annie
                  to leave the room and stop the camera stream.

  VISITOR-REQ-04: The semantic map must not store visual data of rooms in which
                  a visitor has requested privacy.

None of these requirements appears in the research document. None of them is
derivable from the 4-tier architecture. They require a different kind of
thinking — stakeholder-primary, not architecture-primary — that the research
never applies.

The deepest insight from Lens 21 is not about safety or privacy, though both
are important. It is about epistemology: the research knows everything about
how the system works, and nothing about who the system is for. These are
different kinds of knowledge. The first kind is documented in 8 detailed
sections. The second kind is absent. Until the second kind exists, every
architectural decision — however elegant — is solving the wrong problem with
great precision.