LENS 26 — QUESTION HORIZON: CROSS-LENS CONNECTIONS =================================================== PRIMARY CONNECTIONS ------------------- LENS 01 (Constraint Hierarchy — Temporal Surplus as Free Signal): Lens 01 identified that Annie's 58 Hz surplus creates "temporal free signal" — far more frames per second than needed for basic navigation decisions. It catalogued constraints at multiple levels: physics → convention → dissolved. Lens 26 provides the next-order question that Lens 01 could not ask: now that the surplus exists, what is the optimal allocation? Branch 1 of Lens 26 asks whether alternating-query dispatch at 29 Hz nav + 10 Hz scene + 10 Hz obstacle is the best distribution, or whether there is a discovery-based optimal split that changes with room type (cluttered living room vs empty hallway requires different allocation ratios). Lens 01's "dissolved constraint" category should now include: "temporal interleaving enables multi-task perception without frame-level parallelism." But Lens 26 adds the caveat: the interleaving may introduce task-lag artifacts (frame 3 obstacle report about a moment captured between frames 2 and 4's nav queries). That artifact was not visible as a question until the multi-query pipeline was proposed. Cross-citation: Lens 01's 86 ms EMA window is the same 86 ms that Lens 26 Branch 2 identifies as the almost-answered "EMA vs sensor fusion" question. The temporal surplus and the EMA window are two descriptions of the same design variable. LENS 05 (Value Mapping — Privacy and Behavioral Signals): Lens 26 Branch 3 asks whether Annie's semantic map transfers between homes. This question has an immediate privacy consequence that Lens 05 should pick up. A semantic map is not just spatial geometry — it encodes behavioral patterns. "Kitchen frequently visited between 7:00 and 9:00 AM" is a health signal. "Bedroom entered at irregular hours" is a behavioral signal. "Bathroom pattern changed in week 3" is a medical signal. If that map is uploaded as a product SKU and transferred to new users, it carries those behavioral embeddings in latent form. The fraction of the map that is "universally transferable" may be higher than expected precisely because universal semantic anchors are correlated with universal behavioral patterns. Cross-citation: Lens 05's privacy model needs to explicitly address the "map as product" scenario that Lens 26 Branch 3 introduces. The question "what is transferable in the semantic map?" is also the question "what behavioral signals escape the home?" LENS 08 (Analogy Bridge — Neuroscience Mechanisms): Lens 26 Branch 5 (outsider question: "why does the robot need language?") and Lens 08 converge on the same architectural observation from different directions. Lens 08 notes that rat hippocampal place cells encode spatial identity directly as activation patterns, not as verbal descriptions. Lens 26 Branch 4 notes that text2nav achieves 74% navigation success using frozen SigLIP embeddings alone. Both observations point at the same thing: the vision encoder's activation space is a sufficient representation for spatial navigation; the text-decoding step is an unnecessary intermediate. Lens 08's hippocampal replay mechanism (recommended implementation: store vision embeddings keyed by SLAM pose during overnight consolidation) is the architectural implementation of Lens 26's "bypass text" convergence finding. Cross-citation: Lens 08 explicitly recommended that replay-time processing store vision encoder embeddings (not text descriptions) keyed by SLAM pose. Lens 26 confirms this from three independent question branches. The two lenses are co-specifying the same architecture from different starting points. LENS 14 (Historical Pattern — Research Describes Waymo, Does the Opposite): Lens 14 found that the research describes the Waymo pattern (lidar-primary, camera supplementary) and then implements the opposite (VLM-primary, lidar supplementary). Lens 26 finds an analogous inversion: the research builds toward language-grounded semantic maps (VLMaps pattern, CLIP embeddings, language-queryable map) and simultaneously identifies reasons to remove language from the Tier 2 perception loop. Both cannot be maximally true. The VLMaps architecture requires language at the map query interface ("where is the kitchen?") but not at the frame-processing interface ("what is in this frame?"). Lens 26 makes the distinction explicit: language at Tier 1 (strategic goal interpretation) is load-bearing; language at Tier 2 (tactical frame processing) is a relay station that adds latency and hallucination risk without contributing to navigation accuracy. The architectural resolution is: keep text at Tier 1, bypass text at Tier 2. This resolution was not articulable from inside the research because the inversion (Lens 14's finding) was not visible. Cross-citation: Lens 14's "research describes X then does not-X" pattern is a general signal that the team's explicit commitments (use language-grounded maps) and its implicit architecture (VLM-primary, embedding-fast-path) are in tension. Lens 26 Branch 5 makes that tension into a question: which commitment should win? SECONDARY CONNECTIONS --------------------- LENS 02 (Abstraction Leak — Pico RP2040 REPL Crash as Invisible Failure): Lens 02 identified that the most dangerous failures in Annie's stack are invisible abstraction leaks — the IMU crash that silently degraded navigation without any error message. Lens 26 Branch 5 introduces a new potential invisible failure: if text decoding is removed from Tier 2 and replaced with embedding-to-command linear probes, the failure mode changes from "VLM hallucinated LEFT when obstacle was on the right" (human-readable, debuggable from logs) to "embedding distance 0.73 fell below threshold for sector 2" (numeric, requires visualization tooling to debug). The abstraction leak risk increases when text is bypassed. Lens 26's outsider question about explainability cost directly instantiates Lens 02's concern about invisible failures. Any implementation of "bypass text layer" must include an explicit observability plan: what replaces the "VLM said LEFT MEDIUM" log entry? LENS 03 (Dependency Audit — llama-server Embedding Blocker): Lens 03 identified the llama-server embedding extraction blocker as the highest- leverage addressable dependency in Annie's stack. The research itself notes that llama-server does not cleanly expose intermediate embeddings for multimodal inputs, and recommends deploying a separate SigLIP 2 ViT-SO400M (~800 MB VRAM) as a dedicated embedding extractor. Lens 26 Branch 1 (task-parallelism questions) and Branch 4 (cross-field, text-free architecture) both depend on resolving this blocker first. The SigLIP 2 deployment is the prerequisite for every branch of the "bypass text" convergence finding. Cross-citation: Lens 03's blocker analysis gives the sequencing for Lens 26's convergence implementation: (1) deploy SigLIP 2 on Panda; (2) profile text-decode vs embed-only latency; (3) train linear probe; (4) A/B test. None of steps 2-4 are possible until step 1 unblocks the embedding extraction path. LENS 04 (Network Topology — WiFi Cliff Edge at 100ms): Lens 26 Branch 3 (semantic map transfer) has an important WiFi dependency that Lens 04's findings constrain. Semantic map transfer (uploading/downloading concept graphs between homes) requires network connectivity for the initial transfer, but runtime navigation must remain WiFi-independent. The "map as product" scenario Lens 26 introduces needs to be designed so that the transferred concept graph is fully cached locally on Annie before deployment, not streamed at runtime. Lens 04's cliff-edge finding (navigation degrades sharply above 100ms RTT) means any architecture where the transferred map requires Titan or cloud lookups during active navigation is fragile. Cross-citation: the map transfer architecture must specify: (a) transfer happens during setup/charging, not during runtime; (b) the on-Pi cache of concept embeddings is the runtime source; (c) Titan enrichment of the map happens offline (Lens 08 hippocampal replay pattern). LENS 10 (Post-Mortem — "Built the Fast Path, Forgot the Slow Path"): Lens 10's post-mortem finding ("we built the fast path, forgot the slow path") has a direct parallel in Lens 26's convergence finding. The "fast path" (text decoding at 58 Hz, human-readable nav commands) was built because it was the natural output format for a language model. The "slow path" (embedding extraction, cosine similarity, linear probe training on labeled frames) was not built because it required a separate infrastructure step (SigLIP 2 deployment, a data pipeline for frame logging, a training loop for linear probes). The text layer is the fast path. The embedding layer is the slow path. The research proves the fast path works. The slow path — which the convergence finding suggests may work better — was not forgotten by negligence but by the same structural reason Lens 10 identified: the slow path requires planning and infrastructure that the fast path does not. Cross-citation: Lens 10's retrospective framing ("what went wrong and when?") provides the narrative for why the text layer was chosen — it was the fast path to a working system. Lens 26 provides the prospective framing: now that the fast path works, the slow path is the next investment. NOVEL PREDICTIONS FROM QUESTION HORIZON (not in any other lens) --------------------------------------------------------------- 1. THE SEMANTIC MAP BUSINESS MODEL: No other lens addresses the commercial consequence of semantic map transfer. If Annie's concept embeddings (not coordinates) are the map, and if 60-70% of those embeddings are universal (home-layout patterns that repeat across all homes), then the transferred concept graph is a form of pre-trained spatial knowledge. A new user's Annie would not start from a blank SLAM occupancy grid — she would start from a graph where "kitchen-ness," "bathroom-ness," and "hallway-ness" are already recognized. The exploration-to-functional ratio (how long before Annie is useful in a new home) drops from weeks to hours. This is not a navigation research finding. It is a product design finding that the navigation research makes askable for the first time. 2. THE EXPLAINABILITY TRADE: No other lens explicitly frames the text-bypass as a trade between navigation accuracy and debugging transparency. Every other lens that mentions bypassing text (Lens 08, Lens 14) treats it as a pure performance improvement. Lens 26 Branch 5 adds the countervailing concern: text-mediated nav is debuggable by inspection. Embedding-mediated nav requires visualization tooling, dimensionality reduction, and human-in-the-loop evaluation of cosine similarity thresholds. The transparency trade is a product design and safety concern that must be decided explicitly, not defaulted. The question "is the text layer retained for debugging convenience or for navigation performance?" is a governance question that no other lens raised. 3. THE TASK-MINIMUM FREQUENCY QUESTION: Branch 1 asks: what is the minimum nav frequency before task performance degrades? No empirical answer exists. The research proposes 29 Hz nav (frames 0,2,4 in a 6-frame cycle). But if 15 Hz nav works, the remaining 43 Hz can be allocated to embedding extraction and place recognition, enabling real-time topological map building without a separate SigLIP 2 model. This would collapse Phase 2d (embedding extraction) into Phase 2a (multi-query pipeline) — a significant simplification that requires only one measurement: what is Annie's minimum viable nav frequency? No other lens identified this as the highest-leverage measurement in Phase 2a. 4. THE LANGUAGE-GEOMETRY MISMATCH: The outsider question from Branch 5 — "why does the robot need to understand language?" — identifies a category error in the architecture's design rationale that no other lens surfaced. Navigation is a geometric problem. Language is a communication protocol. Using a language model for navigation is appropriate if the mapping is: (human language) → (robot action). It is a detour if the mapping is: (visual embedding) → (text string) → (robot action), where the text string adds no information that the visual embedding did not already contain. Lens 26 is the only lens that makes this category distinction explicit. It suggests that the language model in Annie's stack should be evaluated against two different standards: (a) does it help the robot navigate? (b) does it help the human communicate with the robot? The answer to (a) may be "not as much as we thought" while the answer to (b) remains "yes, significantly." The architecture should reflect this distinction by separating the two functions, not conflating them in a single VLM inference call. SESSION 119 ADDITIONS — DUAL-PROCESS HORIZON CROSS-LENS -------------------------------------------------------- LENS 03 (Dependency Audit — llama-server Embedding Blocker) [SESSION 119 UPDATE]: Lens 03's blocker analysis gains a new entry from session 119: the Hailo-8 AI HAT+ on Pi 5 is a 26 TOPS NPU that is idle for navigation. It is not a blocker in the "missing capability" sense; it is the opposite — an unclaimed capability. Lens 03 should register it as a negative-space dependency: every frame that Hailo does not process is a frame the VLM must process, and every VLM frame costs 25-40 ms of WiFi plus inference latency that Hailo would deliver in under 10 ms locally. The dependency audit is now: (1) llama-server embedding path (pre-existing, for Tier 2 text bypass); (2) HailoRT/TAPPAS toolchain on Pi 5 (new, for System 1 activation); (3) open-vocabulary detector compilation to Hailo format (new, unresolved — NanoOWL compatibility unverified). Cross-citation: Lens 26 Branch 6 asks whether Hailo can run NanoOWL-lite. The answer determines whether Lens 03's dependency list has two or three items. LENS 08 (Analogy Bridge — Neuroscience Mechanisms) [SESSION 119 UPDATE]: Lens 08's dual-process observation (fast instinctual detection + slow deliberative reasoning, mapped to rat hippocampus + prefrontal cortex) is directly validated by the IROS arXiv 2601.21506 result that session 119 surfaced. System 1 (30+ Hz SegFormer/YOLO) + System 2 (1-5 Hz VLM) = 66% latency reduction, 67.5% success vs 5.83% VLM-only on indoor robot navigation. The Lens 08 neuroscience analogy is no longer speculative — it is a peer-reviewed architectural pattern with measured performance advantages. Cross-citation: Lens 08 should upgrade its "fast-slow split" recommendation from "inspired by hippocampus" to "validated by IROS 2601.21506" and add the Hailo-8 as the concrete implementation substrate for System 1. LENS 24 (Resource Allocation Under Uncertainty) [SESSION 119 CREATES]: Lens 24 now has an explicit calibration question from session 119: the tuning question ("at what VLM query rate does System 2 gating outperform always-on VLM?") and the layer-ratio question ("what are the optimal relative Hz for L1/L2/L3/L4?") are both resource-allocation questions under uncertainty. IROS gives one answer for their setup; Annie's specific allocation is unmeasured. Lens 24 should frame the measurement strategy: sweep L2 rates from 1 Hz to 27 Hz on identical routes while L1 runs at 30+ Hz, measure success rate and p95 decision latency, fit the crossover point. This is the same kind of sweep Lens 24 uses for GPU memory allocation but applied to inference-rate allocation across a dual-process stack. Cross-citation: Lens 26 Branch 6 tuning question is a Lens 24 experimental design problem. LENS 25 (Meta-Questions / Process vs Design) [SESSION 119 STRENGTHENS]: Lens 25's "process success vs design success" distinction is exactly the lens session 119 applied. The Hailo-8 activation was a process success (audit surfaced a pre-existing resource), not a design success (nobody designed Annie around Hailo). Lens 25 should now catalogue process-success patterns and their triggers. The trigger in session 119 was: a targeted hardware-audit pass run alongside a literature sweep on dual-process navigation. The meta-question Lens 26 Branch 6 introduces — "what other idle compute is in the household?" — is a Lens 25 process-instrument. It is not a design question. It is a question about the investigation regime. Cross-citation: Lens 25 and Lens 26 Branch 6 share the same output artifact: an explicit inventory of Panda (active), Titan (active), Beast (idle), Orin NX 16GB (idle), plus unaudited tiers (phones, laptops, TV SoCs, router NPUs). Lens 25 should maintain this inventory as a durable household-compute registry appendix. CONVERGENCE SUMMARY ------------------- The three-branch convergence on "bypass text layer" is the most important finding in Lens 26. It is worth restating precisely: Branch 1 (task-parallelism): "What if VLM outputs embeddings instead of text?" → Vision encoder alone at 71 Hz (14 ms, no 4 ms decode). → Enables true task-parallel allocation without interleaving artifacts. Branch 3 (map transfer): "What if SLAM cells stored embeddings instead of text labels?" → Transferable semantic maps (embeddings vs coordinates). → Enables the "map as product" business model. Branch 4 (cross-field): "What if place recognition used raw ViT features?" → Text2nav: 74% success with frozen SigLIP embeddings alone (RSS 2025). → Connects Annie's architecture to the animal navigation and embodied AI literature. All three branches make the same architectural recommendation: - Keep text at Tier 1 (strategic goal interpretation — language IS the interface). - Bypass text at Tier 2 (tactical frame processing — language is a relay station). The implementation path (from Lens 03's dependency analysis): 1. Deploy SigLIP 2 ViT-SO400M on Panda (~800 MB VRAM). 2. Profile text-decode latency vs embed-only latency separately in llama-server. 3. Train 3-layer linear probe on Annie's 6-month labeled frame log. 4. A/B test: embedding path vs text path on identical navigation routes. 5. Decide explicitly on explainability trade before committing to text-free Tier 2. The convergence is not coincidence. It reflects that the text layer was inherited from the model class (Vision-Language Model) rather than designed for the task (geometric navigation). The research created the conditions to ask whether that inheritance is load-bearing or incidental. Lens 26 confirms it is the right question to ask next. Session 119 widens the convergence. Before committing to a text-free Tier 2, two new prerequisite questions must be answered: - At what VLM query rate does System 2 gating outperform always-on VLM? (Tuning) - Can Hailo-8 run open-vocabulary detectors? (Capability) If the tuning crossover is below 15 Hz, Annie's 54 Hz VLM is over-budget and the dual-process split is the first-order architectural move — ahead of text bypass. If Hailo supports open-vocabulary detection, L1 absorbs part of the goal-tracking load that currently sits in Tier 2, changing what Tier 2 needs to be and therefore what its right representation is. The durable output of session 119 is the meta-instrument: "what else is idle?" Apply it on Beast, Orin NX 16 GB, phones, laptops, TV SoCs, and router NPUs. The next invisible resource is waiting for the next targeted audit. The question horizon is not just about new questions the primary research made askable. It is also about new questions that targeted hardware-inventory passes make askable — questions whose answers depend on resources that were invisible before the audit forced them into view.