LENS 02: ABSTRACTION ELEVATOR — CROSS-LENS CONVERGENCE NOTES === CONFIRMED CONVERGENCES === LENS 01 (Constraint Archaeology) + LENS 02: - Lens 01 identified the temporal surplus at 58 Hz as "free signal" — Lens 02 provides the serving layer context: that surplus is real at the VLM level but partially consumed by WiFi jitter before reaching actuators. The 10-layer constraint hierarchy from Lens 01 has "WiFi RF environment" as a physics-level constraint; Lens 02 shows it leaking upward through every abstraction tier. - Synthesis: The temporal surplus (Lens 01) and the WiFi cliff (Lens 02/Lens 04) are the same problem from different altitudes. At 30,000 ft you have spare capacity; at physics level that capacity is consumed by uncontrolled network jitter. LENS 04 (WiFi / Network) + LENS 02: - Lens 04's "cliff edge at 100ms" maps exactly to Lens 02's inter-tier communication leak. The abstraction elevator makes this precise: the cliff is NOT in the VLM tier (Panda is local) and NOT in the kinematic tier (IMU is local). The cliff is specifically in the STRATEGIC→TACTICAL communication path (Titan→Panda, ~35ms baseline, spikes to 100ms) and the TACTICAL→REACTIVE inference path for any Titan-originated replanning command. - Key refinement from Lens 02: the reactive tier (Pi ESTOP) is WiFi-independent — it runs locally on Pi and gates all forward motion regardless of upper-tier state. This means WiFi failure does NOT cause the robot to run into walls. It causes strategic blindness (no Titan replans) and tactical blindness (stale VLM goal context), but the safety floor holds. This is a more nuanced picture than "WiFi is the Achilles heel" — it's "WiFi is the Achilles heel for intelligence, not for safety." LENS 10 (Post-Mortem / Slow Path) + LENS 02: - Lens 10 identified "we built the fast path, forgot the slow path." At the abstraction elevator level: the fast path (VLM at 58 Hz, ESTOP at 10 Hz) is built and working. The slow path (strategic semantic map, Phase 2c) is the missing link between 30,000 ft promise and ground-level reality. "LEFT MEDIUM" is literally the absence of the slow path — a point-in-time reactive signal rather than a spatially persistent semantic representation. LENS 26 (Bypass Text-Language Layer) + LENS 02: - Lens 26's recommendation to bypass the text-language layer is visible at byte level in Lens 02: the VLM vision encoder (14ms, 280 tokens) is separate from the text decoder (4ms, 1-2 tokens). The text decoder output "LEFT MEDIUM" is the text-language layer that Lens 26 argues should be bypassed. Lens 02's byte-level view makes the bypass path concrete: expose the 280-token ViT embedding directly and train/map a small lookup from embedding space to motor commands. This is exactly what llama-server currently blocks (the embedding leak described in narrative-02). === NEW OBSERVATIONS FROM THIS ALTITUDE === OBSERVATION A — The Pico REPL as Tier-4 Failure Mode: - No other lens has explicitly surfaced the Pico RP2040 REPL crash as a tier-level failure. It is documented in MEMORY.md but not in the research document itself. At the abstraction elevator level: Tier 4 (kinematic, 100 Hz) is implemented by a microcontroller with an interactive debugging console that can be accidentally entered and silently disables the tier. The system's health model (imu_healthy flag) requires callers to check it; no automatic propagation exists to Tier 3 or Tier 2. - Implication: The 4-tier hierarchy's downward-override model (faster tiers can veto slower) does NOT have an equivalent upward-notification model for tier failures. A failing Tier 4 is invisible to Tier 3 unless Tier 3 explicitly polls. OBSERVATION B — The Six-Altitude Map Reveals a Missing Tier: - Between 10,000 ft (4-tier architecture) and 3,000 ft (multi-query dispatch), there is a tier that does not appear in the architecture diagram: the FRAME SCHEDULER — the cycle_count modulo N logic that decides which VLM query fires on which frame. This is load-bearing (it determines the effective Hz of each perception capability) but has no tier label, no health monitoring, and no formal specification. It is implemented as a modular arithmetic expression in NavController. If N changes (e.g., add a 7th query type), the per-query Hz changes for ALL queries, not just the new one. This coupling is invisible at 10,000 ft. OBSERVATION C — llama-server Embedding Blocker as Serving-Layer Abstraction Leak: - The model capability (ViT embeddings) exists at the physics/weight level. The serving interface (llama-server HTTP API) does not expose it for multimodal inputs. This is a classic abstraction violation: the abstraction is THINNER than the underlying capability. Workaround (separate SigLIP 2 sidecar) adds operational complexity and VRAM pressure (~800MB on Panda), but validates that the capability is physically achievable on the existing hardware. The blocker is 100% in the software layer — a deployment and API surface problem, not a hardware or model limitation. - Cross-reference Lens 03 (Dependency Telescope): this is the highest-leverage dependency identified in that lens. Lens 02 confirms: it blocks Phase 2d, the phase that enables visual loop closure, which in turn improves SLAM accuracy, which in turn makes Phase 2c semantic labels more accurate. It is an upstream dependency for multiple downstream capabilities. === TENSIONS AND UNRESOLVED QUESTIONS === TENSION 1: WiFi safety vs WiFi intelligence - Lens 02 shows the reactive tier is WiFi-independent (safety floor holds). But the INTELLIGENCE of the system (strategic replanning, semantic context from Titan) IS WiFi-dependent. The robot can survive a WiFi outage safely but cannot navigate intelligently during it. Is this acceptable? The research document does not address degraded-mode behavior explicitly. TENSION 2: Multi-query frame scheduling vs uniform Hz - The 6-slot dispatch gives each non-goal query 9.7 Hz. Is 9.7 Hz sufficient for scene classification (Capability 1) and obstacle description (Capability 2)? At 1 m/s the robot travels 10cm between scene-classification frames. In a doorway transition (typically 80-90cm), the robot could misclassify the room for an entire doorway crossing. This is not addressed in the research document's probability-of-success table (Phase 2a at 90%). TENSION 3: "LEFT MEDIUM" as glass ceiling vs glass ceiling as feature - The two-token qualitative output is the glass ceiling for metric navigation (Lens 02). But it is also what makes the system run at 18ms/frame with a 2B model on a Jetson. Replacing it with metric coordinate output would require either (a) VLM fine-tuning (expensive, slow) or (b) the Phase 2c fusion layer (architecturally sound, but requires Phase 1 SLAM). The qualitative output is not a bug to fix — it is a pragmatic interface that must be interpreted correctly by the fusion layer. The glass ceiling is in the INTERPRETATION, not the OUTPUT FORMAT. === NEW FINDING (session 2026-04-16): 4-TIER → 5-TIER ABSTRACTION LEAK === The "4-tier hierarchy" is a post-hoc rationalization of code wiring, not a first-principles derivation of hardware capability. The Session 119 hardware audit surfaced that the Pi 5 on the robot carries a Hailo-8 AI HAT+ with 26 TOPS of NPU throughput that is idle for navigation. YOLOv8n runs on it at 430 FPS, <10ms latency, zero WiFi dependency. When activated, the architecture becomes a 5-tier hierarchy: L5 (Titan 26B, 1 Hz, strategic) — unchanged L4 (Panda VLM, 29-58 Hz, tactical) — unchanged L3 (Pi lidar ESTOP, 10 Hz, reactive) — unchanged but no longer lowest L2 (Pi IMU, 100 Hz, kinematic) — unchanged L1 (Pi Hailo YOLO, 30+ Hz, safety reflex) — NEW, on-robot, WiFi-independent The convention "Pi is sensor-only, Panda is the perception brain" is dissolvable — it describes code layout, not physical reality. Panda itself is on a shelf in another room (corrected session 119), not on the robot. The Orin-NX-native future robot will collapse L1+L2+L3 onto one onboard device and the tier distinction disappears. === CROSS-LENS ADDITIONS FROM THIS FINDING === LENS 01 (Constraint Archaeology) + new finding: - Lens 01's "temporal surplus at 58 Hz" story assumed the surplus lived on Panda. The Hailo-8 discovery reveals a second, uncounted temporal surplus: 430 FPS of NPU throughput on Pi, zero WiFi cost. The constraint hierarchy needs a row for "latent on-robot compute" below the current physics layer. LENS 04 (WiFi cliff) + new finding: - WiFi cliff analysis assumed safety fell to the lidar ESTOP during outages. With L1 active, the WiFi cliff dissolves for obstacle avoidance entirely — the robot keeps detecting obstacles with pixel-precise YOLO bounding boxes regardless of network state. The cliff becomes a pure intelligence cliff (no strategic/tactical updates) and no longer a perception cliff. LENS 16 (hardware substrate) + new finding: - The Hailo-8 is a canonical instance of Lens 16's thesis: hardware capabilities hide inside software labels. The audit should be repeated across every device in the stack — RTX 5070 Ti headroom, DGX Spark Isaac Perceptor idle capacity, Pico RP2040 unused GPIO pins. LENS 18 (dual-process / System 1 + System 2) + new finding: - IROS paper (arXiv 2601.21506) validates the exact pattern Annie would get from L1 activation: fast reactive (YOLO on Hailo) + slow semantic (VLM on Panda) = 66% latency reduction, 67.5% success rate vs 5.83% VLM-only. This is no longer a theoretical architecture — it maps onto hardware Annie already owns. === SUMMARY FOR ASSEMBLY === Lens 02 confirms: WiFi, "LEFT MEDIUM", Pico REPL, and llama-server embedding blocker are all the same class of problem — abstraction boundaries where the software model assumes more than the physical or infrastructure substrate delivers. The 6-altitude view adds precision that the 4-tier diagram conceals: the frame scheduler is a hidden tier, the reactive tier is WiFi-independent (important for safety analysis), and the capability-vs-API gap in llama-server is the highest-leverage single fix available before writing any new model code. **NEW (session 2026-04-16): the 4-tier diagram is itself an abstraction leak — the Hailo-8 NPU on Pi reveals a 5th tier that was always physically available and conventionally hidden.** Cross-lens: Lens 01 (temporal surplus, now two-surplus), Lens 04 (WiFi cliff, now perception-safe with L1), Lens 16 (hardware substrate audit generalization), Lens 18 (System 1/System 2 dual-process validated), Lens 26 (bypass text layer) all converge on the same structural gap that Lens 02 makes visible from the altitude gradient.