LENS 02: ABSTRACTION ELEVATOR — CROSS-LENS CONVERGENCE NOTES

=== CONFIRMED CONVERGENCES ===

LENS 01 (Constraint Archaeology) + LENS 02:
  - Lens 01 identified the temporal surplus at 58 Hz as "free signal" — Lens 02 provides the serving
    layer context: that surplus is real at the VLM level but partially consumed by WiFi jitter before
    reaching actuators. The 10-layer constraint hierarchy from Lens 01 has "WiFi RF environment" as a
    physics-level constraint; Lens 02 shows it leaking upward through every abstraction tier.
  - Synthesis: The temporal surplus (Lens 01) and the WiFi cliff (Lens 02/Lens 04) are the same
    problem from different altitudes. At 30,000 ft you have spare capacity; at physics level that
    capacity is consumed by uncontrolled network jitter.

LENS 04 (WiFi / Network) + LENS 02:
  - Lens 04's "cliff edge at 100ms" maps exactly to Lens 02's inter-tier communication leak. The
    abstraction elevator makes this precise: the cliff is NOT in the VLM tier (Panda is local) and
    NOT in the kinematic tier (IMU is local). The cliff is specifically in the STRATEGIC→TACTICAL
    communication path (Titan→Panda, ~35ms baseline, spikes to 100ms) and the TACTICAL→REACTIVE
    inference path for any Titan-originated replanning command.
  - Key refinement from Lens 02: the reactive tier (Pi ESTOP) is WiFi-independent — it runs locally
    on Pi and gates all forward motion regardless of upper-tier state. This means WiFi failure does
    NOT cause the robot to run into walls. It causes strategic blindness (no Titan replans) and
    tactical blindness (stale VLM goal context), but the safety floor holds. This is a more nuanced
    picture than "WiFi is the Achilles heel" — it's "WiFi is the Achilles heel for intelligence,
    not for safety."

LENS 10 (Post-Mortem / Slow Path) + LENS 02:
  - Lens 10 identified "we built the fast path, forgot the slow path." At the abstraction elevator
    level: the fast path (VLM at 58 Hz, ESTOP at 10 Hz) is built and working. The slow path
    (strategic semantic map, Phase 2c) is the missing link between 30,000 ft promise and ground-level
    reality. "LEFT MEDIUM" is literally the absence of the slow path — a point-in-time reactive
    signal rather than a spatially persistent semantic representation.

LENS 26 (Bypass Text-Language Layer) + LENS 02:
  - Lens 26's recommendation to bypass the text-language layer is visible at byte level in Lens 02:
    the VLM vision encoder (14ms, 280 tokens) is separate from the text decoder (4ms, 1-2 tokens).
    The text decoder output "LEFT MEDIUM" is the text-language layer that Lens 26 argues should be
    bypassed. Lens 02's byte-level view makes the bypass path concrete: expose the 280-token ViT
    embedding directly and train/map a small lookup from embedding space to motor commands. This is
    exactly what llama-server currently blocks (the embedding leak described in narrative-02).

=== NEW OBSERVATIONS FROM THIS ALTITUDE ===

OBSERVATION A — The Pico REPL as Tier-4 Failure Mode:
  - No other lens has explicitly surfaced the Pico RP2040 REPL crash as a tier-level failure. It is
    documented in MEMORY.md but not in the research document itself. At the abstraction elevator level:
    Tier 4 (kinematic, 100 Hz) is implemented by a microcontroller with an interactive debugging
    console that can be accidentally entered and silently disables the tier. The system's health model
    (imu_healthy flag) requires callers to check it; no automatic propagation exists to Tier 3 or Tier 2.
  - Implication: The 4-tier hierarchy's downward-override model (faster tiers can veto slower) does NOT
    have an equivalent upward-notification model for tier failures. A failing Tier 4 is invisible to
    Tier 3 unless Tier 3 explicitly polls.

OBSERVATION B — The Six-Altitude Map Reveals a Missing Tier:
  - Between 10,000 ft (4-tier architecture) and 3,000 ft (multi-query dispatch), there is a tier that
    does not appear in the architecture diagram: the FRAME SCHEDULER — the cycle_count modulo N logic
    that decides which VLM query fires on which frame. This is load-bearing (it determines the effective
    Hz of each perception capability) but has no tier label, no health monitoring, and no formal
    specification. It is implemented as a modular arithmetic expression in NavController. If N changes
    (e.g., add a 7th query type), the per-query Hz changes for ALL queries, not just the new one.
    This coupling is invisible at 10,000 ft.

OBSERVATION C — llama-server Embedding Blocker as Serving-Layer Abstraction Leak:
  - The model capability (ViT embeddings) exists at the physics/weight level. The serving interface
    (llama-server HTTP API) does not expose it for multimodal inputs. This is a classic abstraction
    violation: the abstraction is THINNER than the underlying capability. Workaround (separate SigLIP 2
    sidecar) adds operational complexity and VRAM pressure (~800MB on Panda), but validates that the
    capability is physically achievable on the existing hardware. The blocker is 100% in the software
    layer — a deployment and API surface problem, not a hardware or model limitation.
  - Cross-reference Lens 03 (Dependency Telescope): this is the highest-leverage dependency identified
    in that lens. Lens 02 confirms: it blocks Phase 2d, the phase that enables visual loop closure,
    which in turn improves SLAM accuracy, which in turn makes Phase 2c semantic labels more accurate.
    It is an upstream dependency for multiple downstream capabilities.

=== TENSIONS AND UNRESOLVED QUESTIONS ===

TENSION 1: WiFi safety vs WiFi intelligence
  - Lens 02 shows the reactive tier is WiFi-independent (safety floor holds). But the INTELLIGENCE
    of the system (strategic replanning, semantic context from Titan) IS WiFi-dependent. The robot
    can survive a WiFi outage safely but cannot navigate intelligently during it. Is this acceptable?
    The research document does not address degraded-mode behavior explicitly.

TENSION 2: Multi-query frame scheduling vs uniform Hz
  - The 6-slot dispatch gives each non-goal query 9.7 Hz. Is 9.7 Hz sufficient for scene classification
    (Capability 1) and obstacle description (Capability 2)? At 1 m/s the robot travels 10cm between
    scene-classification frames. In a doorway transition (typically 80-90cm), the robot could
    misclassify the room for an entire doorway crossing. This is not addressed in the research document's
    probability-of-success table (Phase 2a at 90%).

TENSION 3: "LEFT MEDIUM" as glass ceiling vs glass ceiling as feature
  - The two-token qualitative output is the glass ceiling for metric navigation (Lens 02). But it is also
    what makes the system run at 18ms/frame with a 2B model on a Jetson. Replacing it with metric
    coordinate output would require either (a) VLM fine-tuning (expensive, slow) or (b) the Phase 2c
    fusion layer (architecturally sound, but requires Phase 1 SLAM). The qualitative output is not a
    bug to fix — it is a pragmatic interface that must be interpreted correctly by the fusion layer.
    The glass ceiling is in the INTERPRETATION, not the OUTPUT FORMAT.

=== NEW FINDING (session 2026-04-16): 4-TIER → 5-TIER ABSTRACTION LEAK ===

The "4-tier hierarchy" is a post-hoc rationalization of code wiring, not a first-principles
derivation of hardware capability. The Session 119 hardware audit surfaced that the Pi 5 on
the robot carries a Hailo-8 AI HAT+ with 26 TOPS of NPU throughput that is idle for
navigation. YOLOv8n runs on it at 430 FPS, <10ms latency, zero WiFi dependency.

When activated, the architecture becomes a 5-tier hierarchy:
  L5 (Titan 26B, 1 Hz, strategic)           — unchanged
  L4 (Panda VLM, 29-58 Hz, tactical)        — unchanged
  L3 (Pi lidar ESTOP, 10 Hz, reactive)      — unchanged but no longer lowest
  L2 (Pi IMU, 100 Hz, kinematic)            — unchanged
  L1 (Pi Hailo YOLO, 30+ Hz, safety reflex) — NEW, on-robot, WiFi-independent

The convention "Pi is sensor-only, Panda is the perception brain" is dissolvable — it describes
code layout, not physical reality. Panda itself is on a shelf in another room (corrected session
119), not on the robot. The Orin-NX-native future robot will collapse L1+L2+L3 onto one onboard
device and the tier distinction disappears.

=== CROSS-LENS ADDITIONS FROM THIS FINDING ===

LENS 01 (Constraint Archaeology) + new finding:
  - Lens 01's "temporal surplus at 58 Hz" story assumed the surplus lived on Panda. The Hailo-8
    discovery reveals a second, uncounted temporal surplus: 430 FPS of NPU throughput on Pi, zero
    WiFi cost. The constraint hierarchy needs a row for "latent on-robot compute" below the current
    physics layer.

LENS 04 (WiFi cliff) + new finding:
  - WiFi cliff analysis assumed safety fell to the lidar ESTOP during outages. With L1 active, the
    WiFi cliff dissolves for obstacle avoidance entirely — the robot keeps detecting obstacles with
    pixel-precise YOLO bounding boxes regardless of network state. The cliff becomes a pure
    intelligence cliff (no strategic/tactical updates) and no longer a perception cliff.

LENS 16 (hardware substrate) + new finding:
  - The Hailo-8 is a canonical instance of Lens 16's thesis: hardware capabilities hide inside
    software labels. The audit should be repeated across every device in the stack — RTX 5070 Ti
    headroom, DGX Spark Isaac Perceptor idle capacity, Pico RP2040 unused GPIO pins.

LENS 18 (dual-process / System 1 + System 2) + new finding:
  - IROS paper (arXiv 2601.21506) validates the exact pattern Annie would get from L1 activation:
    fast reactive (YOLO on Hailo) + slow semantic (VLM on Panda) = 66% latency reduction, 67.5%
    success rate vs 5.83% VLM-only. This is no longer a theoretical architecture — it maps onto
    hardware Annie already owns.

=== SUMMARY FOR ASSEMBLY ===
Lens 02 confirms: WiFi, "LEFT MEDIUM", Pico REPL, and llama-server embedding blocker are all the same
class of problem — abstraction boundaries where the software model assumes more than the physical or
infrastructure substrate delivers. The 6-altitude view adds precision that the 4-tier diagram conceals:
the frame scheduler is a hidden tier, the reactive tier is WiFi-independent (important for safety
analysis), and the capability-vs-API gap in llama-server is the highest-leverage single fix available
before writing any new model code. **NEW (session 2026-04-16): the 4-tier diagram is itself an
abstraction leak — the Hailo-8 NPU on Pi reveals a 5th tier that was always physically available
and conventionally hidden.** Cross-lens: Lens 01 (temporal surplus, now two-surplus), Lens 04 (WiFi
cliff, now perception-safe with L1), Lens 16 (hardware substrate audit generalization), Lens 18
(System 1/System 2 dual-process validated), Lens 26 (bypass text layer) all converge on the same
structural gap that Lens 02 makes visible from the altitude gradient.