Lens 02: The Abstraction Elevator. Core question: What do you see at each altitude?

At 30,000 feet, this is a robot companion that navigates your home by understanding it. It understands rooms, recognizes places, avoids obstacles, and builds a living semantic map. Its VLM runs at 58 Hz — faster than Tesla FSD's perception at 36 Hz.

Drop to 10,000 feet and you see a 4-tier hierarchical fusion architecture. Titan, the DGX Spark, handles strategic planning at 1 Hz on a SLAM map. Panda, the Jetson Orin, runs the VLM at 29 to 58 Hz, tracking goals and classifying scenes. The Pi 5 runs reactive lidar-based ESTOP at 10 Hz. An IMU on the Pi corrects heading drift at 100 Hz. Each tier is faster than the one above it and can override downward.

At 3,000 feet you see the multi-query alternating dispatch pattern. The 58-Hz VLM budget is split across 6 slots: frames 0, 2, and 4 do goal tracking at 29 Hz, returning "LEFT MEDIUM" navigation commands. Frame 1 returns a scene label like "hallway" at 9.7 Hz. Frame 3 returns an obstacle token like "chair" at 9.7 Hz. Frame 5 extracts a 280-dimensional vision encoder embedding for place recognition at 9.7 Hz. An exponential moving average with alpha equals 0.3 smooths noise across frames.

At ground level you see the actual implementation: a cycle counter modulo N in NavController dot run loop. The sonar ESTOP fires at 250 millimeters as an absolute gate over all tiers. SLAM grid cells accumulate scene labels at the current robot pose. The sonar value is a float or None — None disables the safety gate. And this is where WiFi latency enters: it is uncontrolled at this level.

At the byte level: 18 milliseconds per frame, a 150-million-parameter vision transformer, a 280-token feature vector, and 1 to 2 tokens of text output. The llama-server wrapper adds about 4 milliseconds for text decoding on top of the 14-millisecond vision encoder pass. The Pico RP2040 microcontroller sends IMU data at 100 Hz over USB serial. Crucially: llama-server cannot expose multimodal intermediate embeddings — this blocks Phase 2d without a separate SigLIP 2 model as a sidecar.

At the physics level: household WiFi RF, motor momentum, lidar beam geometry, and 1.7 centimeters of robot travel between consecutive VLM frames at 1 meter per second. WiFi latency can spike to 100 milliseconds. Motor momentum carries 30 degrees past an IMU target at speed 30. Lidar cannot see above-plane obstacles like shelves or hanging objects.

Now: where do the abstractions leak?

The first and most load-bearing leak is WiFi. The clean 4-tier hierarchy shows Titan, Panda, and Pi connected by arrows. In reality they are connected by household 2.4 GHz WiFi. When WiFi spikes above 100 milliseconds — a cliff edge that Lens 04 characterizes — the strategic and tactical tiers stall. The only tier that keeps running is the reactive ESTOP, because it runs locally on Pi. The 4-tier collaboration collapses to single-tier survival mode.

The second leak is semantic. At 30,000 feet the pitch is "navigates to named rooms." At ground level the VLM outputs "LEFT MEDIUM." Two words for position, two for distance. No coordinates, no confidence, no map reference. The entire Phase 2c roadmap — attaching scene labels to SLAM grid cells to create a queryable semantic map — exists to bridge this single abstraction gap. Until Phase 2c is deployed, "go to the kitchen" only works when the kitchen is currently in the camera frame.

The third leak is in the kinematic tier at the hardware boundary. The Pico RP2040 can drop to its interactive REPL — a crash mode where it silently stops publishing IMU data. No upper tier detects this automatically. The kinematic tier goes dark, heading drift accumulates, and tactical and reactive tiers continue without correction. A hardware reality — a microcontroller with an interactive console — bypasses every software health model.

The fourth and deepest leak is the tier count itself. The four-tier hierarchy is a post-hoc rationalization of how the code happens to be wired — not a first-principles derivation of how the hardware should be partitioned. The Pi 5 carries a Hailo-8 AI HAT+ with 26 TOPS of NPU throughput that is currently idle for navigation. YOLOv8n runs on it at 430 frames per second with less than 10 milliseconds of latency and zero WiFi dependency. Activating it dissolves the four-tier story into a five-tier hierarchy with a new L1 safety reflex sitting below the current reactive tier — on-robot obstacle detection that pre-empts the lidar ESTOP, survives WiFi drops, and returns pixel-precise bounding boxes instead of qualitative blocked-or-clear tokens. The description "Pi is sensor-only, Panda is the perception brain" is a convention inherited from the current code topology — not a physical constraint. Panda itself is on a shelf in another room, not on the robot. The future Orin-NX-native robot will collapse L1, L2, and L3 onto a single onboard device and the tier distinction will disappear entirely.

The key finding: the abstraction hierarchy is real, but the tier numbers themselves are artifacts of current wiring. Moving between altitude levels does not just change the level of detail — it reveals that diagrams tend to describe code layout, not hardware capability. The four-tier diagram has been describing three tiers of software plus a convention, all along.