LENS 13

Constraint Analysis

"What assumptions must hold — and how fragile are they?"

Constraint Fragility Removable? Conflict With Tech Relaxation (3yr)
WiFi <100ms P95 HIGH — uncontrollable environment; microwave or neighbor's network spikes to 300ms silently. Partially RELAXED if Hailo-8 activates: L1 safety detection runs locally on Pi at 430 FPS (YOLOv8n), removing WiFi from the safety path HARD — household RF is not owned; Ethernet bridge possible but changes robot form factor Conflicts with 58Hz VLM loop: stacked spikes exceed one full nav cycle WiFi 7 multi-link reduces household jitter ~60%; dedicated 6GHz band helps but not guaranteed
Single 120° camera ARTIFICIAL — $15 rear USB cam + Pi USB port available; a blind spot is an engineering choice, not physics EASY — 30 minutes to mount + configure; rear cam eliminates surprise obstacles behind robot Conflicts with llama-server single-image API; multi-cam needs custom prompt routing Edge ViT models will do dual-cam fusion in <10ms on 8GB VRAM within 2 years
8GB VRAM on Panda MEDIUM — Gemma 4 E2B consumes ~4GB, leaving 4GB headroom; tight but not maxed. Partially RELAXED if Hailo-8 activates: L1 safety moves off Panda GPU entirely, freeing ~800 MB that unblocks SigLIP Phase 2d without contending with the VLM PARTIAL — retire IndicF5 (done, session 67) bought 2.8GB; next: SigLIP 2 needs ~800MB Conflicts with embedding extraction (Phase 2d): SigLIP + VLM approach 8GB ceiling 3-year trend: 1B models match today's 2B capability; Panda will have 4GB of new headroom
llama-server API limits MEDIUM — software constraint, patchable; embeddings not exposed for multimodal inputs WORKAROUND — deploy SigLIP 2 ViT-SO400M as separate extractor (~800MB); 2-day task (Lens 03) Low conflict: workaround is clean architectural separation, not a hack llama.cpp PR #8985 adds multimodal embedding extraction; likely merged within 12 months
SLAM prerequisite (Phase 1) MEDIUM — Phase 2c/2d/2e blocked; but Phase 2a/2b run fine without SLAM PARTIAL — SLAM deployed but NOT verified in production as of session 89; Zenoh fix pending deploy Conflicts with semantic map annotation: VLM labels need SLAM pose to attach to; no pose = floating labels Neural odometry (learned from IMU+cam without lidar) may eliminate SLAM dependency by 2027
No wheel encoders HIGH — dead-reckoning drift of 0.65m per room-loop observed in session 92; rf2o lidar odom is the only ground truth HARD — TurboPi hardware has no encoder port; requires motor swap or hall-effect sensor retrofit (~$40) Conflicts with precise turn calibration: IMU alone can't distinguish motor slip from legitimate motion Visual odometry from monocular camera approaching encoder-class accuracy for indoor slow-speed robots
Glass/transparent surfaces HIGH — both sensors fail simultaneously: lidar light passes through, camera sees reflection not obstacle; dual sensor failure with zero fallback HARD — requires polarized lidar or IR depth camera; no $15 fix; fundamental physics Conflicts with "VLM proposes, lidar disposes" rule: VLM may warn "glass door ahead" but lidar says "clear" ToF sensors (OAK-D Lite, ~$100) handle glass via IR reflection; likely affordable edge option within 2 years
Motor overshoot on small turns HIGH — 5° commanded → 37° actual at speed 30; 640% overshoot causes oscillation in homing/trim sequences FIXABLE — coast prediction or pre-brake in firmware; estimated 1-session fix; homing already compensates via achieved_deg Conflicts with ArUco homing precision: right-turn undershoot being tuned suggests compound error stacking Field-oriented control (FOC) drivers for brushed motors solve momentum overshoot; available now at ~$20
Pico IMU stability HIGH — crashes to REPL unpredictably; IMU health is binary (healthy / fully absent); no graceful degradation PARTIAL — soft-reboot protocol documented (Ctrl-D); root cause unknown; could be I2C noise, power glitch, or firmware bug Conflicts with heading-corrected turns: IMU crash forces open-loop fallback, compounding motor overshoot errors No technology will fix an undiagnosed hardware/firmware bug; this needs root-cause investigation, not time

Fragility: HIGH = likely to break  |  MEDIUM = conditional  |  LOW = artificial/fixable

Three constraints form a compounding failure cluster, not three independent risks. WiFi latency, Pico IMU stability, and motor overshoot interact in a way that is worse than their individual impacts suggest. When the Pico drops to REPL, the nav loop falls back to open-loop motor commands — exactly the regime where momentum overshoot is most dangerous, because there is no IMU correction available to detect or recover from the overshoot. If this happens mid-corridor and the WiFi simultaneously spikes (as it does when Panda's Ethernet-to-WiFi bridge is under load), three successive commands arrive late to a robot that is already spinning uncontrolled. Lens 01 identified temporal surplus as this system's primary free resource; the compounding cluster burns that surplus in milliseconds. The individual fragility scores in the matrix understate the joint risk because they were assessed in isolation. The WiFi-IMU-overshoot triple failure is the scenario that matters most for production deployment.

The glass surface problem is the most fundamentally hard constraint in the matrix — and also the one most likely to be ignored until it causes a real incident. Every other constraint has either a workaround, a software fix, or a hardware upgrade path. Glass fails both sensors simultaneously: the 360nm lidar wavelength passes through glass panels with enough transmission that the return is below noise floor, while the camera shows a reflection of the room behind the robot rather than the obstacle in front. The "VLM proposes, lidar disposes" fusion rule (Lens 04) breaks down specifically here: VLM may correctly identify "glass door" from visual context clues (frame edges, handle, partial reflection), but lidar says "clear" and the safety daemon vetoes any ESTOP. This is the only scenario where the sensors' complementarity becomes a liability — both channels agree on the wrong answer. Lens 10 named it in the failure pre-mortem and Lens 11's adversarial analysis flagged it as the highest-probability unresolved safety issue. A ToF depth sensor solving glass detection is available today for ~$100; the constraint is artificial in the sense that it reflects a hardware budget decision, not a physics impossibility.

Two constraints are genuinely artificial and could be removed in a single session. Motor overshoot has a documented fix — coast prediction or pre-brake added to the firmware's turn sequence — and the homing system already compensates for it via the achieved_deg prediction hack, which means the problem is fully understood and the path to the fix is clear. The llama-server embedding blocker (Lens 03) has an equally clean workaround: a standalone SigLIP 2 ViT-SO400M consuming ~800MB of the available 4GB headroom on Panda unlocks Phase 2d entirely. Both of these constraints persist not because they are hard but because the sessions that built the current system moved on to the next feature once a workaround was in place. The pattern is consistent with OK-Robot's finding that integration quality, not model capability, determines real-world performance — the workarounds are good enough for demos but create compounding technical debt in production.

Technology will relax the VRAM and model-size constraints first, but not the physical sensor constraints. The 3-year model trajectory is clear: 1B-parameter VLMs will match today's 2B capability (Gemma 4 E2B), freeing roughly 2GB of Panda's 8GB for embedding extraction, AnyLoc, and SigLIP simultaneously. The llama-server API limitation will dissolve when multimodal embedding extraction lands in llama.cpp (PR already in review). The Hailo-8 AI HAT+ on the Pi 5 — 26 TOPS of silicon that currently sits idle — partially RELAXES two matrix constraints at once: activating it as an L1 safety layer moves YOLOv8n obstacle detection off WiFi (430 FPS local, <10 ms, zero jitter exposure on the safety path) and off Panda's GPU (~800 MB freed, which is exactly the SigLIP Phase 2d budget called out in Lens 03). The IROS dual-process paper (arXiv 2601.21506) measured this pattern for indoor navigation — 66% latency reduction and 67.5% success versus 5.83% for VLM-only — validating the System 1 / System 2 split Annie's hardware already supports. WiFi 7 multi-link reduces household jitter but does not eliminate it — the Achilles' heel identified in Lenses 04 and 25 is structural, not generational. Glass surfaces and the absence of wheel encoders will remain exactly as hard in 2028 as they are today: both require physical hardware changes that no software release or model improvement can substitute for. The matrix reveals that the constraints most amenable to technology relaxation are the ones least urgently in need of fixing, while the constraints most urgently dangerous — WiFi jitter, Pico crash, glass — are the ones technology either cannot fix or requires hardware changes to address.

The most fragile constraint is WiFi, and it's uncontrollable by design. Household RF is shared infrastructure — a microwave 3 meters away can spike a 5GHz channel from 15ms to 300ms without any visible indication. Unlike every other constraint in the matrix, WiFi cannot be debugged, patched, or worked around through software. The only structural fix is moving the command channel off WiFi entirely (wired Ethernet bridge) — which the robot's form factor makes awkward but not impossible.

The artificially imposed constraint with the highest leverage is motor overshoot. One session of firmware work — adding coast prediction to the turn sequence — converts a 640% overshoot hazard into a controllable 5–15% residual. The homing compensator already proves the model is correct. Removing this constraint unblocks precise ArUco approach, eliminates the IMU-crash-plus-overshoot compounding failure, and makes small corrective turns reliable enough to trust for semantic waypoint navigation in Phase 2c.

When WiFi and IMU constraints conflict simultaneously, the system has no safe state. Open-loop fallback (IMU absent) plus command latency (WiFi spiking) is a scenario where the robot is executing stale commands with no heading correction and no ability to detect overshoot. This is the production failure mode that Lens 10's pre-mortem did not fully articulate. The fix is not a third sensor — it is a hard ESTOP policy: if IMU is absent AND WiFi P95 exceeds 80ms, refuse all forward motion and wait for both constraints to recover.

The idle Hailo-8 on the Pi 5 is the highest-leverage unused resource in the system. 26 TOPS of on-board NPU silicon has been on the BOM since day one and untouched for navigation. Activating it as an L1 safety layer partially RELAXES both WiFi latency (safety moves local, YOLOv8n at 430 FPS, <10 ms, zero WiFi) and Panda VRAM (~800 MB freed for SigLIP — see Lens 03). The IROS dual-process paper (arXiv 2601.21506) measured 66% latency reduction and 67.5% nav success versus 5.83% VLM-only for exactly this System 1 / System 2 split. The relaxation is not free: it introduces HailoRT and the .hef compilation pipeline as a new subsystem to maintain alongside llama-server. The hybrid architecture (Hailo L1 + VLM L2/L3 + Titan L4) is a trade across runtime ecosystems — worth it for the safety-path and VRAM payoff, but plan the activation carefully.

Which single constraint removal would make Annie's navigation system qualitatively more capable — not just quantitatively faster or more accurate?

Click to reveal

The SLAM prerequisite. Every other constraint improvement is incremental: better WiFi reduces incidents, motor fix improves homing accuracy, SigLIP workaround unlocks embeddings. But Phase 1 SLAM deployment — the one constraint that remains "pending deploy" after session 89 — is a phase transition, not an improvement. With SLAM, VLM labels become spatial memories that persist across sessions, Annie can answer "where is the kitchen?" from accumulated observation rather than real-time inference, and Phase 2c-2e become accessible. Without SLAM, Annie is permanently a reactive navigator with no persistent world model, regardless of how well the other constraints are managed. Deploying the Zenoh fix and verifying SLAM in production is not one task among many — it is the prerequisite that transforms the system from a fast local reactor into a system with genuine spatial memory.