LENS 15

Constraint Relaxation

"What if the rules changed — or what if they were already negotiable?"

CONSTRAINT RELAXATION MAP — INCLUDING ZERO-CAPEX DORMANT-HARDWARE ACTIVATION

CURRENT: WiFi

Constraint: 20–100ms latency, ±80ms variance. Cliff edge at ~100ms destroys temporal surplus at 1 m/s.

Cost of status quo: Random WiFi spikes cause ~4 collisions per hour in a busy channel environment. Every microwave and neighboring network is a production hazard.

METRIC: latency 20–100ms  |  variance ±80ms  |  COST: $0

RELAXED: USB-C Tether

What changes: 5ms guaranteed latency, zero variance. Cliff edge disappears entirely. Nav loop becomes deterministic.

What you give up: Tether limits roaming range to ~2m cable length. Acceptable for kitchen→living room indoor routes via cable reel.

METRIC: latency <5ms  |  variance ±0.5ms  |  COST: $8 USB cable

CURRENT: Monocular Camera

Constraint: No depth signal from camera. VLM must infer "SMALL/MEDIUM/LARGE" as proxy for distance. Fails on textureless surfaces (white walls, glass doors).

Cost of status quo: VLM obstacle accuracy ~60–70% on cluttered scenes. Glass and mirrors cause phantom free-space readings that bypass the lidar ESTOP.

METRIC: depth accuracy ~0%  |  VLM obstacle recall ~65%  |  COST: $0

RELAXED: Intel RealSense D405

What changes: Per-pixel depth at 30 Hz. Obstacle recall climbs to ~90%+. Eliminates glass/mirror false negatives. VLM can focus on semantics, not depth estimation.

What you give up: Extra USB port (Pi 5 has 2 remaining). Weight +~120g. D405 needs 0.07m min distance — chair legs <7cm away are a known blind zone.

METRIC: depth accuracy ~95%  |  obstacle recall ~90%  |  COST: $59 USD

CURRENT: 1 m/s Max Speed

Constraint: At 1 m/s, 100ms WiFi spike = 10cm positional uncertainty per command — half a robot body width. Motor momentum causes 640% turn overshoot at speed 30. Nav loop operates at its physics limit.

Cost of status quo: Homing overshoots require multi-step recovery. Tight corridor navigation requires ESTOP-pause-retry cycles averaging 3× longer than open-floor nav.

METRIC: 1 m/s  |  10cm/100ms slack  |  turn overshoot: +640%  |  COST: $0

RELAXED: 0.3 m/s Cap

What changes: 100ms WiFi spike = 3cm uncertainty (half a lidar resolution cell). Turn overshoot becomes negligible — momentum at 0.3× speed is sub-mm. ArUco homing closes reliably in a single pass.

What you give up: Crossing a 5m room takes 17s instead of 5s. No hardware cost. Speed can be raised to 0.5 m/s for open straight-line corridors and dropped to 0.2 m/s near furniture automatically.

METRIC: 0.3 m/s  |  3cm/100ms slack  |  turn overshoot: ~0%  |  COST: $0

CURRENT: 90%+ Accuracy Target

Constraint: System complexity (Panda GPU, WiFi, multi-query pipeline, 4-tier fusion) exists to push goal-finding from ~60% to ~90%. Hardware cost: Panda Orange Pi 5 Plus + 8GB VRAM = ~$200 of the nav budget.

Cost of status quo: Panda is a single point of failure. If Panda reboots, Annie has zero nav capability. The "last 40% accuracy" requires 100% of the distributed hardware.

METRIC: ~90% goal-finding  |  4-tier system  |  COST: ~$200 GPU hardware

RELAXED: 60% + Retry Loop

What changes: Pi 5 CPU alone runs a 400M VLM at ~8 Hz. Goal-finding ~60%. But a retry loop ("turn 45°, try again") recovers most misses in 2–3 attempts. End-to-end task success rate ~85% with retries — at zero GPU cost.

What you give up: Each retry adds ~8s (turn + settle + re-query). Time-to-goal grows from ~15s to ~30s average. Acceptable for fetch-my-charger use cases; unacceptable for urgent response.

METRIC: 60% first-try  |  ~85% with retry  |  COST: -$200 (remove Panda)

CURRENT: WiFi-Dependent Safety Layer

Constraint: Obstacle detection rides the VLM-over-WiFi path. When WiFi drops, Annie loses her semantic safety net and falls back to sonar/lidar ESTOP alone. Pi 5 CPU cannot run a meaningful detector at nav speeds.

Cost of status quo: Safety is coupled to a best-effort network. WiFi variance (±80ms) pushes reactive stops past the physical stopping distance at 1 m/s.

METRIC: detection Hz ≈ VLM 54 Hz via WiFi  |  fail-open on WiFi drop  |  COST: $0

RELAXED: Hailo-8 Local L1 Safety (DORMANT-HARDWARE ACTIVATION)

What changes: Pi 5 already carries an idle Hailo-8 AI HAT+ at 26 TOPS. Activating it runs YOLOv8n at 430 FPS, <10ms, zero WiFi. Becomes the always-available reactive safety layer beneath the VLM.

What you give up: HailoRT/TAPPAS integration effort; COCO-class fixed vocabulary at L1. Semantic queries still go to the VLM — but are no longer safety-critical.

METRIC: 430 FPS YOLOv8n local  |  <10ms latency  |  COST: $0 (already owned)

CURRENT: Gemma 4 E2B Does Everything

Constraint: The same 3.2 GB Gemma 4 E2B VLM on Panda handles goal-finding ("where is the kitchen?"), scene classification, obstacle reasoning, and open-ended Q&A. One model, one VRAM budget, one latency profile for all four tasks.

Cost of status quo: Simple goal-lookups ("find the door") pay full VLM autoregressive cost — 54 Hz ceiling, text-decoding tax per frame. Detection-shaped tasks are overpaying for reasoning capacity they do not use.

METRIC: all tasks via VLM  |  3.2 GB VRAM  |  54 Hz ceiling  |  COST: $0

RELAXED: Open-Vocab Detection + Gemma for Reasoning Only

What changes: Route goal-finding to NanoOWL (102 FPS) or GroundingDINO 1.5 Edge (75 FPS, 36.2 AP zero-shot) via TensorRT on Panda — a fraction of Gemma's VRAM. Gemma stays resident for true semantic reasoning ("is the glass door closed?"). Two tools, right-sized.

What you give up: Pipeline complexity grows by one model; prompt parsing split between two surfaces. Open-vocab detectors can't answer freeform questions — so VLM remains mandatory, just not on the critical path for every frame.

METRIC: 75–102 FPS goal-find  |  VRAM-light  |  Gemma freed for reasoning  |  COST: $0

coral = current constraint  |  green = relaxed state  |  rows 5–6 are zero-capex relaxations on hardware/models already owned  |  latency figures at 1 m/s unless noted

The "last 40% accuracy costs 10x the hardware" observation is the load-bearing truth of this architecture. Annie's nav stack at 60% goal-finding accuracy needs: one Pi 5 ($80), one lidar ($35), one USB camera ($25). Total hardware: under $150. Annie's nav stack at 90% goal-finding accuracy needs: all of the above, plus a Panda Orange Pi 5 Plus with 8GB VRAM ($200), a reliable 5GHz WiFi channel (dedicated AP, $40), and a 4-tier software architecture spanning three machines. The marginal 30 percentage points of accuracy cost roughly 2.5× the total hardware budget and all of the distributed-system complexity. That tradeoff is not obviously worth making for a home robot whose worst-case failure mode is "turn around and try again."

There is a relaxation pattern even cheaper than "buy a smaller model" — call it dormant-hardware activation. Before any new purchase, Annie's owner already has three idle compute tiers that the original architecture did not count: (1) the Hailo-8 AI HAT+ on Pi 5 — 26 TOPS, sitting idle for navigation today, capable of YOLOv8n at 430 FPS with sub-10ms latency and zero WiFi dependency; (2) Beast, a second DGX Spark with 128 GB unified memory, always-on but workload-idle since session 449; and (3) an Orin NX 16GB module at 100 TOPS Ampere, already owned and reserved for a future Orin-native robot chassis. This changes the constraint math. The VRAM ceiling that forced Gemma 4 E2B to juggle four jobs, the WiFi cliff-edge that made safety feel fragile, the compute budget that capped multi-model pipelines — all become negotiable without buying anything. This is zero-capex relaxation: unlike spending $250 on an Orin NX or $500 on a bigger GPU, activating hardware you already own costs only engineering time.

Three constraints are relaxable today, for under $200 combined, with immediate effect on reliability. First: speed. Dropping from 1 m/s to 0.3 m/s costs nothing and eliminates the two most documented failure modes in the session logs — turn overshoot (640% at speed 30) and WiFi-induced positional drift (10cm per 100ms spike). The nav physics simply become forgiving at low speed. Second: accuracy target. Accepting 60% first-try accuracy with a retry loop produces ~85% task success — within 5 points of the current 90% target — at zero hardware cost, no Panda required. Third: WiFi to USB tether. An $8 cable eliminates the cliff edge that Lens 04 identified as the single highest-risk parameter in the entire system, at the cost of a 2m tether that a retractable cable reel can absorb.

The constraint the user does not actually care about is SLAM accuracy. The Phase 1 and Phase 2 research treats SLAM map fidelity as a foundational requirement — accurate localization enables semantic map annotation, loop closure, and goal-relative path planning. But for Annie's actual use cases (fetch charger, return to dock, avoid Mom), the robot does not need to know it is at coordinate (2.3m, 1.1m) in a globally consistent map. It needs to know: is the goal in frame? Is something blocking forward motion? Have I been here before? All three questions are answerable with the VLM alone, without a SLAM map, to 60–70% accuracy. The SLAM investment buys the remaining 20–30 points of spatial consistency at the cost of 3 additional services (rf2o, EKF, slam_toolbox) and a Docker container that has required 5 dedicated debugging sessions to stabilize.

Hardware trends will relax the VRAM constraint within 18–24 months — but dormant-hardware activation collapses that timeline to weeks. The binding constraint for running VLM + SigLIP simultaneously is the 8GB VRAM ceiling on Panda's Mali GPU. The Jetson Orin NX 16GB (already owned, reserved for the future robot chassis) doubles that ceiling at $0 incremental cost the day it is activated. Beast's 128 GB unified memory can host any specialist model the pipeline needs without touching Panda's budget at all. And Hailo-8 carries the safety layer off-GPU entirely — no VRAM required. The "VRAM per model" curve is following the same trajectory as CPU megahertz in the 1990s: what requires dedicated hardware today will be a background service tomorrow. But Annie's household doesn't have to wait for 2027 — the dormant compute is already on-site.

The most architecturally disruptive relaxation is right-sizing the model to the task. Every "LEFT MEDIUM" command passes through Gemma 4 E2B's full autoregressive stack — a step that pays for reasoning capacity on a task (detection) that doesn't need it. Open-vocabulary detectors close this gap directly: NanoOWL at 102 FPS handles simple noun goals ("kitchen", "door", "person"); GroundingDINO 1.5 Edge at 75 FPS with 36.2 AP zero-shot handles richer prompts. Both fit TensorRT on Panda in a fraction of Gemma's 3.2 GB. Route goal-finding and scene classification to them; keep Gemma resident for questions that genuinely require language ("is the glass door closed?" "is Mom in the room?"). The VLM stops being the critical path for every frame and becomes the slow deliberative layer — the System 2 of a proper dual-process stack. And with the Hailo-8 added as L1 safety, the architecture finally matches the IROS dual-process result (66% latency reduction, 67.5% vs 5.83% success) without a single new hardware purchase. (Cross-ref Lens 06 on reliability layering, Lens 13 on right-sized models.)

The "last 40% accuracy costs 10x hardware" framing clarifies the build decision. If Annie's task success rate at 60% accuracy + retry is 85%, and the current 90% accuracy costs 2.5× the hardware budget plus all distributed complexity, the question becomes: is that 5-point gap worth $200 and three extra failure modes? For a home robot, probably not. For a production product, it depends on what "failure" costs the user.

Three idle compute tiers make "zero-capex relaxation" a real option. Hailo-8 AI HAT+ (26 TOPS, Pi 5, idle for nav) can host the L1 safety layer at 430 FPS with no WiFi dependency. Beast (2nd DGX Spark, 128 GB, workload-idle since session 449) can host specialist models without touching Panda's VRAM. Orin NX 16GB (100 TOPS Ampere, owned) is a 2x VRAM headroom upgrade whenever the chassis is ready. The VRAM/WiFi/compute constraints that shaped the original research are negotiable today, without spending a rupee — the only cost is engineering time.

Right-size the model to the task. NanoOWL at 102 FPS and GroundingDINO 1.5 Edge at 75 FPS are VRAM-light open-vocab detectors that can absorb goal-finding and free Gemma 4 E2B for real reasoning. Two tools sized to their job beats one tool overpaying for generality on every frame.

Speed is a free constraint to relax. 0.3 m/s eliminates turn overshoot, WiFi drift, and homing undershoot with zero hardware change. The nav physics become forgiving. Time-to-goal doubles — irrelevant for fetch-and-return tasks, slightly annoying for real-time following.

The constraint the user does not care about is SLAM accuracy. Five debugging sessions to stabilize three SLAM services suggests the investment-to-value ratio is inverted. The VLM alone — no map — handles the actual use cases at 60–70% accuracy, recoverable with retry.

If you had to deploy Annie into a new home tomorrow with a $50 budget, which constraints would you relax first?

Click to reveal

Spend $0 first: cap speed at 0.3 m/s in config, add a retry loop to the nav tool (turn 45°, re-query, up to 3 attempts), and activate the Hailo-8 AI HAT+ that's already on the Pi 5 as the L1 safety layer — YOLOv8n at 430 FPS, <10ms, no WiFi needed. That alone brings task success from ~60% to ~85%, removes WiFi from the safety path, and costs nothing because every piece of hardware is already owned. Then spend $8 on a USB-C cable through a retractable reel. The remaining $42 buys nothing that matters as much as these four changes. The Panda, the SLAM stack, the 4-tier architecture, the "buy an Orin NX" impulse — those are "last 40% accuracy" purchases. They wait until the 85% baseline is boring.