LENS 03: DEPENDENCY TELESCOPE — CROSS-LENS NOTES Generated: 2026-04-14, Session 97 === CONFIRMED CROSS-LENS CONVERGENCE POINTS === 1. llama-server embedding blocker (Lens 03 primary finding, confirmed) - This is the highest-leverage single dependency change in the system. - llama-server's architecture exposes a clean text-generation API but treats vision encoder output as internal state, not an accessible endpoint. - Fixes: (a) patch llama-server to expose /v1/embeddings for multimodal inputs, (b) replace with a Python inference script using transformers directly, (c) deploy SigLIP 2 as a separate 800 MB VRAM sidecar (current workaround in research). - Cross-lens: Lens 01 (constraint hierarchy) — this is a TOOL constraint (llama-server API), not a physics constraint. Tool constraints are the most addressable tier. - Cross-lens: Lens 24 (Ghost Inventory, 18-gap audit) — the embedding gap is likely listed as an unresolved prerequisite. Lens 03 provides the root cause: it's a server API gap, not a model capability gap. 2. WiFi cliff edge at 100ms (Lenses 03, 04, 10, 13, 25 all converge here) - Lens 03 adds: the cascade structure. WiFi saturation doesn't just slow VLM throughput — it silently degrades three downstream phases (2c, 2d, 2e) via reduced scene label quality. - Lens 03 adds: the JPEG compression quality config is a hidden tuning knob that determines the WiFi bandwidth budget. It is currently manually set with no adaptive logic. - No lens has yet proposed the adaptive rate control solution: a watchdog that monitors Pi→Panda round-trip latency and dynamically reduces VLM query rate from 54 Hz to 10 Hz when latency exceeds 80ms. This is a gap across all four lenses. - Note for assembly: This convergence point should trigger a cross-lens callout box in the final HTML (a "lenses converge here" visual element). 3. Phase 1 SLAM prerequisite chain (Lenses 03, 10, 14) - Lens 03 maps the explicit dependency: 2c → 2d → 2e all blocked by Phase 1. - Three phases share a single dependency → failure correlation is 1.0, not independent. - Published probability table (2c: 65%, 2d: 55%, 2e: 50%) implies independence. It is not. The conditional probability of 2d given 2c failure ≈ 0% (2d requires 2c which requires SLAM). - Lens 10 found: "we built the fast path, forgot the slow path." Phase 1 SLAM is the slow path. - Lens 14: "research describes Waymo pattern then does the opposite (VLM-primary vs lidar-primary)." Dependency Telescope reframes this: VLM-primary is the RUNTIME path, but the setup path (Phase 1 SLAM) IS lidar-primary. The system is lidar-primary in build order, VLM-primary in operation. Not a contradiction — a clarification. - For assembly: The prerequisite chain visualization (Phase 1 → 2c → 2d → 2e as a sequential dependency waterfall) is a strong visual for the Dependency Telescope tree. 4. Camera-lidar calibration as hidden prerequisite (Lens 24, Ghost Inventory) - Lens 24 identified camera-lidar calibration as one of the 18 gaps. - Lens 03 upstream trace: semantic map annotation (Phase 2c) requires attaching VLM scene labels to SLAM grid cells at "current pose." The pose comes from SLAM (lidar-derived). The image comes from the camera. If camera and lidar coordinate frames are not extrinsically calibrated, the pose annotation is wrong by the physical offset between sensors. - The research does not mention this calibration requirement. It is an implicit prerequisite of Phase 2c that appears in neither the prerequisites table nor the implementation roadmap. - Severity: MEDIUM-HIGH. The SLAM pose locates the robot's lidar frame. VLM perceives via camera. If camera is mounted 15cm forward of lidar and 20° rotated, every semantic label gets attached to the wrong grid cell. Room boundaries in the semantic map will be systematically offset. Place recognition (Phase 2d) will have degraded precision. - Fix: Document the camera-lidar extrinsic transform (even approximately: measure with ruler) and apply it when projecting VLM labels onto the grid. A 10-line coordinate transform. This fix belongs in Phase 2c pre-work, not later. === NEW FINDINGS FROM DEPENDENCY ANALYSIS === 5. GGUF conversion + llama.cpp compatibility — model upgrade risk - Every Gemma generation change requires: (a) new GGUF conversion pipeline, (b) llama.cpp version upgrade, (c) VRAM budget re-validation, (d) 54 Hz throughput re-verification. - Google's cadence: Gemma 2 (2024-Q3), Gemma 3 (2025-Q1), Gemma 4 (2026-Q1). Next: ~Q3 2026. - The inference pipeline is abstracted correctly (_ask_vlm takes image_b64 + prompt) but the GGUF build and llama-server compatibility step is not automated. This is a manual toil tax at each model generation — not a blocking risk, but a recurring one. - Recommendation: Document the GGUF conversion recipe in TITAN-SETUP-RECIPES.md now, while the process is fresh from the Gemma 4 swap. 6. Embedding storage — the data engineering dependency that doesn't exist yet - If llama-server embedding block is resolved: 280 tokens × 4 bytes × 54 Hz = ~60 KB/s of raw embedding data during navigation. A 2-hour session = ~432 MB of embeddings. - No storage layer for embeddings exists. The research says "store embeddings keyed by (x, y, heading) from SLAM" without addressing: deduplication, cosine index type (FAISS? hnswlib?), query latency at navigation speed, or session-to-session persistence. - This is a second-order dependency: unblocking llama-server immediately creates a data engineering requirement that must be satisfied before Phase 2d is useful, not just deployable. - Cross-lens: This is likely in Lens 24's ghost inventory. Flag for assembly. 7. Pico RP2040 IMU — silent failure, no automated recovery - Known failure: Pico drops to REPL silently. Detection: manual health polling (imu_healthy flag). - Dependency cascade when IMU fails: Tier 4 (kinematic correction) stops → heading drift accumulates → Tier 3 (SLAM localization) degrades → Tier 2 (VLM navigation commands) execute on a corrupted pose estimate → Tier 1 (strategic planning) receives wrong location. - The failure cascades through all four tiers before triggering any alert. - A watchdog that auto-detects imu_healthy=false and triggers a Pico soft-reboot (Ctrl-D via pyserial) would convert this from a cascading silent failure to a self-healing ~5 second outage. 8. Zenoh SLAM bridge — implemented but undeployed (session 89) - The SLAM+Zenoh bridge is committed and built but NOT running on Pi 5 as of session 96. - The dependency tree has a phantom node: Phase 1 SLAM is listed as a prerequisite for Phase 2c/2d/2e, but Phase 1 SLAM itself has an undeployed prerequisite (Zenoh build). - Risk: researchers and planners may assume Phase 1 SLAM is deployed-and-stable. It is implemented but not yet in production. This phantom prerequisite adds ~1 session of deploy+verify time before Phase 2c can begin. === DEPENDENCY STABILITY RATINGS === Stability scale: STABLE (unlikely to change) / WATCH (may change) / FRAGILE (likely to change) | Dependency | Stability | Reason | |------------------------|-----------|---------------------------------------------| | Gemma 4 E2B model | FRAGILE | Google release every ~6 months | | llama-server API | WATCH | Active OSS project, embedding support could | | | | appear or remain absent for years | | Panda Jetson 8 GB VRAM | STABLE | Fixed hardware | | Household WiFi | FRAGILE | Uncontrolled, usage-dependent | | slam_toolbox | WATCH | ROS2 Jazzy LTS support until 2027 | | RPLIDAR C1 | STABLE | Physical hardware; mechanical wear only | | Pico RP2040 IMU | WATCH | Known REPL crash failure mode | | SigLIP 2 model | WATCH | Google-controlled; likely superseded | | rf2o lidar odometry | STABLE | Patched for Zenoh; no active development | | rmw_zenoh_cpp | WATCH | Pinned to afcd981; upstream moves fast | === NEW FINDINGS FROM SESSION 119 HARDWARE AUDIT === 9. Hailo-8 AI HAT+ on Pi 5 — downstream-dependency demotion (mitigation available) - Key facts: 26 TOPS already on-robot, YOLOv8n at 430 FPS, zero WiFi traffic. - Dependency-graph effect: The WiFi cascade from Finding 2 becomes: "WiFi degrades → 3 Phase 2 phases degrade" (current) → "WiFi degrades → semantic features degrade, safety stays local" (with Hailo-8) - This is the single highest-leverage dependency restructuring available WITHOUT any new hardware purchase. Hardware cost: zero (already installed). Engineering cost: HailoRT/TAPPAS integration, YOLOv8n ONNX compilation, safety-layer wiring. - Dual-process pattern validation: IROS paper arXiv 2601.21506 reports 66% latency reduction, 67.5% vs 5.83% success rate for fast-reactive + slow-semantic layering. === CROSS-LENS CONNECTIONS (expanded) === → LENS 10 (Contradiction Detector / fast vs slow path): Lens 10's "we built the fast path, forgot the slow path" reframes elegantly with the Hailo-8 opportunity. Annie DOES have a "fast path" available (the idle Hailo-8), just unwired. The contradiction isn't missing infrastructure — it's under-utilized infrastructure. The dual-process pattern (fast reactive + slow semantic) resolves the contradiction by running BOTH paths in parallel at different frequencies. → LENS 13 (Opportunity Hunter): The Hailo-8 activation is THE highest-value opportunity surfaced in this research cycle. Hardware cost: zero. Dependency-graph impact: converts WiFi from safety-critical to semantic-only. IROS-paper-validated pattern. Track as Lens 13 primary opportunity. === QUESTIONS FOR ASSEMBLY PHASE === 1. Should the probability table in the research (2c:65%, 2d:55%, 2e:50%) be annotated with the dependency correlation note? These are not independent probabilities. 2. The camera-lidar calibration gap (Finding 4) is a concrete actionable: "measure and document the camera-to-lidar extrinsic transform before starting Phase 2c." Should this appear as a TODO in TODO-OPENCLAW-ADOPTION.md? 3. The Zenoh SLAM bridge undeployed status (Finding 8) may be stale — verify in next session before treating Phase 1 SLAM as a known-stable prerequisite. 4. Voice-queryable spatial memory is the most valuable accidental downstream consumer. Should it be scoped as an explicit feature in Phase 2c planning, or left emergent? The consent question ("who was in my bedroom?") needs a policy decision before the capability is deployed.