LENS 07

Landscape Map

"Where does this sit among all the alternatives?"

SENSOR RICHNESS vs AUTONOMY LEVEL — 12 SYSTEMS MAPPED

Sensor Richness (single camera → full suite) →

↑ Autonomy Level (reactive → fully learned)

NaVid

Pure SLAM

OK-Robot

SayCan

AnyLoc

Annie VLM‑Primary

VLMaps

Active NSLAM

GR00T N1

Tesla FSD v12

Waymo 6G

Annie + Hailo‑8

NanoOWL 102 FPS

GroundingDINO 75 FPS

YOLO‑World 38 FPS

?? Edge+Rich (empty)

Annie (this project) Annie + Hailo-8 (projected) Industry systems Academic systems Empty quadrant Reference baseline

The two axes that genuinely separate these 12 systems are not the obvious ones. "Number of sensors" is a proxy — what it really measures is information throughput per inference cycle: how many independent signals arrive at the decision layer per second. And "autonomy level" is a proxy for where the decision boundary lives: does classical geometry make the motion decision (reactive), does a learned module make it (partial), or does an end-to-end network own the entire chain from pixels to motor command (fully learned)? Once you reframe the axes this way, the landscape becomes legible. Waymo is maximum information throughput (lidar + camera + radar + HD map + fleet telemetry) combined with a decision boundary that lives entirely inside learned modules. Tesla FSD v12 is surprising: eight cameras is richer than one but far below Waymo's multi-modal suite — yet it sits at the highest autonomy level because the end-to-end neural planner removed every classical decision point. Tesla is not at the top-right corner; it is at the top-center, which is its distinctive claim: more autonomy with fewer sensors than anyone thought possible.

Annie's position at roughly x=28%, y=60% is not a compromise — it is the only system in the entire map that deliberately occupies the "low sensor richness + high edge-compute exploitation" quadrant. Consider what the map shows: all the academic systems (VLMaps, OK-Robot, Active Neural SLAM, SayCan, NaVid, AnyLoc) cluster along the left edge, with sensor richness constrained by lab budgets, and autonomy levels in the 30–70% band. All the industry systems (Tesla, Waymo, GR00T N1) move right and up together — more sensors and more learned autonomy are correlated at scale because both require capital. Annie breaks this correlation. It has strictly limited sensors (one camera, one lidar, one IMU — cheaper than any lab system) but deploys a 2B-parameter VLM at 54–58 Hz on edge hardware, enabling multi-query tactical perception that no academic monocular system achieves. The 4-tier hierarchy (Titan at 1–2 Hz, Panda VLM at 10–54 Hz, Pi lidar at 10 Hz, Pi IMU at 100 Hz) is what pushes autonomy level above the academic cluster without adding sensors. This is the position the map reveals: edge compute density, not sensor count, is the real axis that Annie is maximizing.

The dashed amber bubble shows where Annie lands once the idle Hailo-8 AI HAT+ on the Pi 5 (26 TOPS) is activated: she shifts rightward and slightly up on the reframed axes even though no new sensor is added. The same camera stream gets consumed twice — once by the on-Pi Hailo NPU at YOLOv8n 430 FPS for reactive L1 obstacle safety with sub-10 ms latency and zero WiFi dependency, and once by the Panda VLM at 54 Hz for semantic grounding. This is the dual-process pattern from the IROS indoor-nav paper (System 1 + System 2, 66% latency reduction) instantiated on hardware Annie already owns. The shift is not cosmetic: it quantifies "how much inference work is extracted per pixel per second," which is exactly what the x-axis really measures once reframed. The cyan cluster at mid-x (NanoOWL at 102 FPS, GroundingDINO 1.5 Edge at 75 FPS with 36.2 AP zero-shot, YOLO-World-S at 38 FPS) is a second new feature of the landscape — a band of open-vocabulary detectors that sits structurally between fixed-class YOLO and full VLMs, understanding text prompts like "kitchen" or "door" without running a full language model.

The empty quadrant is the crown jewel of this map: top-left as conventionally drawn, but in the reframed axes it is "single-camera + full semantic autonomy." The dashed coral bubble at x=28%, y=88% marks where Annie would be after Phase 2d/2e: same sensor richness, dramatically higher autonomy through embedding-based semantic memory, AnyLoc visual loop closure, and topological place graphs built without offline training. No system lives in this quadrant today. NaVid (video-based VLM, no map) has the right sensor profile but deliberately discards spatial memory — it is reactive by design. VLMaps has the right autonomy architecture but requires offline exploration sweeps and dense GPU infrastructure. The empty quadrant demands a specific combination: a persistent semantic map built incrementally from a single camera, using foundation model embeddings rather than custom training, running on edge hardware. That is precisely Annie's Phase 2c–2e roadmap. The gap is not accidental — it exists because academic systems are optimized for controllable benchmarks (which favor known environments and pre-exploration) and industry systems are optimized for scale (which justifies sensor investment). An always-on personal home robot has neither constraint. It must learn one environment over months of natural use, from one sensor, on hardware that costs less than a high-end smartphone.

From a strategic positioning standpoint, Lens 05 (evolution timeline) established that the field's bottleneck has shifted from spatial memory to semantic grounding to deployment integration to the text-motor gap. The landscape map shows the same transition from a spatial perspective: the over-crowded zone is the mid-left cluster of academic monocular systems — diminishing returns territory, because every incremental semantic improvement in that cluster still requires offline setup. The over-crowded zone on the right is the sensor-rich industry tier — unreachable without fleet capital. The unpopulated space between them, where Annie sits, is not a no-man's-land of compromise. It is the only zone where the constraint set of personal robotics can be satisfied: one home, one robot, always on, no pre-training, no sensor budget, but full use of the latest foundation models on edge hardware. As Lens 14 (research contradiction) notes, the research paper itself describes the Waymo pattern and then does the opposite — which turns out to be correct for the actual deployment context. The landscape map makes that inversion visible as a deliberate edge bet, not a shortcut.

Nova: The overcrowded zones tell you where the returns are diminishing. Everyone is piling into academic monocular-reactive (left-mid cluster) and industry sensor-rich-learned (top-right cluster). The gap between them — edge hardware, single camera, high semantic autonomy — has exactly one system in it: Annie. That gap exists because the two dominant funding structures (academic labs optimizing for benchmark reproducibility, industry optimizing for fleet scale) both make different assumptions that exclude it. Academic labs assume controllable pre-exploration. Industry assumes sensor budgets. A personal home robot violates both assumptions simultaneously, which is why the gap is real and not just unmapped — it's structurally excluded from where the field directs its attention. Annie's position there is not accidental; it's the only position that fits the actual constraint set.

Activating the idle Hailo-8 moves Annie further into her unique quadrant. 26 TOPS on the Pi 5, YOLOv8n at 430 FPS, sub-10 ms latency, zero WiFi dependency. Same sensors, higher edge-compute density — the axis that actually matters gets exploited harder without any hardware purchase.
A new cluster has formed between fixed-class and full-VLM. Open-vocabulary detectors — NanoOWL (102 FPS), GroundingDINO 1.5 Edge (75 FPS, 36.2 AP zero-shot), YOLO-World-S (38 FPS) — understand text prompts without running a language model. This band didn't exist on the original landscape and it changes what "middle of the map" means for any future personal-robotics entrant.

Think: The reframing of the axes reveals something uncomfortable. If "sensor richness" is really "information throughput per inference cycle" and "autonomy level" is really "where the decision boundary lives," then the most interesting axis is the one the map doesn't show: time. Waymo's decision boundary has been moving left (more classical safety overrides reintroduced as autonomy failures accumulated). Tesla's has been moving up (more of the stack replaced by neural). Annie's is moving up-right simultaneously (more sensors via better VLM utilization, more autonomy via semantic memory). The static snapshot hides the trajectories. On a map of trajectories, Annie is the only system whose direction-of-motion points toward the empty quadrant from below, while industry systems spiral around the top-right corner and academic systems cluster in place. Which trajectory reaches the empty quadrant first?