LENS 07 — LANDSCAPE MAP
"Where does this sit among all the alternatives?"

---

PARAGRAPH 1

The two axes that genuinely separate these 12 systems are not the obvious ones. "Number of sensors" is a proxy — what it really measures is information throughput per inference cycle: how many independent signals arrive at the decision layer per second. And "autonomy level" is a proxy for where the decision boundary lives: does classical geometry make the motion decision, does a learned module make it, or does an end-to-end network own the entire chain from pixels to motor command?

Once you reframe the axes this way, the landscape becomes legible. Waymo is maximum information throughput — lidar plus camera plus radar plus HD map plus fleet telemetry — combined with a decision boundary that lives entirely inside learned modules. Tesla FSD version 12 is surprising: eight cameras is richer than one but far below Waymo's multi-modal suite — yet it sits at the highest autonomy level because the end-to-end neural planner removed every classical decision point. Tesla is not at the top-right corner; it is at the top-center, which is its distinctive claim: more autonomy with fewer sensors than anyone thought possible.

---

PARAGRAPH 2

Annie's position is not a compromise — it is the only system in the entire map that deliberately occupies the "low sensor richness plus high edge-compute exploitation" quadrant. Consider what the map shows: all the academic systems — VLMaps, OK-Robot, Active Neural SLAM, SayCan, NaVid, AnyLoc — cluster along the left edge, with sensor richness constrained by lab budgets, and autonomy levels in the 30 to 70 percent band. All the industry systems — Tesla, Waymo, GR00T N1 — move right and up together. More sensors and more learned autonomy are correlated at scale because both require capital.

Annie breaks this correlation. It has strictly limited sensors — one camera, one lidar, one IMU — cheaper than any lab system. But it deploys a 2-billion-parameter VLM at 54 to 58 frames per second on edge hardware, enabling multi-query tactical perception that no academic monocular system achieves. The 4-tier hierarchy — Titan at 1 to 2 Hz, Panda VLM at 10 to 54 Hz, Pi lidar at 10 Hz, Pi IMU at 100 Hz — pushes autonomy level above the academic cluster without adding sensors. Edge compute density, not sensor count, is the real axis Annie is maximizing.

A dashed projection shows where Annie lands once the idle Hailo-8 AI HAT-plus on the Pi is activated. 26 T-O-P-S, YOLOv8-nano at 430 frames per second, sub-10-millisecond latency, zero WiFi dependency. Same sensors — the same camera stream gets consumed twice, once locally on the Hailo N-P-U for reactive L1 safety, once on the Panda V-L-M for semantic grounding. Annie shifts rightward and slightly up on the reframed axes without adding any hardware, because the axis is really about compute-per-pixel, not sensor count. A new cluster has also formed between fixed-class detectors and full vision-language models: open-vocabulary detectors. NanoOWL at 102 frames per second. GroundingDINO 1.5 Edge at 75 frames per second with 36.2 A-P zero-shot on complex prompts. YOLO-World-S at 38 frames per second with the strongest language capability. These understand text prompts — "kitchen", "door" — without running a full language model.

---

PARAGRAPH 3

The empty quadrant is the crown jewel of this map. In the reframed axes it is "single-camera plus full semantic autonomy." The dashed marker at x=28%, y=88% on the scatter plot marks where Annie would be after Phase 2d and 2e: same sensor richness, dramatically higher autonomy through embedding-based semantic memory, AnyLoc visual loop closure, and topological place graphs built without offline training.

No system lives in this quadrant today. NaVid has the right sensor profile but deliberately discards spatial memory — it is reactive by design. VLMaps has the right autonomy architecture but requires offline exploration sweeps and dense GPU infrastructure. The empty quadrant demands a specific combination: a persistent semantic map built incrementally from a single camera, using foundation model embeddings rather than custom training, running on edge hardware. That is precisely Annie's Phase 2c through 2e roadmap.

The gap is not accidental. It exists because academic systems are optimized for controllable benchmarks — which favor known environments and pre-exploration — and industry systems are optimized for scale — which justifies sensor investment. An always-on personal home robot has neither constraint. It must learn one environment over months of natural use, from one sensor, on hardware that costs less than a high-end smartphone.

---

PARAGRAPH 4

From a strategic standpoint, the landscape map confirms the evolution timeline finding: the over-crowded zone is the mid-left cluster of academic monocular systems — diminishing returns territory, because every incremental semantic improvement still requires offline setup. The over-crowded zone on the right is the sensor-rich industry tier — unreachable without fleet capital. The unpopulated space between them, where Annie sits, is the only zone where the constraint set of personal robotics can be satisfied.

As the research contradiction lens notes, the research paper describes the Waymo pattern and then does the opposite — which turns out to be correct for the actual deployment context. The landscape map makes that inversion visible as a deliberate edge bet, not a shortcut. Annie is not a miniaturized Waymo. It is the only system whose position on the map is determined by the constraints of personal robotics rather than by the funding structure of labs or industry.

---

NOVA

The overcrowded zones tell you where the returns are diminishing. Everyone is piling into academic monocular-reactive on the left and industry sensor-rich-learned on the top-right. The gap between them — edge hardware, single camera, high semantic autonomy — has exactly one system in it: Annie. That gap exists because the two dominant funding structures both make different assumptions that exclude it. Academic labs assume controllable pre-exploration. Industry assumes sensor budgets. A personal home robot violates both assumptions simultaneously, which is why the gap is real and not just unmapped — it is structurally excluded from where the field directs its attention.

Two Nova bullets. First: activating the idle Hailo-8 moves Annie further into her unique quadrant. 26 T-O-P-S on the Pi 5, YOLOv8-nano at 430 frames per second, sub-10-millisecond latency, zero WiFi dependency. Same sensors, higher edge-compute density — the axis that actually matters gets exploited harder without any hardware purchase. Second: a new cluster has formed between fixed-class and full-V-L-M. Open-vocabulary detectors — NanoOWL at 102 frames per second, GroundingDINO 1.5 Edge at 75 frames per second with 36.2 A-P zero-shot, YOLO-World-S at 38 frames per second — understand text prompts without running a language model. This band did not exist on the original landscape and it changes what "middle of the map" means for any future personal-robotics entrant.

---

THINK

The reframing of the axes reveals something uncomfortable. If sensor richness is really information throughput per inference cycle, and autonomy level is really where the decision boundary lives, then the most interesting axis is the one the map does not show: time. Waymo's decision boundary has been moving left — more classical safety overrides reintroduced as autonomy failures accumulated. Tesla's has been moving up — more of the stack replaced by neural. Annie's is moving up-right simultaneously — more sensors via better VLM utilization, more autonomy via semantic memory.

The static snapshot hides the trajectories. On a map of trajectories, Annie is the only system whose direction of motion points toward the empty quadrant from below, while industry systems spiral around the top-right corner and academic systems cluster in place. Which trajectory reaches the empty quadrant first?