"Where does this sit among all the alternatives?"
The two axes that genuinely separate these 12 systems are not the obvious ones. "Number of sensors" is a proxy — what it really measures is information throughput per inference cycle: how many independent signals arrive at the decision layer per second. And "autonomy level" is a proxy for where the decision boundary lives: does classical geometry make the motion decision (reactive), does a learned module make it (partial), or does an end-to-end network own the entire chain from pixels to motor command (fully learned)? Once you reframe the axes this way, the landscape becomes legible. Waymo is maximum information throughput (lidar + camera + radar + HD map + fleet telemetry) combined with a decision boundary that lives entirely inside learned modules. Tesla FSD v12 is surprising: eight cameras is richer than one but far below Waymo's multi-modal suite — yet it sits at the highest autonomy level because the end-to-end neural planner removed every classical decision point. Tesla is not at the top-right corner; it is at the top-center, which is its distinctive claim: more autonomy with fewer sensors than anyone thought possible.
Annie's position at roughly x=28%, y=60% is not a compromise — it is the only system in the entire map that deliberately occupies the "low sensor richness + high edge-compute exploitation" quadrant. Consider what the map shows: all the academic systems (VLMaps, OK-Robot, Active Neural SLAM, SayCan, NaVid, AnyLoc) cluster along the left edge, with sensor richness constrained by lab budgets, and autonomy levels in the 30–70% band. All the industry systems (Tesla, Waymo, GR00T N1) move right and up together — more sensors and more learned autonomy are correlated at scale because both require capital. Annie breaks this correlation. It has strictly limited sensors (one camera, one lidar, one IMU — cheaper than any lab system) but deploys a 2B-parameter VLM at 54–58 Hz on edge hardware, enabling multi-query tactical perception that no academic monocular system achieves. The 4-tier hierarchy (Titan at 1–2 Hz, Panda VLM at 10–54 Hz, Pi lidar at 10 Hz, Pi IMU at 100 Hz) is what pushes autonomy level above the academic cluster without adding sensors. This is the position the map reveals: edge compute density, not sensor count, is the real axis that Annie is maximizing.
The dashed amber bubble shows where Annie lands once the idle Hailo-8 AI HAT+ on the Pi 5 (26 TOPS) is activated: she shifts rightward and slightly up on the reframed axes even though no new sensor is added. The same camera stream gets consumed twice — once by the on-Pi Hailo NPU at YOLOv8n 430 FPS for reactive L1 obstacle safety with sub-10 ms latency and zero WiFi dependency, and once by the Panda VLM at 54 Hz for semantic grounding. This is the dual-process pattern from the IROS indoor-nav paper (System 1 + System 2, 66% latency reduction) instantiated on hardware Annie already owns. The shift is not cosmetic: it quantifies "how much inference work is extracted per pixel per second," which is exactly what the x-axis really measures once reframed. The cyan cluster at mid-x (NanoOWL at 102 FPS, GroundingDINO 1.5 Edge at 75 FPS with 36.2 AP zero-shot, YOLO-World-S at 38 FPS) is a second new feature of the landscape — a band of open-vocabulary detectors that sits structurally between fixed-class YOLO and full VLMs, understanding text prompts like "kitchen" or "door" without running a full language model.
The empty quadrant is the crown jewel of this map: top-left as conventionally drawn, but in the reframed axes it is "single-camera + full semantic autonomy." The dashed coral bubble at x=28%, y=88% marks where Annie would be after Phase 2d/2e: same sensor richness, dramatically higher autonomy through embedding-based semantic memory, AnyLoc visual loop closure, and topological place graphs built without offline training. No system lives in this quadrant today. NaVid (video-based VLM, no map) has the right sensor profile but deliberately discards spatial memory — it is reactive by design. VLMaps has the right autonomy architecture but requires offline exploration sweeps and dense GPU infrastructure. The empty quadrant demands a specific combination: a persistent semantic map built incrementally from a single camera, using foundation model embeddings rather than custom training, running on edge hardware. That is precisely Annie's Phase 2c–2e roadmap. The gap is not accidental — it exists because academic systems are optimized for controllable benchmarks (which favor known environments and pre-exploration) and industry systems are optimized for scale (which justifies sensor investment). An always-on personal home robot has neither constraint. It must learn one environment over months of natural use, from one sensor, on hardware that costs less than a high-end smartphone.
From a strategic positioning standpoint, Lens 05 (evolution timeline) established that the field's bottleneck has shifted from spatial memory to semantic grounding to deployment integration to the text-motor gap. The landscape map shows the same transition from a spatial perspective: the over-crowded zone is the mid-left cluster of academic monocular systems — diminishing returns territory, because every incremental semantic improvement in that cluster still requires offline setup. The over-crowded zone on the right is the sensor-rich industry tier — unreachable without fleet capital. The unpopulated space between them, where Annie sits, is not a no-man's-land of compromise. It is the only zone where the constraint set of personal robotics can be satisfied: one home, one robot, always on, no pre-training, no sensor budget, but full use of the latest foundation models on edge hardware. As Lens 14 (research contradiction) notes, the research paper itself describes the Waymo pattern and then does the opposite — which turns out to be correct for the actual deployment context. The landscape map makes that inversion visible as a deliberate edge bet, not a shortcut.