LENS 19

Scale Microscope

"What changes at 10x? 100x? 1000x?"

SCALING BEHAVIOR — IMPACT AT 10x ACROSS 7 DIMENSIONS
WiFi latency — semantic (post-Hailo)
92%  ⚠ CLIFF PERSISTS for VLM queries only
WiFi latency — safety (post-Hailo)
15% — DEMOTED: Hailo-8 runs obstacle detection locally
VRAM pressure (SigLIP add, post-Hailo)
72% — step function softened (~800 MB freed on Panda)
Hailo-8 power draw (~2 W continuous)
40% — strictly linear with inference load, no step functions
Map area (SLAM file size)
60% — linear, manageable
Embedding storage (60KB/session)
55% — linear, predictable
Scene label vocabulary
35% — sublinear (rooms plateau fast)
VLM accuracy vs frame rate
22% — sublinear above 15 Hz
User trust accumulation
18% — logarithmic plateau

⚠ = discontinuous cliff  |  coral = superlinear (dangerous)  |  amber = linear  |  green = sublinear (favorable)

The scaling picture splits into three categories, but the dangerous-dimensions count drops from one to one-half once the Hailo-8 AI HAT+ on Pi 5 is activated as the L1 safety layer. Pre-Hailo, WiFi channel contention was a single undifferentiated cliff: at 8+ devices on the same 2.4 GHz channel, 802.11 CSMA/CA's exponential backoff drove P95 latency from 80ms to 200ms+ in a single-device increment, and that spike fell on both the obstacle-detection path and the semantic-query path simultaneously. Post-Hailo, the cliff bifurcates. The 26 TOPS Hailo-8 NPU runs YOLOv8n locally on Pi 5 at 430 FPS with <10ms latency and zero WiFi dependency, so reactive obstacle avoidance — the path where a 200ms spike could send the robot 20cm past a decision point — now terminates inside the chassis. The superlinear cliff persists only for semantic queries ("where is the kitchen?", "is the path blocked by a glass door?") which still require the Gemma 4 E2B VLM on Panda over WiFi. Lens 04 identified WiFi as the most sensitive single parameter in the current system. Lens 19 now splits that hazard into two bars: safety is demoted to the favorable green zone (linear, local, ~2 W continuous on the NPU), while semantic stays in the coral zone at the scale where household-level transmitter density crosses channel saturation. The Hailo-8 also scales linearly in its own right: power consumption rises smoothly with inference load, no step functions, no discontinuities — a textbook well-behaved scaling curve that replaces a discontinuous one.

VRAM pressure remains a step function, but Hailo-8 activation partially mitigates the ceiling on Panda. The current Panda configuration runs the Gemma 4 E2B VLM (2B parameters) for nav inference with roughly 4–5 GB VRAM consumed against a 16 GB practical ceiling. Adding SigLIP 2 ViT-SO400M for embedding extraction (Phase 2d) adds ~800MB in a single step, and Phase 2e (AnyLoc / DINOv2 ViT-L) adds another ~1.2 GB. Pre-Hailo, two models stacked alongside E2B already crowded the ceiling. Post-Hailo, because obstacle detection moves off the Panda GPU entirely and onto the Hailo-8 NPU (separate silicon, separate memory, not a VRAM line-item), roughly 800 MB of Panda VRAM is freed from the nav pipeline — enough headroom to absorb the SigLIP step without qualitative pressure. The DINOv2 step is still binary, but now has breathing room. This does not eliminate the step-function character; each new model addition remains a fits-or-crashes decision with no graceful half-load. Session 270 documented exactly this class of failure on Titan when the 35B MoE and 27B silently accumulated. The Phase 2 roadmap must still treat each SigLIP → DINOv2 addition as a budget audit event, but with Hailo-8 absorbing the safety-detection VRAM cost, one rung of the ladder is now wider.

Map area, embedding storage, and scene label vocabulary are all in the favorable linear or sublinear zone — and the reasons reveal important design properties. Map file size scales linearly with floor area: a 10m² room yields a ~560-byte PNG; a 100m² apartment yields ~5–6 KB; a 1000m² building yields ~50–60 KB. These are trivially small even on Pi 5 storage. The interesting case is scene label vocabulary. A single-room deployment learns roughly 5 stable labels (kitchen, hallway, bedroom, bathroom, living room). A whole-house deployment adds a few more (office, laundry, garage) but then plateaus — most homes have 6–12 semantically distinct spaces, and the VLM's one-word scene classifier achieves this vocabulary ceiling within the first week of operation. Scaling to 100x more floor area does not produce 100x more label diversity; it produces the same labels applied to more grid cells. This sublinear growth in vocabulary means the SLAM semantic overlay architecture scales favorably: the query "where is the kitchen?" works equally well at 10m² and 1000m² because the label set is already stable. Embedding storage at 60KB per session is strictly linear — 1 session/day × 365 days × 60KB = 21.9MB per year. Even a decade of daily use fits in under 250MB.

The confluence point — where WiFi, map size, and room count inflection curves all meet simultaneously — is at the whole-house scale, roughly 100m² with 3 or more floors and 5+ regular occupants. Below this scale (single room, single user, single floor), all seven dimensions are individually manageable: WiFi is below saturation, VRAM fits comfortably, map files are trivially small, vocabulary is small, trust is building rapidly. Above whole-house scale (multi-building campus, fleet of robots) the architecture becomes wrong: shared GPU inference is required, map files must be tiled and streamed, WiFi must be replaced with dedicated mesh networking, and trust must be federated across multiple user profiles. Annie's architecture is explicitly artisanal — 4-tier hierarchical fusion designed for one home, one robot, one family. The whole-house inflection point is the design horizon. Below it, scale costs nothing. Above it, scale costs everything. The practical implication: before deploying Phase 2 in a large multi-story home, install a dedicated 5 GHz AP for the robot's command channel and verify Panda's VRAM budget after every model addition. These are the only two scaling risks that cause qualitative failure rather than graceful degradation.

Hailo-8 activation neutralizes the superlinear WiFi cliff for the safety path. YOLOv8n runs locally on the 26 TOPS NPU at 430 FPS, <10ms, zero WiFi dependency, ~2 W continuous. Reactive obstacle avoidance no longer traverses the shared-medium channel. The 802.11 CSMA/CA cliff persists only for semantic queries (VLM on Panda), not for safety-critical control. This is the single highest-leverage scaling improvement available to Annie, and it requires zero software rewrite — the NPU is already on the robot and currently idle.

Hailo-8 scales as a clean linear curve, not a step function. Power consumption rises smoothly with inference load (target ~2 W continuous), VRAM is not a line-item (separate NPU silicon). No discontinuities, no cliffs. The new L1 safety layer adds capability without adding any of the dangerous scaling patterns present elsewhere in the stack.

VRAM step function is partially mitigated by Hailo offload. Moving obstacle detection to the Hailo NPU frees ~800 MB on Panda — roughly one SigLIP-sized addition of headroom against the 16 GB ceiling. Each new model on Panda (SigLIP → DINOv2) remains a fits-or-crashes decision, but one rung of the ladder is now wider. Session 270 silent-overflow discipline still applies; Hailo buys runway, not immunity.

Scene labels plateau sublinearly — this is a design win. Most homes have 6–12 semantically distinct spaces. The VLM vocabulary ceiling is reached early; scaling map area does not grow the query complexity. The semantic overlay architecture works at any house size.

The whole-house inflection point is the design horizon — and Hailo-8 moves it outward. With the safety layer decoupled from WiFi, the previous brick wall at 8+ devices on 2.4 GHz becomes a soft degradation of semantic response time rather than a safety failure mode. Annie's architecture gains real headroom at whole-house scale. Above multi-building campus scale the architecture still requires structural change (shared inference, mesh networking, federated trust), but the sub-whole-house regime just got substantially more robust.

If Annie is deployed in a 3-story house with 6 family members and 40 smart-home devices on the WiFi, which scaling dimension breaks first — and what is the cheapest fix?

Click to reveal

WiFi breaks first, and it breaks hardest. With 40 IoT devices plus 6 users' phones and laptops, the 2.4 GHz channel will be saturated almost continuously during waking hours. The nav command channel — Panda to Pi, 18ms latency budget — will see P95 spikes above 200ms, which is long enough for the robot to travel 20cm past a decision point at 1 m/s before receiving the corrective command. The sonar ESTOP is the only safety net left at that latency. The cheapest fix is a $35 router with VLAN isolation: put the robot's Pi and Panda on a dedicated 5 GHz SSID with QoS priority, separate from all household IoT traffic. This drops variance from ±80ms to ±5ms with zero software changes. The second cheapest fix — a wired Ethernet bridge from Panda to a Pi zero acting as a WiFi repeater near the robot's docking station — costs $12 and eliminates the channel contention entirely for the command path. Neither fix requires touching the VLM stack or the SLAM pipeline. The scaling fix for the most dangerous dimension is a network configuration change, not a software change.