# Building a Two-Layer Robot Navigation System: Hailo Reflexes + GPU Brain

*How we gave Annie — a personal AI assistant — the ability to autonomously navigate a robot car through a home, using subsumption architecture with a 30 FPS safety layer and a 3.5-second strategic planning loop.*

## The Problem

Annie is a personal ambient intelligence that controls a TurboPi robot car — a Pi 5 with mecanum wheels, a camera gimbal, and a Hailo AI HAT+ neural processing unit. She already had four manual tools: drive, photo, look, and status. But "drive forward for 2 seconds" is not navigation. We needed Annie to autonomously explore rooms, find things, and avoid obstacles — safely.

The challenge: real-time obstacle avoidance requires 30+ FPS reaction time. Strategic navigation requires understanding scenes and making plans. These operate on fundamentally different timescales. No single system does both well.

## The Architecture: Subsumption (Brooks, 1986)

Rodney Brooks' subsumption architecture solves this by layering behaviors. Lower layers handle reflexes (fast, simple, always-on). Higher layers handle planning (slow, complex, intermittent). The key insight: **lower layers can override higher layers at any time.**

```
Layer 1 (Brain — Titan GPU, 3.5s cycles):
  See scene → Understand goal → Decide action → Drive

Layer 0 (Reflexes — Hailo NPU, 30 FPS):
  Detect obstacle → Too close? → EMERGENCY STOP
  ↑ overrides Layer 1 instantly
```

**Why this fits hobby robotics:** ROS navigation stacks (costmaps, SLAM, path planners) are powerful but overkill for a home robot car that needs to explore rooms. Subsumption gives us safety guarantees with minimal complexity. The safety layer is ~200 lines of Python. The navigation loop is ~150 lines.

## The Hardware Stack

| Component | Role | Performance |
|-----------|------|-------------|
| **Hailo AI HAT+ 26T** (on Pi 5) | YOLO obstacle detection | 310 FPS (6.66ms/frame) |
| **Pi 5 Camera** (via OpenCV) | Visual input | 30 FPS, 640x480 |
| **Gemma 4 26B** (on Titan DGX Spark, GPU) | Scene understanding + navigation decisions | 50 tok/s, 1.4s vision |
| **Pi 5** (16 GB) | Motor control, sensor reading | FastAPI server |
| **Mecanum wheels** (4, via STM32) | Omnidirectional movement | UART control |

## The Camera Sharing Problem

Both the safety daemon and the photo endpoint need camera frames. But Linux V4L2 won't let two processes open `/dev/video0` simultaneously with most USB cameras.

**Solution: FrameGrabber pattern** — a single thread owns the camera and publishes the latest frame to shared memory.

```
FrameGrabber Thread (single camera reader)
    │
    ├──▶ Safety Daemon: reads latest frame → YOLO inference → ESTOP if needed
    │
    └──▶ /photo endpoint: reads latest frame → JPEG → base64
    
    Protection: threading.Lock on read/write
    Stale detection: monotonic frame_id counter
```

The key detail: `get_frame()` returns a **copy** plus a monotonic `frame_id`. The safety daemon tracks the frame_id to detect camera disconnection (USB wobble on a moving car). If the same frame_id persists for 30 reads (~1 second), the daemon triggers an emergency stop — because a blind safety system is worse than no safety system.

## The UART Safety Dance

The most dangerous bug we caught in adversarial review: **two threads writing to the UART serial port simultaneously produces garbled motor commands.** During an emergency stop, the safety daemon could send a "stop all motors" command while the asyncio event loop's executor is mid-drive. The garbled serial could make the car accelerate into the very obstacle it's trying to avoid.

**Fix: `_uart_lock` (threading.Lock)** wraps every `board.set_motor_duty()` call, whether from the asyncio executor or the safety callback.

```python
_uart_lock = threading.Lock()

def _stop_motors_sync(board):
    with _uart_lock:
        board.set_motor_duty([[1, 0], [2, 0], [3, 0], [4, 0]])

def _drive_sync(board, action, speed):
    with _uart_lock:
        # mecanum kinematics → set_motor_duty
```

A related bug: `asyncio.Event.set()` is NOT thread-safe when called from a background thread. The safety daemon uses `loop.call_soon_threadsafe(_stop_event.set)` to properly signal the asyncio event loop. Without this, the event loop can miss the stop signal entirely — intermittently, which is the worst failure mode for a safety system.

## The Navigation Loop

Each cycle (~3.5 seconds):

```
1. SENSE (parallel, Pi-only, ~100ms):
   GET /photo     → raw JPEG base64
   GET /obstacles  → Hailo detection list + safe_forward flag

2. THINK (single multimodal GPU call, ~2s):
   Send image + obstacles + goal + history to Gemma 4 on Titan
   → Returns one word: forward / backward / left / right / goal_reached / give_up

3. ACT (Pi, ~1s):
   POST /drive → execute movement for 1 second at cautious speed

4. CHECK:
   ESTOP triggered? → abort gracefully
   Max cycles (10)? → stop
```

**Why one combined call:** An earlier design had two GPU calls per cycle — one for scene description, another for navigation decision. This created a Titan→Pi→Titan roundtrip (photo from Pi, describe on Titan, send description back to Pi, Pi sends to Titan for decision, response back to Pi, Pi sends to Titan for Annie). The combined multimodal call eliminates this: send the raw image directly to Titan with the goal context, get the action word back. One call, ~2 seconds, half the GPU load.

## Distance Estimation: The Hardest Unsolved Problem

Estimating distance from a 2D bounding box is unreliable. Our heuristic:

```
distance ≈ (reference_bbox_ratio × reference_distance × gimbal_correction) / observed_bbox_ratio
```

Where `reference_bbox_ratio` is "how big does a person appear at 100cm?" (~40% of frame height). The gimbal correction accounts for camera tilt — pointing down makes objects appear larger.

This is wrong more often than it's right. That's why:
- **Sonar** provides accurate frontal distance (ultrasonic, hardware)
- The safety threshold is conservative (30cm)
- Distance estimation errs toward "closer" with conservative reference values
- Phase 5 includes calibration at known distances

## The Safety Stack (6 Layers)

| Layer | Mechanism | Response Time |
|-------|-----------|---------------|
| 0 | **Hailo YOLO auto-ESTOP** | <100ms (30 FPS) |
| 1 | **Emergency stop button** (/estop) | <100ms |
| 2 | **Dead-man's switch** (10s timeout) | 10s |
| 3 | **Duration clamp** (max 5s per drive) | Immediate |
| 4 | **Rate limit** (30 cmds/min) | Immediate |
| 5 | **Motor-zero on restart** (systemd) | Boot time |

The Hailo layer (new) is the only one that detects obstacles proactively. The others are reactive (respond to commands) or temporal (respond to silence). Together, they form defense in depth: even if the Hailo daemon crashes, the dead-man's switch stops the car within 10 seconds.

## Performance Numbers

| Metric | Value |
|--------|-------|
| Hailo YOLO inference | 6.66ms per frame (310 FPS capable) |
| FrameGrabber overhead | <1ms per frame copy |
| Safety response time | <100ms (detection → motor stop) |
| Navigation cycle time | ~3.5s (sense 100ms + think 2s + act 1s + overhead) |
| Max navigation duration | ~35s (10 cycles × 3.5s) |
| Pi 5 memory usage | ~375 MB (of 16 GB) |
| Titan GPU per cycle | 1 multimodal call, ~2s |

## Lessons Learned

1. **Thread safety is the #1 concern in robotics software.** Two bugs found in adversarial review would have made the safety system cause the very crashes it was designed to prevent.

2. **Camera sharing is a real problem.** V4L2 device contention is not hypothetical. The FrameGrabber pattern (single reader, shared buffer) is the standard solution but easy to forget when designing a system with multiple consumers.

3. **Subsumption works for hobby robots.** We didn't need ROS, SLAM, or costmaps. A 200-line safety daemon + a 150-line navigation loop gives us autonomous room exploration with obstacle avoidance.

4. **Combined multimodal calls are 2x more efficient.** Separating "describe what you see" from "decide what to do" wastes a GPU call. Modern VLMs can do both in one pass.

5. **Rate limiters become obstacles when the threat model changes.** The rate limiter was designed to prevent LLM tool-call loops from creating a runaway car. Once we added a proper safety daemon (Hailo), the rate limiter was killing our own navigation loop at cycle 10. Threat model evolution requires rate limit evolution.

6. **Power supply is the silent killer.** Pi 5 throttle warnings (0x50000) under heavy load (motors + camera + NPU) can slow the safety daemon's inference, widening the reaction-time window. Monitor and abort navigation if throttled.