# Dashboard Event Pipeline Audit — COMPLETED

**Session 358 (2026-03-23). Status: ALL PHASES COMPLETE.**

## Results Summary

The dashboard went from "0 events" to **2,700+ events visible** with live creature activation. **42/42 creatures instrumented** (was 41/42 before this session). 6 commits, all tests pass.

### Root Causes of "0 Events" (3 independent issues found)

| # | Root Cause | Layer | Fix |
|---|-----------|-------|-----|
| 1 | No auth token in browser localStorage | Dashboard frontend | Set via `?token=` URL param (persists in localStorage) |
| 2 | `crypto.randomUUID()` unavailable on HTTP | Dashboard frontend | Polyfill in `synthetic.ts` |
| 3 | Annie Voice running old code on Titan | Backend deployment | Git pull + restart |

**Plot twist:** The backend pipeline was NEVER broken. PostgreSQL had 15,047 events on March 22 (107,616 for the week). The "0 events" was entirely a frontend configuration issue.

### Audit Checklist — ALL COMPLETE

#### Phase 1: Event Pipeline Health ✅
- [x] SSE event source connected — works when localStorage token is set
- [x] Events recorded in database — 15,047 events on March 22 alone
- [x] `record_event()` calls mapped — 211+ calls across 4 services
- [x] All 42 creatures mapped to backend calls — 42/42 coverage

#### Phase 2: Trace the Coffee Order ✅
- [x] March 22 events queried — 15,047 events, 18 creature types active
- [x] Context Engine pipeline fully visible (kraken, unicorn, dragon, oracle, etc.)
- [x] Annie Voice only emitted sphinx events (old code) — griffin/selkie/gargoyle dark
- [x] Entity extraction captured: `coffee_delivery` (promise, 0.95), `coffee delivery` (event, 0.9)

#### Phase 3: Gap Analysis ✅
- [x] `browser_agent_tools.py` had ZERO `emit_event` calls — 14 added
- [x] `fairy` (contradiction detection) had 0 events — instrumented in `main.py`
- [x] Gap was missing instrumentation, NOT broken pipeline

#### Phase 4: Fix the Gaps ✅
- [x] 14 `emit_event()` calls added to `schedule_coffee_delivery`
- [x] SSE streaming verified end-to-end (curl → emit → ring buffer → SSE → creature glow)
- [x] Selkie creature lit up in real-time on dashboard
- [x] Time Machine has historical events (138K+ events, Mar 9-22)

## Fixes Applied (26 issues from adversarial review + 4 bonus discoveries)

### Backend Pipeline Reliability
| File | Fixes | Key Changes |
|------|-------|------------|
| `observability.py` | 5 | Token check before drain, rejection logging, queue in correct event loop, drain-before-sleep, docstring |
| `chronicler.py` | 3 | Persist semaphore (cap 20), SSE eviction tolerance (drop oldest), cross-registry validation |
| `browser_agent_tools.py` | 7 | 14 emit_events, JS injection validation, timezone-aware dates, stale page ref, dead code removal, smart waits, credential guard |
| `main.py` (context-engine) | 1 | fairy creature events on contradiction detection |

### Frontend + Security
| File | Fixes | Key Changes |
|------|-------|------------|
| `synthetic.ts` | 1 | crypto.randomUUID polyfill for HTTP contexts |
| `start.sh` | 1 | Token moved to `~/.her-os-token` (out of git) |
| `.gitignore` | 1 | Token file excluded |
| `docker-compose.yml` | 1 | HF_HUB_OFFLINE=1 (skip model validation on restart) |

### Tests Updated
| File | Changes |
|------|---------|
| `test_observability.py` | Queue lifecycle tests adapted to new start_flush_loop pattern |
| `test_chronicler.py` | 42 creatures, SSE tolerance behavior, FUTURE_CREATURES cleared |
| `test_chronicler_llm.py` | Creature count + unicorn process name updated |

## Commits

| Hash | Description |
|------|-------------|
| `269f781` | Event pipeline reliability + browser agent observability (24 issues) |
| `e75cc63` | crypto.randomUUID polyfill for HTTP dashboard |
| `dc3b9fc` | Chronicler tests: 42 creatures, SSE tolerance |
| `05265af` | fairy creature lights up on contradiction detection |
| `c9de934` | HF_HUB_OFFLINE to skip model validation on restart |

## Key Insights & Anti-Patterns

### Anti-Patterns Found
1. **Silent failure cascade**: Queue drains events, THEN checks token → permanent data loss. Fix: check preconditions before destructive operations.
2. **HTTP 200 hiding rejections**: Emit endpoint returns 200 even when all events rejected. Fix: log response body, not just status code.
3. **Module-level lazy initialization**: `_get_queue()` created asyncio.Queue in wrong event loop on reconnect. Fix: create in `start_flush_loop()` where event loop is guaranteed.
4. **Aggressive SSE eviction**: Full subscriber queue → permanent eviction. Backgrounded browser tabs lose connection forever. Fix: drop oldest event, keep subscriber.
5. **Token in git**: Auth token hardcoded in `start.sh`. Fix: `~/.her-os-token` file with mode 600.
6. **DOM values in JS eval f-strings**: `installment_no` from DOM interpolated into `page.evaluate()` without validation. Fix: `.isdigit()` check.
7. **`crypto.randomUUID` on HTTP**: Only works in secure contexts. Dashboard served over plain HTTP. Fix: polyfill.

### Patterns That Worked
1. **Creature registry as single source of truth**: 42 creatures in `chronicler.py`, validated by cross-service tests. Prevents drift.
2. **Fire-and-forget observability**: `emit_event()` never blocks the voice pipeline. Queue + batch POST every 2s.
3. **Ring buffer + PostgreSQL**: Real-time from memory, historical from BTREE index. No contention.
4. **SSE with auto-reconnect**: Browser EventSource handles reconnection automatically. Catch-up from ring buffer on connect.
5. **`data` field for step differentiation**: One creature (selkie) handles all browser steps via `data={"step": "login"}` etc. Keeps registry clean.

### Coverage Model
- **42/42 creatures instrumented** (100%)
- **211+ emit_event/record_event calls** across 4 services
- **3 event ingestion paths**: filesystem JSONL (audio), HTTP POST (annie-voice), in-process (context-engine)
- **2 consumption paths**: SSE real-time, REST historical query
- **Dashboard token**: Must be set in browser localStorage via `http://titan:5174/?token=YOUR_TOKEN`
