# Research: Mindscape Dashboard — Purpose, Architecture, and Value Proposition

**Date:** 2026-03-23
**Status:** Active document (living reference, updated as dashboard evolves)
**Supersedes:** Portions of `RESEARCH-OBSERVABILITY.md` (which covers the broader observability-first architecture philosophy)
**Context:** The Mindscape/Aquarium dashboard is the visual observability layer for her-os. Annie runs 40+ concurrent background processes 24/7 with no user trigger. This document explains why the dashboard exists, what it does, how it works, and what makes it fundamentally different from any existing AI observability tool.

---

## Table of Contents

1. [Why This Dashboard Exists](#1-why-this-dashboard-exists)
2. [The Three Core Functions](#2-the-three-core-functions)
3. [What Makes This Unique](#3-what-makes-this-unique-vs-other-agentic-frameworks)
4. [Architecture Overview](#4-architecture-overview)
5. [Current Gaps and Enhancement Priorities](#5-current-gaps-and-enhancement-priorities)
6. [The Fundamental Question](#6-the-fundamental-question-this-dashboard-answers)
7. [References](#7-references)

---

## 1. Why This Dashboard Exists

### The Invisible Machine Problem

Annie is not a chatbot. She is a continuously-running ambient intelligence system. At any given moment, she is simultaneously:

- Listening to a transcript stream from the Omi wearable
- Running speech emotion recognition on audio segments
- Extracting entities from conversations (people, places, promises, decisions)
- Gating extracted entities by confidence before committing them to the knowledge graph
- Consolidating memories across time (recent observations promote to long-term facts)
- Generating embeddings for vector search
- Indexing into Neo4j for graph traversal
- Monitoring for proactive nudge opportunities
- Serving voice conversations through a real-time Pipecat pipeline
- Handling text chat sessions through a separate LLM path
- Compacting context windows when conversations get long
- Syncing with external services (SearXNG, PostgreSQL, Qdrant)

There are 40+ such processes. None of them require a user trigger. They run around the clock.

### The Silent Failure Mode

Without the dashboard, you **assume** the pipeline is working. But things silently break.

**Real example (March 2026):** An `.env` drift caused the entity extraction service to route requests to a dead model endpoint. Simultaneously, embedding generation was failing with 503 errors, and Graphiti had silently queued 315 items that would never be processed. From the outside, the system looked "online" — health checks passed, the voice agent responded to questions, the web UI loaded. But half the pipeline was broken. Conversations were being heard but not understood. No entities were being extracted. No memories were forming. Annie was listening but not learning.

This went undetected for days because there was no way to verify that the full chain executed correctly without SSH-ing into the server and tailing logs across multiple services.

### The Dashboard as the Single Pane of Truth

No SSH. No logs. No kubectl. The Mindscape dashboard is the **only** interface that answers whether the full processing chain — from raw audio to stored memory — executed correctly for every conversation.

It exists because:

1. **40+ processes** cannot be monitored by tailing log files
2. **Silent failures** are the norm, not the exception, in distributed systems
3. **Health checks lie** — a service can be "healthy" while producing garbage
4. **Temporal context matters** — you need to know not just "is it working now" but "was it working yesterday when that important conversation happened"
5. **Trust requires transparency** — a personal AI that listens 24/7 must be auditable by the person it serves

---

## 2. The Three Core Functions

### 2A. Pipeline Integrity Verification (Primary Function)

> "Did the thing that was supposed to happen actually happen, in the order it was supposed to happen?"

This is the dashboard's primary purpose. Everything else is secondary.

#### The Pipeline

A single conversation flows through a multi-stage processing chain:

```
Omi wearable stream
    -> VAD (voice activity detection)
    -> Whisper STT (speech-to-text)
    -> Speaker diarization
    -> Speech emotion recognition
    -> Entity extraction (Claude/Nemotron)
    -> Confidence gating (threshold filtering)
    -> Knowledge graph sync (Neo4j)
    -> Memory consolidation (tier promotion)
    -> Embedding generation (vector indexing)
    -> Qdrant upsert
```

Each step depends on the previous one completing correctly. If STT fails, no entities are extracted. If extraction routes to a dead model, no facts enter the knowledge graph. If embeddings fail, retrieval returns stale results.

#### How Verification Works

Each backend process maps 1:1 to a creature in the aquarium. When a creature lights up (glow, tendril extension, breathing radius), its corresponding process is active. When creatures light up **in sequence**, the pipeline is working. A gap — a creature that should have fired but didn't — means something broke.

Connection types between creatures reveal causality:

- **Data connections:** raw data flowing between processes (audio bytes, extracted entities, embedding vectors)
- **Control connections:** orchestration signals (start processing, gate passed, compaction triggered)
- **LLM connections:** inference calls to language models (extraction prompts, summarization, tool use)

Seven portals mark system boundaries where data crosses into external services:

- Claude API (cloud LLM for extraction)
- Ollama (local LLM for lightweight tasks)
- PostgreSQL (relational storage)
- SearXNG (web search)
- Qdrant (vector database)
- Neo4j (knowledge graph)
- vLLM (model serving)

Portal color-coding: green = local/self-hosted, amber = external cloud dependency.

### 2B. Temporal Auditability (Memory Formation Over Time)

Every reasoning decision, entity extraction, and memory consolidation is timestamped and replayable. This is not just logging — it is a temporal audit trail of how Annie forms beliefs about the person she serves.

#### Time Machine

The dashboard operates in two modes:

- **Live mode:** SSE (Server-Sent Events) streams events in real-time. Creatures animate as processes fire. You watch the pipeline work.
- **Review mode:** Date picker and timeline navigator with progressive drill-down: home view (all days) -> day view -> hour view -> minute view. Playback controls include play/pause and speed adjustment (0.5x through 4x).

#### Memory Zone (press M)

The memory zone visualizes entity lifecycle across time:

- **8 type swim-lanes:** Person, Place, Topic, Promise, Event, Emotion, Decision, Habit
- **3 tier bands:** L0 (recent observations), L1 (consolidated facts), L2 (deep/permanent memory)
- **Entity dots** encode information visually: radial glow (salience score), color (type + emotional valence), promotion trails (showing tier transitions), evergreen halos (for permanent facts that never decay)

When a new fact arrives from extraction, its corresponding dot animates — you can see learning happen.

#### Why Temporal Audit Matters

You can trace exactly how Annie came to believe something about you. Every conversation where a fact was mentioned, every extraction that captured it, every consolidation step that promoted it from L0 to L1 to L2 — all visible, all replayable.

This is **explainable AI at the personal level, across time**.

It is essential for trust. A personal AI that listens to your conversations 24/7 **must** be auditable. Not "auditable in theory if you dig through database tables" — auditable visually, intuitively, by the person whose life it is learning about.

### 2C. Behavioral Regression Detection

Git tracks code changes. The dashboard tracks **behavioral** changes.

#### The Behavioral Flight Recorder

Going back in time through dashboard events reveals **when** the system's behavior changed. This captures things that git cannot:

- **Model swaps:** Same code, different model weights. The extraction quality shifts but no diff appears in version control.
- **Config drift:** An environment variable changes. Health checks still pass. But extraction now takes 8 minutes instead of 2 seconds because the LLM fell back to CPU.
- **Pipeline bugs:** A race condition causes every 10th embedding to fail. The system works 90% of the time. Without temporal event data, this is invisible.
- **Gradual degradation:** Memory consolidation quality slowly declines as the knowledge graph grows. No single event marks the change.

#### Comparison Across Time

You can compare Annie's behavior on different dates: same code, same configuration, but different performance characteristics. The temporal event trail makes this possible because every event carries its timing, its reasoning, and its outcome.

**Real example:** If dashboard events had been flowing on March 20, the Ollama CPU fallback would have been immediately visible — extraction events taking 3-8 minutes instead of the expected 2 seconds. Instead, it took manual investigation to discover the regression.

---

## 3. What Makes This Unique (vs Other Agentic Frameworks)

### Comparison Matrix

| Feature | LangSmith / LangFuse | Mindscape Dashboard |
|---------|---------------------|---------------------|
| **Scope** | Per-request traces | Continuous 24/7 system monitoring |
| **Memory tracking** | Not tracked | Entity salience, tier promotion, consolidation over time |
| **Reasoning visibility** | Token counts, latency metrics | Plain-English orchestrator reasoning per event |
| **Temporal depth** | Traces expire (retention limits) | Full temporal replay across system lifetime |
| **Topology model** | Linear DAG (request -> steps -> response) | 3-zone ecosystem (Listening / Thinking / Acting) |
| **Identity** | Generic nodes ("llm", "retriever", "tool") | 40+ named creatures with visual morphology |
| **Connection semantics** | Call graph (A called B) | Typed flows (data / control / llm) with semantic labels |
| **Primary purpose** | Debug and optimize LLM applications | Trust, transparency, pipeline integrity verification |
| **User model** | Developer debugging a product | Person auditing the AI that learns about their life |

### Why Existing Tools Don't Fit

**LangSmith/LangFuse** are request-scoped tracers. They answer: "What happened during this one API call?" They do not answer: "What happened across all of Annie's background processes between 2pm and 6pm yesterday?" They have no concept of memory formation, entity salience, or tier-based consolidation over weeks and months.

**Prometheus/Grafana** track metrics (latency, error rates, throughput). They answer: "Is the system healthy right now?" They do not answer: "Did extraction actually run after that conversation, and did the extracted entities make it into the knowledge graph?" Metrics without semantic meaning cannot verify pipeline correctness.

**Jaeger/Zipkin** trace distributed requests. They answer: "How did this request flow through microservices?" They do not model always-on background processes that fire without any user request. Annie's processing is not request-driven — it is event-driven and time-driven.

**The gap:** No existing tool combines continuous process monitoring, semantic pipeline verification, temporal replay, and memory formation visualization into a single interface. The Mindscape dashboard fills this gap because it was designed for a use case that did not previously exist — a personal AI system that runs 24/7 and forms long-term memories about a specific person.

---

## 4. Architecture Overview

### 4.1 Creatures (40+)

Each creature maps 1:1 to a backend process. This is not a metaphor — it is a direct correspondence. If a creature exists in the aquarium, there is a running process it represents. If a process runs without a creature, it is invisible and therefore unaccountable.

**Properties:**

- **Zone assignment:** Fixed to one of three zones — Listening, Thinking, or Acting. Zone encodes the creature's role in the pipeline.
- **Position:** Fixed coordinates within its zone, encoding pipeline order and relationships.
- **Visual morphology:** Changes with activity state:
  - Radius breathing (idle pulsing vs active expansion)
  - Glow intensity (proportional to event throughput)
  - Tendril extension (reaches toward connected creatures during data flow)
- **Per-creature tendril configuration:** Different creatures have different numbers of tendrils reflecting their connectivity. Phoenix has 5, jellyfish has 6, spider has 8, kraken has 8.
- **Badge counters:** Show queued/pending events per creature.

### 4.2 Connections (22 Instrumented)

Connections are the edges between creatures. They are not just visual lines — they carry semantic meaning.

**Three types:**

| Type | Meaning | Visual |
|------|---------|--------|
| `data` | Raw data flowing between processes (audio, text, entities, embeddings) | Solid line with directional particles |
| `control` | Orchestration signals (start, stop, gate, escalate) | Dashed line with slower particles |
| `llm` | Inference calls to language models | Glowing line with fast particles |

**Behavior:**

- Connections animate with sway and particles when the corresponding data flow is active
- Show which edges are hot in real-time
- Reveal data flow causality as execution progresses through the pipeline

### 4.3 Portals (7)

Portals represent external service boundaries — the points where Annie's processing crosses into services she does not own.

| Portal | Service | Color | Locality |
|--------|---------|-------|----------|
| Claude | Claude API (Anthropic) | Amber | External cloud |
| Ollama | Ollama local inference | Green | Local |
| PostgreSQL | Relational database | Green | Local |
| SearXNG | Web search proxy | Green | Local |
| Qdrant | Vector database | Green | Local |
| Neo4j | Knowledge graph | Green | Local |
| vLLM | Model serving engine | Green | Local |

**Visual behavior:** Tentacles animate from creature to portal with phase-shifted particles, showing the direction and volume of cross-boundary data flow.

### 4.4 Event System

The event system is the data backbone that powers everything — live animation, temporal replay, and audit trails.

#### ObservabilityEvent Schema

```
{
  event_id:    string    // Unique identifier
  timestamp:   datetime  // When the event occurred
  service:     string    // Backend service name
  process:     string    // Specific process within the service
  creature:    string    // Mapped creature identifier
  zone:        string    // Listening | Thinking | Acting
  event_type:  string    // See event types below
  session_id:  string    // Conversation/session correlation
  data:        object    // Event-specific payload
  reasoning:   string    // Plain-English orchestrator decision rationale
}
```

#### Event Types

| Type | Meaning |
|------|---------|
| `start` | Process began execution |
| `complete` | Process finished successfully |
| `error` | Process failed |
| `metric` | Performance measurement (latency, throughput) |
| `skip` | Process decided not to act (with reasoning) |
| `candidates` | Entity extraction produced candidate list |
| `fulfilled` | A pending action was completed |
| `nudge` | Proactive suggestion generated |
| `wonder` | Daily Wonder content generated |
| `comic` | Daily Comic content generated |
| `validate` | Validation step executed |
| `asr_correct` | ASR transcript correction applied |

#### Delivery Modes

- **Live mode:** SSE streaming. Events arrive in real-time and drive creature animations immediately.
- **Review mode:** Database replay. Events are loaded from persistent storage for a selected time range and replayed at configurable speed.

#### The `reasoning` Field

The `reasoning` field is what separates this from generic structured logging. It captures plain-English orchestrator decision rationale: not "extraction completed in 2.3s" but "Extracted 3 entities from segment. Dropped 'the restaurant' (confidence 0.31, below threshold 0.40). Promoted 'Priya' (confidence 0.92, existing entity, updated last-seen timestamp). Created new entity 'dal tadka recipe' (confidence 0.67, novel topic, no prior references)."

This reasoning is visible on hover in the waterfall view, making every decision auditable without touching a database.

### 4.5 Memory Zone (Press M)

The memory zone is a dedicated visualization layer for entity lifecycle.

**Layout:**

- **8 type swim-lanes** arranged vertically: Person, Place, Topic, Promise, Event, Emotion, Decision, Habit
- **3 tier bands** arranged horizontally: L0 (recent), L1 (consolidated), L2 (deep/permanent)

**Entity dot encoding:**

| Visual Property | Maps To |
|----------------|---------|
| Radial glow size | Salience score (how important/relevant) |
| Color hue | Entity type (Person = blue, Place = green, etc.) |
| Color saturation | Emotional valence (stronger emotion = more saturated) |
| Promotion trail | Animated line showing tier transition (L0 -> L1 -> L2) |
| Evergreen halo | Permanent fact (never decays, always retrievable) |

When new facts arrive from extraction, their corresponding dots animate — a visible representation of learning happening in real-time.

### 4.6 Time Machine

The Time Machine is the interface for temporal navigation.

**Controls:**

- **Mode toggle:** Live (real-time SSE) vs Review (historical replay)
- **Date picker:** Select any day in the system's history
- **Timeline navigator:** Progressive drill-down hierarchy: Home (all days) -> Day (24 hours) -> Hour (60 minutes) -> Minute (60 seconds)
- **Playback:** Play/pause button, speed control (0.5x, 1x, 2x, 4x)

**Waterfall view:** When you click on a creature's activity in Review mode, the waterfall shows all events within +/- 10 seconds of that activity. Each event displays:

- Timing bar (duration and position relative to neighbors)
- Entity tags (which entities were involved)
- Reasoning on hover (the plain-English decision rationale)

This is how you answer: "Why did Annie think I was talking about Priya on Tuesday at 3pm?" — scrub to Tuesday 3pm, find the extraction creature's activity, read the reasoning.

---

## 5. Current Gaps and Enhancement Priorities

### 5.1 ~~Critical: Events Not Flowing ("0 Events")~~ — RESOLVED (Session 358)

**Status: FIXED. Dashboard shows 2,700+ events. 42/42 creatures instrumented.**

The "0 events" had three independent root causes, none of which were in the event pipeline itself:

1. **No auth token in browser localStorage** — SSE was never attempted. Dashboard fell to synthetic demo mode. Fix: set token via `?token=` URL param.
2. **`crypto.randomUUID()` crashes on HTTP** — synthetic demo mode used a secure-context-only API. Fix: polyfill with fallback.
3. **Old code deployed on Titan** — Annie Voice was running pre-instrumentation code. Fix: git pull + restart.

The backend pipeline was working the entire time — 15,047 events on March 22, 107,616 for the week. The event pipeline architecture (ring buffer + SSE + PostgreSQL) is sound.

**Additional reliability fixes applied** (from adversarial review):
- Events no longer permanently lost when token is missing (re-queued instead of drained-and-dropped)
- HTTP 200 with rejected events now logged as warnings
- SSE subscribers survive backgrounded browser tabs (drop oldest, don't evict)
- Persist tasks capped at 20 concurrent (prevents memory growth during DB slowdowns)
- `browser_agent_tools.py` instrumented with 14 `emit_event` calls (was 0)
- Fairy creature (contradiction detection) now emits events

### 5.2 Operational Health (Enhance, Don't Replace)

Health information should be surfaced **through** the creature model, not alongside it in a separate panel. The creatures ARE the processes — their visual state should encode process health.

**Planned enhancements:**

- **Creature badges/color for degraded states:** GPU vs CPU fallback (amber outline), queue depth exceeding threshold (badge count turns red), error rate above threshold (creature dims), latency exceeding SLA (creature pulses orange)
- **Portal status:** Is Ollama running on GPU? Is vLLM responding within latency budget? Is PostgreSQL accepting connections? Portal glow/dim reflects health.
- **Connection health:** Are embeddings succeeding? Are extractions timing out? Connection color shifts from healthy (default) to degraded (amber) to broken (red) based on success rate.

### 5.3 Workspace Files

Workspace files (SOUL.md, RULES.md, USER.md, etc.) are now accessible via Settings -> Workspace button, in addition to the Librarian creature. This needs:

- Auth token configuration to access protected files
- Display of file contents within the dashboard UI
- Edit capability for files that the user should be able to modify (USER.md, preferences)

### 5.4 Pipeline Sequence Validation

Automated detection of pipeline breaks without requiring a human to watch the aquarium.

**Planned capability:**

- **Expected sequence templates:** Define the expected creature activation order per event type (e.g., "after Kraken fires, Oracle must follow within 30 seconds")
- **Missing-step detection:** "Kraken fired but Oracle never followed" = extraction completed but validation didn't run
- **Visual alerts:** Missing creatures are surfaced as dimmed outlines with red borders
- **Notification:** Optional alerts when pipeline integrity checks fail

---

## 6. The Fundamental Question This Dashboard Answers

For a personal AI system with 40+ concurrent processes running 24/7:

> **"Did the things that were supposed to happen actually happen, in the sequence they were supposed to happen — and can I verify this at any point in time, past or present?"**

No other tool answers this question.

- **SSH and logs** are reactive and ephemeral. You must know what to look for, where to look, and when to look. By the time you check, the relevant logs may have rotated.
- **Monitoring tools** (Prometheus, Datadog) show metrics without semantic meaning. "P99 latency is 200ms" tells you nothing about whether extraction produced correct entities.
- **Trace viewers** (Jaeger, LangSmith) show single requests without temporal continuity. They cannot show that Tuesday's extraction quality was worse than Monday's across all conversations.
- **Health checks** report binary status. A service can be "healthy" while routing to a dead model, producing empty extractions, or falling back to CPU inference.

The Mindscape dashboard is the only interface that makes the invisible visible — turning Annie's silent, continuous processing into a watchable, replayable, auditable story of an AI learning about a person's life.

---

## 7. References

- `docs/RESEARCH-OBSERVABILITY.md` — The broader observability-first architecture philosophy and original design exploration (metaphors, API design, event taxonomy)
- `docs/RESEARCH-AGENT-ORCHESTRATION.md` — Agent orchestration patterns that generate the events the dashboard displays
- `docs/RESEARCH-CONTEXT-ENGINEERING.md` — Context engineering patterns visible through the Memory Zone
- `docs/RESOURCE-REGISTRY.md` — GPU VRAM budget for the models powering the creatures' backend processes
- `services/context-engine/` — Primary event source for extraction, gating, and knowledge graph sync
- `services/annie-voice/` — Voice pipeline events (STT, LLM, TTS, tool use)
- `services/audio-pipeline/` — Audio processing events (VAD, diarization, emotion recognition)