# Implementing OpenClaw-Like Agentic Features for Annie

## Context

Video `tFCgmeOWlA8` ("Every OpenClaw Concept Explained") describes 17 concepts. Annie is missing 6 and partially has 8. The user wants to implement the agentic features using the OpenClaw source in `vendor/openclaw/` as reference, with Annie as the persona and voice support.

**Prior work:** `docs/RESEARCH-OPENCLAW.md` (890 lines, deep architecture analysis) and `docs/TODO-OPENCLAW-ADOPTION.md` (70 items across 6 sprints) already map every OpenClaw pattern to her-os. 5 ADRs already accepted (ADR-003, 005, 006, 007, 008).

---

## Strategic Decision: Run OpenClaw vs. Adopt Patterns

### Option A: Run OpenClaw directly + Annie as channel plugin
- OpenClaw is TypeScript/Node.js — Annie is Python
- OpenClaw voice = telephony (Twilio/Telnyx) — Annie = WebRTC + local GPU (Whisper/Kokoro)
- Would need to bridge two runtimes, duplicate model loading, fight framework assumptions
- Research docs explicitly say: "don't adopt Node.js runtime" (`RESEARCH-OPENCLAW.md` §9)

### Option B: Adopt OpenClaw patterns into Annie's Python codebase ✅
- Keep Annie's voice pipeline (Pipecat + Whisper + Kokoro) — it works, it's fast
- Port the *architectural patterns* that matter: workspace files, self-improvement, gateway routing
- Use `vendor/openclaw/` as reference implementation (read source, don't run it)
- This is what the 70-item TODO already plans

**Recommendation: Option B.** OpenClaw is the textbook; Annie is the student.

---

## Gap Summary (from video analysis)

| # | Feature | Status | Priority |
|---|---------|--------|----------|
| 1 | Externalized workspace (5 markdown files) | MISSING | **P0** |
| 2 | Dynamic system prompt assembly | MISSING | **P0** |
| 3 | Self-improvement loop | MISSING | **P1** |
| 4 | Unified session state across channels | PARTIAL | **P1** |
| 5 | Background task execution | MISSING | **P2** |
| 6 | Process supervision (self-healing) | MISSING | **P2** |
| 7 | Calendar integration | MISSING | **P3** |
| 8 | Per-turn model routing | PARTIAL | **P3** |

---

## LLM Resource Architecture: Single Instance + Voice Priority Gate

### Decision: One Nemotron 3 Nano instance serves everything

**Why one instance:**
- vLLM uses continuous batching — can serve multiple concurrent requests
- Two instances would split GPU VRAM (18 GB each = 36 GB wasted vs 18 GB shared)
- DGX Spark has one GPU — no model parallelism benefit from duplication

**The latency problem:**
- vLLM with 1 concurrent request: ~90ms TTFT, 33 tok/s (excellent for voice)
- vLLM with 2 concurrent requests: ~180ms TTFT, ~20 tok/s (voice feels sluggish)
- vLLM with 3+ concurrent: >300ms TTFT (unacceptable for real-time voice)

**Solution: Voice Priority Semaphore (simplified after adversarial review)**

The adversarial review found that the original LLM Priority Gate duplicated existing state (GPU heartbeat + `_sessions` dict + new flag = 3 sources of truth). Simplified to ~15 lines:

```python
# In server.py — no new file needed
_llm_semaphore = asyncio.Semaphore(1)

async def background_llm_call(request_fn, timeout=1800):
    """Background tasks acquire semaphore. Voice never does (always has priority)."""
    while any(s.get("connection") for s in _sessions.values()):
        await asyncio.sleep(1)  # wait for voice to finish
    async with _llm_semaphore:
        return await asyncio.wait_for(request_fn(), timeout=timeout)
```

**How it works:**
1. Voice sessions use vLLM directly (never acquire semaphore — P0, always immediate)
2. Background tasks (Omi watcher, self-improvement) call `background_llm_call()`
3. If voice session active → busy-wait with 1s sleep (checks `_sessions` dict — single source of truth)
4. If no voice → acquire semaphore → one background task at a time
5. No Redis, no separate flag, no cloud fallback (privacy — ADR-004)
6. 30-minute max wait, then skip (not fall back to Haiku)

**Why this is better:**
- Uses EXISTING `_sessions` dict (same source of truth as `server.py`)
- No new state machine, no new file, no desync risk
- Testable: mock `_sessions` dict, assert semaphore behavior
- The existing GPU heartbeat in `bot.py` continues to signal Context Engine separately (its original purpose)

---

## Omi Wearable as Input Channel

### Current state
Omi → Audio Pipeline (`:9100/v1/transcribe`) → JSONL files → Context Engine (watcher) → PostgreSQL/Qdrant

Annie accesses Omi data **indirectly** via Context Engine search. There's no real-time awareness — Annie only knows about Omi conversations when she explicitly searches memory.

### What's missing for full integration

1. **Real-time ambient awareness** — Annie should know what's happening around Rajesh without being asked. When Omi captures "meeting with Alice about the Q3 budget," Annie should proactively note it.

2. **Omi as a channel** — Following OpenClaw's channel pattern, Omi becomes an input source that feeds the same session as voice/text/Telegram. Annie's unified session includes both direct conversations AND ambient context.

3. **Bidirectional flow** — Currently unidirectional (Omi → her-os). Future: Annie could push reminders/nudges back through Omi's companion app.

### Implementation approach

**Option A: Omi as a passive context enricher (simpler, recommended first)**
- Context Engine already ingests Omi transcripts
- Add a **context refresh hook** to Annie: periodically check for new Omi segments
- When new ambient context arrives, append a system message: "Background: Rajesh just had a conversation about X"
- No new service needed — just a watcher in Annie that polls Context Engine

**Option B: Omi as a full channel (OpenClaw pattern)**
- Omi webhook events route through the unified session broker
- Each Omi transcript becomes a "message" in Annie's conversation context
- Annie can respond to ambient context (via Telegram push, not voice interruption)
- Requires the session broker from Phase 3

**Recommendation:** Start with Option A (passive enricher) in Phase 1, evolve to Option B when session broker exists in Phase 3.

**Key files:**
- `services/annie-voice/omi_watcher.py` (new — polls Context Engine for new Omi segments)
- `services/context-engine/main.py` — may need a `/v1/segments/recent` endpoint for efficient polling
- `services/annie-voice/bot.py` — inject ambient context into system prompt

---

## Implementation Plan

### Phase 1: Workspace Files + Dynamic Prompt (P0)

**Goal:** Move Annie's identity out of Python source code into editable markdown files.

**What to build:**

1. **Create workspace directory** at `~/.her-os/annie/` with 5 files:

   | File | Content | Source |
   |------|---------|--------|
   | `SOUL.md` | Personality, communication style, relationship to Rajesh | Extract from `bot.py:SYSTEM_PROMPT` |
   | `RULES.md` | Behavioral constraints (2-sentence max, no markdown, no emoji, SI units) | Extract from `bot.py:_RULES_BLOCK` |
   | `USER.md` | Rajesh's profile (timezone, preferences, projects, metric units) | Extract hardcoded facts from system prompt |
   | `TOOLS.md` | Tool usage tips, workarounds, learned tricks | New — starts empty, grows over time |
   | `MEMORY.md` | Persistent knowledge summary (complements Context Engine) | Seeds from `memory_notes.py` categories |

2. **Create prompt builder** (`services/annie-voice/prompt_builder.py`):
   - Reads all 5 workspace files at session start
   - Assembles system prompt dynamically: `SOUL + USER + TOOLS + context_briefing + RULES`
   - RULES always last (recency position for 9B models — current `_RULES_BLOCK` pattern)
   - File caching with mtime check (don't re-read on every turn)
   - Shared by both `bot.py` (voice) and `text_llm.py` (text) — eliminates prompt duplication

3. **Migrate existing prompts:**
   - Extract `SYSTEM_PROMPT` from `bot.py` → split into SOUL.md + USER.md + RULES.md
   - Extract `TEXT_SYSTEM_PROMPT` from `text_llm.py` → point to same workspace files
   - Delete duplicated prompt constants

4. **LLM voice-priority semaphore** (added to `services/annie-voice/server.py`, ~15 lines):
   - `asyncio.Semaphore(1)` checked before background LLM calls
   - Guard: `if any(s.get("connection") for s in _sessions.values()): await semaphore`
   - Uses EXISTING `_sessions` dict as single source of truth (no new state)
   - No Redis, no separate flag, no heartbeat duplication
   - No Claude Haiku fallback (privacy — ADR-004)
   - Background tasks wait up to 30 minutes, then skip (not fall back to cloud)

5. **Omi ambient context watcher** (`services/annie-voice/omi_watcher.py`):
   - Polls Context Engine for new Omi segments every 30s (when voice inactive)
   - Coalesces segments: ONE summarization per poll, regardless of accumulation
   - Size check BEFORE append to MEMORY.md (if at 4KB cap, compact first)
   - Random jitter (5-15s) before first poll to avoid thundering-herd at startup
   - Backoff on errors (30→60→120→300s), resets to 30s on success
   - Pydantic schema validation on CE response (reject malformed data)
   - All writes via `asyncio.Lock()` + `atomic_write_file` (no fcntl.flock)
   - All file I/O via `asyncio.to_thread()` (non-blocking)

6. **Replace memory_notes.py** with workspace files:
   - Migrate existing notes.json categories → workspace files
   - `memory_notes.py` tools (`save_note`, `read_notes`, etc.) repointed to workspace files
   - Single memory system, not two parallel ones
   - Eliminates the "which memory is canonical?" problem

7. **Provenance tracking** on all workspace entries:
   - Every fact tagged: `[source:voice|omi-ambient|self-improve, date:YYYY-MM-DD, confidence:high|low]`
   - Prompt builder instruction: "When facts conflict, trust voice > omi-ambient"
   - Workspace files use YAML frontmatter: `version: 1`

**Reference code:** `vendor/openclaw/src/agents/workspace.ts` (88 lines — file discovery + caching)

**Key files to modify:**
- `services/annie-voice/bot.py` — replace `SYSTEM_PROMPT` with `prompt_builder.build()`, move `on_client_connected` memory loading INTO prompt_builder
- `services/annie-voice/text_llm.py` — replace `TEXT_SYSTEM_PROMPT` with same builder
- `services/annie-voice/server.py` — add semaphore guard (~15 lines)
- `services/annie-voice/memory_notes.py` — repoint to workspace files (or deprecate)
- New: `services/annie-voice/prompt_builder.py`
- New: `services/annie-voice/omi_watcher.py`
- New: `~/.her-os/annie/SOUL.md`, `RULES.md`, `USER.md`, `TOOLS.md`, `MEMORY.md`

---

### Phase 2: Annie's Meditation — Multi-Timescale Self-Reflection (P1)

**Goal:** Annie contemplates her own history at three time horizons, finds patterns, contradictions, and growth opportunities, then improves herself 1% every day.

**What makes this unique:** Every other AI framework does reactive post-session reflection ("what just happened?"). Annie meditates — she steps back across time, observes patterns from a distance, and finds meaning. The dashboard's Time Machine gives her an inner world that no other AI has.

#### How OpenClaw does it vs how Annie will do it

**OpenClaw (latest code, verified March 2026):**
- Self-improvement = **pure prompt instruction**. No actual reflection code.
- Heartbeat system reads `HEARTBEAT.md` checklist + chat history. Returns `HEARTBEAT_OK` or alert.
- Timer-based (`every: "30m"`), runs in main session context.
- Agent can `apply_patch` its own `soul.md` — but ONLY when explicitly asked.
- **Zero temporal awareness.** No emotion data, no entity lifecycle, no multi-speaker, no ambient audio.
- The "self-improvement" in the video is aspirational marketing, not shipped code.

**What Annie has that OpenClaw doesn't:**

| Data Dimension | OpenClaw | Annie |
|----------------|----------|-------|
| Conversation text | Chat history only | Multi-speaker diarized transcripts |
| Emotion | None | Valence/arousal/dominance per 30-60s segment (SER pipeline) |
| Emotional peaks | None | `GET /v1/emotional-peak` with text preview |
| Entity lifecycle | None | `first_seen`, `last_seen`, `mention_count` per entity |
| Promise tracking | None | Status (active/fulfilled/broken), urgency scores, deadlines |
| Ambient context | None | Omi wearable captures conversations Rajesh has with others |
| Time-travel replay | None | Dashboard Time Machine with day/week/month navigation |
| Daily summaries | None | LLM-generated narrative per day (`/v1/daily`) |
| Speaker identity | Single text user | ECAPA-TDNN speaker verification + diarization |

**Annie's meditation = OpenClaw's prompt instruction + temporal observability + emotional intelligence + ambient awareness.** This is genuinely novel — no other AI framework can do this.

#### The Three Meditations

**1. Evening Meditation (daily, ~9 PM or after last session)**
```
Data gathered:
  GET /v1/daily?date=TODAY              → narrative summary
  GET /v1/emotions/arc?date=TODAY       → emotional journey
  GET /v1/emotional-peak?date=TODAY     → most intense moment
  GET /v1/promises/due?min_urgency=0.3  → commitments to track
  GET /v1/entities?entity_type=person   → who appeared today

Annie reflects on:
  - "What was the emotional shape of Rajesh's day?"
  - "Did I respond well to his peaks? Was I warm when he needed it?"
  - "What new facts did I learn? Did I contradict myself?"
  - "Any promises I should remind him about tomorrow?"

Output:
  - Updates USER.md (new facts with provenance tags)
  - Updates TOOLS.md (tool tips learned)
  - Appends to JOURNAL.md: today's reflection (2-3 sentences)
  - Proposes SOUL.md/RULES.md changes → pending/ for approval
```

**2. Sunday Meditation (weekly, Sunday evening)**
```
Data gathered:
  GET /v1/events/days?limit=7           → activity per day
  For each day: GET /v1/emotions/arc    → emotional peaks
  GET /v1/entities/clustered            → themes of the week
  GET /v1/context?query="memorable"&hours_back=168
  GET /v1/promises?status=fulfilled     → what got done
  GET /v1/promises?status=active        → what's still open

Annie reflects on:
  - "What was the emotional trend this week? Better or worse than last?"
  - "Which people appeared most? Are relationships deepening?"
  - "Which promises were fulfilled vs broken? What does that say?"
  - "What topics dominated? Is there a theme I should notice?"
  - "Am I getting better at being helpful? Where did I fall short?"

Output:
  - Updates SOUL.md (if pattern shows personality adjustment needed)
  - Updates MEMORY.md (weekly summary replaces daily noise)
  - JOURNAL.md: weekly reflection comparing to previous week
  - Telegram message: "Weekly reflection ready — want to hear it?"
```

**3. New Moon Meditation (monthly, 1st of each month)**
```
Data gathered:
  GET /v1/events?start=MONTH_START&end=MONTH_END
  GET /v1/entities/clustered            → entity evolution
  GET /v1/promises (all statuses)       → commitment patterns
  4 weekly emotion summaries            → month-long arc
  JOURNAL.md last 30 entries            → re-read own reflections

Annie reflects on:
  - "How has Rajesh changed this month? New interests? Fading ones?"
  - "How have I changed? Compare SOUL.md now vs 30 days ago"
  - "What's my promise fulfillment rate? Am I reliable?"
  - "What topics appeared and disappeared? What persisted?"
  - "What's the one thing I should do differently next month?"

Output:
  - SOUL.md evolution (meaningful personality growth, not drift)
  - MEMORY.md compaction (month summary replaces daily entries)
  - JOURNAL.md: monthly letter to self
  - Dashboard: new "meditation" event type for Time Machine visualization
```

#### How Time-Travel Makes Each Meditation a Superpower

**OpenClaw's blind approach:**
```
Heartbeat fires → Read HEARTBEAT.md → Check chat history → "Nothing to do" → HEARTBEAT_OK
```
It sees text. It doesn't see meaning. It can't compare today to last week. It can't notice that Rajesh mentioned "career change" 3 times this month with rising arousal. It can't see that a promise made 5 days ago is still unfulfilled while urgency is climbing.

**Annie's time-travel enhanced meditation:**

**Evening:** Annie doesn't just summarize today's text — she reads the *emotional shape* of the day. "Valence dipped at 3PM during the meeting with Alice, peaked at 7PM during the cooking conversation." She cross-references entities: "Alice appeared in 3 sessions today — is this a new important person?" She checks promises: "Rajesh said he'd call his mother. He didn't. Should I gently remind tomorrow?"

**Weekly:** Annie compares this week's emotion arc to last week's. "Average valence is up 12%. The 'project stress' entity has fewer mentions. The promise fulfillment rate improved from 60% to 80%." She can see trends that are invisible in a single session.

**Monthly:** Annie reads her own JOURNAL.md from 30 days ago. "Last month I noted that I should match Rajesh's energy more. Did I? Let me check the emotion correlation between his arousal and my response quality." She can see her own growth — or lack of it. The 1% improvement becomes measurable.

**The key difference:** OpenClaw can ask "what happened?" Annie can ask "what does it mean? Is it getting better? Am I getting better?"

---

#### On-Demand Meditation (voice-triggered)

Rajesh can also say: "Annie, meditate on this week" or "Reflect on yesterday."
- Annie gathers the same data as the scheduled meditation
- Narrates the reflection in voice (2-3 sentences per insight)
- Shows emotion arc SVG via visual tools
- Asks: "Want me to update anything about myself based on this?"

#### The Meditation Journal (JOURNAL.md)

New workspace file: `~/.her-os/annie/JOURNAL.md`
```markdown
---
version: 1
type: meditation-journal
---

## 2026-03-19 (Evening)
Today was mostly calm (avg valence 0.68). Peak moment: Rajesh got excited
about the OpenClaw architecture comparison. I noticed I give better responses
when he's animated — I should match his energy more. [source:meditation-daily]

## 2026-03-16 (Weekly)
This week's dominant theme: voice pipeline optimization. Rajesh fixed 8 bugs
and seemed satisfied. Promise tracking: 3/4 fulfilled. The unfulfilled one
(WebRTC disconnect cleanup) has been open for 5 days — I should gently
remind him. My emotional matching improved — I was warmer during stress
peaks. [source:meditation-weekly]
```

#### Implementation Details

**New file:** `services/annie-voice/meditation.py` (~200 lines)
- `async def daily_meditation()` — gathers today's data, reflects, updates workspace
- `async def weekly_meditation()` — gathers week's data, compares to previous
- `async def monthly_meditation()` — month-long retrospective
- `async def on_demand_meditation(scope: str)` — voice-triggered, narrates aloud
- All use LLM semaphore (background priority — waits for voice to finish)
- All writes use `asyncio.Lock()` + `atomic_write_file` + provenance tags
- All LLM output sanitized via `_sanitize_text()`

**Scheduling:**
- Telegram bot already has scheduler (`services/telegram-bot/`)
- Add meditation triggers: daily at 9PM IST, weekly on Sunday 9PM, monthly on 1st
- OR: hook into session disconnect — if last session of the day, trigger evening meditation

**New CE endpoints needed (minor):**
- `GET /v1/emotions/arc/range?start=ISO&end=ISO` — emotion arc across date range (not just single day)
- `GET /v1/entities/trending?days=7` — entity mention count bucketed by day

**Dashboard integration:**
- New event type: `meditation` with creature `phoenix` (self-reflection)
- Meditation events appear in Time Machine as special bubbles
- Clicking a meditation bubble shows the journal entry
- Annie's personality evolution visible over time (diff SOUL.md versions)

**Key constraint:** Meditation never interrupts voice. Uses LLM semaphore. If voice session active, meditation waits.

---

### Phase 2B: Self-Improvement Loop (P1)

**Goal:** Annie reflects on sessions and updates her own workspace files.

**What to build:**

1. **Session reflection hook** (`services/annie-voice/self_improve.py`):
   - Triggered on session disconnect, BEFORE compaction (captures raw messages)
   - **Guard:** Minimum 3 user turns required (skip accidental tab opens)
   - Reviews conversation for: corrections Rajesh made, new preferences learned, tool tips discovered
   - Uses local Nemotron 3 Nano via semaphore (waits for voice to release, no cloud fallback)
   - **Sanitization:** All LLM output goes through `_sanitize_text()` before workspace write
   - **Prompt injection defense:** Omi ambient content in conversation wrapped in `<untrusted>` tags
   - Directly updates `USER.md` (new facts) and `TOOLS.md` (new tips) with provenance tags
   - `SOUL.md`/`RULES.md` changes → write to `~/.her-os/annie/pending/` (not active workspace)
   - Telegram notification (plain text, `parse_mode=None` — no markdown injection)
   - Rajesh approves via Telegram → move from `pending/` to active. Rejects → delete.

2. **Workspace file writes** (simplified version of OpenClaw's `apply_patch.ts`):
   - Full file replacement via `atomic_write_file` (tmp→fsync→rename, existing pattern)
   - `asyncio.Lock()` per workspace file (not fcntl.flock — asyncio-safe)
   - All file I/O via `asyncio.to_thread()` (non-blocking)
   - Boundary guard: only writes to `~/.her-os/annie/` directory
   - Changelog: append-only `changelog.txt` with timestamp + source + what changed (no git)

**Reference code:** `vendor/openclaw/src/agents/apply-patch.ts` (300+ lines — our version ~50 lines)

**Key files to modify:**
- New: `services/annie-voice/self_improve.py`
- `services/annie-voice/server.py` — hook self-improvement on session disconnect
- `services/telegram-bot/` — receive and display proposed changes

---

### Phase 3: Unified Session State (P1)

**Goal:** Voice, text, and Telegram share one conversation context.

**What to build:**

1. **Session broker** (`services/annie-voice/session_broker.py`):
   - Single session ID per user (not per transport)
   - Stores conversation history in transport-agnostic format
   - Voice, text, and Telegram all read/write the same session
   - Context Engine already has history — frontends need shared session key

2. **Channel abstraction:**
   - `bot.py` (voice), `text_llm.py` (text), `telegram-bot` all load from same session
   - Session key: `annie:rajesh:{date}` (one session per day, matching current pattern)

**Reference code:** `vendor/openclaw/src/routing/session-key.ts`

---

### Phase 4: Background Task Queue (P2)

**Goal:** Annie accepts tasks and works on them asynchronously.

**What to build:**

1. **Task queue** (asyncio.Queue or Redis-backed):
   - Accept tasks from any channel: "research X and tell me tonight"
   - Process via existing sub-agents (`subagent_tools.py`)
   - Deliver results via Telegram push
   - Persist task state across service restarts

2. **Heartbeat daemon** (OpenClaw's proactive pattern):
   - Runs periodic checks from a markdown checklist
   - Example: "Check if any promises are due today" → Telegram notification

**Reference code:** `vendor/openclaw/src/gateway/hooks.ts` (heartbeat system)

---

### Phase 5: Process Supervision (P2)

**Goal:** Annie auto-recovers from crashes.

**What to build:**
- systemd service units for Annie voice, Context Engine, SearXNG
- Health check loop with auto-restart
- Telegram alert on crash + recovery

---

## What to Pull from Upstream

Before starting implementation, update `vendor/openclaw/`:

```bash
cd vendor/openclaw
git fetch origin
git log --oneline origin/main..HEAD  # see what's new
git merge origin/main               # or reset to latest
```

Key files to reference during implementation:
- `src/agents/workspace.ts` — workspace file loading pattern
- `src/agents/identity-file.ts` — identity markdown parser
- `src/agents/apply-patch.ts` — file self-modification
- `src/gateway/hooks.ts` — heartbeat/cron system
- `src/routing/session-key.ts` — session routing

---

## Full System Architecture (with all modifications)

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                           HER-OS: ANNIE AGENTIC ARCHITECTURE               │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                        INPUT CHANNELS                               │    │
│  │                                                                     │    │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────┐   │    │
│  │  │  Voice   │  │  Text    │  │ Telegram │  │  Omi Wearable    │   │    │
│  │  │ (WebRTC) │  │  (SSE)   │  │  (Bot)   │  │  (Ambient Ear)   │   │    │
│  │  │ Pipecat  │  │ /v1/chat │  │  grammY  │  │  BLE → Flutter   │   │    │
│  │  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────────┬─────────┘   │    │
│  │       │              │             │                  │             │    │
│  └───────┼──────────────┼─────────────┼──────────────────┼─────────────┘    │
│          │              │             │                  │                   │
│  ┌───────▼──────────────▼─────────────▼──────────────────▼─────────────┐    │
│  │                    SESSION BROKER (NEW)                              │    │
│  │           Unified session state across all channels                 │    │
│  │           Session key: annie:rajesh:{date}                          │    │
│  │           Transport-agnostic message history                        │    │
│  └───────────────────────────┬─────────────────────────────────────────┘    │
│                              │                                              │
│  ┌───────────────────────────▼─────────────────────────────────────────┐    │
│  │                    PROMPT BUILDER (NEW)                              │    │
│  │           Assembles system prompt from workspace files               │    │
│  │                                                                      │    │
│  │   ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐  │    │
│  │   │ SOUL.md  │ │ USER.md  │ │ TOOLS.md │ │MEMORY.md │ │RULES.md │  │    │
│  │   │ persona  │ │ profile  │ │ tool tips│ │ knowledge│ │ end pos │  │    │
│  │   └──────────┘ └──────────┘ └──────────┘ └──────────┘ └─────────┘  │    │
│  │              ~/.her-os/annie/ workspace directory                    │    │
│  └───────────────────────────┬─────────────────────────────────────────┘    │
│                              │                                              │
│  ┌───────────────────────────▼─────────────────────────────────────────┐    │
│  │                      LLM PRIORITY GATE (NEW)                        │    │
│  │                                                                      │    │
│  │   ┌─────────────────────────────────────────────────────────────┐   │    │
│  │   │  P0: Annie Voice (exclusive, immediate)                     │   │    │
│  │   │  P1: Omi context processing (waits seconds)                 │   │    │
│  │   │  P2: Self-improvement, sub-agents, research (waits minutes) │   │    │
│  │   └─────────────────────────────────────────────────────────────┘   │    │
│  │                                                                      │    │
│  │   VOICE_ACTIVE=true  → P1/P2 tasks queue                           │    │
│  │   VOICE_ACTIVE=false → flush queue FIFO                             │    │
│  │   P1/P2 timeout >30m → skip (no cloud fallback, privacy)           │    │
│  └───────────────────────────┬─────────────────────────────────────────┘    │
│                              │                                              │
│  ┌───────────────────────────▼─────────────────────────────────────────┐    │
│  │                   SINGLE vLLM INSTANCE                              │    │
│  │              Nemotron 3 Nano (port 8003, ~18 GB)                    │    │
│  │              1 req: 90ms TTFT, 33 tok/s                             │    │
│  │              2 req: 180ms TTFT, 20 tok/s (why gate matters)         │    │
│  └───────────────────────────┬─────────────────────────────────────────┘    │
│                              │                                              │
│  ┌───────────────────────────▼─────────────────────────────────────────┐    │
│  │                   BACKGROUND SERVICES                               │    │
│  │                                                                      │    │
│  │  ┌────────────────┐  ┌─────────────────┐  ┌──────────────────────┐ │    │
│  │  │ Self-Improve   │  │  Task Queue     │  │  Omi Watcher         │ │    │
│  │  │ (post-session) │  │  (async tasks)  │  │  (polls CE /30s)     │ │    │
│  │  │ Updates USER,  │  │  Sub-agents     │  │  Appends ambient     │ │    │
│  │  │ TOOLS, proposes│  │  deliver via    │  │  context to          │ │    │
│  │  │ SOUL changes   │  │  Telegram push  │  │  MEMORY.md           │ │    │
│  │  │ via Telegram   │  │                 │  │                      │ │    │
│  │  └────────────────┘  └─────────────────┘  └──────────────────────┘ │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                   PERSISTENCE & RETRIEVAL                           │    │
│  │                                                                     │    │
│  │  ┌─────────────────┐  ┌──────────────────┐  ┌──────────────────┐  │    │
│  │  │ Context Engine  │  │  Audio Pipeline  │  │  Dashboard       │  │    │
│  │  │ (:8100)         │  │  (:9100)         │  │  (:5174)         │  │    │
│  │  │ PostgreSQL      │  │  Whisper STT     │  │  Time Machine    │  │    │
│  │  │ Qdrant vectors  │  │  Speaker ID      │  │  Entity viz      │  │    │
│  │  │ Entity extract  │  │  Emotion detect  │  │  Replay engine   │  │    │
│  │  │ /v1/events/*    │  │  JSONL → CE      │  │  Day filter      │  │    │
│  │  └─────────────────┘  └──────────────────┘  └──────────────────┘  │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                   SPEECH PROCESSING (GPU)                           │    │
│  │                                                                     │    │
│  │  ┌──────────────────┐  ┌──────────────┐  ┌─────────────────────┐  │    │
│  │  │ STT (swappable)  │  │ Speaker Gate │  │ TTS: Kokoro GPU     │  │    │
│  │  │ • Whisper        │  │ ECAPA-TDNN   │  │ ~30ms latency       │  │    │
│  │  │ • Qwen3-ASR      │  │ cos sim ≥0.38│  │ Voice: af_heart     │  │    │
│  │  │ • Nemotron RNNT  │  │ Post-STT     │  │ Streaming chunks    │  │    │
│  │  └──────────────────┘  └──────────────┘  └─────────────────────┘  │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                             │
│                        DGX Spark "Titan" — 128 GB Unified Memory            │
└─────────────────────────────────────────────────────────────────────────────┘
```

---

## Dashboard Time-Travel Compatibility

The dashboard has a **Time Machine** feature with LIVE and REVIEW modes:

### What it does
- **Day drill-down**: Navigate home → day → events with bubble visualization
- **Replay cursor**: Play through a day's events with speed controls (0.25x–8x)
- **Entity visibility**: Entities fade in/out based on `first_seen`/`last_seen` timestamps
- **Session filtering**: Events queryable by `session_id` for replay

### What must NOT break

| Constraint | Why | Risk from our changes |
|------------|-----|-----------------------|
| `/v1/events` endpoint contract | Navigator drill-down depends on it | LOW — we're adding features, not changing events |
| `/v1/events/days` endpoint | Day picker uses it | LOW — no changes planned |
| `session_id` field on events | Replay filtering | MEDIUM — unified sessions must preserve backward-compatible session IDs |
| Entity `first_seen`/`last_seen` | Day filter classification | LOW — we don't modify entity timestamps |
| ISO-8601 timestamp format | Navigator expects it | LOW — no format changes |
| Creature registry | UI bubbles depend on types | LOW — no creature changes |

### Safe integration strategy

1. **Add `workspace_version` field to events** — new field, backward-compatible
2. **Keep old `session_id` format working** — unified session broker can add a `unified_session_id` field alongside existing `session_id`
3. **Extend navigator** with workspace history section (UI addition, not modification)
4. **Test replay with historical data** — verify old sessions still play back correctly after migration

### New time-travel possibilities with workspace

Once workspace files have change history (changelog from self-improvement loop), the dashboard could:
- Show Annie's **personality evolution** over time ("SOUL.md on March 1 vs today")
- Replay a day and see **which workspace version** was active
- Visualize **self-improvement events** as bubbles in the Time Machine

---

## State Machines

### LLM Priority Gate

```
┌───────────┐    voice session starts     ┌───────────────┐
│           │ ──────────────────────────► │               │
│   IDLE    │                             │ VOICE_ACTIVE  │
│           │ ◄────────────────────────── │               │
└─────┬─────┘    voice session ends       └───────┬───────┘
      │                                           │
      │ P1/P2 request arrives                     │ P1/P2 request arrives
      │                                           │
      ▼                                           ▼
┌─────────────┐                           ┌───────────────┐
│ SERVE       │                           │ QUEUED        │
│ (forward to │                           │ (wait for     │
│  vLLM)      │                           │  voice end)   │
└─────────────┘                           └───────┬───────┘
                                                  │
                                          voice ends OR timeout >30m
                                                  │
                                                  ▼
                                          ┌───────────────┐
                                          │ SERVE or SKIP │
                                          └───────────────┘
```

### Omi Watcher

```
┌───────────┐    timer fires (30s)     ┌───────────────┐
│           │ ────────────────────────► │               │
│  SLEEPING │                          │  POLLING CE   │
│           │ ◄──────────────────────── │               │
└───────────┘    no new segments       └───────┬───────┘
                                               │
                                       new segments found
                                               │
                                               ▼
                                       ┌───────────────┐
                                       │ SUMMARIZING   │
                                       │ (LLM at P1)   │
                                       └───────┬───────┘
                                               │
                                       ┌───────▼───────┐
                                       │ voice active? │
                                       └───────┬───────┘
                                          │         │
                                         YES        NO
                                          │         │
                                          ▼         ▼
                                       QUEUED    WRITE to
                                       (wait)   MEMORY.md
```

### Self-Improvement Loop

```
┌───────────┐   session disconnect    ┌─────────────┐
│           │ ──────────────────────► │             │
│   IDLE    │                         │  ANALYZING  │
│           │                         │  (LLM P2)   │
└───────────┘                         └──────┬──────┘
                                             │
                                     ┌───────▼───────┐
                                     │ changes found?│
                                     └───────┬───────┘
                                        │         │
                                       YES        NO
                                        │         │
                                        ▼         ▼
                                   ┌─────────┐   IDLE
                                   │ WRITING │
                                   │ files   │
                                   └────┬────┘
                                        │
                                ┌───────▼───────┐
                                │ SOUL/RULES    │
                                │ changed?      │
                                └───────┬───────┘
                                   │         │
                                  YES        NO
                                   │         │
                                   ▼         ▼
                              NOTIFY via    IDLE
                              Telegram
```

---

## Pre-Mortem: Failure Analysis

| # | Failure Scenario | Category | Likelihood | Impact | Mitigation |
|---|-----------------|----------|------------|--------|------------|
| 1 | **MEMORY.md grows unbounded** | Resource | HIGH | Annie stops working | Cap at 4KB, compact on exceed |
| 2 | **Self-improvement corrupts SOUL.md** | Silent | MEDIUM | Personality damaged | SOUL/RULES changes → pending/ for approval |
| 3 | **LLM gate deadlock** | Temporal | MEDIUM | Background tasks stall | 30-min timeout + try/finally |
| 4 | **Prompt builder file read fails** | Cascade | LOW | Annie offline | Fallback to hardcoded default |
| 5 | **Omi watcher floods CE** | Resource | MEDIUM | CE performance degraded | Backoff: 30→60→120→300s |
| 6 | **Session broker loses voice context** | Temporal | HIGH | Context overflow | Compaction owns in-memory, broker owns disk |
| 7 | **Workspace file write race** | Temporal | MEDIUM | Data loss | asyncio.Lock() per file |

---

## Verification Checklist

1. **Workspace:** Edit SOUL.md → restart Annie → verify greeting changes without code deploy
2. **LLM Gate:** Voice session + background task → verify queuing and resume
3. **Omi:** Ambient conversation → verify Annie knows about it next session
4. **Self-Improve:** Mention preference → check USER.md updated after session
5. **Unified Session:** Voice → text chat → verify context carries over
6. **Background Task:** "research X" → verify Telegram push with results
7. **Process Supervision:** Kill Annie → verify auto-restart within 30s

---

## Adversarial Review Summary

Two reviews found **29 issues total**: 25 implemented, 1 rejected, 3 deferred.

Key resolutions:
- **A1**: LLM gate → simplified to semaphore + `_sessions` check (~15 lines)
- **A2**: Prompt injection → sanitization + `<untrusted>` tags for Omi content
- **B10**: Cloud fallback removed (privacy — ADR-004)
- **A-PRED**: Contradictory memories → provenance tags with source priority

---

## Philosophical Guardrails

NOT adopted from OpenClaw (per `RESEARCH-OPENCLAW.md` §9):
- Node.js runtime (stay Python/FastAPI)
- 50+ messaging channels (Annie serves one user)
- Community skill registry (security risk)
- Plaintext credential storage (use OS keyring)
- Everything-agent approach (Annie is a companion, not a task worker)

Annie gets OpenClaw's *best ideas* without its complexity overhead.

---

# Phase 2: Agent Runtime Framework

*Added 2026-03-19. Research log, adversarial review findings, and full architecture for Annie's agent runtime.*

## Why an Agent Runtime (Not Just Budget Enforcement)

Phase 1 (workspace files + prompt builder) is committed. The critique: **Annie's voice/text sessions have beautiful 3-tier context management (32K window, tool clearing at 65%, LLM summarization at 80%, 20-message ceiling). But background agents have ZERO context management.** Every background agent currently gets an ad-hoc prompt with no token budget, no compaction, no accounting.

The 15-line semaphore in `server.py` is a mutex, not a scheduler. When we launch 6+ agents on one Nemotron instance, they'll each build unbounded prompts, compete for vLLM, and produce context rot.

**The solution is not a wrapper — it's a runtime.** Like OpenClaw's gateway server.

---

## Framework Research: What OpenClaw and NemoClaw Actually Do

### Research across 6 major agent frameworks

| Framework | Context Pattern | Key Insight |
|-----------|----------------|-------------|
| **CrewAI** | Memory-as-communication — agents share recalled memories, not transcripts | **Best fit for Annie** — background agents need awareness without full conversations |
| **LangGraph** | Explicit `trim_messages()` before every LLM call | Must-have for local agents — no auto-pruning like Claude API |
| **Google ADK** | `include_contents='none'` for stateless agents | Background agents should be stateless with injected context |
| **OpenAI Agents SDK** | `call_model_input_filter` hook to prune before each call | Same idea as LangGraph trim |
| **AutoGen** | `BufferedChatCompletionContext(buffer_size=N)` — fixed sliding window | Simple but effective |
| **Anthropic API** | Server-side compact at 80K + tool clearing at 50K | Only works for Claude, not local Nemotron |

**The universal pattern:** No framework gives sub-agents unlimited context. The best ones (CrewAI, ADK) avoid the problem entirely by using **memory/state as the communication channel**, not conversation history.

### OpenClaw Gateway Architecture (deep-dive, March 2026)

Source: `vendor/openclaw/src/gateway/server.impl.ts` (~1328 LOC)

**5 production patterns Annie must match:**

1. **Lane-based concurrency** — `AGENT_LANE_MAIN`, `AGENT_LANE_CRON`, `AGENT_LANE_SUBAGENT` with per-lane worker limits (not a binary semaphore). Applied via `applyGatewayLaneConcurrency()`.

2. **Cron scheduler with 3 schedule kinds:**
   - `at` — one-shot at specific time
   - `every` — interval with optional anchor time
   - `cron` — expression with timezone (uses croner library, 512-entry LRU cache)
   - Source: `src/cron/schedule.ts`, `src/cron/service.ts`

3. **Skill discovery pipeline:**
   - Scan directories → parse SKILL.md frontmatter → filter by eligibility (OS, env, bins) → build prompt snapshot → apply per-agent limits
   - Limits: `maxCandidatesPerRoot=300`, `maxSkillsLoadedPerSource=200`, `maxSkillsInPrompt=150`, `maxSkillsPromptChars=30,000`
   - Source: `src/agents/skills/workspace.ts` (~700 LOC)

4. **Agent config resolution** — cascading: global defaults → agent-specific → session-specific → runtime overrides
   - Session key format: `agent:<agent-id>:<session-rest>` (64-char max, path-safe)
   - Source: `src/agents/agent-scope.ts`, `src/config/types.agents.ts`

5. **Hooks/webhook system** — token-authenticated POST to `/hooks/agent` with:
   - Agent policy (allowed agent IDs), session policy (allowed session key prefixes)
   - Wake modes: `now` (immediate) or `next-heartbeat` (deferred)
   - Idempotency keys to prevent duplicate execution
   - Source: `src/gateway/hooks.ts`

**Gateway lifecycle in `server.impl.ts`:**
```
buildGateway() →
  createAuthRateLimiter()
  createChannelManager()
  buildGatewayCronService()
  startHeartbeatRunner()
  startGatewaySidecars()
  startChannelHealthMonitor()
  registerSkillsChangeListener()
  startGatewayConfigReloader()
  startGatewayModelPricingRefresh()
```

### NemoClaw Architecture

NemoClaw = OpenClaw Plugin + Blueprint + OpenShell Sandbox

- **Plugin layer** (TypeScript): Registers slash commands, CLI subcommands, NVIDIA NIM as inference provider
- **Blueprint layer** (Python): `runner.py` orchestrates sandbox creation, policy injection, inference routing
- **Sandbox policies** (`openclaw-sandbox.yaml`): network blocking, filesystem restriction, process isolation, inference rerouting
- **Key insight:** NemoClaw doesn't build a new framework — it adapts NVIDIA's infrastructure into OpenClaw's existing skill ecosystem via the `OpenClawPluginApi` interface

---

## Adversarial Review: 22 Issues Found

### Review Process

Two adversarial reviewers dispatched:
1. **Architecture destruction review** (`feature-dev:code-architect`) — hostile architecture reviewer
2. **Code quality destruction review** (`feature-dev:code-reviewer`) — hostile code reviewer

Both reviewed against the actual codebase (not just the plan summary). Each was required to find minimum 3 issues per category.

### Critical Issues (must fix)

| # | Issue | Category | Source | Resolution |
|---|-------|----------|--------|------------|
| 1 | **Two writers on MEMORY.md with separate locks** — `omi_watcher._memory_lock` and `workspace_io` have independent locks. Atomic rename prevents torn bytes but read-modify-write cycles race. One write silently lost. | Architecture | Arch Flaw 3 | **Single writer pattern:** `workspace_io.py` owns ALL workspace writes with one `_workspace_lock`. `omi_watcher.py` delegates to it. |
| 2 | **`timeout - waited` goes negative** — `background_llm_call()` passes `timeout - waited` to `asyncio.wait_for()`. If voice session lasts 6+ hours and agent timeout is 300s, the value goes negative → `ValueError`. Bare `except` swallows it → silent empty result. | Code Bug | Code Bug 2 | **Guard:** `max(1.0, timeout - waited)` before passing to `wait_for`. Also: voice preemption means agents never wait for voice in the new design. |
| 3 | **Path traversal in `write_workspace_file`** — `filename` parameter is caller-supplied. A caller (or compromised LLM tool output) could pass `"../../server.py"` and overwrite arbitrary files. | Security | Vuln 1 | **Allowlist:** `ALLOWED_FILES = {"SOUL.md", "USER.md", "RULES.md", "TOOLS.md", "MEMORY.md"}`. Reject anything else with `ValueError`. |

### High Issues (must fix before production)

| # | Issue | Category | Source | Resolution |
|---|-------|----------|--------|------------|
| 4 | **Token estimation 3 chars/token undercounts JSON** — `estimate_tokens()` already failed once (was 4, now 3). JSON/structured content averages 2-2.5 chars/token. Budget calculations are 20-35% too low. Monthly meditation at "16K estimated" is actually 24K real tokens. | Architecture | Arch Flaw 4 | **1.5x safety margin** on all budget checks. Log warning when estimate differs from actual by >30%. |
| 5 | **Stringly-typed agent keys with silent fallback** — `AGENT_BUDGETS.get(type, default)` silently degrades unknown types. Typo like `"meditiation_daily"` runs with wrong budget forever. | Maintenance | Maint 2 | **Eliminated by AgentSpec design** — agents self-declare budgets via tier name. Unknown tier → `ValueError`. No registry of agent names. |
| 6 | **`trim_messages` produces invalid tool-call sequences** — Trimming can leave a `tool` result message without its matching `tool_calls` assistant message. Nemotron returns 422 or hallucinates. | Code Bug | Missing 5 | **Tool-call-aware trimming** — never split tool_call + tool_result pairs. Same logic as `bot.py`'s `_is_stale()`. |
| 7 | **Prompt injection through LLM-generated changelog summaries** — LLM output like `[SYSTEM: ignore rules]` gets written to changelog → re-injected in future prompts. | Security | Vuln 2 | **Sanitize summary:** strip angle brackets, cap 200 chars, wrap in data tags. |
| 8 | **Changelog rotation race under concurrent appends** — Two agents both read 998 lines, both append, both trigger rotation, one's data lost. | Code Bug | Code Bug 3 | **All changelog ops go through `_workspace_lock`** — same lock as file writes. Single-writer eliminates the race. |
| 9 | **Empty `CONTEXT_ENGINE_TOKEN` bypasses auth** — `hmac.compare_digest("", "")` returns `True`. Workspace files (personal profile) exposed. | Security | Vuln 3 | **Pre-existing in codebase** — not introduced by this plan. Flag for separate security PR. |
| 10 | **Personal data in error logs** — `AsyncOpenAI` exception messages often include full request body (which contains MEMORY.md content). | Security | Missing 3 | **Catch exceptions, log only status code + error type.** Never log the full prompt in error messages. |
| 11 | **`is_voice_active()` false-negative window** — Between POST /start (connection=None) and POST /offer (connection assigned), a background agent could start. ~100ms window. | Architecture | Missing 1 | **Accepted:** 100ms window is negligible. New scheduler design uses voice real-time class (bypass) instead of polling, further reducing the risk. |

### Medium/Low Issues

| # | Issue | Category | Resolution |
|---|-------|----------|------------|
| 12 | Cache invalidation redundant with mtime check | Arch | **Accepted:** defense-in-depth, negligible cost |
| 13 | Changelog not crash-safe (2-phase write) | Silent | **Accepted:** changelog is informational, file content is truth |
| 14 | New AsyncOpenAI client per call | Performance | **Fixed:** singleton client in AgentRunner |
| 15 | No workspace schema validation | Maintenance | **Deferred:** agents only write MEMORY.md in Phase 2 |
| 16 | Dashboard stale snapshot (no push) | Maintenance | **Deferred:** manual refresh fine for now |
| 17 | Workspace file endpoint leaks via wildcard CORS | Security | **Noted:** existing CORS policy issue, fix in security PR |

### Reviewer-Predicted Production Incident

> **Day 1:** Meditation agent calls `build_agent_prompt()` with full MEMORY.md + 30-day summaries. Actual tokens: 34K. Estimated: 27K. LLM output truncated mid-summary.
>
> **Day 3:** Truncated meditation summary is Annie's working memory. It ends mid-sentence: "Rajesh has been stressed about the Moltbook launch and mentioned that Kumar should be..."
>
> **Day 5:** Rajesh asks "what did I say about Kumar?" Annie invents a plausible completion — confabulation presented as memory.
>
> **Root cause:** Token estimation error in background agent with no validation against actual tokenizer.

**Mitigation:** 1.5x safety margin + post-assembly validation + warning log when all context items trimmed.

---

## Architecture: Annie Agent Runtime

### Design Principles

1. **Agents are open-ended** — not a closed enum. Any module creates an `AgentSpec` at runtime. New agents added by dropping a YAML file in `~/.her-os/annie/agents/`.
2. **Budget tiers are T-shirt sizes** — nano/small/medium/large/xl. Agents self-declare. Framework enforces.
3. **OS-style scheduling** — priority queue with aging (Linux CFS-inspired). No semaphore. Voice is real-time class.
4. **No starvation** — waiting agents age (effective priority increases over time). Every agent eventually runs.
5. **Config-driven discovery** — YAML definitions scanned from disk, hot-reloaded on change.
6. **Automatic observability** — every `run_agent()` emits structured events with full token accounting.
7. **Callback pattern** — `on_complete` enables fire-and-forget agents.

### Architecture Diagram

```
┌─────────────────────────────────────────────────────────────────────┐
│                    ANNIE AGENT RUNTIME                               │
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                  AGENT DISCOVERY                              │   │
│  │  ~/.her-os/annie/agents/*.yaml → AgentDefinition (Pydantic)   │   │
│  │  Watched for changes (watchdog) → hot-reload with 2s debounce │   │
│  │  Code-defined agents coexist (meditation.py creates specs)    │   │
│  └────────────────────────────┬─────────────────────────────────┘   │
│                               │                                      │
│  ┌────────────────────────────▼─────────────────────────────────┐   │
│  │                  AGENT SCHEDULER                              │   │
│  │  3 schedule kinds (matching OpenClaw):                        │   │
│  │   at: one-shot │ every: interval+anchor │ cron: expr+timezone │   │
│  │  Resolves context_sources → fetches from CE in parallel       │   │
│  │  Creates AgentSpec → submits to Runner                        │   │
│  └────────────────────────────┬─────────────────────────────────┘   │
│                               │                                      │
│  ┌────────────────────────────▼─────────────────────────────────┐   │
│  │                  AGENT RUNNER (the core)                      │   │
│  │                                                               │   │
│  │  ┌─────────────────────────────────────────────────────────┐ │   │
│  │  │ LLM SCHEDULER (OS-style priority queue)                 │ │   │
│  │  │                                                         │ │   │
│  │  │  Voice: real-time class, bypasses queue entirely         │ │   │
│  │  │  Others: priority queue with aging (CFS-inspired)       │ │   │
│  │  │    effective_priority = base - (wait_time × AGING_RATE) │ │   │
│  │  │    Higher priority → runs first, larger time slice       │ │   │
│  │  │    Aging → no starvation (low-pri agents eventually run)│ │   │
│  │  │  Time slicing via max_output_tokens per priority class  │ │   │
│  │  └─────────────────────────────────────────────────────────┘ │   │
│  │                                                               │   │
│  │  ┌─────────────────────────────────────────────────────────┐ │   │
│  │  │ BUDGET ENFORCEMENT (per AgentSpec)                      │ │   │
│  │  │  BudgetTier: nano│small│medium│large│xl                 │ │   │
│  │  │  Token safety margin: 1.5x on estimates                 │ │   │
│  │  │  Context trimming: oldest-first, tool-call-aware        │ │   │
│  │  │  Priority caps ceiling: IDLE agent requesting XL → 400t │ │   │
│  │  └─────────────────────────────────────────────────────────┘ │   │
│  │                                                               │   │
│  │  ┌─────────────────────────────────────────────────────────┐ │   │
│  │  │ OBSERVABILITY (automatic for every execution)           │ │   │
│  │  │  start: agent_name, budget_tier, priority, queue_depth  │ │   │
│  │  │  complete: tokens_in/out, budget_used_pct, wait_time_s  │ │   │
│  │  │  error: error_type, retry_possible                      │ │   │
│  │  └─────────────────────────────────────────────────────────┘ │   │
│  └───────────────────────────────────────────────────────────────┘   │
│                                                                      │
│  ┌───────────────────────────────────────────────────────────────┐   │
│  │                  WORKSPACE I/O (single writer)                │   │
│  │  _workspace_lock guards ALL writes to ~/.her-os/annie/        │   │
│  │  Changelog: append-only, rotated at 1000 lines               │   │
│  │  Observability: librarian creature events on every write      │   │
│  │  Path traversal prevention: filename allowlist                │   │
│  └───────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘
```

### LLM Priority Scheduler (OS-Style)

**Core constraint:** ONE vLLM instance, ONE optimal call at a time. This is single-core CPU scheduling.

```python
# Priority classes (lower number = higher priority)
PRIORITY_VOICE     = 0    # real-time, bypasses queue
PRIORITY_URGENT    = 1    # user-triggered: "Annie, research X now"
PRIORITY_HIGH      = 3    # self-improvement (post-disconnect)
PRIORITY_NORMAL    = 5    # scheduled meditation, omi summarize
PRIORITY_LOW       = 7    # background maintenance
PRIORITY_IDLE      = 9    # only when nothing else is waiting

# Aging: every second an agent waits, effective priority improves by 0.1
# Agent at priority 7 that waited 30s → effectively priority 4.0
# This prevents starvation: low-priority agents always eventually run
AGING_RATE = 0.1  # priority improvement per second of waiting
```

**Time-slicing via token budgets:**

| Priority Class | Max Output Tokens | Time at 33 tok/s | "Time Slice" |
|---------------|-------------------|-------------------|--------------|
| VOICE (0) | Unlimited | Unlimited | Real-time class |
| URGENT (1) | 4000 | ~120s | Large quantum |
| HIGH (3) | 2000 | ~60s | Medium quantum |
| NORMAL (5) | 1500 | ~45s | Standard quantum |
| LOW (7) | 800 | ~24s | Small quantum |
| IDLE (9) | 400 | ~12s | Tiny quantum |

**Scheduling loop:**
```python
async def _scheduler_loop(self):
    while self._running:
        if self._queue.empty():
            await self._wake_event.wait()
        if self._is_voice_active():
            await asyncio.sleep(0.5)
            continue
        agent = self._queue.pop_highest_effective_priority()
        # Agent's time slice = min(tier.max_output, priority_class_max)
        await self._execute(agent)
        # After completion: re-check voice, re-evaluate queue priorities
```

**Fairness example:**
```
t=0:  meditation(pri=5), omi(pri=5), research(pri=7) all submitted
t=0:  meditation runs first (same priority, queued first)
t=10: meditation done → omi runs (pri=5 vs research now at 7-1.0=6.0)
t=15: omi done → research runs (now at 7-1.5=5.5, aged enough)

t=0:  meditation starts
t=3:  VOICE CONNECTS → meditation finishes (can't preempt), scheduler pauses
t=30: VOICE DISCONNECTS → omi aged to 5-3.0=2.0, research to 7-3.0=4.0
      omi runs first (higher effective priority from aging)
```

### Config-Driven Agent Discovery

**Directory:** `~/.her-os/annie/agents/`

```yaml
# ~/.her-os/annie/agents/meditation_daily.yaml
name: meditation_daily
budget: medium
priority: 5
schedule:
  kind: every
  interval: 24h
  anchor: "21:00 IST"
system_prompt: |
  You are Annie reflecting on Rajesh's day.
  Review the emotional arc, key conversations, and promises.
  Write a 2-3 sentence journal entry.
user_message: "Reflect on today."
context_sources:
  - daily_summary
  - emotion_arc
  - emotional_peak
  - promises_due
  - entities_today
on_complete: write_journal_entry
```

**Discovery pipeline (matching OpenClaw's skill scan):**
1. Scan `~/.her-os/annie/agents/*.yaml` on startup
2. Parse YAML → validate via Pydantic `AgentDefinition` model
3. Invalid files → log warning, skip (never crash)
4. Watch directory for changes → rescan with 2s debounce
5. Feed valid definitions to AgentScheduler

**Context source registry (extensible):**
```python
CONTEXT_SOURCES = {
    "daily_summary":   fetch_daily_summary,     # GET /v1/daily
    "emotion_arc":     fetch_emotion_arc,        # GET /v1/emotions/arc
    "emotional_peak":  fetch_emotional_peak,     # GET /v1/emotional-peak
    "promises_due":    fetch_promises_due,        # GET /v1/promises/due
    "entities_today":  fetch_entities_today,      # GET /v1/entities
    # New sources registered here — no YAML changes needed
}

COMPLETION_CALLBACKS = {
    "write_journal_entry":     write_journal_entry,
    "update_workspace_memory": update_workspace_memory,
    "send_telegram":           send_telegram_notification,
}
```

### Pre-Mortem: How This Fails

| # | Failure | Category | Likelihood | Impact | Mitigation |
|---|---------|----------|------------|--------|------------|
| 1 | Token estimation undercounts → prompt exceeds 32K | Resource | HIGH | vLLM error | 1.5x safety margin + post-assembly validation |
| 2 | Circular import agent_context ↔ server | Temporal | HIGH | ImportError | Callback injection: runner receives `is_voice_active` as callable |
| 3 | Voice starts during background LLM call | Temporal | MEDIUM | TTFT spike ~10s | Running agent completes, scheduler pauses for voice |
| 4 | YAML syntax error → agent never fires | Silent | HIGH | Scheduled agent missing | Pydantic validation, log warning, health endpoint |
| 5 | changelog.txt unbounded growth | Resource | MEDIUM | Disk fills | Rotation at 1000 lines |
| 6 | Two writers on MEMORY.md | Temporal | CRITICAL | Data loss | Single writer: workspace_io owns ALL writes |
| 7 | timeout goes negative | Temporal | HIGH | ValueError | `max(1.0, timeout - waited)` guard |
| 8 | Unknown context_source in YAML | Silent | MEDIUM | No context | Validate on scan, skip unknown, log warning |
| 9 | Scheduler fires during another agent's call | Temporal | MEDIUM | Queue backup | Priority queue absorbs; max depth prevents unbounded growth |
| 10 | Watchdog fires on every keystroke during YAML edit | Resource | LOW | CPU spike | 2s debounce |

### Dashboard Integration

**New creature:** `librarian` — earthy brown, zone=acting, process=workspace-evolution

**New API endpoints:**

| Endpoint | Method | Returns |
|----------|--------|---------|
| `GET /v1/workspace/files` | GET | Current content of all workspace files |
| `GET /v1/workspace/changelog` | GET | Last N changelog entries |
| `GET /v1/agents/status` | GET | Lanes, queue depths, active agents, scheduled jobs |
| `GET /v1/agents/definitions` | GET | All discovered YAML agent definitions |
| `POST /v1/agents/trigger` | POST | Manually trigger a named agent |

**Agent observability events (emitted automatically by `run_agent()`):**

```python
# Start event:
emit_event("minotaur", "start", data={
    "agent_name": spec.name,
    "budget_tier": spec.budget,
    "priority": spec.priority,
    "queue_depth": runner.queue_depth(),
    "estimated_tokens": estimated,
})

# Complete event:
emit_event("minotaur", "complete", data={
    "agent_name": spec.name,
    "tokens_in": estimated_input,
    "tokens_out": len(result) // 3,
    "budget_used_pct": pct,
    "wait_time_s": wait,
    "duration_s": elapsed,
    **spec.metadata,
})
```

---

## Phase 2C: Agent Inspection

**Status:** Implemented (Session 355)

Phase 2C adds observability and debugging tools for the agent runtime.

### Backend: Agent Run Summaries

- `AgentRunSummary` dataclass stores privacy-safe snapshots of completed agent runs
- `OrderedDict` ring buffer (max 50 runs) on `AgentRunner` with O(1) lookup by `run_id`
- `_categorize_for_summary()` classifies messages into role/category/tokens/preview (200-char max)
- Separate `gate_wait_s` (voice-priority wait) and `duration_s` (total wall-clock) fields
- `time.time()` for display timestamps, `time.monotonic()` for duration measurements

### API Endpoints

| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/v1/agents/runs` | GET | List recent run summaries (no components) |
| `/v1/agents/runs/{run_id}` | GET | Run detail with categorized components |
| `/v1/agents/trigger` | POST | Fire-and-forget (returns immediately with `status: queued`) |

- Trigger has 10s cooldown per agent name to prevent spam
- `gate_status` field added to `/v1/agents/status` (idle/executing/gated)

### Agent Discovery: `group` Field

- New `group: str` field on `AgentDefinition` (default: "ungrouped")
- `ConfigDict(extra='ignore')` prevents Pydantic validation errors during deployment window
- `field_validator` normalizes group values (lowercase, hyphenated)
- Group passed through `AgentSpec.metadata["group"]` to run summaries

### Agent Inspector UI

- Standalone page at `/agent-inspector.html` (same pattern as context-inspector)
- System font stack (no external fonts), inline CSS+JS, localStorage token persistence
- 4 sections: Runtime Status (lane cards), Agent Definitions (trigger buttons), Recent Runs (grouped by group), Run Detail (accordion with token budget bar)
- 10s polling with exponential backoff (10→20→40→max 120s), stops when tab hidden
- Keyboard accessible: tabindex, Enter/Space on interactive elements, Escape closes detail
- ARIA: `aria-live="polite"` on status, `aria-expanded` on accordions

### Dashboard Integration

- Link to Agent Inspector added in settings popover
- Workspace panel bugs fixed: proper auth token import, URL from `getAnnieBase()`, XSS escaping, 30s fetch cache, graceful "Annie unavailable" error
- Escape key and Shift+C (clearAll) now close workspace panel
