# Next Session: Annie Unified Capability Registry — DONE (Session 410)

## What

Implement a unified capability registry with two components:
1. **Capability Manifest** (`/v1/capabilities` endpoint) — read-only aggregation of ALL 11 capability groups
2. **Dispatch Dict** (`tool_dispatch.py`) — thin O(1) dispatch replacing the 3 if-chains in text_llm.py

## Why

Rajesh asked (session 403): "We need a registry where we can register all these kinds of skills and tools." Annie has 27+ tools across 4 files with 3 dispatch if-chains, plus 8 background/scheduled capabilities invisible to any introspection. Two rounds of adversarial review (38 issues total) rejected the full ToolRegistry approach as over-engineered. This simpler design was the result.

## Full Plan

See `~/.claude/plans/dazzling-knitting-haven.md` for the complete plan with:
- Architecture (capability_manifest.py + tool_dispatch.py)
- All 11 capability groups mapped
- Pre-mortem failure analysis
- Full adversarial review trail (Round 1: 16 issues, Round 2: 22 issues)

## Annie's Full Capability Surface (11 groups)

| # | Group | Type | Count | Where |
|---|-------|------|-------|-------|
| 1 | **LLM Tools (text)** | tool_use | 15-28 | text_llm.py CLAUDE_TOOLS |
| 2 | **LLM Tools (phone)** | tool_use | 5 | phone_tools.py |
| 3 | **Browser Automation** | tool_use (gated) | 10 | browser_agent_tools.py |
| 4 | **Sub-agents** | tool_use → LLM call | 3 | subagent_tools.py |
| 5 | **Router Monitoring** | background + tool | alerts, anomaly, reports | router_monitor.py, network_anomaly.py |
| 6 | **Proactive Pulse** | scheduled (15min) | 2-stage triage | proactive_pulse.py |
| 7 | **Meditation** | scheduled (daily/weekly/monthly) | 3 meditations | meditation.py |
| 8 | **Context Engine** | always-on service | extraction, nudges, reflections | services/context-engine/ |
| 9 | **Emotion Recognition** | pipeline sidecar | SER + compound emotions | services/ser-pipeline/ |
| 10 | **Telegram Bot** | event-driven | briefings, approvals, buttons | services/telegram-bot/ |
| 11 | **Omi Ambient Watcher** | polling (30s) | transcript → memory | omi_watcher.py |

## Implementation (what to build)

### 1. `services/annie-voice/capability_manifest.py` (~120 lines)

Read-only manifest builder. Aggregates from existing sources:
- `CLAUDE_TOOLS` list → tool names, descriptions, groups, channels
- `PHONE_OPENAI_TOOLS` → phone tool subset
- `SkillLoader.get_catalog()` → skills (if any)
- `AgentDiscovery.scan()` → scheduled agents
- `_BACKGROUND_SERVICES` hardcoded list → 13 background/pipeline services

Enrichment tables (`_TOOL_GROUPS`, `_TOOL_CHANNELS`, `_BACKGROUND_SERVICES`) provide metadata without changing any existing code.

### 2. `services/annie-voice/tool_dispatch.py` (~80 lines)

Thin dispatch dict replacing 3 if-chains:
```python
_HANDLERS: dict[str, Callable] = {}
def register(name: str, handler: Callable) -> None: ...
async def dispatch(name: str, args: dict, user_message: str = "") -> str: ...
```

### 3. Modify `services/annie-voice/text_llm.py` (~-70, +30 lines)

Replace `_dispatch_tool()` 23-branch if-chain with register() calls + 1-line dispatch. The supervisor stack (`_execute_tool_native`, `_execute_tool_typed`, hooks, adapters, loop detector) stays EXACTLY as-is.

**CRITICAL**: `execute_python` special case STAYS in `_execute_tool_native()` — it returns dict, uses ThreadPoolExecutor, needs `ExecutePythonAdapter.classify_code()`. Do NOT move it to the dispatch dict.

### 4. Modify `services/annie-voice/server.py` (+20 lines)

Add `GET /v1/capabilities` endpoint with auth + optional group filter.

### 5. Tests (~180 lines)

- `tests/test_capability_manifest.py` — manifest structure, group filtering, tool count matches CLAUDE_TOOLS
- `tests/test_tool_dispatch.py` — register/dispatch, unknown tool ValueError, handler key sync with CLAUDE_TOOLS
- **Sync test**: `assert set(dispatch._HANDLERS.keys()) >= set(t["name"] for t in CLAUDE_TOOLS if t["name"] != "execute_python")`

## What Stays UNCHANGED (explicit list)

- `CLAUDE_TOOLS` list in text_llm.py (schemas)
- `OPENAI_TOOLS` list (format conversion)
- `_execute_tool_native()` (supervisor: hooks, adapters, loop detector)
- `_execute_tool_typed()` (error routing, ToolResult classification)
- `execute_python` special case in `_execute_tool_native()`
- `phone_tools.py` (self-contained, isolated)
- `browser_agent_tools.py` and `execute_browser_tool()` (called from dispatch handler)
- `tool_adapters.py` (centralized classification)
- `TOOLS.md` (hand-written prompt engineering)
- `_TOOL_INTENT` regex patterns
- All sensitivity gating logic (stays in callers)
- `prompt_builder.py`

## Adversarial Review Findings — ALL MUST BE ADDRESSED (NO DEFERRALS)

### Round 1 (16 issues — all resolved in plan design)

| # | Severity | Issue | Resolution |
|---|----------|-------|------------|
| C1 | CRITICAL | Handler signatures incompatible | Inline lambdas in register() calls, not ToolEntry.unpack |
| C2 | CRITICAL | annie-core can't be imported by annie-voice | Everything in annie-voice/ |
| C3 | CRITICAL | Sensitivity gating dropped | Sensitivity stays in callers (_execute_tool_native), not dispatch |
| H1 | HIGH | Phase 1 breaks tests | Only new files; no existing file touched until dispatch migration |
| H2 | HIGH | TOOLS.md auto-generation downgrade | TOOLS.md stays hand-written |
| H3 | HIGH | Eager imports break feature gates | No eager imports; dispatch handlers use lambdas with lazy imports |
| H4 | HIGH | Phone isolation broken | phone_tools.py unchanged |
| M1 | MEDIUM | Thread safety | dispatch dict populated at module load (import lock); no Lock needed |
| M2 | MEDIUM | Adapter audit harder | tool_adapters.py unchanged, stays centralized |
| M3 | MEDIUM | execute_python returns dict | execute_python stays in _execute_tool_native, NOT in dispatch dict |
| M4 | MEDIUM | paper_to_notebook orchestration | paper_to_notebook stays in _execute_tool_native, NOT in dispatch dict |
| M5 | MEDIUM | Format differences are caller concerns | Correct — callers keep _TOOL_INTENT, context_management, etc. |
| M6 | MEDIUM | Skills incompatible with ToolEntry | No ToolEntry — manifest reads SkillLoader directly |
| L1 | LOW | Default channels too broad | Channel metadata is in manifest only, not in dispatch |
| L2 | LOW | Over-engineering | Dramatically simplified: 2 files instead of full registry |
| L3 | LOW | Creature field coupling | No creature field — removed |

### Round 2 Architecture (9 issues — all addressed by simplification)

| # | Issue | Resolution |
|---|-------|------------|
| Flaw 1 | search_memory sensitivity_max not injected by dispatch | Sensitivity stays in caller; dispatch lambda reads _current_channel |
| Flaw 2 | execute_python can't fit unpack pattern | execute_python excluded from dispatch dict |
| Flaw 3 | tool_definitions.py god module | No tool_definitions.py — register() calls inline in text_llm.py |
| Flaw 4 | frozen dataclass vs runtime gating | No dataclass — plain dict |
| Missing 1 | _TOOL_INTENT not synced | Intent regexes unchanged; manifest includes tool names for cross-check |
| Missing 2 | Background capabilities not representable | Manifest includes ALL 11 groups including background |
| Missing 3 | asyncio.iscoroutine fragile | Use `inspect.isawaitable()` instead |
| Missing 4 | manifest() crashes on Callable serialization | No Callable in manifest — only metadata strings |
| Missing 5 | Secret scrubbing gap | Scrubbing unchanged — stays where it is |

### Round 2 Code Quality (15 issues — all addressed by simplification)

| # | Issue | Resolution |
|---|-------|------------|
| BUG-1 | execute_python unpack impossible | Excluded from dispatch dict |
| BUG-2 | _current_channel race survives | Lambdas read _current_channel at call time; race was pre-existing |
| BUG-3 | search_memory/get_entity_details sensitivity_max dropped | Dispatch lambdas explicitly pass `sensitivity_max=_sensitivity_for_channel(_current_channel)` |
| BUG-4 | sync/async handler detection fragile | Use `inspect.isawaitable()` not `asyncio.iscoroutine()` |
| SEC-1 | manifest exposes handler function names | Manifest has no handler references — only strings |
| SEC-2 | Runtime env var changes ignored | Same as current behavior; restart required |
| SEC-3 | unpack lambdas capture mutable refs | Lambdas read module globals at call time (late binding) |
| SEC-4 | channel defaults to "text" | dispatch() has no channel param — sensitivity stays in caller |
| MAINT-1 | 6 sources of truth remain | Manifest aggregates all sources into one endpoint; dispatch dict is ONE of the sources |
| MAINT-2 | tool_definitions.py untestable | No separate file — register() calls inline in text_llm.py |
| MAINT-3 | Channel enum N-way sync | No Channel enum — channel metadata only in manifest |
| MAINT-4 | Registry dispatch bypasses supervisor | Dispatch dict is called BY _dispatch_tool(), which is called BY _execute_tool_native() — supervisor intact |
| PERF-1 | tools_for_channel iterates + reads env per request | No per-request filtering — CLAUDE_TOOLS list unchanged, built once at import |
| PERF-2 | Import-time singleton breaks test fixtures | dispatch dict populated at module load; tests can patch _HANDLERS directly |
| PERF-3 | to_claude_format rebuilds list per call | CLAUDE_TOOLS list unchanged — zero allocation |

## Start Command

```
Implement the Annie Unified Capability Registry. Read docs/NEXT-SESSION-CAPABILITY-REGISTRY.md for full context, plan, and adversarial review findings. Also read ~/.claude/plans/dazzling-knitting-haven.md for architecture details.

IMPORTANT: ALL 38 adversarial review findings are listed in the doc. Every single one MUST be addressed in implementation — zero deferrals. The doc explicitly shows the resolution for each.

Implementation order:
1. tool_dispatch.py (thin dispatch dict, ~80 lines)
2. Modify text_llm.py (replace _dispatch_tool if-chain with register + dispatch)
3. capability_manifest.py (read-only manifest, ~120 lines)
4. Modify server.py (add /v1/capabilities endpoint)
5. Tests (dispatch sync test + manifest tests)
6. Deploy and verify: GET /v1/capabilities returns all 11 groups
```
