# COMPLETE: Annie's Registry vs open-multi-agent ToolRegistry

**Status:** IMPLEMENTED (Session 412, 2026-04-04)
**Implementation:** ToolSpec migration complete. All 31 tools migrated atomically.

## Context

Session 410 implemented Annie's capability registry (dispatch dict + manifest). Rajesh asked: "How would Claude Code do it? Would it have a registry?" and pointed to `vendor/open-multi-agent/`.

Session 411 completed the full comparison: 3 exploration agents mapped both architectures, a Plan agent designed recommendations, and 2 adversarial reviewers stress-tested the proposed ToolSpec implementation (27 findings, all resolved).

## Architecture Summary

### Annie (Python) — 3 independent tool paths
- **Voice** (`bot.py`): Pipecat `FunctionSchema` → `llm.register_function()` → direct async handlers
- **Text** (`text_llm.py`): `CLAUDE_TOOLS` list (hand-written JSON Schema) → `tool_dispatch.register()` → O(1) dict dispatch → `tool_adapters.classify_tool_result()`
- **Manifest** (`capability_manifest.py`): Read-only JSON with enrichment tables → `GET /v1/capabilities`

### open-multi-agent (TypeScript) — 1 path
- `defineTool()` with Zod → `ToolRegistry` (Map) → `ToolExecutor` (Zod validation + semaphore) → `AgentRunner` loop with `allowedTools` filter

**Key structural difference:** Annie is a single-agent system with 3 output formats (Claude API, OpenAI API, Pipecat). open-multi-agent is a multi-agent orchestration framework where each agent gets its own fresh registry.

## Comparison Table

| Aspect | Annie (Python) | open-multi-agent (TypeScript) | Verdict |
|--------|---------------|-------------------------------|---------|
| Tool definition | `CLAUDE_TOOLS` list (raw dicts) | `defineTool()` with Zod schema | **ADAPT** → ToolSpec |
| Registry | Flat dispatch dict (`_HANDLERS`) | `ToolRegistry` class (Map) | **SKIP** — module-level dict works |
| Dispatch | `tool_dispatch.dispatch()` | `ToolExecutor.execute()` | **SKIP** — Annie's is simpler, sufficient |
| Input validation | None (trusts LLM, `.get()` defaults) | Zod `safeParse` before execute | Future: Pydantic `model_validate()` |
| Concurrency | Sequential per round (asyncio) | Semaphore-bounded parallel (4) | **SKIP** — Annie uses asyncio.Semaphore where needed |
| Dynamic add/remove | Not supported | `addTool()` / `removeTool()` | **SKIP** — single agent, no use case |
| Duplicate handling | Silent overwrite (idempotent) | Throws Error (strict) | **SKIP** — hot-reload safety |
| Error handling | 5-state ToolResult + per-tool adapters | Binary `{data, isError}` | **Annie wins** |
| Schema format | Hand-written JSON Schema | Auto-generated from Zod | **ADAPT** → Pydantic `model_json_schema()` |
| Per-agent filtering | Channel-based sensitivity | `allowedTools` whitelist | Different needs, both valid |
| Loop detection | LoopDetector (repeat, no-progress, freq) | `maxTurns` blunt cap only | **Annie wins** |
| Error recovery | ErrorRouter (fallback chains per type) | Returns error to LLM, hopes | **Annie wins** |
| Hook system | HookRegistry (before/after, blocking) | `onToolCall` observational only | **Annie wins** |

## Verdicts

### Q1: Should Annie adopt `defineTool()`? — ADAPT

**Create `ToolSpec` frozen dataclass** that co-locates name, description, handler, Pydantic input model, group, and channels. Eliminates the 4-place fragmentation (CLAUDE_TOOLS + `_register_tool()` + `_TOOL_GROUPS`/`_TOOL_CHANNELS` + ADAPTERS). Use Pydantic `model_json_schema()` with `Field(description="...")` to auto-generate schemas.

**Critical constraints** (from adversarial review):
- MUST use `Field(description="...")` for every property — Pydantic drops descriptions otherwise (BUG-2)
- MUST strip `title` from properties — Pydantic adds it, LLMs don't need it (FLAW-C1)
- MUST unwrap `anyOf` for optional types — Pydantic emits `anyOf: [{type: T}, {type: null}]` instead of `{type: T}` (BUG-1)
- MUST migrate ALL 28 tools atomically — partial migration (8/28) creates permanent dual-pattern worse than current state (FLAW-D3)

### Q2: Should Annie adopt runtime add/remove? — SKIP

Single-agent system. Feature gating at import time (`BROWSER_AGENT_ENABLED`, `TWF_ENABLED`) is simpler and avoids thread-safety concerns during streaming responses. Agents are already lazy-loaded (session 410).

### Q3: Should Annie adopt strict duplicate-throws? — SKIP

Silent overwrite was deliberately chosen for uvicorn `--reload` hot-reload safety. The sync tests (`test_all_claude_tools_registered` + `test_no_extra_handlers`) already catch the bugs that strict registration would catch.

### Q4: Should dispatch dict become a class? — SKIP

Originally planned as ADAPT, **revised to SKIP** after adversarial review:
- MAINT-4: Class wrapping a 55-line module that immediately creates `_default` singleton adds indirection without real value.
- FLAW-B4: `importlib.reload(text_llm)` in TWF tests won't clear singleton state — ghost registrations persist.
- FLAW-B3: Tests import `_HANDLERS` directly — class refactor breaks these imports.

### Q5: Is the ToolExecutor separation valuable? — SKIP

Annie already has a RICHER separation than open-multi-agent's ToolExecutor:
- `tool_dispatch.py` = what exists + routing (like ToolRegistry)
- `_execute_tool_native()` = execution + error classification (like ToolExecutor + more)
- `_execute_tool_typed()` = supervisor orchestration (LoopDetector, ErrorRouter, HookRegistry) — **not in open-multi-agent at all**
- `tool_adapters.py` = per-tool error semantics with 5-state ToolResult — **not in open-multi-agent at all**

## Patterns Annie Has That open-multi-agent Lacks

| Pattern | Annie | open-multi-agent |
|---------|-------|-------------------|
| **ToolResult richness** | 5 states: SUCCESS, PARTIAL, ERROR_TRANSIENT, ERROR_PERMANENT, ERROR_BLOCKED + error_type + alternatives | 2 states: `{data, isError}` |
| **Per-tool error classification** | 8 adapter classes in `tool_adapters.py` — each tool's error semantics encoded separately | All errors treated identically |
| **LoopDetector** | 3 detection patterns: repeat, no-progress, frequency | Only `maxTurns` blunt cap |
| **ErrorRouter** | Static fallback chains per error type (e.g., http_403 → try_alt_url → use_snippet → report) | Error returned to LLM with no guidance |
| **HookRegistry with blocking** | Before-tool hooks can reject tool calls (SSRF blocking, audit) | `onToolCall` is observational only, cannot block |
| **Channel sensitivity gating** | Note tools check channel sensitivity before executing | No multi-channel concept |

## Adversarial Review Summary

27 findings from 2 adversarial reviewers. Key categories:
- **7 showstoppers** that changed the plan (BUG-1, BUG-2, BUG-3, FLAW-A4, FLAW-B2, FLAW-B3, FLAW-D3)
- **11 accepted** and incorporated into implementation plan
- **7 rejected** with reasoning (decorator alternative, TypedDict, ToolDispatch class)
- **2 informational** (pre-existing issues, not regressions)

Full adversarial review with dispositions: `~/.claude/plans/typed-juggling-whale.md`

## Tool Inventory (41 tools)

| Category | Count | Registration |
|----------|-------|-------------|
| Text-path tools (TOOL_SPECS → CLAUDE_TOOLS) | 31 | ToolSpec + auto-derived CLAUDE_TOOLS |
| Voice-only (Pipecat) | 9 | llm.register_function() |
| Visual (render_*) | 3 | llm.register_function() after PipelineTask |
| MCP browser | 2 | Conditional on BROWSER_MCP_ENABLED |
| **Feature-gated** | 15 | 5 gates: BROWSER_AGENT, TWF, CREMEITALIA, ROUTER_MONITOR, BROWSER_MCP |

**Metadata gaps found:** 10 tools missing adapters, 16 missing groups, 15 missing channels. Fixed by ToolSpec migration (all metadata becomes required fields).

## What Was Read

### open-multi-agent (all fully read)
- `src/tool/framework.ts` — ToolRegistry + defineTool + zodToJsonSchema
- `src/tool/executor.ts` — ToolExecutor with Zod validation + semaphore
- `src/tool/built-in/index.ts` — Built-in tool registration pattern
- `src/agent/agent.ts` — Agent with addTool/removeTool
- `src/agent/runner.ts` — Core dispatch loop with allowedTools filtering
- `src/agent/pool.ts` — AgentPool with semaphore-controlled concurrency
- `src/team/team.ts` — Team coordination (no registry ownership)
- `src/orchestrator/orchestrator.ts` — Multi-agent orchestration + pool building
- `src/types.ts` — Shared type definitions

### Annie (all fully read)
- `tool_dispatch.py` — Dispatch dict (55 lines)
- `text_llm.py` — CLAUDE_TOOLS + registration + dispatch
- `capability_manifest.py` — Enrichment tables + manifest builder
- `tool_adapters.py` — Per-tool error classification
- `bot.py` — Voice tool schemas + Pipecat registration
- `tools.py` — Tool implementations
- `visual_tools.py` — Visual output tools with Pydantic models
- `server.py` — FastAPI server + /v1/capabilities endpoint
- `agent_scheduler.py` — Lazy-load sources + dynamic registration

## Implementation Status

**COMPLETE** (Session 412). All 31 text-path tools migrated to ToolSpec.

Files created:
- `services/annie-voice/tool_spec.py` — Frozen dataclass + `to_claude_schema()`/`to_openai_schema()`
- `services/annie-voice/tool_schemas.py` — 31 Pydantic input models with `Field(description=...)`
- `services/annie-voice/tests/test_schema_golden.py` — Golden schema tests (regression safety net)
- `services/annie-voice/tests/test_tool_spec.py` — ToolSpec unit tests

Files modified:
- `services/annie-voice/text_llm.py` — `TOOL_SPECS` list replaces hand-written `CLAUDE_TOOLS` dicts + registration calls
- `services/annie-voice/capability_manifest.py` — `_TOOL_GROUPS`/`_TOOL_CHANNELS` dicts deleted, replaced by ToolSpec lookup
- `services/annie-voice/tool_adapters.py` — 4 missing adapters added (order_from_cremeitalia, network_status, task_status, cancel_task)
- `services/annie-voice/tests/test_tool_dispatch.py` — `known_names` derived from `TOOL_SPECS`
- `services/annie-voice/tests/test_capability_manifest.py` — `_TOOL_GROUPS`/`_TOOL_CHANNELS` refs replaced