# IMPLEMENTED: Full Atomic ToolSpec Migration

## Context

Sessions 410-411 compared Annie's tool registry with open-multi-agent's ToolRegistry (`vendor/open-multi-agent/`). Found 4-place fragmentation: adding a tool requires edits to CLAUDE_TOOLS, `_register_tool()`, `_TOOL_GROUPS`/`_TOOL_CHANNELS`, and ADAPTERS. Inventory found 41 tools with metadata gaps (10 missing adapters, 16 missing groups, 15 missing channels).

Two adversarial reviewers found **27 issues** with the proposed ToolSpec design. All are resolved in this plan. Rajesh chose **Option B: Full Atomic Migration** — migrate ALL 28 text-path tools to ToolSpec in one change.

## Comparison Verdicts (decided)

| # | Question | Verdict |
|---|----------|---------|
| 1 | Adopt `defineTool()` pattern? | **ADAPT** → ToolSpec frozen dataclass |
| 2 | Runtime add/remove? | **SKIP** — single-agent, no use case |
| 3 | Strict duplicate-throws? | **SKIP** — hot-reload safety |
| 4 | Dict → class? | **SKIP** — adds no value (MAINT-4), breaks reload tests (FLAW-B4) |
| 5 | ToolExecutor separation? | **SKIP** — Annie's separation is richer |

## Critical Adversarial Findings (must address)

| ID | Finding | Resolution |
|----|---------|------------|
| **BUG-2** | `model_json_schema()` drops per-property `description` without `Field(description=...)` | MUST use `Field(description="...")` for EVERY property in EVERY Pydantic model |
| **BUG-1** | `Optional[str]` produces `anyOf` instead of `{type: string}` | `to_claude_schema()` strips `anyOf` — unwraps to first non-null type |
| **BUG-3** | `$defs` silently dropped for nested models | `to_claude_schema()` includes `$defs` if present |
| **FLAW-D3** | Partial migration (8/28) creates permanent dual-pattern | Migrate ALL 28 atomically |
| **FLAW-A4** | `to_claude_schema()` ValueError for no `input_model` | All ToolSpecs MUST have `input_model` (required field, not Optional) |
| **FLAW-C1** | `model_json_schema()` leaks `title` field metadata | `to_claude_schema()` strips `title` from all properties |
| **FLAW-B1** | Feature-gated tools: handler registered, schema conditional | ToolSpec gets `gated: bool` field |
| **FLAW-B2** | `phone_tools.py` has 5 duplicate schemas | Document as known debt (separate PR) |
| **GAP-1** | No schema comparison test | Write golden tests BEFORE migration |

Full adversarial review (27 findings with dispositions): `~/.claude/plans/typed-juggling-whale.md`

## Implementation Phases

### Phase 1: Golden Schema Tests FIRST (~30 min)

Write `services/annie-voice/tests/test_schema_golden.py` — one test per tool capturing the EXACT current schema. These tests MUST pass before AND after migration.

```python
def test_web_search_schema():
    tool = next(t for t in CLAUDE_TOOLS if t["name"] == "web_search")
    assert tool["input_schema"]["properties"]["query"] == {"type": "string", "description": "The search query"}
    assert tool["input_schema"]["required"] == ["query"]
```

28 tests total. Run them, verify all pass against current hand-written schemas.

### Phase 2: Create ToolSpec + Pydantic Models (~1.5 hours)

**NEW `services/annie-voice/tool_spec.py`** (~80 lines):
- Frozen dataclass: `name`, `description`, `handler`, `input_model` (REQUIRED), `group`, `channels` (tuple), `gated` (bool)
- `to_claude_schema()`: calls `model_json_schema()`, strips `title`, unwraps `anyOf`, preserves `$defs`
- `to_openai_schema()`: wraps Claude schema in OpenAI format
- `tool_spec.py` imports ONLY from Pydantic and stdlib — no annie-voice imports (prevents import cycles)

**NEW `services/annie-voice/tool_schemas.py`** (~350 lines):
- ALL 28 Pydantic BaseModel subclasses with `Field(description="...")` for EVERY property
- Copy descriptions EXACTLY from hand-written CLAUDE_TOOLS (character-for-character match)
- Add `@field_validator(mode="before")` coercers where current handlers do type coercion (e.g., `hours_back` in search_memory)
- Do NOT use `Optional[T]` — use `T` with `Field(default=...)` instead

Key models:
```python
class WebSearchInput(BaseModel):
    query: str = Field(description="The search query")

class SearchMemoryInput(BaseModel):
    query: str = Field(description="What to search for in past conversations")
    hours_back: int = Field(default=168, description="How many hours back to search (default: 168 = 7 days)")

class SaveNoteInput(BaseModel):
    category: str = Field(description="Category for the note")
    content: str = Field(description="The note content to save")

class ThinkInput(BaseModel):
    thought: str = Field(description="Your internal reasoning")

# ... 24 more — read CLAUDE_TOOLS in text_llm.py lines 209-560 for ALL schemas
```

### Phase 3: Migrate text_llm.py (~30 min)

Replace the 300-line CLAUDE_TOOLS dict list + 25 `_register_tool()` calls with:

```python
TOOL_SPECS: list[ToolSpec] = [
    ToolSpec(name="web_search", description="...", handler=lambda args, msg: search_web(args.get("query", "")),
             input_model=WebSearchInput, group="web", channels=("text", "telegram", "phone", "voice")),
    # ... all 28 tools
    ToolSpec(name="order_flour", ..., gated=True),  # Only in CLAUDE_TOOLS when TWF_ENABLED
]

# Derive CLAUDE_TOOLS (backward compatible)
CLAUDE_TOOLS = [s.to_claude_schema() for s in TOOL_SPECS if not s.gated]
if BROWSER_AGENT_ENABLED:
    CLAUDE_TOOLS.extend(s.to_claude_schema() for s in TOOL_SPECS if s.group == "browser")
if TWF_ENABLED:
    CLAUDE_TOOLS.extend(s.to_claude_schema() for s in TOOL_SPECS if s.name == "order_flour")
if CREMEITALIA_ENABLED:
    CLAUDE_TOOLS.extend(s.to_claude_schema() for s in TOOL_SPECS if s.name == "order_from_cremeitalia")
if ROUTER_MONITOR_ENABLED:
    CLAUDE_TOOLS.extend(s.to_claude_schema() for s in TOOL_SPECS if s.name == "network_status")

# Register ALL handlers (gated or not)
for _spec in TOOL_SPECS:
    _register_tool(_spec.name, _spec.handler)

# OPENAI_TOOLS derived as before (unchanged pattern)
OPENAI_TOOLS = [{"type": "function", "function": {"name": t["name"], "description": t["description"], "parameters": t["input_schema"]}} for t in CLAUDE_TOOLS]
```

**Keep** `_dispatch_*` handler functions (they don't change). **Keep** `tool_dispatch.py` as-is (55 lines, no class refactor).

### Phase 4: Update capability_manifest.py (~15 min)

Replace `_TOOL_GROUPS` and `_TOOL_CHANNELS` enrichment dicts with ToolSpec lookup:

```python
def _get_tool_manifest() -> list[dict]:
    from text_llm import CLAUDE_TOOLS, TOOL_SPECS
    spec_lookup = {s.name: s for s in TOOL_SPECS}
    # ... derive group/channels from spec_lookup
```

**DELETE**: `_TOOL_GROUPS` (lines 29-72), `_TOOL_CHANNELS` (lines 74-109). Both replaced by ToolSpec fields.

### Phase 5: Fix Metadata Gaps + Update Tests (~30 min)

- **tool_adapters.py**: Add base `ToolAdapter()` for `order_from_cremeitalia`, `network_status`, `task_status`, `cancel_task`
- **test_tool_dispatch.py**: Derive `known_names` from `TOOL_SPECS` instead of hardcoded sets
- **test_capability_manifest.py**: Verify all specs have valid groups/channels
- **NEW test_tool_spec.py**: Construction, frozen, `to_claude_schema()`, `to_openai_schema()`, gated behavior, no duplicate names
- **Verify**: Golden schema tests (Phase 1) still pass

### Phase 6: Write Comparison Document (~10 min)

Update `docs/NEXT-SESSION-REGISTRY-COMPARISON.md` with:
- Verdict table (Q1-Q5) with reasoning
- Patterns Annie has that open-multi-agent lacks
- Adversarial findings summary
- Mark as COMPLETE

### Phase 7: Final Verification

```bash
cd services/annie-voice
python -m pytest tests/ -q                    # All 2543+ tests pass
python -m pytest tests/test_schema_golden.py -v  # Golden schemas match exactly
python -c "from text_llm import CLAUDE_TOOLS; print(len(CLAUDE_TOOLS))"  # Same count
grep -c "_TOOL_GROUPS\|_TOOL_CHANNELS" capability_manifest.py  # 0 (fully removed)
```

## Key Files

| File | Action | Lines |
|------|--------|-------|
| `services/annie-voice/tool_spec.py` | CREATE | ~80 |
| `services/annie-voice/tool_schemas.py` | CREATE | ~350 |
| `services/annie-voice/tests/test_schema_golden.py` | CREATE | ~120 |
| `services/annie-voice/tests/test_tool_spec.py` | CREATE | ~80 |
| `services/annie-voice/text_llm.py` | MODIFY (replace 300-line CLAUDE_TOOLS + registrations) | ~-200 net |
| `services/annie-voice/capability_manifest.py` | MODIFY (delete enrichment dicts) | ~-80 net |
| `services/annie-voice/tool_adapters.py` | MODIFY (add 4 missing adapters) | ~+8 |
| `services/annie-voice/tests/test_tool_dispatch.py` | MODIFY (derive known_names from specs) | ~-10 net |
| `services/annie-voice/tests/test_capability_manifest.py` | MODIFY | ~+10 |
| `docs/NEXT-SESSION-REGISTRY-COMPARISON.md` | MODIFY (add verdicts, mark complete) | rewrite |

## Anti-Pattern Guards

- Do NOT use `Optional[T]` in Pydantic models — produces `anyOf` that breaks schemas
- Do NOT skip `Field(description="...")` — LLM loses parameter guidance
- Do NOT migrate partially (8/28) — either all or none
- Do NOT refactor `tool_dispatch.py` to a class — breaks reload-based tests
- Do NOT include `title` from Pydantic in output schemas — strip it
- Do NOT add validation at dispatch time yet — follow-up after migration verified
- Do NOT touch voice-only tools (enroll_voice, meditate, schedule_task) — separate Pipecat path
- Do NOT touch phone_tools.py duplicate schemas — separate PR (documented known debt)

## Start Command

```
Read docs/NEXT-SESSION-TOOLSPEC-MIGRATION.md for full context. This is an implementation session — execute all 7 phases in order. Start with Phase 1 (golden schema tests). The adversarial review plan is at ~/.claude/plans/typed-juggling-whale.md for reference on the 27 findings. Key constraint: every golden schema test must pass before AND after migration — zero tolerance for schema drift.
```