# Next Session: ToolSpec Migration — Titan Verification + Adversarial Fixes

## Context

Session 412 completed the ToolSpec migration locally: 31 tools migrated from hand-written CLAUDE_TOOLS dicts to frozen ToolSpec dataclass + Pydantic schemas. **2617 tests pass locally, 0 failures.** 88 tests skip on laptop (Titan-only deps). Two adversarial reviewers found 21 issues — all must be fixed (zero deferrals).

### Session 413 Update (no code changes)
- All 8 target files READ and verified to match plan expectations
- `services/annie-voice/conftest.py` does NOT exist — Fix 0f must CREATE it (not edit tests/conftest.py)
- Fix 0a: Rajesh wants a better approach — see updated fix below. **ASK before implementing.**
- Pipecat version not pinned in requirements.txt — check Titan `.venv/bin/pip show pipecat-ai` for Fix 0h

The ToolSpec migration code is DONE. This session is about:
1. Fixing 9 adversarial review findings (security, maintenance, code quality)
2. Deploying to Titan
3. Confirming 100% pass (0 skips, 0 failures)

## What's Already Done (DON'T redo)

- `tool_spec.py` — Frozen dataclass, `to_claude_schema()`/`to_openai_schema()`, 5 schema cleaning rules
- `tool_schemas.py` — 31 Pydantic models with `Field(description=...)`, `Literal` for enums
- `text_llm.py` — TOOL_SPECS list replaces 350-line CLAUDE_TOOLS + 25 `register()` calls
- `capability_manifest.py` — `_TOOL_GROUPS`/`_TOOL_CHANNELS` dicts deleted, ToolSpec lookup
- `tool_adapters.py` — 4 missing adapters added + reverse stale adapter test
- `tests/test_schema_golden.py` — 50 tests (0 skips), includes gated tool schemas via `to_claude_schema()`
- `tests/test_tool_spec.py` — 22 tests for ToolSpec dataclass
- All 2617 tests pass locally

## Phase 0: Fix Adversarial Review Findings (9 code fixes)

### Fix 0a: Remove hardcoded token (SEC-1) — CRITICAL SECURITY

**Rajesh's requirement:** Don't lose the token — keep prompting for it, find a better solution.

**Recommended approach:** Remove hardcoded value, keep `setdefault("", "")` so env var takes
precedence. The token lives in `.env` on Titan already. Scripts fail fast if empty.

```
Files: services/annie-voice/test_obs_pipeline.py:11
       services/annie-voice/test_observability_live.py:15

Change: os.environ.setdefault("CONTEXT_ENGINE_TOKEN", "bUFUGVI9n_vxO-mnWBF4lamWg9PjmE3Qs4I8NgmCDj0")
    To: os.environ.setdefault("CONTEXT_ENGINE_TOKEN", "")
    Add: Early check: if not os.environ.get("CONTEXT_ENGINE_TOKEN"):
             sys.exit("CONTEXT_ENGINE_TOKEN not set — run with: CONTEXT_ENGINE_TOKEN=... python <script>")

Alternative: Read from ~/.her-os-token file (same as dashboard uses). ASK RAJESH.
```

### Fix 0b: ROUTER_PASSWORD injection (SEC-2)

```
File: services/annie-voice/tests/test_router_monitor.py:23

Change: os.environ.setdefault("ROUTER_PASSWORD", "test_password")  (module-level)
    To: Use @pytest.fixture with mock.patch.dict("os.environ", {"ROUTER_PASSWORD": "test_password"})
```

### Fix 0c: SSH server validation (SEC-3)

```
File: services/annie-voice/router_monitor.py (~line 300)

Add: Validate server string matches hostname/IP pattern (reject strings starting with -)
     before passing to subprocess.create_subprocess_exec
```

### Fix 0d: Hard-coded count 31 → dynamic (MAINT-1)

```
File: services/annie-voice/tests/test_tool_spec.py (TestToolSpecsIntegrity.test_spec_count)

Change: assert len(TOOL_SPECS) == 31
    To: assert len(TOOL_SPECS) >= 28, f"Expected at least 28 specs, got {len(TOOL_SPECS)}"
```

### Fix 0e: Docker check guard behind aarch64 (BUG-2)

```
File: services/annie-voice/tests/test_code_tools.py:~line 44

Change: _has_sandbox = _check_docker_sandbox()
    To: import platform
        _has_sandbox = platform.machine() == "aarch64" and _check_docker_sandbox()
```

### Fix 0f: Add collect_ignore for diagnostic scripts (BUG-1)

```
File: services/annie-voice/conftest.py (root-level, NOT tests/conftest.py)

Add: collect_ignore = ["test_obs_pipeline.py", "test_observability_live.py"]
```

### Fix 0g: Env vars test — verify presence on Titan (MAINT-4)

```
File: services/annie-voice/tests/test_deployment_health.py

Update test_required_env_vars_documented: When running on Titan (aarch64),
also verify each required env var is actually set (not just list length check).
```

### Fix 0h: Conftest pipecat mock — add version comment (MAINT-3)

```
File: services/annie-voice/tests/conftest.py:36-74

Add: Comment documenting which pipecat version the mock list was built against.
```

### Fix 0i: BUG-4 reverse adapter check (ALREADY DONE)

Already committed in session 412 — `TestAdapterCoverage.test_no_stale_adapters` in `test_schema_golden.py`.

## Phase 1: Verify Skip Count Baseline

Before deploying, re-run locally to capture structured skip data:

```bash
cd services/annie-voice
python3 -m pytest tests/ -q --tb=no -rs 2>&1 | grep "SKIPPED" > /tmp/laptop_skips.txt
wc -l /tmp/laptop_skips.txt
```

This creates the baseline for Phase 5 (diff against Titan).

## Phase 2: Deploy to Titan

```bash
# Commit ALL fixes + ToolSpec migration
git add services/annie-voice/tool_spec.py services/annie-voice/tool_schemas.py \
      services/annie-voice/text_llm.py services/annie-voice/capability_manifest.py \
      services/annie-voice/tool_adapters.py services/annie-voice/router_monitor.py \
      services/annie-voice/test_obs_pipeline.py services/annie-voice/test_observability_live.py \
      services/annie-voice/tests/ services/annie-voice/conftest.py \
      docs/NEXT-SESSION-TOOLSPEC-MIGRATION.md docs/NEXT-SESSION-REGISTRY-COMPARISON.md
git commit -m "feat: atomic ToolSpec migration + adversarial review fixes (zero deferrals)"
git push

# Pull on Titan + clear pycache (CRITICAL: session 365 stale bytecache lesson)
ssh titan "cd ~/workplace/her/her-os && git pull && \
  find services/annie-voice -name '__pycache__' -exec rm -rf {} + 2>/dev/null; \
  echo 'pycache cleared'"
```

## Phase 3: Pre-flight Dependency Check (MUST use .venv/bin/python)

```bash
ssh titan "cd ~/workplace/her/her-os/services/annie-voice && .venv/bin/python << 'PREFLIGHT'
import sys, platform, shutil, subprocess, os
checks = []
arch = platform.machine()
checks.append(('aarch64', arch == 'aarch64', arch))
for lib in ('openai', 'uvicorn', 'matplotlib', 'pandas', 'seaborn'):
    try:
        mod = __import__(lib)
        checks.append((lib, True, getattr(mod, '__version__', 'ok')))
    except ImportError:
        checks.append((lib, False, f'MISSING'))
try:
    import readability; checks.append(('readability-lxml', True, 'ok'))
except ImportError:
    checks.append(('readability-lxml', False, 'MISSING'))
has_docker = bool(shutil.which('docker'))
has_sandbox = False
if has_docker:
    r = subprocess.run(['docker', 'image', 'inspect', 'python-sandbox:latest'], capture_output=True, timeout=5)
    has_sandbox = r.returncode == 0
checks.append(('docker+sandbox', has_docker and has_sandbox, 'ok' if has_sandbox else 'NEEDS BUILD'))
from pathlib import Path
soul = Path.home() / '.her-os' / 'annie' / 'SOUL.md'
checks.append(('workspace/SOUL.md', soul.exists(), 'ok' if soul.exists() else 'MISSING'))
for flag in ('TWF_ENABLED', 'CREMEITALIA_ENABLED', 'ROUTER_MONITOR_ENABLED', 'BROWSER_AGENT_ENABLED'):
    checks.append((f'env:{flag}', True, os.environ.get(flag, 'unset')))
all_ok = all(ok for _, ok, _ in checks)
for name, ok, detail in checks:
    print(f"  {'✓' if ok else '✗'}  {name}: {detail}")
sys.exit(0 if all_ok else 1)
PREFLIGHT"
```

## Phase 4: Run Full Suite on Titan

```bash
ssh titan "cd ~/workplace/her/her-os/services/annie-voice && \
  RUN_SERVER_TESTS=1 .venv/bin/python -m pytest tests/ -q --tb=short -rs 2>&1"
```

**Success:** `XXXX passed, 0 skipped, XX deselected`

## Phase 5: Diff Skip Reasons (laptop vs Titan)

```bash
ssh titan "cd ~/workplace/her/her-os/services/annie-voice && \
  .venv/bin/python -m pytest tests/ -q --tb=no -rs 2>&1 | grep SKIPPED" > /tmp/titan_skips.txt
diff /tmp/laptop_skips.txt /tmp/titan_skips.txt
# Expected: Titan file empty (all unskipped)
```

## Key Risk: Feature Flags on Titan

On Titan, `.env` may have `TWF_ENABLED=true`, `BROWSER_AGENT_ENABLED=true`, etc. This means CLAUDE_TOOLS will include more tools than on laptop. The golden schema tests handle this — gated tools are tested via `to_claude_schema()` directly, not via CLAUDE_TOOLS lookup. But be aware the total CLAUDE_TOOLS count will differ.

## Adversarial Review Plan

Full plan with all 21 findings: `~/.claude/plans/shimmering-painting-cerf.md`

## Start Command

```
Read docs/NEXT-SESSION-TOOLSPEC-TITAN-VERIFY.md for full context. This is an implementation + verification session. Execute Phase 0 fixes first (9 code changes from adversarial review), then deploy to Titan and run full test suite. The adversarial review plan is at ~/.claude/plans/shimmering-painting-cerf.md. Key constraint: ZERO deferrals — fix everything, 0 skips on Titan.
```
