# Next Session: Samantha Voice Personality — Phase B

## What

Make Annie *sound* like Samantha from "Her" at the voice level. Phase A (prompt personality) is deployed. Phase B adds: (1) Kokoro voice blending for warmer tone, (2) Samantha reference audio extraction from the movie, (3) IndicF5 voice cloning integration via Panda HTTP bridge. The plan has been through two rounds of adversarial review (26 findings, all CRITICAL/HIGH/MEDIUM addressed).

## Plan

`~/.claude/plans/eager-wibbling-engelbart.md`

**Read the plan first** — it has the full implementation, all 26 adversarial review findings, state machine diagram, pre-mortem failure analysis, and deployment checklists.

## Key Design Decisions (from adversarial review)

1. **SSH probe is a hard gate** — Step 1 (voice blending) CANNOT proceed until `KPipeline.load_voice()` or `.pt` files are verified on Titan. If neither exists, skip blending entirely and go straight to IndicF5 cloning.

2. **asyncio.Semaphore(1), NOT threading.Lock** — The IndicF5 server on Panda serializes GPU access via `asyncio.Semaphore(1)` in the async layer. `threading.Lock` would exhaust the thread pool under load.

3. **Streaming PCM, NOT base64 JSON** — IndicF5 server has a `/v1/tts/stream` endpoint that streams raw PCM chunks via HTTP chunked transfer. The Titan-side `IndicF5TTSService` yields frames as they arrive (same pattern as Kokoro's async queue bridge). This preserves streaming TTFB benefit.

4. **Barge-in cancels HTTP stream** — When the user interrupts, `_handle_interruption()` sets `cancel_event`, which breaks the `async for chunk in resp.aiter_bytes()` loop on Titan side. The HTTP stream close propagates to Panda, releasing the GPU semaphore promptly.

5. **Auth required on IndicF5 endpoint** — `X-Internal-Token` header (same pattern as Context Engine). Prevents LAN abuse of GPU.

6. **Fail-fast startup validation** — IndicF5 server validates voice-references directory, metadata.json, and all WAV files at startup. Missing refs = `RuntimeError` (not silent 500s during calls).

7. **Auto-fallback at bot startup** — If `TTS_BACKEND=indicf5` but Panda server is unreachable, bot.py automatically falls back to Kokoro and logs ERROR. No manual intervention needed for cold-start failures.

8. **Strict TTS_BACKEND validation** — Typos like `indicF5` (wrong case) raise `ValueError` at startup, not silent fallthrough.

9. **No SCP to Panda** — Voice reference files deployed via SSH pipe (`ssh panda "cat > ..." < file`), not SCP (session 395 incident).

10. **BF16 enforced at code level** — `torch.autocast("cuda", dtype=torch.bfloat16)` in indicf5_server.py with comment warning. FP16 breaks Vocos vocoder.

11. **phone_audio.py blackwell_patch bug fixed** — Pre-existing bug where `phone_audio.py:433` imports KPipeline without blackwell_patch. Fixed as prerequisite Step 0.

12. **RESOURCE-REGISTRY.md corrected** — Titan has 3 KPipeline instances (1.5 GB, not 0.5 GB). Panda section added with IndicF5 + all voice models.

## Files to Modify

1. `services/annie-voice/phone_audio.py` — Add `import blackwell_patch` before KPipeline import (Step 0, bug fix)
2. `services/annie-voice/kokoro_tts.py` — Add `_load_blended_voice()` method + `voice_blend` init param (~30 lines) (Step 1)
3. `services/annie-voice/bot.py` — Add `KOKORO_VOICE_BLEND`, `TTS_BACKEND`, `INDICF5_*` env vars + backend switch (~25 lines) (Steps 1+3)
4. `scripts/extract_voice_references.sh` — New file, ffmpeg extraction script (Step 2)
5. `services/annie-voice/indicf5_server.py` — New file, FastAPI TTS server for Panda (Step 3)
6. `services/annie-voice/indicf5_tts.py` — New file, Pipecat TTSService HTTP bridge (Step 3)
7. `start.sh` — Add `start_indicf5_server()` / `stop_indicf5_server()` to Panda section (Step 3)
8. `docs/RESOURCE-REGISTRY.md` — Add Panda section + correct Titan Kokoro count (Step 3)
9. `tests/test_voice_blend.py` — New file, 12 blend tests (Step 4)
10. `tests/test_indicf5_server.py` — New file, 12 server tests (Step 4)
11. `tests/test_indicf5_tts.py` — New file, 6 TTSService tests (Step 4)

## Start Command

```
cat ~/.claude/plans/eager-wibbling-engelbart.md
```

Then implement the plan. Begin with Step 0 (blackwell_patch bug fix), then the SSH probe (Step 1.0) — the probe result determines whether blending is feasible.

**All 26 adversarial findings are already addressed in the plan.**

## Verification

1. `cd services/annie-voice && python -m pytest tests/ -x -q` — all tests pass
2. SSH probe: verify KPipeline API on Titan
3. Voice blend: call Annie, compare warmth to baseline af_heart
4. Reference audio: listen to extracted clips for quality (no background music)
5. IndicF5: switch TTS_BACKEND=indicf5, call Annie, verify Samantha-like voice
6. Barge-in: interrupt Annie mid-sentence in IndicF5 mode — verify clean cancel
7. Security: unauthenticated POST to Panda:8771 → 401
8. Path traversal: POST with `../../etc/passwd` → 400
9. Startup validation: remove metadata.json, restart server → RuntimeError
10. Fallback: stop Panda server, restart bot → auto-fallback to Kokoro
11. RESOURCE-REGISTRY.md matches actual `nvidia-smi` on both machines