# Next Session 118 — WebRTC TTS swap (Kokoro → Chatterbox) + Speaker Gate completion

**Supersedes:** earlier A/B/C decision tree (session 117 executed Option A — browser WebRTC verification — and produced partial live data).

## User decision (session 117, 2026-04-16)

> "I think we need to switch to Chatterbox for WebRTC also, this should be our next session."

**Why:** Annie currently has two different voices on two surfaces — Kokoro blend (`af_heart:0.6 + af_bella:0.4`) on browser WebRTC vs Chatterbox-Samantha clone on phone. Voice-identity inconsistency is a UX regression against the product's "ambient Annie" vision.

## Scope

### Primary — ship Chatterbox into Annie WebRTC

**File to modify:** `services/annie-voice/bot.py:1225` (ChatterboxTTSService already imported; just not selected).

**Architectural branch — decide at start of session:**

1. **Route browser TTS to Panda's Chatterbox (`:8772`)** — single source of truth, voice bank lives in one place. Cost: ~78 ms WiFi RTT hop per synth (measured session 111). Per-sentence, not per-token, so budget fits.
2. **Run Chatterbox on Titan (in-process or container)** — zero LAN hop. But session 111 A/B produced a "BAD" user verdict on a real phone call via Titan-Chatterbox shim (resemblyzer cosine was 0.92 offline, real-call was bad — root cause unidentified). Re-attempting needs a different approach: either debug the session-111 failure, or try Chatterbox Turbo (ResembleAI/chatterbox-turbo, 1-step decoder, ~2.85 GB VRAM, purpose-built for low-latency voice agents — NOT validated).
3. **Hybrid: Kokoro fallback** — keep Kokoro as TTS_BACKEND=auto fallback if Chatterbox unhealthy (mirror the phone_loop.py `PHONE_TTS_BACKEND=auto` pattern). Protects against the `muted-not-crashed` failure mode (session 103 MEMORY block).

**Recommendation (not decided):** Option 1 (route to Panda Chatterbox). Fastest to ship. Consistent voice identity. Session 111 already showed Panda Chatterbox is stable. Session 118 focus is just the bot.py wiring + testing — not re-litigating the Titan-Chatterbox stability question.

### Secondary — finish Speaker Gate verification (deferred from 117)

**Test 2 (intruder-reject) never ran.** Plan §7 protocol: user plays podcast/TV audio ~60s while silent; gate should reject with `sim<0.38`. Required for §8.1 retune pre-condition + §12 session-close verdict.

**Test 1 data analysis (from session 117):**
- 12 gate decisions captured in `/tmp/annie-voice.log` since LOG_OFFSET_START=2118624.
- 10 accepts: 1 with sim=0.000 (first utterance "Hey, how are you?" — too_short short-circuit, not enrollment-loss), 9 with sim>0.38 (median 0.542, p25≈0.425, p75≈0.60, min 0.405, max 0.698).
- 2 rejects at sim=0.305 ("By the audio.") and sim=0.329 ("I understand"). **Classification (confirmed end of session 117):** "By the audio." was Rajesh's real utterance — a **CONFIRMED FALSE-REJECT**. "I understand" remains probable-echo (Annie-TTS bleed into mic) but not confirmed. ~10% false-reject rate on short (3-word) utterances attributable to ECAPA-TDNN embedding variance on brief audio, not a threshold calibration issue alone.

**Verdict so far (tentative pending Test 2):** Gate works at threshold 0.38 for WebRTC *for typical utterances* (9/10 real accepts correctly classified, median 0.542). However, **short utterances produce unreliable embeddings** → ~10% FRR observed at the 0.38 threshold. Dropping threshold won't cleanly fix this without risking intruder false-accepts (need Test 2 data to know the separation). Knife-edge on min=0.405 (only 0.025 above threshold). Options for session 118:
(a) Run Test 2 + retune based on full distribution.
(b) Accept the FRR and add a min-utterance-length gate upstream (e.g., require ≥3s audio before running embedding; below that, fail-open rather than run unreliable embedding).
(c) Swap embedding model — 28-day-old `project_speaker_gate_tuning.md` noted WeSpeaker variance; ECAPA-TDNN (current) may have similar short-utterance instability. Research TitaNet or pyannote-embedding as alternatives.

### Tertiary — phone gate integration (original B option)

Still deferred. 9-day-old `project_phone_speaker_verification_gap.md` (son social-engineering incident) still open. Not blocked by session 118, just not prioritized above voice-identity consistency.

## Load-bearing findings from session 117 (don't re-discover)

1. **HTTPS required for WebRTC UI.** `http://192.168.68.52:7860` triggers `WebRTC not supported or suppressed` (Chrome blocks `getUserMedia` on non-localhost HTTP). Use `https://voice.her-os.app` (Titan systemd cloudflared, `/etc/cloudflared/config.yml`).
2. **Nemotron Nano label purged session 117** (commit `4aa1a74`). UI now shows "Gemma 4 26B". No stale references in user-visible surfaces.
3. **Baseline preserved for Test 2:** `LOG_OFFSET_START=2118624`. Do NOT restart annie-voice (would truncate `/tmp/annie-voice.log` via start.sh:537 `>` redirect) — or if restart is unavoidable, update the baseline value in this doc.
4. **Gemma 4 31B is a no-go for voice.** 6.9 tok/s (7.3× slower than current Gemma 4 26B NVFP4 at 50.4 tok/s). Evidence: `docs/RESEARCH-GEMMA4-BENCHMARK.md:308-349`. Beast-vs-Titan is a non-factor (identical DGX Spark GB202 hardware). If voice latency ever bites, the move is *smaller* (Gemma 4 E4B) not bigger.

## Banned actions

- `git pull --rebase`, `git reset --hard`, `git push --force`
- SSH to hostname `panda` — use `192.168.68.57`
- Family-member intruder source for §7 testing
- Threshold below 0.30
- Running `./start.sh` / `./stop.sh` from inside an SSH session (laptop-local only)
- Restarting annie-voice without re-baselining LOG_OFFSET_START
- Voice upgrades via `--load-format fastsafetensors` (not in the official image — session 117 research)

## Related

- `project_speaker_gate_tuning.md` (topic memory, session 117 update)
- `project_phone_speaker_verification_gap.md` (9d old, still accurate)
- Session 111 MEMORY block — Titan-Chatterbox A/B lessons (BAD user verdict on real call)
- Session 103 MEMORY — Chatterbox `muted-not-crashed` failure mode
- `services/annie-voice/chatterbox_tts.py:159-161` — Chatterbox HTTP client failure handling
- `services/annie-voice/bot.py:1225` — ChatterboxTTSService import (already present, just unused in WebRTC path)