# Next Session — Speaker Gate v2 live verification (session 117) — V2 post-adversarial review

**Supersedes:** original V1 handoff drafted pre-review (this file previously held V1).
**Plan:** `/home/rajesh/.claude/plans/reactive-munching-backus.md` — read end-to-end; every command below is tied to a plan §.

## What
Confirm Speaker Gate v2 actually accepts Rajesh and rejects everyone else on live phone calls on Titan. PR #7 fixed the 400-Empty-audio bug that had been masking the gate as fail-open; this session is the first live verification with working guards.

## Why this session's plan is stronger than V1
V1 handoff doc had three latent bugs discovered by adversarial review — any of them would have produced a false "gate works" verdict:
1. **Log format mismatch** — V1 used `[SPEAKER-GATE]` in grep patterns but the real loguru log emits `Speaker gate:` (no brackets). Every grep would have returned empty, operator would have declared test failed.
2. **`sim=0.000` ambiguity** — three distinct paths produce `sim=0.000` in the log (enrolled=false, too_short short-circuit, genuine zero similarity) with different meanings. V1 collapsed them.
3. **Schema guessing** — V1 asked for field `bank_size` / `num_embeddings`. Real field is `total_styles`.

All fixed in the plan file.

## Key Design Decisions (from adversarial review — do NOT revert these)

1. **Pre-flight gate 5.2 checks `warmup_done=true` directly**, not `status in (ok, healthy)`. Real status values are `ok`, `warming_up`, `loading`; "healthy" doesn't exist.
2. **Pre-flight gate 5.3 asserts no `samantha*` style in bank.** Samantha WAVs at `services/audio-pipeline/voice-references/` would silently mask a broken enrollment as a passing check.
3. **Every test window snapshots `docker inspect her-os-audio StartedAt`** before AND after. Mid-test container restart = test INVALID, not pass.
4. **10 utterances per test** (not 5). Retune §8 needs ≥10 nonzero samples; collecting them once avoids re-running on fail.
5. **Percentile computation uses p25/p75, not p10/p90.** At N=10, p10/p90 degenerate to min/max and are not real percentiles.
6. **Threshold retune clamp `[0.30, 0.50]`** is enforced by plan-author discipline (no CI guard yet). Anything <0.30 effectively disables the gate.
7. **§8.4 env-propagation verify uses `ps/awk` with `comm=python3` filter**, NOT `pgrep -fn "server.py"` — MEMORY session-111 gotcha applies (bash wrapper PID ≠ python3 child PID).
8. **§7.4 cross-references TWO log sources** (annie-voice.log for gate decisions + docker logs for endpoint requests). Zero decisions with zero requests = phone didn't hear intruder (INVALID). Zero decisions with >0 requests = logger broken.
9. **Enrollment poisoning guards** (§5.4): quiet room mandatory, RMS sanity on WAV, dry-run inspection before overwriting style=phone.
10. **Alternative 1 adopted** as supplementary offline sweep (§6.0) before live call. Not a replacement — BT HFP codec variance only testable live.
11. **Alternative 2 deferred** — JSONL gate-event instrumentation is a concrete Session 118 follow-up item, not this session.

## Files that might change

| File | Role | Change (only if retune triggers) |
|------|------|----------------------------------|
| `start.sh` line 519 | `SPEAKER_GATE_THRESHOLD='0.38'` | Scalar update to new value within [0.30, 0.50] |

Everything else is READ-ONLY for this session.

## Start Command

```
Read /home/rajesh/.claude/plans/reactive-munching-backus.md end-to-end.
Run §5 pre-flight hard gates (5.1 → 5.7) in order; STOP on first failure.
Run §6.0 offline sweep (optional but recommended).
Run §6 Test 1 (self-accept, 10 utterances). Verify §6.3 criteria (a)–(g).
Run §7 Test 2 (intruder-reject, 60 s podcast audio). Verify §7.3 criteria (a)–(f).
If either test fails AND ≥10 nonzero samples exist on both sides: branch to §8 retune.
If retune: open PR via §8.4 ship path, deploy via laptop-local stop/start, re-run §6+§7.
On pass: update MEMORY session-117 block (§11); annotate project_speaker_gate_tuning.md with phone-BT-HFP verdict.
Opt-in §10 secondary cleanup if time permits.
```

## Verification (end-of-session checklist)
- [ ] All §5 pre-flight gates passed with timestamps recorded
- [ ] Test 1: ≥10 `Speaker gate: accepted` lines, all with `sim > 0.000`, median ≥ 0.45 (or retune fired)
- [ ] Test 2: ≥10 `Speaker gate: rejected` lines with `sim > 0.000`, median ≤ 0.30 (or retune fired)
- [ ] No `sim=0.000` lines in accepted windows (would indicate enrollment loss)
- [ ] Container didn't restart during either test (StartedAt unchanged)
- [ ] Zero LLM-response events during Test 2 intruder window
- [ ] If retune shipped: PR merged + `./stop.sh annie && ./start.sh annie` from laptop + env verified via ps/awk selector + Test 1/2 re-run
- [ ] MEMORY session-117 block prepended above 116 with verdict + distribution stats
- [ ] `project_speaker_gate_tuning.md` updated (session-117 addendum supersedes 27-day-old DISABLED note)
- [ ] Follow-up session 118 file created if deferred work exists

## Banned actions (unchanged from V1)
- `git pull --rebase`, `git reset --hard`, `git push --force`
- SSH to hostname `panda` — use `192.168.68.57`
- Family-member intruder source for Test 2
- Threshold below 0.30
- Running `./start.sh` / `./stop.sh` FROM inside a Titan SSH session (laptop-local only)

## Diagnostic command cheatsheet (corrected grep patterns)

```bash
# Live tail
ssh titan 'tail -f /tmp/annie-voice.log | grep --line-buffered -iE "(Speaker gate:|stale transcription|verify-speaker)"'

# Count accepts vs rejects in a window
ssh titan 'grep -E "Speaker gate: (accepted|rejected)" /tmp/annie-voice.log | tail -30 | awk "{print \$NF}" | sort | uniq -c'

# Enrollment state
ssh titan 'curl -sf http://localhost:9100/v1/enrollment'

# Similarity histogram
ssh titan 'grep -oE "sim=[0-9.]+" /tmp/annie-voice.log | tail -50 | sort | uniq -c'

# Endpoint health + warmup status
ssh titan 'curl -sf http://localhost:9100/health'
```

## Related
- **Source plan:** `/home/rajesh/.claude/plans/reactive-munching-backus.md` (adversarial-reviewed, 9 HIGH + 6 MEDIUM findings implemented)
- **Session 116 merged:** PRs #4–#8 (`main@78184c4`)
- **PR #7** (`eb08215`): speaker-gate 400-fix — the enabler for this session's testing
- **Speaker gate source:** `services/annie-voice/speaker_gate.py:SpeakerGateProcessor`
- **Endpoint source:** `services/audio-pipeline/main.py:verify_speaker` (line ~340-414)
- **Enrollment tool:** `services/annie-voice/bot.py:974-1020` (handle_enroll_voice)
- **Test coverage:** `services/annie-voice/tests/test_speaker_gate.py` (49 tests passing, all mocked — no live integration yet)