# Next Session 121: Speaker-Aware Barge-In (Defense-in-Depth)

## What

Annie gets interrupted by intruder voices (podcast, TV, other people) even though the speaker gate later rejects the utterance. She goes silent for nothing. Fix: two-layer defense so Annie only responds to barge-in from the enrolled speaker.

## Problem

Current flow (barge-in is speaker-blind):
```
Mic audio → VAD detects speech → BARGE-IN fires (Annie stops talking)
         → STT transcribes → Speaker Gate verifies → reject → Annie is already silent
```

Desired flow:
```
Mic audio → VAD detects speech → Speaker verify → only barge-in if enrolled speaker
         → If barge-in slips through → Gate rejects → Annie RESUMES speaking
```

## Design (agreed in session 120b)

**Two layers, defense-in-depth:**

### Layer 2 — "Resume on Reject" (build FIRST)
- When the speaker gate rejects a transcription, signal the pipeline to resume interrupted TTS
- Requires: TTS frame buffer before transport output, "undo interruption" mechanism
- Zero latency cost to happy path (enrolled speaker barge-in is unaffected)
- Standalone value: Annie recovers from ANY false barge-in, not just intruders

### Layer 1 — "Gate Before Barge-In" (build SECOND)
- Intercept `InterruptionFrame` in the Pipecat pipeline
- Hold it for ~0.5-1s while collecting audio and running speaker verification
- If verified → release the interrupt (barge-in proceeds)
- If not verified → drop the InterruptionFrame silently (Annie keeps talking)
- Latency concern: adds ~0.5-1s to enrolled speaker's barge-in response time
- Prevents most intruder interruptions before they reach Annie

### Combined Effect
- Layer 1 prevents ~95% of intruder barge-ins (fast pre-check)
- Layer 2 catches the remaining ~5% (gate rejects → Annie resumes)
- Enrolled speaker experiences ~0.5s additional barge-in latency (acceptable)

## Context

- Speaker gate VERIFIED (session 120b): threshold 0.38, 33/33 intruder rejects, 2/2 self-accepts, 0.353 margin
- Gate endpoint: `/v1/verify-speaker` on audio pipeline (:9100)
- Barge-in architecture: Pipecat transport sends `InterruptionFrame` on VAD trigger
- Existing gate: `services/annie-voice/speaker_gate.py` — post-STT FrameProcessor
- Pipeline: transport.input → STT → speaker_gate → LLM context → LLM → TTS → transport.output

## Key Files

- `services/annie-voice/speaker_gate.py` — existing gate (post-STT verification)
- `services/annie-voice/bot.py` — pipeline construction, ~line 1370-1400 (gate placement)
- `services/annie-voice/kokoro_tts.py` — TTS service (need to understand frame buffering)
- Pipecat internals: `InterruptionFrame`, `TTSAudioRawFrame`, transport interruption handling

## Start Command

Plan Layer 2 first using `planning-with-review`. Key questions to investigate:
1. How does Pipecat propagate `InterruptionFrame` through the pipeline?
2. Where in the pipeline can we buffer TTS frames for replay?
3. How does `_handle_interruption()` work in the TTS service and transport?
4. Can we "un-cancel" a TTS stream, or must we re-synthesize?

## Verification

1. Layer 2: Play podcast → Annie gets barged in → gate rejects → Annie resumes speaking
2. Layer 1: Play podcast → Annie does NOT get barged in at all
3. Enrolled speaker: barge-in still works (with acceptable latency)
4. All existing tests pass (2750+ annie-voice)
