# Next Session 120: Fix WebRTC Text Display Thinking Token Leak

## What

Annie's WebRTC text display shows Gemma 4 thinking tokens that she doesn't speak. The fix adds two-layer defense at the RTVI transport boundary: (1) block raw streaming LLM tokens, (2) apply `clean_for_tts()` to all aggregated output before it reaches the browser. This matches the phone pipeline's defense-in-depth approach.

## Plan

Read the plan at `~/.claude/plans/purring-squishing-acorn.md`.
It has the full implementation, root cause analysis, all adversarial review findings, and design decisions.

## Key Design Decisions (from adversarial review)

1. **Display is driven by `BotOutput` events (not `BotLlmText`)** — voice-ui-kit subscribes ONLY to `RTVIEvent.BotOutput`. The `bot-llm-text` raw streaming channel is NOT used by the display UI. So `bot_llm_enabled=False` alone is insufficient — we need `bot_output_transforms` too.

2. **`bot_output_transforms` is the right seam** — rtvi.py:1460-1462 applies transforms to text BEFORE creating `RTVIBotOutputMessage`. Registering `clean_for_tts()` here catches anything ThinkBlockFilter misses (tag-stripped thinking content, chat template tokens, etc.).

3. **`bot_llm_enabled=False` is defense-in-depth** — blocks raw `bot-llm-text` AND legacy `bot-transcription` messages. Also blocks `bot-llm-started`/`bot-llm-stopped` lifecycle events, but voice-ui-kit has ZERO handlers for these (confirmed via grep).

4. **Client patch `!data.spoken` guard is safe** — TTS pushes `AggregatedTextFrame` (not `TTSTextFrame`) at tts_service.py:693, so `isTTS=False` in `_send_aggregated_llm_text`, so `spoken=False`, so the guard passes and assistant bubbles are created normally.

5. **Client dist is NOT stale** — rebuilt Apr 16 11:43. No client rebuild needed.

6. **Existing `SpeechTextFilter` explains why TTS is clean** — TTS service runs `clean_for_tts()` on the text variable for synthesis (tts_service.py:675-678) but pushes the ORIGINAL frame to the observer. The display-side gap is that no equivalent filter existed for `BotOutput` text — until this fix.

## Files to Modify

1. `services/annie-voice/bot.py` — Add `RTVIObserverParams` import (top-level), add `_strip_thinking_for_display` async helper, pass params to `PipelineTask` at line 1437
2. `services/annie-voice/tests/test_think_filter.py` — Add integration test feeding `LLMTextFrame` through `process_frame()`, add config regression test
3. `services/annie-voice/tests/test_voice_pipeline_bugs.py` — Add test for `_strip_thinking_for_display` transform function

## Start Command

```bash
cat ~/.claude/plans/purring-squishing-acorn.md
```

Then implement the plan. All adversarial findings are already addressed in it.

## Verification

1. Run existing tests: `cd services/annie-voice && python -m pytest tests/ -v` (all 2750+ must pass)
2. Run new transform + config tests
3. Deploy: commit + push → Titan `git pull` → `./stop.sh annie && ./start.sh annie`
4. Open `https://voice.her-os.app`, connect, have conversation
5. Verify text display shows ONLY what Annie speaks (no thinking tokens)
6. Verify barge-in works (interrupt mid-sentence)
7. Check browser DevTools console: `[RTVI:BotOutput]` logs should show clean text, no `[RTVI:BotLlmText]` logs
8. (Optional) Run Speaker Gate Test 2: play podcast near mic for 60s, verify all entries rejected