# Next Session 119 — Fix WebRTC text display (thinking leak) + Speaker Gate Test 2

## What

Annie's WebRTC text display shows Gemma 4 thinking tokens ("Wait, Samantha from Her? Oh, wow. Oh, that Wait, really?") that she doesn't speak. The TTS path correctly filters these — Annie only speaks the clean version. The text display leaks raw LLM output including `<|channel>thought...<channel|>` content.

**User request:** "Study how the phone pipeline handles all this and do the same for Annie Voice."

**Constraint:** Keep Gemma 4 thinking ENABLED. It makes Annie smarter. Phone works great with thinking ON because there's no text display. The fix is in the WebRTC display path, not the model config.

## Plan

Read the plan at `~/.claude/plans/misty-stargazing-dragonfly.md` for the session 118 context (dual Chatterbox architecture, adversarial review findings).

## Key Context (from session 118)

### What's deployed and working
- **Titan Chatterbox on :8773** — WebRTC TTS, Samantha voice, bench venv (`~/workplace/her/her-os-chatterbox-bench/.venv-chatterbox-bench/`)
- **Panda Chatterbox on :8772** — phone TTS (unchanged)
- **Barge-in cancel** — `chatterbox_tts.py` sends `session_id` in POST body + `/cancel/<sid>` in `_handle_interruption`
- **ThinkBlockFilter** — strips `<think>...</think>` (Qwen3) and `<|channel>thought...<channel|>` (Gemma 4) from pipeline TextFrames. 21 tests pass.
- **Speaker Gate** — enabled at 0.38, rejects echo (sim<0.38), accepts real speech

### What failed in session 118
- **AlwaysUserMuteStrategy** — mutes user audio while bot speaks. Silently drops legitimate barge-in transcriptions → Annie goes silent. **REVERTED** (commit `03d4dbc`). Lesson: Pipecat `user_mute_strategies` are for non-conversational bots. For voice agents with barge-in, the speaker gate is the right echo filter.

### The text display leak
- ThinkBlockFilter is in the pipeline BEFORE TTS and transport.output() (bot.py line 1399-1400)
- TTS correctly receives only clean text (confirmed: Annie speaks clean version)
- But the WebRTC text display shows raw thinking content WITHOUT tags
- Possible causes:
  1. vLLM's `--tool-call-parser gemma4` strips `<|channel>thought...<channel|>` tags server-side, leaking content into `content` field without markers
  2. Pipecat transport sends TextFrames to data channel before/parallel to the pipeline filter
  3. The LLM adapter emits text to a separate path that bypasses the filter

### How phone handles it (study this)
- `phone_loop.py` calls LLM, receives response, passes to TTS — NO text display channel
- Phone's `_POST_ECHO_FACTOR = 0.3` drains echo after TTS playback
- Phone uses `LlamaCppToolsService` (same adapter as WebRTC) but responses go to `tts_backends.py` not Pipecat pipeline
- The phone path's "solution" is architectural: no text display = no leak

### vLLM config (DO NOT change)
- Gemma 4 26B-A4B NVFP4 on Titan:8003
- `--tool-call-parser gemma4` (MANDATORY for tools)
- NO `--reasoning-parser` set
- NO `--chat-template-kwargs` set → thinking ON by default
- **Keep thinking ON** — user decision: "we don't want to disable thinking if it makes Annie weak"

## Files to Investigate

| File | Why |
|------|-----|
| `services/annie-voice/bot.py:1389-1412` | Pipeline construction — ThinkBlockFilter at step 8, transport.output() at step 11 |
| `services/annie-voice/think_filter.py` | Current filter — handles both tag formats, 21 tests |
| `services/annie-voice/llamacpp_llm.py` | LLM adapter — how streaming text is emitted, any parallel text paths |
| `services/annie-voice/.venv/.../pipecat/transports/` | How WebRTC transport sends TextFrames to data channel |
| `services/annie-voice/phone_loop.py:1340-1545` | Phone pipeline's echo drain + LLM response handling |
| `services/annie-voice/client/src/index.tsx` | WebRTC client — how text is received and displayed |

## Approach

1. **Trace the text path:** Add debug logging to ThinkBlockFilter (log input vs output text). Deploy, run one WebRTC conversation, check if the filter sees and strips the thinking text. If it does → the leak is AFTER the filter (transport parallel path). If it doesn't → the leak is BEFORE the filter (vLLM strips tags).

2. **Check Pipecat transport internals:** Does `SmallWebRTCTransport.output()` send TextFrames to the data channel as it receives them? Or does it accumulate and send at specific points? The filter must process frames BEFORE the transport sends text to the client.

3. **Fix:** Either (a) ensure the transport only sends post-filter text, (b) add a second filter at the transport level, or (c) change the client to not display thinking text.

## Secondary: Speaker Gate Test 2

After the display fix, run the intruder rejection test:
1. Open `https://voice.her-os.app`, start WebRTC session
2. Play podcast/TV audio near mic for ~60s while staying silent
3. Check log: `grep 'Speaker gate:' /tmp/annie-voice.log` — all entries should show `rejected` with `sim < 0.38`

## Verification

- [ ] WebRTC text display shows ONLY what Annie speaks (no thinking tokens)
- [ ] Annie still sounds natural with Samantha voice
- [ ] Barge-in still works
- [ ] Speaker Gate Test 2: all intruder audio rejected

## Start Command

```bash
cat ~/workplace/her/her-os/docs/NEXT-SESSION-119-WEBRTC-TEXT-DISPLAY-FIX.md
```

Then trace the text path with debug logging as described in the Approach section.
