# Next Session — open PR for streaming-stt-bargein-20260415

**Session 115 ended 2026-04-15 with all code shipped & deployed.** This doc bootstraps the PR-opening session so it can start cold.

## Start prompt

```
Read MEMORY.md's session-115 block + docs/NEXT-SESSION-STREAMING-BARGEIN-PR.md.
Branch streaming-stt-bargein-20260415 has 6 commits behind origin/main and
is live on Panda. Open PR #5 against main. DO NOT land it yet — just get
the PR open with a full description, let CI run, and hand me the URL.

Do not rework Phase 3 in this session. That's a separate session with its
own scope (see MEMORY session-115 deferrals). Your only job here is: make
the PR.
```

## Pre-flight (required; cheap)

1. `git fetch origin && git log --oneline origin/main..streaming-stt-bargein-20260415` — confirm 6 commits present on branch (f59cd1b, 3f174b7, 28af00e, 8a1ae23, d118bb7, d4bf169).
2. `git diff origin/main...streaming-stt-bargein-20260415 --stat` — sanity-check the per-file diff size (expect ~10 files, ~1,200 insertions).
3. `ssh panda 'curl -s http://localhost:8772/health' && ssh panda 'tail -1 /tmp/phone-auto.log'` — confirm Panda services still healthy on the branch code (regression test: don't PR if deployment is broken).
4. On Panda: `cd ~/workplace/her/her-os/services/annie-voice && ~/workplace/her/her-os/.venv/bin/python -m pytest tests/test_tts_backends.py tests/test_chatterbox_server.py -q` — expect 40/40 pass.
5. On Panda: `cd ~/workplace/her/her-os && ~/workplace/her/her-os/.venv/bin/python -m pytest scripts/tests/test_nemotron_stt_server.py -q` — expect 18/18 pass.

If any pre-flight fails, STOP. Do not open the PR against main with red tests.

## Parent PR status

Session 113's PR #4 (`parakeet-stt-bench-20260415` → main) is still open. Session 115's branch is stacked on top of session 113's branch. Options:

- **(A) Wait for PR #4 to merge first, then rebase session-115 onto fresh main, then open PR #5.** Cleanest history. Requires PR #4 to be merged first.
- **(B) Open PR #5 as session-113-branch-based now, label WIP, let it wait for #4.** GitHub supports this (target branch can be the parent branch, not main). Merge order is preserved.
- **(C) Open PR #5 targeting main directly, accept that the diff includes session-113 commits.** Ugly diff but doesn't block on #4.

**Recommended: (A) if PR #4 is already approved; (B) otherwise.** Ask the user which.

## PR body template

```markdown
## Summary
- Streaming STT sidecar (nemotron-speech-streaming-en-0.6b) on :11439
- Chatterbox `/cancel/<session_id>` endpoint with Option B semantics
- Barge-in → cancel_guard_set + /cancel POST in _playback_worker

## What's in scope
- Phase 1 full (sidecar + client lib + 18 tests)
- Phase 2 full server-side (cancel endpoint + session tracking + tombstone sweeper + 7 tests)
- Phase 2 adapted client-side for phone path (tts_backends.ChatterboxBackend + SynthCancelled + 7 tests)
- Phase 3 MINIMAL (barge-in wiring only; set cancel_guard + dispatch /cancel)
- Follow-up docs: TODO-CHATTERBOX-OPTION-A.md, TODO-CANCEL-GUARD-GAP.md (closed by d4bf169)

## What's NOT in scope (deferred to next-session Phase 3 full rework)
- FullDuplexOrchestrator / HalfDuplexOrchestrator Strategy pattern
- streaming STT path wiring into phone_audio.PhoneSTT.stream_ws
- TurnState enum / CancellableTurn / cancel_chain / _await_cancel_observable
- ENABLE_BARGE_IN + STREAMING_STT env flags (cancel is currently always-on)
- Systemd units for either sidecar (Phase 6)
- Phase 4 bench + Phase 5 formal live validation + Gates 0.6, 0.9

## Verification
- scripts/tests/test_nemotron_stt_server.py: 18 passed (GPU soak @pytest.mark.slow skipped)
- services/annie-voice/tests/test_chatterbox_server.py: 17 passed (including TestCancel class)
- services/annie-voice/tests/test_tts_backends.py: 23 passed (including TestChatterboxCancel + TestCancelGuardLifecycle)
- Gate 0.4: nemotron streaming API works via conformer_stream_step + CacheAwareStreamingAudioBuffer (TTF 493ms on 4s synth WAV; att_context_size=[70,13])
- Gate 0.8: llama-server cancel-to-slot-release floor = 1.192s conservative / ~40ms when slot tracked
- Live call (4 barge-ins) on d4bf169: kill_playback felt correct within ~50ms; cancel_guard_set fires on bargein_fired branch

## Known gaps
- Chatterbox `/cancel` POST for the *currently-playing* WAV is still a no-op because `_current_session_id` is cleared when generate_audio's POST returns. Next-sentence synth IS now aborted via cancel_guard_set (the fix in d4bf169), but the active sentence's HTTP response is not proactively cancelled at the server. This is acceptable — the synth is already done by playback time, so there's nothing to save.
- Deploy still uses nohup + ps-argv env var for CHATTERBOX_TOKEN (session-113 leak unaddressed). Phase 6 systemd unit will fix this.

## Rollback
If anything breaks in production, single `git revert d4bf169 28af00e` on Panda + `./stop.sh phone chatterbox && ./start.sh phone chatterbox` reverts to session-113 baseline. Chatterbox `/cancel` endpoint is additive (no existing behavior changed), so leaving the server code in place is safe.

## Related
- Plan: `~/.claude/plans/reflective-wobbling-blossom.md`
- Predecessor: PR #4 (parakeet-stt-bench-20260415)
- Follow-up: next session = Phase 3 full rework + streaming STT wiring
```

## What the session must NOT do

- Do not start Phase 3 full rework. That's its own session.
- Do not write the systemd units (Phase 6). That's its own session.
- Do not run the 500-session VRAM soak test. That's Phase 4 territory.
- Do not change the Chatterbox startup path from ps-argv to systemd. Out of scope.
- Do not rebase the branch onto main if PR #4 is still open — that loses the stack relationship.

## Done-looks-like

- PR URL in hand.
- CI either green, or red with a clear action item for the follow-up session.
- MEMORY.md session-116 block updated with the PR URL + any CI action items.
- Nothing deployed or changed on Panda by this session — it's docs + GitHub only.