# TODO — Close the cancel-session-id lifetime gap

**Filed:** session 115, 2026-04-15 post-test-call
**Severity:** Low (kill_playback still works; this is the "stop wasted GPU" gap)

## What's broken

Session 115 shipped `_playback_worker` → `tts.cancel_current()` → `backend.cancel(session_id)` on barge-in. Logs from the session-115 test call (4 real barge-ins on turns 6/10/11/13) show the Chatterbox server received ZERO `/cancel/<session_id>` POSTs.

## Why

`ChatterboxBackend._current_session_id` lifetime is **scoped to a single `/v1/tts` HTTP POST**. In the phone flow:

1. `_tts_worker` calls `tts.generate(text, wav)`  → `backend.generate_audio()` POSTs → Chatterbox returns full WAV ~1-3 s later → `finally` block clears `_current_session_id = None` → WAV queued to `play_queue`.
2. `_playback_worker` dequeues the WAV and plays it.
3. User barges in during playback → `_playback_worker` calls `tts.cancel_current()` → reads `current_session_id` → **it's already None** (step 1 cleared it).

So `cancel` is called, sees None, and returns False. Server never sees a POST.

## Why kill_playback still made the call "feel right"

Kill_playback kills the ffplay/aplay subprocess, so audible output stops in ~50 ms. That's all Mom notices. The plumbing is a silent no-op underneath.

## Who actually benefits from fixing this

The cancel POST becomes meaningful only when:
- Chatterbox is BUSY synthesizing the NEXT sentence (pipelined by `_tts_worker`) at the moment barge-in fires. Then `current_session_id` IS populated and `/cancel/<id>` lands while that synth is still running → the 204-discard path actually saves 1-3 s of GPU waste.
- With `_cancel_any_pending` pre-set: the NEXT sentence's `generate_audio` sees the guard at entry and raises `SynthCancelled` without issuing its HTTP POST at all.

## Fix (small, ~5 lines)

In `services/annie-voice/phone_loop.py` `_playback_worker`, change the barge-in branch:

```python
if bargein_task in done and not bargein_task.cancelled():
    exc = bargein_task.exception()
    if exc is None and bargein_task.result():
        bargein_fired = True
        logger.info("[PHONE-STREAM] Barge-in during sentence {}", sentence_num)
        # Pre-set the module-level guard BEFORE kill_playback so any
        # sentence still queued at `_tts_worker` aborts without even
        # issuing its POST.
        import tts_backends
        tts_backends.cancel_guard_set()
        try:
            await phone_audio.kill_playback(proc)
            if tts is not None:
                try:
                    await asyncio.get_running_loop().run_in_executor(
                        None, tts.cancel_current,
                    )
                except Exception as e:
                    logger.warning("[PHONE-STREAM] Chatterbox cancel raised: {}", e)
            cancel_pipeline.set()
            break
        finally:
            # Brief tail (2 s) then release the guard so the NEXT turn's
            # first synth can run normally. Matches the tombstone window
            # on the server.
            asyncio.get_running_loop().call_later(2.0, tts_backends.cancel_guard_clear)
```

Plus a `conftest` or test that asserts `cancel_guard_set()` is called with a simulated barge-in.

## Additional fix (optional)

Alternatively or additionally: keep `current_session_id` populated for the duration of the WAV being queued+played, not just the POST. Set it in `_tts_worker` before queuing, clear it when `_playback_worker` dequeues. That way cancel_current() still hits the right id even post-POST.

## When to close this

When the next session's Phase 3 full rework (the plan's Strategy-pattern split) re-implements `cancel_chain()` properly — at that point this is subsumed by the full cancel_chain contract.
