# TODO — Pipecat Migration for phone_loop.py (post session-114 follow-up)

**Created:** session 114, 2026-04-15
**Source:** `/planning-with-review` Stage 2 alternative A2, rejected for session 114, filed here per skill rule ("deferred LOW must have concrete follow-up").
**Target:** multi-session migration, starts after session-114's streaming+barge-in is validated in production for ≥ 2 weeks.
**Parent plan:** `~/.claude/plans/reflective-wobbling-blossom.md`
**Related research:** `docs/RESEARCH-PIPECAT-VOICE-AGENT.md`

---

## The proposal

Migrate `services/annie-voice/phone_loop.py` from its hand-rolled turn-state-machine to **Pipecat's pipeline framework**, reusing the existing `bot.py` pipeline infrastructure. Barge-in becomes `allow_interruptions=True` on Pipecat's transport config. The SM-1 state machine designed in session 114 becomes a legacy artifact superseded by Pipecat's `InterruptionHandler` + `STTMuteFilter` + `LLMUserContextAggregator` + `TTS.interrupt()`.

## Why this was proposed

The adversarial architecture reviewer for session 114 noted that session 114's plan re-implements what Pipecat already provides. `services/annie-voice/bot.py` already runs a Pipecat pipeline, and the project already has `docs/RESEARCH-PIPECAT-VOICE-AGENT.md`. The correct architectural move is migration, not hand-rolling a parallel framework inside `phone_loop.py`.

Concrete benefits:
- Barge-in logic collapses from ~400 lines of new asyncio code to a boolean config
- LLM cancel uses Pipecat's `LLMResponseAggregator.interrupt()` — battle-tested against openai-compatible endpoints
- Telemetry and observability (frame-level tracing) comes for free
- Future transports (LiveKit for web calls, Daily.co for video, Twilio for PSTN if we ever revisit voice-call routing) become plug-and-play

## Why rejected for session 114

`phone_loop.py` has accumulated **~1500 lines of domain-specific logic over 113 sessions**, each of which would need to become a `FrameProcessor` subclass:

- Tool calling (Ollama streaming tool_calls bug workaround from session 67)
- Context compaction (OpenClaw-pattern sliding window)
- Echo draining (lines 1322-1325, 250 ms post-playback silence)
- Contact book integration (LLM-driven `callee_name` extraction, session 108)
- SEARXNG web search tool (Docker container at 192.168.68.52)
- Thinking cues (Gemma 4 chat template quirks)
- `/v1/phone/debug/transcribe` endpoint (session 113 phone-API integration)
- Greeting logic + tool-call filtering + memory tools

Porting each is a multi-session effort. Doing it inside session 114 would push scope from "add streaming + barge-in" to "rewrite the phone daemon," violating the reviewer's own principle of bounded scope.

## Proposed migration plan (4 phases, multi-session)

### Phase M1 — Inventory + parity spec
Produce `docs/PIPECAT-MIGRATION-PARITY.md` enumerating every behavior `phone_loop.py` exhibits, each mapped to its Pipecat equivalent or "needs custom FrameProcessor." Estimated effort per line item. Acceptance: 100% of session-114 `test_phone_loop_bargein.py` + `test_phone_loop_halfduplex_regression.py` tests have a Pipecat-equivalent test specified.

### Phase M2 — Dual-pipeline with feature-flag
Add `USE_PIPECAT_PIPELINE=0` env flag to `phone_loop.py` (mirrors `ENABLE_BARGE_IN` pattern from session 114). When flag=1, route the call through a Pipecat `Pipeline` instead of the hand-rolled Strategy. Keep both paths live. A/B test via flag flip in `start.sh`.

### Phase M3 — Port FrameProcessors one domain at a time
Order by simplicity-first:
1. VAD (Pipecat `VADAnalyzer` already does this)
2. STT (existing nemotron sidecar → `FrameProcessor` that wraps the WebSocket client)
3. TTS (Chatterbox → `FrameProcessor`)
4. LLM (Gemma 4 via `OpenAILLMService` with llama-server endpoint)
5. Tool calling (custom `FrameProcessor` with Ollama workaround)
6. Contact book + memory tools (custom)
7. Compaction + echo draining + greeting (custom)

Each step adds a regression test that passes under both flag states.

### Phase M4 — Remove hand-rolled path
After 30 days of production parity with flag=1 default, delete `HalfDuplexOrchestrator` + `FullDuplexOrchestrator` + `CancellableTurn` + `cancel_chain` from `phone_loop.py`. Commit becomes "refactor: phone_loop.py is now a Pipecat pipeline." Session-114's SM-1 / SM-2 / SM-3 state machines get archived in `docs/ARCHITECTURE-ARCHIVE-SM1-SM3.md` for historical reference.

## Risks

- Pipecat version churn — the framework is young; API changes frequently. Pin the version in `requirements.txt`, track the upstream changelog in `docs/PIPECAT-VERSION-NOTES.md`.
- `bot.py` currently targets a different use case (Annie as standalone assistant); migrating phone_loop may require splitting shared config vs. phone-specific config.
- Pipecat's `InterruptionHandler` interrupt semantics may not match the session-114 SM-1 state transitions exactly — edge cases (barge-in during LLM_STREAMING before any TTS started) need careful reproduction.
- Pipecat pipelines are async-first; any blocking call inside a `FrameProcessor` starves the pipeline. Existing Ollama/Chatterbox sync clients need async wrappers.

## Non-goals

- Do NOT migrate `bot.py` at the same time. The two pipelines have different runtime profiles (phone = single call, bot = persistent Annie). Separate migrations.
- Do NOT adopt Pipecat's bundled STT/TTS providers. Nemotron + Chatterbox are locally-hosted and we're not introducing cloud dependencies mid-migration.

## Owner / trigger

- **Owner:** whoever takes up the "phone daemon maintainability" thread after session 114
- **Trigger:** session 114's streaming+barge-in stable for ≥ 2 weeks in production (no cancel-chain races, no barge-in false-positive storms, no OOM)
- **Deadline:** none — pure tech-debt reduction; only act when new features require it OR when maintenance burden on the hand-rolled path grows too large