# Research: Phone Pipeline AEC (Acoustic Echo Cancellation)

**Date:** 2026-04-08
**Status:** Research complete, implementation pending
**Related:** `docs/NEXT-SESSION-PHONE-AEC.md`, Session 28 logs

---

## Problem Statement

When Annie speaks via BT HFP, her TTS audio plays through the Pixel's speaker, gets picked up by the Pixel's mic, travels back over BT SCO to Panda, and is heard by the caller as echo. Nobody in the chain does AEC — the telecom network normally handles this for regular calls, but Panda is a raw BT Audio Gateway with no echo cancellation.

### Echo Chain

```
USER SPEAKS → Pixel mic → BT SCO → Panda pw-record → [main_queue, bargein_queue]
                                                              ↓
                                         STT → LLM → TTS → pw-play → BT SCO → Pixel speaker
                                                                                     ↓
                                                          Pixel speaker → Pixel mic (acoustic coupling)
                                                                                     ↓
                                                          Echo returns to Panda → user hears themselves
```

### Evidence (Session 28 Logs)

| Turn | Thinking Cue | Echo Drain | Duration |
|------|-------------|------------|----------|
| 1 | No | 494 frames | 14,820ms |
| 2 | Yes | 360 frames | 10,800ms |
| 3 | Yes | 147 frames | 4,410ms |

Barge-in false positives on turns 2 and 3 (sentence 2) — the 300ms echo guard is too short.

---

## Current Echo Mitigation (Band-Aid, NOT AEC)

The phone pipeline has 5 layers of echo mitigation, all local to Panda. **None of them prevent the caller from hearing echo in their earpiece** — they only protect VAD/STT from ghost turns.

### 1. Post-Echo Factor Drain (`phone_loop.py:72`)

```python
_POST_ECHO_FACTOR = 0.3
```

After Annie finishes speaking, sleep for `0.3 * (tts_duration + cue_duration)` (capped at 2.0s), then drain all queued frames. This waits for BT echo to settle before starting the next listen cycle.

- **Greeting drain** (`phone_loop.py:1315-1331`): `min(greeting_duration * 0.3, 2.0)` seconds
- **Response drain** (`phone_loop.py:1495-1512`): `min((tts_duration + cue_duration) * 0.3, 2.0)` seconds

### 2. Barge-In Echo Skip Guard (`phone_audio.py:352-398`)

```python
skip_initial_s = 1.5  # Ignore first 1.5s of BT echo
threshold_frames = 10  # 300ms sustained speech to trigger
```

During SPEAKING state, the first 1.5s of mic input is ignored to avoid triggering on Annie's own echo.

### 3. Per-Sentence Echo Skip (`phone_loop.py:776-777`)

```python
skip_s = 1.0 if sentence_num == 1 else 0.3
```

The playback worker uses different skip durations: 1.0s for the first sentence (longer BT transient) and 0.3s for subsequent sentences.

### 4. Ghost Turn Protection (`phone_loop.py`)

After 3 consecutive empty STT results, drain queue and sleep 1s.

### 5. Frame Broadcaster Queue Eviction (`phone_audio.py:76-117`)

When a subscriber's queue is full, oldest frames are evicted instead of blocking the reader thread.

### Why These Are Insufficient

All 5 mitigations operate **after** the echo has been captured. They clear Panda's input buffers but do nothing about the audio already sent back to the caller via BT SCO. The caller hears their own voice (and Annie's voice) echoed back with ~20-50ms delay.

---

## Hardware Context

| Component | Detail |
|-----------|--------|
| Panda | RTX 5070 Ti, Ubuntu 24.04, PipeWire 1.0.5 |
| Pixel 9a | BT HFP connected, MAC `FC:41:16:C5:AC:61` |
| BT profile | HFP (Hands-Free), SCO codec (CVSD or mSBC) |
| Audio format | 16kHz, 16-bit signed, mono PCM |
| Frame size | 30ms = 960 bytes |

---

## Approach Options

### Option A: PipeWire Echo-Cancel Module (Recommended)

PipeWire has a built-in `filter-chain` module (`libpipewire-module-echo-cancel`) that uses the WebRTC Audio Processing library. This runs at the PipeWire daemon level — zero Python code changes for the audio path.

**How it works:**
- Creates a virtual source (`echo_cancel_source`) that outputs echo-cancelled audio
- Takes two inputs: the raw mic capture (BT input) and the reference signal (what Annie is playing to BT output)
- The WebRTC AEC algorithm adaptively models the echo path and subtracts the predicted echo from the mic signal
- Our `pw-record` targets the virtual source instead of the raw BT input

**Config file:** `~/.config/pipewire/pipewire.conf.d/echo-cancel.conf` (on Panda)

```lua
context.modules = [
    {   name = libpipewire-module-echo-cancel
        args = {
            capture.props = {
                node.name = "echo_cancel_capture"
                target.object = "bluez_input.FC_41_16_C5_AC_61.0"
            }
            playback.props = {
                node.name = "echo_cancel_playback"
                target.object = "bluez_output.FC_41_16_C5_AC_61.1"
            }
            source.props = {
                node.name = "echo_cancel_source"
                node.description = "Echo-Cancelled BT Input"
            }
            sink.props = {
                node.name = "echo_cancel_sink"
                node.description = "Echo-Cancel Reference"
            }
            library.name = "aec/libspa-aec-webrtc"
            aec.args = {
                webrtc.gain_control = true
                webrtc.extended_filter = true
            }
        }
    }
]
```

**Code change** (`phone_audio.py:30-31`):

```python
# Before:
BT_INPUT = f"bluez_input.{_BT_MAC}.0"
# After:
BT_INPUT = os.getenv("BT_INPUT_NODE", f"bluez_input.{_BT_MAC}.0")
```

**Env var** (`start.sh`): `BT_INPUT_NODE=echo_cancel_source`

**Pros:**
- Zero Python audio path changes
- WebRTC AEC is battle-tested (used by Chrome, Firefox, Teams, etc.)
- Runs in PipeWire's real-time thread — minimal latency (~10-30ms)
- No additional dependencies if `libspa-0.2-modules` is installed

**Cons:**
- PipeWire module may not be installed on Panda
- BT reconnects may break node references (needs testing)
- Config references specific BT MAC — may need dynamic generation

### Option B: Python-Level AEC (Fallback)

Use `speexdsp` or `webrtc-audio-processing` Python bindings to process audio frames in `phone_audio.py`.

**How it works:**
- Capture the reference signal (Annie's TTS output) before sending to pw-play
- Feed both mic input and reference signal to AEC processor
- Output clean audio to the pipeline consumers

**Implementation sketch:**

```python
# In phone_audio.py, new AEC processor class:
class SoftwareAEC:
    def __init__(self, sample_rate=16000, frame_ms=30):
        self.aec = speexdsp.EchoCanceller(frame_ms * sample_rate // 1000,
                                           filter_length=sample_rate // 2)

    def process(self, mic_frame: bytes, ref_frame: bytes) -> bytes:
        return self.aec.process(mic_frame, ref_frame)
```

**Pros:**
- Works regardless of PipeWire module availability
- Full control over AEC parameters
- Can log echo residual for debugging

**Cons:**
- Complex: need to time-align mic and reference signals
- Need to pipe reference signal from TTS through the AEC
- Adds Python processing overhead to the audio path
- More code to maintain

### Option C: PulseAudio Compatibility Module

```bash
pactl load-module module-echo-cancel
```

PulseAudio's echo-cancel module running through PipeWire's PulseAudio compatibility layer.

**Pros:**
- Well-documented, widely used
- May already be available

**Cons:**
- Legacy approach — PipeWire native is preferred
- Less control over parameters
- May not correctly route BT HFP nodes

---

## Research Questions (Pre-Implementation)

### Q1: Is `libpipewire-module-echo-cancel` available on Panda?

**Check commands:**
```bash
find / -name "*echo*cancel*" 2>/dev/null
dpkg -l | grep -i pipewire
apt list --installed | grep -i "spa\|webrtc"
```

**Expected packages:** `pipewire-audio`, `libspa-0.2-modules`, or `libspa-plugins`

### Q2: Is `libspa-aec-webrtc` installed?

**Check commands:**
```bash
find / -name "libspa-aec-webrtc*" 2>/dev/null
find /usr/lib -name "aec" -type d 2>/dev/null
```

**If missing:** `sudo apt install pipewire-audio` or `libspa-0.2-modules`

### Q3: Does it work with BT HFP SCO (16kHz mono)?

BT HFP SCO uses CVSD or mSBC codec at 8kHz/16kHz. PipeWire resamples to 16kHz. The WebRTC AEC should handle this — it supports 8/16/32/48kHz — but needs verification.

### Q4: Does the AEC module survive BT reconnects?

When the Pixel reconnects BT, the bluez nodes are recreated. The AEC config references specific `target.object` names. Options if it breaks:
- Use WirePlumber rules to auto-link on BT connect
- Restart PipeWire when BT reconnects
- Use node description matching instead of exact names

### Q5: What latency does AEC add?

WebRTC AEC typically adds 10-30ms. For voice calls, <50ms is the target. The BT SCO link itself adds ~20ms, so total would be ~30-50ms — acceptable.

---

## Key Code Locations

| Component | File | Lines | Purpose |
|-----------|------|-------|---------|
| BT node names | `services/annie-voice/phone_audio.py` | 30-32 | BT_INPUT / BT_OUTPUT constants |
| Recording startup | `services/annie-voice/phone_audio.py` | 160-193 | `start_recording()` spawns pw-record |
| Reader thread | `services/annie-voice/phone_audio.py` | 121-158 | `_reader_thread()` feeds asyncio.Queue |
| Frame broadcaster | `services/annie-voice/phone_audio.py` | 76-117 | Fan-out to multiple subscribers |
| VAD utterance collection | `services/annie-voice/phone_audio.py` | 221-336 | `collect_utterance()` with webrtcvad |
| Barge-in detection | `services/annie-voice/phone_audio.py` | 352-398 | `detect_bargein()` with 1.5s echo skip |
| Playback | `services/annie-voice/phone_audio.py` | 402-427 | `play_audio()` + `kill_playback()` |
| TTS resampling | `services/annie-voice/phone_audio.py` | 452-486 | ffmpeg 24kHz to 16kHz |
| POST_ECHO_FACTOR | `services/annie-voice/phone_loop.py` | 72 | 0.3x duration echo drain multiplier |
| Greeting drain | `services/annie-voice/phone_loop.py` | 1315-1331 | Post-greeting queue drain |
| Response drain | `services/annie-voice/phone_loop.py` | 1495-1512 | Post-response drain with metrics |
| Playback worker | `services/annie-voice/phone_loop.py` | 730-816 | Sequential playback with barge-in |
| Phone auto-answer startup | `start.sh` | 720-741 | Env setup + process spawn |

---

## Constraints

- AEC must NOT add >50ms latency (voice calls are latency-sensitive)
- AEC must work at 16kHz mono (BT HFP format)
- AEC must survive BT reconnects (Pixel disconnects/reconnects during day)
- AEC should be a PipeWire config change if possible (no Python audio path changes)
- Fallback: if PipeWire AEC module unavailable, use Python-level AEC

---

## Implementation Plan

### Phase 1: Research on Panda (SSH)
1. Check PipeWire version and installed modules
2. Check for `libspa-aec-webrtc` library
3. Install missing packages if needed

### Phase 2: Configure PipeWire AEC
1. Create `~/.config/pipewire/pipewire.conf.d/echo-cancel.conf` on Panda
2. Restart PipeWire: `systemctl --user restart pipewire`
3. Verify virtual source appears: `pw-cli list-objects | grep echo_cancel`

### Phase 3: Code Changes (Minimal)
1. Make `BT_INPUT` configurable via env var in `phone_audio.py`
2. Add `BT_INPUT_NODE=echo_cancel_source` to `start.sh`
3. Optionally reduce `_POST_ECHO_FACTOR` (AEC handles echo, drain can be lighter)

### Phase 4: Verification
1. Call Annie on phone
2. Speak — should NOT hear own voice echoed
3. Annie responds — should NOT hear Annie's voice echoed back
4. Barge-in still works
5. STT quality unchanged
6. Run existing tests: `cd services/annie-voice && python3 -m pytest tests/ -q`
7. Check latency: `grep LATENCY /tmp/phone-auto.log`

---

## Post-AEC Cleanup Opportunities

Once AEC is confirmed working, these existing mitigations could be simplified:

1. **Reduce `_POST_ECHO_FACTOR`** from 0.3 to 0.1 or remove entirely — AEC handles echo at source
2. **Reduce barge-in `skip_initial_s`** from 1.5s to 0.5s — less echo to skip
3. **Reduce per-sentence skip** from 1.0s/0.3s — echo is already cancelled
4. **Remove ghost turn workaround** if echo-driven false positives disappear

These are stretch goals — keep current values until AEC is proven stable over multiple days.

---

## References

- [PipeWire Echo Cancel Module](https://docs.pipewire.org/page_module_echo_cancel.html)
- [WebRTC Audio Processing](https://webrtc.googlesource.com/src/+/refs/heads/main/modules/audio_processing/)
- [speexdsp Python bindings](https://github.com/xiongyihui/speexdsp-python)
- Session 28 investigation logs (echo drain metrics)
