# Next Session: Build Annie's Voice Conversation Loop

## What's Done (Session 380)

### BT HFP Audio Bridge — PROVEN
- Panda paired with iPhone via Bluetooth HFP
- Bidirectional call audio: `pw-play`/`pw-record` with bluez targets
- gTTS Kannada played into active call → Rajesh heard it
- Rajesh's voice captured → Whisper STT = perfect transcription
- Zero extra hardware. See `memory/project_bt_hfp_validated.md`

### IndicF5 TTS — FIXED + OPTIMIZED
- model.py patched: `torch.compile` removed + safetensors key remapping (`ema_model._orig_mod.` stripped) + `strict=True`
- **EPSS7 + BF16 = 285ms** for 3.47s Kannada audio (8x faster than baseline)
- FP16 is BROKEN (vocoder ComplexHalf noise). Use BF16 only.
- Quality confirmed by Rajesh for NFE32, EPSS7 FP32, and EPSS7 BF16
- See `memory/project_indicf5_production_config.md` and `memory/project_indicf5_loading_gotchas.md`

### Benchmark Summary
| Engine | Latency | Quality | Type |
|--------|---------|---------|------|
| IndicF5 EPSS7+BF16 | **285ms** | Good | Local GPU |
| gTTS | 519ms | Nice | Cloud |
| Sarvam Bulbul v3 | 1112ms | Not bad | Cloud |

## What's Next

### 1. Build Conversation Loop on Panda (~150 lines Python)
```
[BT audio in] → VAD (Silero) → STT (Whisper) → LLM (Nemotron on Titan) → TTS (IndicF5 EPSS7+BF16) → [BT audio out]
```

Components needed:
- **VAD**: Silero VAD for speech endpoint detection
- **STT**: Whisper medium on GPU (already installed)
- **LLM**: HTTP call to Nemotron Nano on Titan (vLLM port 8003)
- **TTS**: IndicF5 EPSS7+BF16 (285ms, see production config)
- **Audio I/O**: `pw-record`/`pw-play` with BT HFP targets

### 2. Test Story Conversation
- Caller asks for a story in Kannada
- LLM generates story chunks
- TTS speaks each chunk
- Test with iPhone BT connection

### 3. Buy Pixel 9a → Pair with Panda BT

### 4. Record Annie's 3s Kannada Voice Reference
- IndicF5 voice cloning needs a 3-second reference audio
- Currently using Punjabi reference (works but not Annie's voice)

## Key Files
- `scripts/benchmark_tts.py` — TTS benchmark comparing IndicF5/gTTS/Sarvam
- `docs/RESEARCH-AUDIO-BRIDGE-PIXEL-PANDA.md` — 6 audio bridge approaches
- `docs/RESEARCH-INDICF5-TTS-OPTIMIZATION.md` — 10 optimization approaches
- `docs/RESEARCH-PIXEL-ON-DEVICE-SPEECH.md` — Pixel STT rejected

## Environment on Panda
- Python 3.12, PyTorch 2.11.0+cu130, transformers 4.57.6
- RTX 5070 Ti, 16 GB VRAM
- BT adapter: hci0, BlueZ 5.72, PipeWire 1.0.5, WirePlumber 0.4.17
- IndicF5 model.py patched at `~/.cache/huggingface/modules/transformers_modules/ai4bharat/IndicF5/.../model.py`
- Backup at `model.py.bak`, BT config backup at `~/backups/bt-config-20260331/`