# Next Session: Annie Speaking on Google Pixel 9a

## Context

Read these docs first:
- `docs/RESEARCH-INDIAN-LANGUAGE-SPEECH.md` — Full research + benchmarks for Indian language speech pipeline
- `docs/RESEARCH-ANNIE-PHONE.md` — Pixel 9a decision, ADB automation
- `memory/hardware-infra.md` — Panda (RTX 5070 Ti, phone hub) and Titan specs

## What's Done

**On Panda (RTX 5070 Ti, x86_64, 192.168.68.57):**
- IndicConformerASR 600M — STT for pure Kannada, 145ms/3s audio, 303 MB VRAM, GPU
- IndicF5 TTS — Kannada speech generation, 2112ms/2.61s audio (RTF=0.808, slower than real-time), 1.7 GB VRAM
- Whisper medium — for mixed Kannada-English, GPU
- Sarvam Saaras v3 API — integrated for code-mixed STT (567ms, API key in `.env.eval`)
- Live demo at `scripts/live_asr_demo.py` (port 8765, 3 models)
- PyTorch 2.11.0+cu130, transformers 4.57.6, onnxruntime-gpu 1.24.4, torchcodec
- HF token set at `~/.cache/huggingface/token`

**E2E validated:** IndicConformerASR → Nemotron Nano on Titan → responds in Kannada. LLM understands pure Kannada.

**Key insight:** ASR transcript is intermediate — no human reads it. Optimize for LLM understanding, not transcription accuracy.

## What's NOT Done — The Actual Question

**How does audio physically flow between the Pixel 9a and Panda in real-time?**

The models are ready on Panda. But we haven't built the **audio bridge** — the mechanism that:
1. Captures audio from Pixel 9a's microphone (during a phone call or voice interaction)
2. Streams it in real-time to Panda for STT processing
3. Takes TTS audio from Panda and plays it back on the Pixel's speaker/call

### Research DONE — see `docs/RESEARCH-AUDIO-BRIDGE-PIXEL-PANDA.md`

**Verdict:**
- **Use Case 1 (voice assistant):** Custom Android app (AudioRecord + AudioTrack + WebSocket over ADB USB reverse). ~3 days.
- **Use Case 2 (phone calls from mom):** VoIP/SIP with FreeSWITCH + sip-to-ai bridge. GoIP-1 GSM gateway for mom to keep calling same Jio number. ~3-5 days.
- **DO NOT root the Pixel** (breaks UPI). **DO NOT try to inject audio into cellular calls** (Android blocks this by design).

### Implementation needed:
- P0: Custom Android app (AudioRecord + AudioTrack + WebSocket). ~200-300 lines Kotlin.
- P0: Python WebSocket audio bridge server on Panda (receives PCM, runs STT/LLM/TTS, sends PCM back)
- P1: Wake word detection on Pixel (OpenWakeWord or Porcupine)
- P1: VAD on Pixel (WebRTC VAD or Silero)
- P2: FreeSWITCH + sip-to-ai on Panda for mom's calls
- P3: GoIP-1 GSM gateway hardware

### Also pending:
- IndicF5 TTS latency optimization (RTF 0.808 = slower than real-time, need < 0.5)
- Record Annie's Kannada voice identity (3-sec reference audio for IndicF5 voice cloning)
- No Kannada reference prompt shipped with IndicF5 (only Punjabi + Marathi)
- transformers version conflict (4.57.6 for IndicF5, may need 5.x for other models)
- Pixel 9a not yet purchased (Croma ₹34,999 with HDFC)