# Research — Chatterbox 500M on Titan DGX Spark (Blackwell SM_121) for manual failover

**Date:** 2026-04-15
**Status:** synthesis-parity validated; HTTP failover path NOT yet dry-run
**Plan:** `/home/rajesh/.claude/plans/replicated-cuddling-duckling.md`
**Verdict:** `titan_chatterbox_synthesis_parity_with_panda`
**Cross-refs:** `docs/RESEARCH-CHATTERBOX-CPU-BENCHMARK.md`, `docs/RESEARCH-TTS-GPU-DGX-SPARK.md`

## Why this exists

Panda's Chatterbox TTS at `:8772` is the only path from Annie's phone daemon to synthesized audio. A Panda outage silently mutes phone calls — the daemon stays up, answers, and delivers zero-filled PCM (MEMORY "MUTE-NOT-CRASH"). This bench is the first step toward a **manual failover target on Titan**: verify that Chatterbox 500M can run natively on Blackwell SM_121 and produce audio indistinguishable from Panda output.

**What this bench PROVES:** Chatterbox weights load on Blackwell, synthesis is numerically correct, and the voice identity is a ~0.92 cosine match against Panda's production output.

**What this bench does NOT PROVE:** the actual failover HTTP path (auth header, int16 PCM framing, phone daemon compatibility). That is Phase 6 of the plan — opt-in, not run this session.

## Result headline

| Metric | Value |
|--------|-------|
| Install path | `chatterbox-tts==0.1.7` + forced `torch==2.11.0+cu128` + `torchaudio==2.11.0+cu128` |
| Blackwell patches required | 2 (see [Blackwell patches](#blackwell-patches) below) |
| Synthesis p50 (10-utterance batch) | **2055 ms** |
| Synthesis p95 | 3507 ms |
| RTF (real-time factor) p50 | 0.55 (~1.8× real-time) |
| VRAM peak (torch API) | 3.33 GB |
| VRAM peak (per-process nvidia-smi) | 3.66 GB |
| Unified-memory gap | ~342 MiB |
| A/B cosine similarity vs Panda (resemblyzer mean) | **0.9199** (min 0.8652, spread 0.0848) |
| Gemma post-flight drift | 1.005× baseline (no degradation) |

Full numbers in `docs/BENCHMARK-CHATTERBOX-TITAN-20260415.json`.

## Install recipe (reproducible)

```bash
# 1. Pristine worktree off origin/main
cd ~/workplace/her/her-os
git fetch origin
git worktree add ../her-os-chatterbox-bench -b chatterbox-titan-bench-$(date +%Y%m%d) origin/main
cd ../her-os-chatterbox-bench

# 2. Make sure the bench venv is ignored (once per worktree)
grep -q "^\.venv-chatterbox-bench/" .gitignore || \
  printf '\n# Chatterbox bench venv\n.venv-chatterbox-bench/\n.venv-*/\n' >> .gitignore

# 3. Create and populate venv
python3 -m venv .venv-chatterbox-bench
source .venv-chatterbox-bench/bin/activate
pip install --upgrade pip
pip install chatterbox-tts==0.1.7

# 4. Torch stack alignment — pin to 2.11.0+cu128 (Blackwell SM_121 support).
# Must use --extra-index-url (NOT --index-url, which replaces PyPI).
pip install --force-reinstall torch==2.11.0+cu128 \
  --extra-index-url https://download.pytorch.org/whl/cu128
pip install --force-reinstall torchaudio==2.11.0+cu128 \
  --extra-index-url https://download.pytorch.org/whl/cu128

# 5. Sanity probes
python - <<'PY'
import torch
assert torch.__version__.startswith("2.11.0"), torch.__version__
assert torch.cuda.get_device_capability() == (12, 1), torch.cuda.get_device_capability()
assert torch.cuda.is_available()
print("torch stack OK")
PY
pip check  # Expect benign chatterbox==0.1.7 → torch==2.6.0 pin warning; not a blocker
```

**`pip check` output is expected to warn**:

```
chatterbox-tts 0.1.7 has requirement torch==2.6.0; python_version < "3.14", but you have torch 2.11.0+cu128.
torchaudio 2.11.0+cu128 has requirement ... (similar)
nvidia-cusparselt-cu12 0.7.1 is not supported on this platform
```

These are **resolver-level version-pin warnings, not functional conflicts** — Chatterbox works with torch 2.11 once we apply the Blackwell patches below. The `nvidia-cusparselt` warning is an unused optional for our workload.

**HF weights** auto-download on first `ChatterboxTTS.from_pretrained(device="cuda")` call. Pinned sha256s (for drift detection):

| File | Size | sha256 |
|------|------|--------|
| `conds.pt` | 107 KB | `6552d70568833628ba019c6b03459e77fe71ca197d5c560cef9411bee9d87f4e` |
| `s3gen.safetensors` | 1.06 GB | `2b78103c654207393955e4900aac14a12de8ef25f4b09424f1ef91941f161d4e` |
| `t3_cfg.safetensors` | 2.13 GB | `914cb1696f47527fe8852ca8f1fe1fa63cb34f76f9c715e84e067b744dd0da81` |
| `tokenizer.json` | 25 KB | `d71e3a44eabb1784df9a68e9f95b251ecbf1a7af6a9f50835856b2ca9d8c14a5` |
| `ve.safetensors` | 5.7 MB | `f0921cab452fa278bc25cd23ffd59d36f816d7dc5181dd1bef9751a7fb61f63c` |

**`pip freeze` hash:** `eb299ede3811be089ddd92a294e8e2a02b7e54d778d2291c185b09e96983330c`
Full freeze in `docs/chatterbox-titan-bench-samples-20260415-053202/pip_freeze.txt`.

Chatterbox is **MIT licensed (code + weights)**.

## Blackwell patches

Two Jiterator / NVRTC call sites fire when voice cloning on SM_121 with torch 2.11. Both are fixed in `services/annie-voice/blackwell_patch.py`:

| # | Call site | Root cause | Patch helper |
|---|-----------|-----------|--------------|
| 1 | `chatterbox.models.s3gen.xvector.extract_feature` (one-shot per ref, ~5 s audio) | `torchaudio.compliance.kaldi.fbank` → internal `torch.stft(return_complex=True)` → `.abs()` on complex tensor → Jiterator → NVRTC `invalid --gpu-architecture sm_121` | `patch_chatterbox_xvector_cpu_fbank()` routes `extract_feature` through CPU; round-trip is negligible since fbank is only called during voice-clone setup. |
| 2 | `chatterbox.models.s3tokenizer.s3tokenizer.S3Tokenizer.log_mel_spectrogram` (once per reference, during `prepare_conditionals`) | `torch.stft(return_complex=True)` followed by `stft[..., :-1].abs()**2` on complex tensor — same Jiterator path | `patch_chatterbox_s3tokenizer_log_mel()` replaces the method with `return_complex=False` + `real**2 + imag**2` computed via pre-compiled ops. |

Other chatterbox STFT call sites are already Blackwell-safe:
- `chatterbox.models.s3gen.hifigan.HiFiGAN._stft` uses `torch.view_as_real(...)` + tensor indexing — no Jiterator dispatch.
- `chatterbox.models.s3gen.utils.mel` uses `torch.view_as_real(torch.stft(...))` + `.pow(2).sum(-1)` — no Jiterator dispatch.

Apply both patches before `ChatterboxTTS.from_pretrained` on Titan:

```python
import sys
sys.path.insert(0, "services/annie-voice")
from blackwell_patch import patch_chatterbox_xvector_cpu_fbank, patch_chatterbox_s3tokenizer_log_mel
patch_chatterbox_xvector_cpu_fbank()
patch_chatterbox_s3tokenizer_log_mel()

from chatterbox.tts import ChatterboxTTS
model = ChatterboxTTS.from_pretrained(device="cuda")
# ... synthesis works including voice-clone path
```

## Voice reference — plan deviation

Plan assumed `samantha_movie_primary.wav` (34.7 s, in Titan's `services/audio-pipeline/voice-references/`). **Panda's production allowlist** (`~/.her-os/annie/voice-references/metadata.json`) actually only contains the short refs `samantha_hello/name/evolving.wav` (5–11 s); production defaults to `samantha_evolving.wav` (5 s). For production-faithful failover parity this bench uses the **production ref on both sides** — `samantha_evolving.wav` scp'd from Panda to Titan:`/tmp/samantha_evolving.wav`.

Plan's Samantha-ref 30-40 s length probe + ffmpeg-trimmed fallback (CODE-11 / PM-3) were therefore not exercised — the 5 s ref is well below any Chatterbox voice-embedder cap.

## Failover runbook — EXPLICIT STATE

**State today:** synthesis parity validated only. The actual HTTP failover path (phone daemon `httpx` → Titan HTTP endpoint with `X-Internal-Token` auth, raw int16 PCM response, identical headers) is **NOT exercised by this bench.** Do not advertise this as "redundancy validated."

**To become a real failover target, the following must happen (Phase 6 of the plan, opt-in):**

1. Stand up `services/annie-voice/chatterbox_titan_shim.py` — FastAPI wrapper binding Titan `:8773`. API must be byte-for-byte identical to Panda's `chatterbox_server.py` (same headers, same raw int16 PCM response, same `X-Audio-Duration`/`Sample-Rate`/`Channels`/`Format` response headers).
2. Smoke the shim from Panda's phone-daemon codepath — `httpx.post` to `http://<titan-ip>:8773/v1/tts` with real `CHATTERBOX_TOKEN`, verify raw PCM body and headers.
3. Scheduled 30-minute supervised window: repoint ONE phone line's `CHATTERBOX_URL` to Titan:8773. User initiates 2–3 test calls. Observe call quality, subjective audio identity, dropped-frame rate.
4. Rollback. Document the additional verdict `titan_chatterbox_failover_dry_run_clean`.

**Until that runs:** manual failover would require (a) standing up the shim under urgency, (b) updating Panda's `CHATTERBOX_URL`, and (c) hoping the untested HTTP path doesn't have off-by-one PCM framing issues. Treat this as a **~1-hour-to-activate** fallback, not a hot-standby.

## Revisit triggers

- Chatterbox version changes (anything newer than 0.1.7)
- torch bumps past 2.11.x (especially 2.12+ — NVRTC behavior may change)
- Titan firmware / CUDA driver upgrade
- Panda's production voice ref changes from `samantha_evolving.wav`
- Quarterly canary: **2026-07-15** — re-run this bench unchanged to detect silent drift

## Supporting artifacts (all committed on this branch)

- Samples: `docs/chatterbox-titan-bench-samples-20260415-053202/`
  - `titan_native/titan_native_{00..09}.wav` — 10 Titan synth outputs
  - `panda_baseline/chatterbox_{00..09}.wav` — 10 Panda HTTP baseline outputs
  - `panda_baseline/chatterbox_{00..09}.pcm` — same as .wav, raw server bytes for byte-level inspection
  - `titan_native_profile.json.gz` — `torch.profiler` chrome trace
  - `BENCHMARK-CHATTERBOX-TITAN-PHASE2.json` — per-utterance timing + VRAM
  - `BENCHMARK-CHATTERBOX-TITAN-IDENTITY.json` — per-pair cosine scores
  - `pip_freeze.txt` — exact venv state
- Top-level: `docs/BENCHMARK-CHATTERBOX-TITAN-20260415.json` — merged verdict
- Code:
  - `services/annie-voice/blackwell_patch.py` — both patch helpers
  - `services/annie-voice/tests/test_blackwell_patch.py` — unit tests for `patch_stft`
  - `scripts/benchmark_chatterbox_titan.py` — Phase 2 bench
  - `scripts/generate_chatterbox_baseline.sh` — Phase 3 baseline (fixed `/v1/tts` + auth)
  - `scripts/tts_identity_score.py` — Phase 4 A/B scorer (embedding + human modes)
