# Next Session — Chatterbox Titan shim build + live A/B traffic switching

**Predecessor:** Session 110 (2026-04-15) landed `titan_chatterbox_synthesis_parity_with_panda` on PR #3 (`chatterbox-titan-bench-20260415`). Synthesis parity is validated; HTTP failover path is NOT.
**Motivation (NEW this session):** Panda needs VRAM headroom to host **Gemma-4 E4B** (~5 GB) in production. Chatterbox on Panda holds ~3.7 GB. Moving Chatterbox to Titan unlocks the E4B swap.
**User stance (important, read carefully):** **No decommission on Panda.** We run Chatterbox on BOTH Panda AND Titan in parallel for the whole session, and the assistant switches the phone daemon's `CHATTERBOX_URL` between the two on user command after each test call. User is the judge of audio quality via ear — several back-and-forth rounds.

---

## What to do

### Phase A — Planning (detailed, at session start, ~20–30 min)

Do NOT start coding. First produce a written plan covering:

1. **Shim architecture.** How `services/annie-voice/chatterbox_titan_shim.py` will mirror `chatterbox_server.py` byte-for-byte. Request schema (TTSRequest fields), auth (`X-Internal-Token`), response (raw int16 PCM + `X-Audio-Duration`/`-Sample-Rate`/`-Channels`/`-Format` headers), concurrency (asyncio.Semaphore(1) like Panda), startup eager-load + Blackwell patch application order. Identify every edge where Titan and Panda could diverge.
2. **Voice-references deployment.** Panda keeps its `~/.her-os/annie/voice-references/` untouched. Titan needs the same files + `metadata.json`. Decide: rsync from Panda once, or commit a small subset (samantha_evolving.wav is the production default — 240 KB) behind a `.gitignore` exception? Document the choice.
3. **Startup / supervisor.** How the Titan shim runs. Foreground nohup for the session is fine — no systemd yet. Keep it simple: `nohup python -m uvicorn chatterbox_titan_shim:app --host 0.0.0.0 --port 8773 > /tmp/chatterbox-titan.log 2>&1 &`. Document the PID capture + kill command.
4. **Latency plan.** What we measure when switching (round-trip p50/p95 from Panda to Titan vs Panda local). Keep lightweight: `curl -w %{time_total}` against each endpoint pre-switch, log to `/tmp/tts-switch-*.log`.
5. **Switch mechanics.** How traffic flips between Panda `:8772` (local) and Titan `:8773` (over LAN). Options:
   - (a) restart phone daemon with changed `CHATTERBOX_URL` env (reliable, ~5 s downtime between calls)
   - (b) patch the env file + SIGHUP (if phone daemon supports reload)
   - (c) reverse proxy on Panda with a hot-swap upstream (overkill for this session)
   Pick the simplest approach that doesn't disrupt the user's workflow between calls.
6. **Rollback confidence.** Every switch must have a one-line "go back" command ready to paste if anything breaks. Document them for both directions.
7. **Ordering of the session.** How the phases below interleave with user-in-the-loop test calls.

Present the plan to the user. Adjust per feedback. Only then start Phase B.

### Phase B — Build the Titan shim (~45 min)

1. Create `services/annie-voice/chatterbox_titan_shim.py` on laptop (fork of `chatterbox_server.py`, preserving API byte-for-byte). In its `@app.on_event("startup")` handler, apply **both** Blackwell patches before `_get_model()`:
   ```python
   from blackwell_patch import patch_chatterbox_xvector_cpu_fbank, patch_chatterbox_s3tokenizer_log_mel
   patch_chatterbox_xvector_cpu_fbank()
   patch_chatterbox_s3tokenizer_log_mel()
   ```
2. Commit on branch `chatterbox-titan-bench-20260415` (continuation of PR #3). Push.
3. On Titan worktree (`~/workplace/her/her-os-chatterbox-bench`): `git pull`, ensure `~/.her-os/annie/voice-references/` exists with `metadata.json` + `samantha_evolving.wav` (scp from Panda if missing), launch the shim on `:8773` via nohup + venv python (`.venv-chatterbox-bench/bin/python`). Capture PID.
4. Verify from Panda with the exact `httpx` pattern the phone daemon uses:
   ```bash
   ssh panda 'set -a && . ~/.her-os/.env && set +a && \
     curl -s -o /tmp/titan_smoke.pcm -w "HTTP=%{http_code} BYTES=%{size_download} TIME=%{time_total}s\n" \
     -X POST http://<titan-ip>:8773/v1/tts \
     -H "X-Internal-Token: $CHATTERBOX_TOKEN" \
     -H "Content-Type: application/json" \
     -d "{\"text\":\"Testing Titan failover path.\",\"reference_audio\":\"samantha_evolving.wav\"}"'
   ```
   Expect HTTP 200, bytes ≥ 24 000, time_total < 3 s.
5. Verify response headers match Panda's (curl `-I` won't work for POST; use `-D /tmp/hdrs` and diff against a Panda capture).

### Phase C — Live A/B traffic switching (user-in-the-loop, flexible duration)

**The user is the oracle.** User says "call Annie"; after the call:
- "Sounded good — try the other one": assistant switches `CHATTERBOX_URL` to the other endpoint, waits for phone-daemon restart, tells user "ready — try again."
- "That one had artifacts — switch back": assistant reverts.
- "Keep it here for now": assistant leaves it in place and waits.

Assistant's job during Phase C is **traffic-direction + state reporting**, not judgment:

- Keep a visible running log of the current endpoint (last switch time, current URL) so the user never has to ask.
- Before each switch: note the endpoint + timestamp. After each switch: confirm phone daemon is healthy (`curl -sf <panda>:8770/health || systemctl status phone-daemon`) and the target TTS endpoint is up (`curl -sf <target>/health`).
- Collect per-call latency samples if possible (phone-daemon log has per-call timing). Save to `/tmp/tts-switch-<ts>.log`.
- Do NOT make quality judgments. Only the user does. If user says "switch" — switch. If "keep" — keep.

**Switch commands (to be refined in Phase A plan):**
- **Point to Titan:** on Panda, update `CHATTERBOX_URL=http://<titan-ip>:8773` in phone daemon's env file, `kill -HUP` or restart phone-daemon. Log time + confirm `/health`.
- **Point to Panda:** inverse — `CHATTERBOX_URL=http://localhost:8772`, restart.
- **Rollback one-liner:** `ssh panda 'echo CHATTERBOX_URL=http://localhost:8772 > /tmp/url && ... && kill -HUP $(pgrep -f phone_call.py)'` (fill in exact commands during Phase A planning).

### Phase D — Wrap-up (~20 min, at session end)

Regardless of whether the user decides to keep Chatterbox on Titan or revert to Panda permanently:

1. **Restore Panda as the phone daemon's default** before ending the session. `CHATTERBOX_URL=http://localhost:8772`. Verify with one final test call if user is willing.
2. Leave the Titan shim running on `:8773` (no harm — it's ~3.3 GB in Titan's 128 GB pool). Or shut it down cleanly if user prefers. Document the PID + stop command either way.
3. **Emit additional verdict** based on user's A/B verdict:
   - `titan_chatterbox_failover_dry_run_clean` — if user reported no artifacts across ≥2 Titan calls
   - `titan_chatterbox_failover_dry_run_partial` — if user detected artifacts; document what they heard
   - Update `docs/RESEARCH-CHATTERBOX-TITAN-REDUNDANCY.md` "Failover runbook — EXPLICIT STATE" section to reflect the new state.
4. **Update `docs/BENCHMARK-CHATTERBOX-TITAN-20260415.json`** with the new verdict in the `verdicts` array.
5. **Update Registry Change Log** with the A/B result + measured Panda→Titan round-trip latency.
6. **Commit + push.** PR #3 accumulates these commits. If PR #3 was already merged, branch off `main` with `chatterbox-titan-ab-<date>`.
7. **MEMORY.md** — new "Last Session" entry with A/B outcome + measured latency + whether E4B Panda swap is now unblocked or still gated.

---

## Explicit user decisions (do NOT re-ask)

| Question | User's answer |
|----------|---------------|
| Decommission Chatterbox on Panda at the end of the session? | **No.** Panda stays running. Switch between Titan and Panda repeatedly; user compares by ear. |
| Who judges audio quality? | **User.** Assistant does not compare clips or make "it sounds fine" judgments. |
| How many A/B rounds? | User decides — assistant keeps switching on command until user says "done." |
| Detailed planning upfront? | **Yes.** Phase A produces a written plan; user approves before Phase B starts coding. |
| E4B swap on Panda this session? | **No.** E4B swap is the *next* session, gated on this session's A/B outcome. |
| Phase 6 `--staged-failover-dry-run` flag / `scripts/run-chatterbox-titan-bench.sh` runner? | Not required. The session-110 NEXT-SESSION-V2 handoff referenced these but they were never built; the manual ssh+curl+env-patch path is faster for a single session. |

---

## Pre-flight checks (run FIRST before Phase A planning)

```bash
# 1. Titan shim prerequisites
ssh titan 'ls ~/workplace/her/her-os-chatterbox-bench/.venv-chatterbox-bench/bin/python && ls /tmp/samantha_evolving.wav'

# 2. Panda phone daemon health
ssh panda 'set -a && . ~/.her-os/.env && set +a && \
  curl -sf http://localhost:8772/health && echo "panda_chatterbox_up"; \
  curl -sf http://localhost:8770/health 2>/dev/null && echo "phone_daemon_up" || echo "phone_daemon_status_check_needed"'

# 3. Panda → Titan LAN reachability
ssh panda 'ping -c 3 -W 2 <titan-ip> 2>&1 | tail -3'

# 4. Chatterbox still running on Panda (not accidentally killed between sessions)
ssh panda 'pgrep -af chatterbox_server'

# 5. Gemma on Titan healthy (no drift gate needed — not touching it this session)
ssh titan 'curl -sf http://localhost:8003/v1/models | head -c 200'

# 6. Prior-session branch state
cd ~/workplace/her/her-os && git fetch origin && \
  git log --oneline origin/chatterbox-titan-bench-20260415 -5
```

If any of (1), (2), (3), (4), (5) fails: stop and escalate to user. Do NOT begin Phase A planning without these green.

---

## Files that may be touched this session

| Path | Action |
|------|--------|
| `services/annie-voice/chatterbox_titan_shim.py` | **Create** (new, Phase B) |
| `services/annie-voice/chatterbox_server.py` | Read-only reference |
| `services/annie-voice/blackwell_patch.py` | Read-only (use existing helpers) |
| `~/.her-os/annie/voice-references/` on Titan | **Populate** (scp from Panda or add to git) |
| Phone daemon env file on Panda (`CHATTERBOX_URL`) | **Modify repeatedly** during Phase C |
| `docs/BENCHMARK-CHATTERBOX-TITAN-20260415.json` | **Update** with Phase 6 verdict (Phase D) |
| `docs/RESEARCH-CHATTERBOX-TITAN-REDUNDANCY.md` | **Update** runbook state (Phase D) |
| `docs/RESOURCE-REGISTRY.md` | **Append** Change Log row with A/B outcome + latency (Phase D) |
| `MEMORY.md` | **Update** "Last Session" block (Phase D) |

Nothing in `scripts/` needs adding this session unless latency sampling calls for it.

---

## Anti-goals (don't do these)

- Don't build a systemd unit for the Titan shim (not yet — nohup is fine for this session).
- Don't add a reverse proxy, load balancer, or automated failover. User is explicitly the traffic director.
- Don't decommission Panda Chatterbox.
- Don't swap Panda's E2B for E4B in production this session.
- Don't try to measure audio identity again — session 110 already did that (cosine 0.92). This session is purely about HTTP-path correctness + subjective audio quality through the phone.

---

## Start command

```bash
cat /home/rajesh/workplace/her/her-os/docs/NEXT-SESSION-CHATTERBOX-TITAN-AB-DEPLOY.md
# Then run the 6 Pre-flight checks above
# Then present the Phase A plan to the user — do NOT start Phase B until approved
```