# Next Session — Parakeet/Whisper reconcile + fleet cleanup (session 117)

**Supersedes:** nothing — session 116 closed clean, no stale handoffs. Start here.

## Context from session 116 (2026-04-15)

Session 116 merged 4 PRs and deployed clean:

| PR | Purpose | Merge commit |
|----|---------|--------------|
| #4 | Parakeet STT benchmark + infra | `fd14d30` |
| #5 | Streaming STT sidecar + Chatterbox `/cancel` + barge-in | `abefbb5` |
| #6 | panda-llamacpp healthcheck fix + systemd unit under VC | `b8231e3` |
| #7 | speaker-gate 400 fix (match audio-pipeline empty-body contract) | `eb08215` |

`main` is at `eb08215`. Laptop + Panda + Titan all on main. Live phone call PASSED. Inventory audit completed for all three machines (Panda, Beast, Titan). Five audit findings surfaced; 3 fixed in-session (panda-llamacpp unhealthy, Titan chatterbox shim retired, speaker-gate 400s), 2 deferred to this session.

**No code changes required before starting.** Everything below is triage of accumulated drift + optional hygiene.

## Priority 1 — Parakeet vs Whisper wiring reconciliation (load-bearing)

**Why this is P1:** MEMORY's session-113 decision `stay-whisper (strict)` is authoritative for future STT planning. But Panda's phone daemon env has `PARAKEET_URL=http://localhost:11438` and a Parakeet server is actively running (PID 1737111, `scripts/parakeet_stt_server.py`). If future-you plans an STT change against the "we run Whisper" baseline, the plan is wrong.

### Diagnostic steps

```bash
# 1. Is phone actually using Parakeet, or is it dead code?
ssh 192.168.68.57 'grep -iE "(parakeet|whisper)" ~/workplace/her/her-os/scripts/phone_call.py | head -20'

# 2. Which STT backend is selected at runtime?
ssh 192.168.68.57 'grep -A10 "def _init_stt\|def get_stt_backend\|STT_BACKEND" ~/workplace/her/her-os/scripts/phone_call.py | head -40'

# 3. Live-fire: during a call, does phone POST to :11438?
ssh 192.168.68.57 'sudo tcpdump -i any -n -s0 -w /tmp/stt.pcap port 11438 & sleep 60; sudo pkill tcpdump; tcpdump -r /tmp/stt.pcap 2>/dev/null | wc -l'
# (run this while placing a real call)

# 4. Parakeet server access log for last 24h — any production calls?
ssh 192.168.68.57 'tail -500 /tmp/parakeet-stt.log | grep -iE "(transcribe|POST)" | tail -20'
```

### Decision matrix

- **If Parakeet is live in the STT path:** update MEMORY's session-113 verdict line to reflect the rollout. Document WHY the verdict flipped (ops pressure? WER benefit?). The verdict block lives at `~/.claude/projects/-home-rajesh-workplace-her-her-os/memory/infra_parakeet_*` and/or the session-113 "Last Session" block.
- **If Parakeet is NOT called:** kill it (`pkill -f parakeet_stt_server`), remove `PARAKEET_URL` from phone daemon env in `start.sh`, redeploy. The verdict is still correct; only the live state was drifting.

### Success criteria

- MEMORY and live wiring agree on Parakeet's status.
- Exactly ONE STT backend is active on Panda — whichever is chosen, deliberately.

## Priority 2 — Operational cleanup (low-risk, ~30 min total)

### 2A — Retire stale orphan processes

```bash
# Panda: streaming_conversation_test.py — running since Apr 10, no caller
ssh 192.168.68.57 'kill 986328 986327'  # also kill parent bash wrapper

# Titan: http.server on :9876 — running since Mar 26, probably docs preview leftover
ssh titan 'kill 605388'
# Verify both ports freed:
ssh 192.168.68.57 'ss -tlnp 2>/dev/null | grep -E ":(8766|9876)" || echo "freed"'
ssh titan 'ss -tlnp 2>/dev/null | grep -E ":9876" || echo "freed"'
```

### 2B — Scope the Titan Ollama decision

MEMORY's session-67 `Ollama is superseded by llama-server` is currently written as a global decision but only Panda enforces it. Titan's Ollama container is Up 11 days and **actively spawning runners** (PID 111314 at 18:40 today).

```bash
# Who's calling Titan:11434 in the last hour?
ssh titan 'ss -tnp 2>/dev/null | grep ":11434" | head -10'
ssh titan 'docker logs ollama --since 1h 2>&1 | grep -iE "(POST|GET).*generate|chat" | tail -10'
```

- **If Titan Ollama has callers:** rewrite the MEMORY decision: `Ollama retired on Panda; Titan retains for [role]`.
- **If no callers:** `ssh titan 'docker stop ollama && docker rm ollama'` and leave systemd service disabled.

### 2C — Audit her-os-neo4j on Titan

MEMORY says Context Engine uses Postgres + Qdrant for BM25/vector. Neo4j is Up 3 weeks, serving `:17474/:17687`, unreferenced in current architecture.

```bash
ssh titan 'docker logs her-os-neo4j --since 3d 2>&1 | tail -40'
ssh titan 'docker exec her-os-neo4j cypher-shell -u neo4j -p <pass> "MATCH (n) RETURN count(n) LIMIT 1"'
```

- **If empty / no writes:** retire, free ~300 MB RAM + disk I/O.
- **If populated:** document intent. Add to the "active infrastructure" list in MEMORY's Current Phase block.

### 2D — Delete merged feature branches

```bash
# Laptop
git branch -d streaming-stt-bargein-20260415 parakeet-stt-bench-20260415 fix/panda-llamacpp-healthcheck fix/speaker-gate-empty-buffer-guard
# Remote
gh api -X DELETE repos/myidentity/her-os/git/refs/heads/streaming-stt-bargein-20260415
gh api -X DELETE repos/myidentity/her-os/git/refs/heads/parakeet-stt-bench-20260415
gh api -X DELETE repos/myidentity/her-os/git/refs/heads/fix/panda-llamacpp-healthcheck
gh api -X DELETE repos/myidentity/her-os/git/refs/heads/fix/speaker-gate-empty-buffer-guard
```

### 2E — Decide on 2 untracked benchmark files on Panda

`scripts/benchmark_gemma4_e4b_nav_panda.py` + `scripts/benchmark_indic_asr_results.json` have been untracked for days. Either commit them (if useful reference benchmarks) or add to `.gitignore` (if throwaway).

## Priority 3 — Quality-of-life (opt-in)

### 3A — Enable branch-CI

PRs #5/#6/#7 all merged with `no checks reported`. Create `.github/workflows/pytest.yml` that runs `services/annie-voice/tests/` + `services/audio-pipeline/tests/` + `scripts/tests/` on every PR.

### 3B — Barge-in-specific live call

Session 115's live call validated STT/TTS/LLM. It did NOT exercise mid-sentence interrupts, so the `d4bf169` `cancel_guard_set()`-before-`kill_playback` fix hasn't been validated under real audio. Plan: call, speak, let Annie start answering, interrupt her mid-sentence. Confirm via `tail /tmp/phone-auto.log | grep -iE '(bargein|cancel_guard)'` that both log lines fire and the cancel HTTP dispatches to Chatterbox.

### 3C — Speaker-gate live verification

Session 116's PR #7 stopped the 400 noise but didn't confirm the gate actually *rejects* non-enrolled speakers. Plan: have someone other than Rajesh speak during a call. Expected: transcription is dropped silently, Annie does not respond. If the gate fails-open for an unknown speaker, look at `audio-pipeline` logs for similarity scores and the `0.38` threshold.

## Priority 4 — Strategic (only if triggered by real product need)

- **Stage nemotron-streaming STT sidecar on Panda** — model downloaded, Phase 1 code shipped in PR #5 but systemd wiring + `PhoneSTT.stream_ws()` integration unstaged. Requires ~3.5–4 GB VRAM; Panda has 5.5 GB free. Worth it if STT latency is a pain point; skip otherwise.
- **Swap E2B → E4B nav VLM on Panda** — only if E2B quality issues surface in real use. Pre-computed capacity math is in MEMORY's session-116 block: retire E2B frees 3.24 GB, E4B Q4_K_M needs 4.76 GB, net +4 GB headroom post-swap. Don't swap preemptively.

## Files touched by this session (expected)

| File | Action |
|------|--------|
| `MEMORY.md` | EDIT — strike through completed items, update Parakeet verdict after reconciliation |
| `start.sh` | POSSIBLY EDIT — remove `PARAKEET_URL` if Parakeet retired; no change otherwise |
| `.github/workflows/pytest.yml` | POSSIBLY WRITE — if enabling branch-CI |
| `.gitignore` | POSSIBLY EDIT — if Panda benchmark files are deemed throwaway |

## Start command

```
Read /home/rajesh/workplace/her/her-os/docs/NEXT-SESSION-117-PARAKEET-RECONCILE-AND-CLEANUP.md end-to-end.
Work Priority 1 (Parakeet/Whisper reconcile) first — it's load-bearing.
Then Priority 2 items in order — each is independent, 5-10 min each.
Priorities 3 and 4 are opt-in; only do them if the user asks.
Every deploy uses laptop-local ./stop.sh && ./start.sh — never manual SSH with reconstructed env vars.
```

## Banned actions

- `git pull --rebase`, `git reset --hard`, `git push --force` on any branch
- Any SSH to hostname `panda` — use literal IP `192.168.68.57`
- Preemptive swaps (E2B→E4B, Whisper→Parakeet, Chatterbox moves) without observed pain
- Starting a new PR without merging the priority-1 decision first (keeps MEMORY aligned)

## Verification (end-of-session)

1. **MEMORY/reality parity check** — `grep -iE "parakeet|whisper" MEMORY.md | head -5` agrees with `ssh 192.168.68.57 'ps -ef | grep -iE "parakeet|whisper" | grep -v grep'`.
2. **No listening-port orphans** on Panda (`:8766` free) or Titan (`:9876` free).
3. **Ollama scope** — MEMORY's decision line matches live state on each machine individually.
4. **Fleet health** — `./start.sh check` shows green for every service.
5. **MEMORY session-117 block** prepended above session-116 with what changed + any new follow-ups.

## Related

- **Session 116 handoff (original):** `docs/NEXT-SESSION-PR5-OPEN.md` (open-PR task; superseded by session 115.5 plan, executed in session 116)
- **Session 116 merged PRs:** #4, #5, #6, #7 (all on `main`)
- **Session 116 MEMORY block:** authoritative for current fleet state