# Next Session 124: Deployment Safety + Narrate Research Progress

## Session 123 Recap (what's deployed, what's working)

Five commits shipped and E2E-verified on Titan:
- `34b37a8` — vLLM `--reasoning-parser gemma4` + bare `<|channel>` fragment strip in `tts_text_clean.py` (6 new tests)
- `8bc5e67` — Titan Chatterbox retired (`chatterbox_titan_shim.py` deleted, `start_titan_chatterbox`/`stop_titan_chatterbox` removed). Panda Chatterbox (:8772) preserved. Freed 3.2 GB VRAM + 6 GB system RAM.
- `d632394` — sub-agent tool timeout 10s → 120s on `invoke_researcher`, `invoke_memory_dive`, `invoke_draft_writer`
- `0c6e954` — resume-on-reject StartFrame (INCOMPLETE — superseded by e8efeae)
- `e8efeae` — resume-on-reject uses `TTSSpeakFrame` (the real fix for the audio-context teardown race)

Plus one workspace edit (NOT in git — per-user data):
- `~/.her-os/annie/RULES.md` v1 → v2 on Titan: removed strict "MAXIMUM 2 sentences" rule. Backup at `RULES.md.bak`.

**E2E confirmation from session 123 end:** YouTube intruder voice leaked at 04:23:17 → Speaker gate rejected sim=0.109 → ResponseTracker resumed 117 chars → Kokoro delivered audio (TTFB 247ms) → zero `unable to append audio to context` warnings → Annie finished cleanly. Barge-in + intruder-reject + resume is working end-to-end.

## Context for next session

Read before starting:
- `MEMORY.md` session 123 section — has the full causal chain across the session (4 bugs all caught by E2E, each informing the next fix)
- `feedback_verify_titan_at_head.md` — the deployment gotcha that cost ~1 hour today (the motivation for Issue 1 below)
- `infra_vllm_gemma4_reasoning_parser.md` — vLLM reasoning-parser gotcha
- Current HEAD on Titan: `e8efeae` — verify with `ssh titan "cd ~/workplace/her/her-os && git log --oneline -1"`

## Issue 1: Deployment Safety in `start.sh` (PRIMARY — 20-30 min, high leverage)

### Problem

Session 123 lost ~1 hour to stale-code restarts. I committed+pushed 5 fix commits, restarted Annie 3 times, and the user retested 3 times hearing the same bugs — because `./start.sh annie` blindly restarts the Python service using whatever's on Titan's disk, regardless of whether it matches `origin/main`. User sees "fix didn't work" when the truth is "fix not deployed."

`./start.sh annie` runs from laptop but SSHes into Titan to spawn the Python service from Titan's local filesystem. So `restart ≠ redeploy`.

### Design (sketch — validate during implementation)

Add a pre-start check to `start_annie` (and probably `start_all_titan`) that:

1. Runs `ssh titan "cd $TITAN_PROJECT && git fetch && git rev-parse HEAD"` and compares against `git rev-parse origin/main` from laptop.
2. If Titan HEAD != origin/main:
   - **Option A (auto-pull, safer default):** `ssh titan "cd $TITAN_PROJECT && git pull"` then proceed. Fail loudly on merge conflicts.
   - **Option B (warn-and-abort):** print `"⚠ Titan is N commits behind origin/main. Run: ssh titan 'cd $TITAN_PROJECT && git pull'"` and `exit 1`.
   - **Option C (interactive prompt):** default to auto-pull but let user `--no-pull` override.
3. Same check for Panda if any Panda-side service needs it (phone stack). Lower priority since today's pain was Titan-specific.

**Recommended:** Option A by default with a visible log line (`→ Titan was 5 commits behind origin/main, pulled to HEAD e8efeae`). Fail loudly on conflicts — no silent magic.

### Edge cases to handle

- SSH timeout / Titan unreachable → surface the error clearly, don't silently fall through
- Dirty working tree on Titan (uncommitted changes) → pull will fail; report clearly, don't force
- Fast-forward vs merge → prefer `git pull --ff-only` to avoid accidental merges
- What if Titan is AHEAD of origin/main? (i.e. someone committed directly on Titan) → warn and abort, don't trash their work

### Files to modify

- `start.sh` — add a helper function `ensure_titan_at_head()` or similar. Call it from `start_annie` and `start_all_titan`. Keep other `start_*` functions unchanged initially (can extend later).
- Consider similar check in `start_llm` since `start.sh:~396-415` runs on laptop (vLLM flags are applied correctly even without Titan pull) — but the Annie Voice venv's Python code is the stale-risk path.

### Verification

1. Simulate stale state: `ssh titan "cd ~/workplace/her/her-os && git reset --hard HEAD~1"`, then `./start.sh annie`. Expect: warning + auto-pull back to HEAD, or clean abort with instructions.
2. Simulate Titan-ahead state (make a commit on Titan only): expect warning + abort, no silent overwrite.
3. Simulate SSH-down: `./start.sh annie` when Titan unreachable → existing `check_titan` already handles this, but verify the new code doesn't mask the error.
4. Normal case (HEAD matches): expect silent skip or one-line confirmation.

## Issue 2: Narrate Research Progress (STRETCH — 1 hour, UX win)

### Problem

Current slow-tool UX:
```
User: "Give me a detailed history of calculators."
Annie: "Starting some research on that."  ← 04:17:29
  [14 seconds of silence — feels frozen]
Annie: [research result]                   ← 04:17:43
```

The researcher sub-agent DOES emit progress every ~5s via `progress_files` (see `/tmp/annie-voice.log` for `Progress saved: research-<id> status=in_progress` entries). But nothing reaches Annie's voice.

### Design (sketch)

Wire the progress-file events to trigger `TTSSpeakFrame`s with brief milestone narrations. Ideas:
- "Still looking at mechanical calculators..." (after 5s)
- "Now checking the transistor era..." (after 10s)
- "Almost done, just pulling in modern architectures..." (after 15s, if still running)

The narrations should:
- Be short (1 sentence)
- Vary (not repeat the same phrase)
- Be generated by the researcher itself, not hardcoded (otherwise they'll be generic and unrelated to the actual query). The researcher's `progress_files` payload could include a `speak_me: str | None` field that the voice pipeline pulls out and TTS'd.

### Files to investigate

- `progress_files.py` — confirm its schema and where it's called from
- `handle_invoke_researcher` (where? probably in `agent_runner.py` or `agent_scheduler.py`) — where progress payloads are constructed
- `bot.py:1506` `on_tool_start` handler — current site of interim speech for slow tools. A parallel `on_tool_progress` handler could listen to progress files and inject `TTSSpeakFrame`s.

### Edge cases

- Rate-limit narrations: at most 1 per ~5s so Annie doesn't chatter over herself
- What if the tool finishes while a narration is playing? Skip the narration, deliver the result immediately
- Interruption: if the user speaks during a narration, the existing speaker-gate + resume machinery should handle it gracefully (the resume would pick up from wherever the narration was cut off — OR skip to the real result). Worth testing explicitly.

## Issue 3: Commit RESOURCE-REGISTRY.md (HOUSEKEEPING — 5 min)

Session 123 added a Change Log entry for the Titan Chatterbox retirement. The file is currently modified-but-uncommitted alongside pre-existing session-119 Beast/Orin NX hardware additions that were also never committed. Both batches are legitimate additions per the CLAUDE.md rule ("any GPU model change MUST update RESOURCE-REGISTRY.md").

**Action:** `git add docs/RESOURCE-REGISTRY.md && git commit -m "docs(registry): Titan Chatterbox retirement + Beast/Orin NX hardware sections"` and push. Review the diff first to make sure nothing unexpected.

## Open questions for the next session

- Should deployment-safety be auto-pull (default) or warn-and-abort? Ask user after showing the scaffolded code.
- Are there OTHER start.sh sub-services that run Titan-side Python (context-engine, dashboard)? If so, extend the check to them too — or keep it Annie-only initially and iterate.
- For narrate-research-progress: should narrations count against Annie's "don't talk over herself" rule? I.e., pause LLM response generation while a narration plays? Or let them overlap?

## Verification (end-of-session check)

```bash
# From laptop
cd ~/workplace/her/her-os

# 1. Full annie-voice test suite on Titan
ssh titan "cd ~/workplace/her/her-os/services/annie-voice && python -m pytest tests/ -x -q"
# Expect: ~2800 pass (2750 baseline + today's new tests)

# 2. Deployment-safety smoke test
ssh titan "cd ~/workplace/her/her-os && git reset --hard HEAD~1"
./start.sh annie
# Expect: warning OR auto-pull back to HEAD. NOT silent stale-code start.

# 3. E2E voice test at https://voice.her-os.app
# Try a research prompt: "Give me a detailed history of calculators"
# - If narrate-research-progress was implemented: expect 1-3 audio milestones during the 14s research window
# - If only deployment-safety: expect the existing flow (starting → silence → result) — confirm it still works
```

## Caveat

If session 124 hits its own class of unexpected bugs (as session 123 did), prioritize root-causing them over completing the stretch goal. Session 123's pattern — E2E test catches what unit tests can't — is worth repeating here. Implement Issue 1, restart Annie, test end-to-end by actually using her for 10 minutes, THEN start Issue 2.
