# Next Session: Coffee Fragility Fixes (6 fixes, adversarially reviewed)

## What
Execute `~/.claude/plans/coffee-fragility-fixes.md` — fixes six fragility bugs exposed by the 2026-04-18 coffee-order failure. Product-keyed cheese routing, past-date validation, layered browser-login diagnostics, promise-regex widening + repeating-failure alerts, priority-classed observability queue (sync-safe), and a tool-dispatch re-entry guard.

## Plan
`~/.claude/plans/coffee-fragility-fixes.md` — **read it first**. Contains the full implementation, adversarial review findings (3 CRITICAL + 12 HIGH + 11 MEDIUM + 3 LOW), and the response table showing how every CRITICAL/HIGH/MEDIUM is addressed.

## Key Design Decisions (from adversarial review — do NOT revert these)

1. **`emit_event` STAYS synchronous.** Do not add `await` inside it. CRITICAL overflow writes to `~/.her-os/annie-critical-overflow.jsonl` synchronously, then a drain loop reads it back. v1 plan had an `await` inside `def` — that was the review's top CRITICAL finding.

2. **Cheese pattern uses a mutable holder, never async-at-import.** Static allowlist (`_CHEESE_STATIC_TERMS`) is always live and compiled at module load. Optional `refresh_cheese_pattern_from_catalog()` runs at startup and every 30 min, mutating `_CHEESE_PATTERN_HOLDER[0]`. Do NOT call any `async def` from module-level list construction.

3. **No `priority=` kwarg on `emit_event`.** Priority is derived inside the function from `(creature, event_type)` via `_PRIORITY_OVERRIDES` table. All existing call sites work unchanged.

4. **Re-entry guard (Fix 6) is a first-class fix, not a state-diagram decoration.** Code lives in `services/annie-voice/tool_dispatch.py` with the `_DispatchGuard` class. 5-second window, SHA-1 hash of user message, `force=True` bypass.

5. **Cheese product extraction is curated + filtered, not catalog-firstword.** Static allowlist is the floor; catalog extension requires `len(token) >= 6`, `isalpha()`, NOT in `GENERIC_BLOCKLIST`. Negative lookahead blocks `"cheese pizza/bread/burger/…"` false positives.

6. **Repeat-failure alerts use a file, not cross-service Telegram.** `context-engine` and `telegram-bot` both write to `~/.her-os/alerts.jsonl`. Only `telegram-bot` (which owns the bot token) polls the file and delivers. No credentials cross service boundaries.

7. **Phone + WhatsApp are DEFERRED with named follow-up docs.** Verified by grep: neither imports `_detect_tool_choice` or the ordering tools. See `docs/NEXT-SESSION-PHONE-ORDERING-PARITY.md`, `docs/NEXT-SESSION-PHONE-OBSERVABILITY-PARITY.md`, `docs/NEXT-SESSION-WHATSAPP-ORDERING.md`.

8. **`VALIDATION_ERROR:` prefix is load-bearing.** The system prompt instructs the LLM to surface such strings verbatim. A golden test asserts middleware does not strip the prefix. If you "clean up" error messages, you will regress to today's state.

9. **Credential scrub is in code, not notes.** `_scrub_secrets(text, creds)` applied at every point where page content is captured into `diag`.

## Files to Modify (ordered per execution plan)

**Step 1 — Fix 4A (regex widening, unblocks 6 days of 422s):**
- `services/context-engine/main.py:2753` — widen PathParam regex.
- `services/context-engine/main.py:2801` — same widening on the `/notify` endpoint.

**Step 2 — Fix 2 (past-date):**
- `services/annie-voice/browser_agent_tools.py` — add `IST`, `MAX_COFFEE_FUTURE_DAYS`, `_parse_and_validate_date`, wire into `schedule_coffee_delivery`.
- `services/annie-voice/text_llm.py` — system prompt addition about `VALIDATION_ERROR:` passthrough.
- `services/annie-voice/tests/test_date_validation.py` — 6 tests.

**Step 3 — Fix 6 (re-entry guard, MUST land before Fixes 1/3):**
- `services/annie-voice/tool_dispatch.py` — add `_DispatchGuard` + `dispatch_tool_with_guard`.
- Wire in at current dispatch call site in `text_llm.py`.
- `services/annie-voice/tests/test_dispatch_guard.py` — 5 tests.

**Step 4 — Fix 1 (cheese routing):**
- `services/annie-voice/text_llm.py` — `_CHEESE_STATIC_TERMS`, `_compile_cheese_pattern`, `_CHEESE_PATTERN_HOLDER`, `_cheese_pattern`, `refresh_cheese_pattern_from_catalog`, modify `_detect_tool_choice`.
- `services/annie-voice/server.py` — startup hook for refresh task + 30-min loop.
- `services/annie-voice/tests/test_intent_routing.py` — 12 tests.

**Step 5 — Fix 5 (priority observability, feature-flagged):**
- `services/annie-voice/observability.py` — replace module with 3-queue design. Flag: `OBS_PRIORITY_CLASSES=true`.
- `services/annie-voice/tests/test_observability_priority.py` — 4 tests.

**Step 6 — Fix 3 (login diagnosis, feature-flagged):**
- `services/annie-voice/browser_agent_tools.py` — `_scrub_secrets`, `_diagnose_login`, replace the inline auto-login block. Flag: `COFFEE_LAYERED_LOGIN=true`.
- `services/annie-voice/tests/test_login_diagnosis.py` — 7 tests with local aiohttp fixture server.

**Step 7 — Fix 4 B/C/D (tracker + poller + instrumentation):**
- `services/context-engine/repeat_failure_tracker.py` — new file.
- `services/telegram-bot/alert_poller.py` — new file.
- `services/telegram-bot/bot.py` — wire `alert_poll_loop` into startup.
- `services/telegram-bot/promise_scheduler.py` — wrap 422 handler.
- `services/telegram-bot/scheduler.py` — wrap Wonder/Comic failures.
- `services/context-engine/main.py` — wrap `emotional-peak` 500 handler.
- 7 tests across both services.

**Step 8 — Deferral handoff docs (already created as part of this plan, verify present):**
- `docs/NEXT-SESSION-PHONE-ORDERING-PARITY.md`
- `docs/NEXT-SESSION-PHONE-OBSERVABILITY-PARITY.md`
- `docs/NEXT-SESSION-WHATSAPP-ORDERING.md`

## Start Command

```bash
cat ~/.claude/plans/coffee-fragility-fixes.md
```

Then implement step 1. All adversarial findings are already addressed in the plan — implement from the plan, do not re-brainstorm design decisions.

## Verification

Per-step verification gates in the plan (§ Verification). Before closing any step:

1. `ssh titan "git -C ~/workplace/her/her-os log --oneline -1"` matches laptop HEAD (per `feedback_verify_titan_at_head.md`).
2. `./stop.sh annie && ./start.sh annie`.
3. `ssh titan "tail -30 /tmp/annie-voice.log"` → assert expected startup lines.
4. Step-specific curl / Telegram interaction from plan § Verification #4–10.
5. All new tests pass: `cd services/annie-voice && .venv/bin/pytest tests/test_date_validation.py tests/test_dispatch_guard.py tests/test_intent_routing.py tests/test_observability_priority.py tests/test_login_diagnosis.py -v`.

## Circuit Breakers (if something goes wrong during exec)

- If a step breaks an unrelated test: STOP, do not proceed to next step.
- If Fix 5 causes any voice-latency regression after flag flip: set `OBS_PRIORITY_CLASSES=false` + restart.
- If Fix 3 starts returning `crash_note` for all login attempts: set `COFFEE_LAYERED_LOGIN=false` + restart + investigate Blue Tokai state.
- If the alert poller delivers 50+ alerts in the first hour (likely backfill of 6 days of rot): DON'T forward the flood to Rajesh. Truncate `~/.her-os/alerts.jsonl` to the last 24h of entries before enabling the poller.

## Acknowledged Scope Limits (do NOT expand)

- No rewrite of Blue Tokai flow as direct `fetch()` calls — separate plan.
- No Wonder/Comic `Image_process_failed` root-cause fix — Fix 4 surfaces it, fix is a separate ticket.
- No `emotional-peak` 500 fix — same as above.
- No phone/WhatsApp parity in this session — see the 3 deferral docs.
