# Next Session: Coffee Fragility Fixes — Post-Deploy Followups

## Context (read this first)

The 6 coffee fragility fixes from `~/.claude/plans/coffee-fragility-fixes.md` were **shipped + deployed** in session 128 (2026-04-19) on commits `e6502ad` (coffee fixes) and `ba8cb16` (IndicF5 flag-gate). Annie / context-engine / telegram-bot all restarted; Titan HEAD matches `ba8cb16`.

**Session 129 update (2026-04-19 tail):** Priority 0 was executed — smoke-test mutation reverted via stdin-piped `UPDATE`. Then a second live coffee-order attempt at 16:19 IST revealed a **new root cause (#4 in `project_annie_fragility_root_causes.md`)**: the coffee tool was lying to Annie about outcomes — Blue Tokai API returned `"Updated successfully"` but a post-API `Page.goto` timed out, and the outer try/except caught it as generic failure. Two duplicate orders landed (installments 15 & 16 both set to 20/04/2026). **Fixed + deployed in commit `8b2a879`** on top of `ba8cb16`. Titan HEAD now `8b2a879`. `./start.sh annie` refused auto-pull because Titan has 8 untracked files (build artifacts); used `HER_OS_NO_SYNC=1` after manual `git pull --ff-only`. That friction remains for the next session (see Priority 4 below + the related gitignore note).

**MANDATORY READ before doing anything:** the topic memory file
`~/.claude/projects/-home-rajesh-workplace-her-her-os/memory/project_coffee_fragility_fixes_impl.md`
— it lists the 9 load-bearing design decisions you MUST NOT revert,
the deploy state, and the smoke-test side-effect that produced the
top item below.

The verification §4–10 from the plan is **partially complete**:
- §4 regex apostrophe — verified live on context-engine ✅
- §5 past-date — needs Telegram interaction
- §6 duplicate dispatch — needs Telegram interaction
- §7 cheese routing — pattern refresh fired (42 terms), needs
  Telegram interaction to prove dispatch
- §8 layered login diagnosis — flag still OFF
- §9 alert poller end-to-end — poller running, no synthetic alert
  injected yet
- §10 CRITICAL drop grep — flag still OFF, can't measure

---

## Priority 0 — revert the smoke-test mutation (production data)

**The bug I created during deploy verification.** My bare-PATCH curl
to test the widened regex included `{"status":"fulfilled"}` in the
body. Before the regex widening it would have 422'd; after, it
succeeded, and `promise:annie's order placement` is now stuck at
`status=fulfilled, fulfilled_at=2026-04-19T01:11:03+00:00`. The
2026-04-18 coffee order never actually placed, so this is
**semantically wrong**.

User has three choices (ask if unsure — but **prefer revert**):

1. **Revert (preferred).** Restores pre-mutation non-terminal state:
   ```bash
   ssh titan 'docker exec context-engine-postgres-1 psql -U postgres -d her_os -c \
     "UPDATE entities SET properties = properties - '\''status'\'' - '\''fulfilled_at'\'' \
      WHERE entity_id = '\''promise:annie'\''\'\'''\''s order placement'\'' \
      RETURNING entity_id, properties->>'\''status'\'' AS status;"'
   ```
   This will hit the safety hook (UPDATE on shared DB). Either
   approve interactively or set the requested permission rule.

2. **Mark broken** via the public API — most honest outcome, no
   sudo / hook escalation:
   ```bash
   ssh titan 'set -a && source ~/workplace/her/her-os/services/context-engine/.env && set +a && \
     curl -s -w "\nHTTP=%{http_code}\n" -X PATCH \
       -H "X-Internal-Token: $CONTEXT_ENGINE_TOKEN" \
       -H "Content-Type: application/json" \
       -d "{\"status\":\"broken\",\"fulfilled_evidence\":\"Coffee order failed 2026-04-18; promise marked broken on 2026-04-19 to clean up after smoke-test mutation in session 128\"}" \
       "http://localhost:8100/v1/promises/promise:annie%27s%20order%20placement"'
   ```
   Note the `Content-Type` header (the patch body is JSON).

3. **Leave fulfilled** — only if user confirms the underlying order
   actually completed at some point.

After whichever option, re-query to confirm:
```bash
ssh titan 'docker exec context-engine-postgres-1 psql -U postgres -d her_os \
  -c "SELECT entity_id, properties->>'\''status'\'' AS status, properties->>'\''fulfilled_at'\'' AS fulfilled_at FROM entities WHERE entity_id LIKE '\''promise:annie%''\'' ORDER BY last_seen DESC LIMIT 3;"'
```

**Do NOT use the bare PATCH endpoint for any future regex smoke
tests.** Use a guaranteed-nonexistent ID like
`promise:smoke-test-$(uuidgen)` and assert HTTP 404 (regex passed,
entity not found). 200 = you mutated something.

---

## Priority 1 — Telegram-loop verifications (§5/6/7 from plan)

These need Rajesh in the loop or scripted via the Telegram API. The
shortest path is Rajesh sends three messages from his phone and you
read back the response from `/tmp/telegram-bot.log` or the chat
history. **Capture the exact response strings** for the topic memory
update at the end of the session.

1. **Fix 2 past-date** — tell Annie via Telegram:
   `order me coffee for 01/01/2000`
   Expected response **must contain verbatim**:
   `VALIDATION_ERROR: date '01/01/2000' is in the past`
   …and refuse to schedule. If Annie paraphrases ("That date already
   passed!"), the system-prompt passthrough rule isn't being
   surfaced — investigate `_VALIDATION_ERROR_PASSTHROUGH` in
   `services/annie-voice/text_llm.py`.

2. **Fix 6 re-entry guard** — send `order coffee` twice within ~2
   seconds. The second response should contain:
   `VALIDATION_ERROR: Duplicate dispatch suppressed (5s window).
   If this was intentional, add 'try that again' to your request.`
   If the second call also runs the full Blue Tokai flow, the guard
   isn't wired — verify `_dispatch_tool` in `text_llm.py` routes
   through `_guarded_dispatch`.

3. **Fix 1 cheese routing**. Three messages, distinct expected
   routes:
   - `order me some bocconcini` → routes to `order_from_cremeitalia`
   - `order me cheese pizza` → routes to LLM (no forced tool)
   - `order coffee from cremeitalia` → routes to
     `schedule_coffee_delivery` (coffee precedence test)
   Check `/tmp/annie-voice.log` for the `Dispatch:` line that names
   the tool.

If any of these fail, do NOT change the design — investigate the
specific call site. The 9 load-bearing decisions in the topic memory
file explain why each piece is shaped the way it is.

---

## Priority 2 — Synthetic alert E2E (§9 from plan)

The repeat-failure tracker + alert_poller is wired (telegram-bot log
shows `Alert poller started (every 30s)`), but no real failure has
happened yet so the path is unproven. Inject a synthetic alert and
watch it land:

```bash
ssh titan 'mkdir -p ~/.her-os && cat >> ~/.her-os/alerts.jsonl <<EOF
{"ts": '$(date +%s)', "signature": "synthetic|/test|500", "count": 3, "first_seen": '$(date +%s)', "last_seen": '$(date +%s)', "detail": "synthetic test from session 129 followup"}
EOF'
```

Within 30s, Rajesh should receive a Telegram message starting with
`⚠️ Repeat failure: synthetic|/test|500`. Then verify the
checkpoint advanced:
```bash
ssh titan 'cat ~/.her-os/alerts.checkpoint'   # should be a byte offset matching the file size
```

If the alert doesn't arrive, check `/tmp/telegram-bot.log` for
`Alert delivery failed` entries (chat-id misconfig, send_message
exception). If it arrives but checkpoint doesn't advance, a
`Checkpoint write failed` line will be in the log.

---

## Priority 3 — flag flip after burn-in

The plan mandates burn-in before flipping the two feature flags.

**24h after deploy (i.e. after 2026-04-20T06:36 IST)**:

Flip `OBS_PRIORITY_CLASSES=true`:
1. Check the alert file isn't already drowning Rajesh:
   `ssh titan 'wc -l ~/.her-os/alerts.jsonl 2>/dev/null'` — if it's
   over ~30 lines in 24h, **truncate** to last 24h before flipping
   anything else (the plan's circuit breaker — repeat-failure backfill
   is a known risk now that Fix 4A unblocked the 422s).
2. Add `OBS_PRIORITY_CLASSES=true` to `~/workplace/her/her-os/.env`
   on Titan (or wherever the Annie env file lives — check `start.sh`
   `start_annie()` body for the source).
3. `./stop.sh annie && ./start.sh annie` from laptop.
4. Confirm new log line:
   `Observability loops started — 3 priority queues + disk overflow`
5. After 1h, grep for drops:
   `ssh titan 'grep -E "dropping low|dropping high|CRITICAL OBSERVABILITY DROP" /tmp/annie-voice.log | tail -20'`
   - Low drops acceptable (high traffic / metric flood).
   - High drops: investigate which event types are saturating.
   - CRITICAL drops: should be ZERO. If non-zero, immediately set
     `OBS_PRIORITY_CLASSES=false`, restart, file a bug.

**24h after the OBS flip is clean**:

Flip `COFFEE_LAYERED_LOGIN=true` the same way. There's no log line
to assert directly — first real Blue Tokai login attempt with an
expired session will produce a `selkie/diagnosis` event with the
6-flag dict in the data field. Watch for the next coffee-order
attempt.

If `_diagnose_login` returns `crash_note` for the first real attempt,
flip the flag back off and investigate Blue Tokai DOM (it changes
sometimes; selectors may have rotted). Memory note in
`feedback_browser_agent_lessons.md` recommends a fetch()-based
rewrite as the long-term answer.

---

## Priority 4 — clean up Titan settings.yml runtime contamination

`services/annie-voice/searxng/settings.yml` on Titan has a
runtime-injected real Brave API key (the docker entrypoint substitutes
`ENV_BRAVE_API_KEY` → real value at first run, and the result is
written back to disk because the file is bind-mounted). It's owned
by uid 977 (SearXNG container user), can't be rewritten by Rajesh
without sudo, and is currently masked from git via
`git update-index --assume-unchanged` (set in session 128).

**Why this matters:** any future `git stash` / `git checkout` on
that file fails, and the key is in plaintext on a tracked file path
(only safe because git ignores it now).

Two cleaner alternatives — pick one:

A. **Move substitution to read-time.** Change
   `services/annie-voice/searxng/settings.yml` to keep
   `api_key: ENV_BRAVE_API_KEY` as the literal placeholder. Patch
   the docker-compose entrypoint to write a copy to `/tmp/settings.yml`
   inside the container with the real key, and start SearXNG from
   that copy. The bind-mounted source stays clean.

B. **Mount the file read-only with a separate writable overlay.**
   Bind-mount the placeholder version r/o, and let entrypoint write
   the substituted version to a container-local volume. Same effect.

Either is a small change. Do this BEFORE the next time `start.sh`
auto-pull refuses on Titan for the same reason.

While you're there, look at the 8 untracked files on Titan
(`scripts/benchmark_*`, `services/annie-voice/client/dist/assets/*`,
`services/annie-voice/searxng.bak/`, `services/context-engine/.env.backup-20260321`,
`unsloth_compiled_cache/`). Most are build artifacts or backups —
add the obvious ones to `.gitignore` (especially
`services/annie-voice/searxng.bak/` and the dist assets).

---

## Out of scope — DO NOT pull these in

These are real follow-up tickets but each one wants its own
adversarial-review session, not a hot-tack onto this followup.

- **Wonder/Comic `Image_process_failed` root cause.** Fix 4 surfaces
  it as a repeat-failure alert; the actual rendering failure is a
  separate investigation. Don't try to fix it here — you'll know it
  exists when the alert arrives.
- **`emotional-peak` 500 root cause.** Same shape. Fix 4D
  instruments it; root cause is a separate ticket.
- **Blue Tokai direct `fetch()` rewrite.** Fix 3 hardens the
  Playwright path with diagnostics; replacing the path entirely
  needs a separate plan + adversarial review (memory note
  `feedback_browser_agent_lessons.md` flags it as the long-term
  right answer).
- **Phone + WhatsApp parity.** Three named handoff docs already
  exist:
  - `docs/NEXT-SESSION-PHONE-ORDERING-PARITY.md`
  - `docs/NEXT-SESSION-PHONE-OBSERVABILITY-PARITY.md`
  - `docs/NEXT-SESSION-WHATSAPP-ORDERING.md`
  Each one is its own session.
- **`promise:annie's order placement` provenance.** We still don't
  know what created that promise originally. Investigation is
  separate from the cleanup in Priority 0.
- **Pre-existing 141 annie-voice test failures** (indicf5 imports,
  blackwell_patch signature, code_tools sandbox, go_home adapter,
  pytz/playwright missing). All pre-date session 128. Each one is
  its own micro-fix.

---

## Verification checkpoint before closing the session

Mirror the discipline of `feedback_verify_titan_at_head.md`:

```bash
ssh titan 'cd ~/workplace/her/her-os && git log --oneline -1'
# Should match laptop HEAD. If you commit anything new in this session, push + verify match again.
```

Update the topic memory file
`~/.claude/projects/-home-rajesh-workplace-her-her-os/memory/project_coffee_fragility_fixes_impl.md`
with the verification results (which §s passed, which failed, what
flag flips landed, mutation cleanup outcome).

---

## Tone reminder

Per `user_overwhelm_from_perspectives.md` and
`user_motivation_learning_and_autonomy.md`: be honest about what
worked, name what didn't, frame open follow-ups as "interesting to
try" not "blocking the roadmap." If a verification fails, name the
specific surface you'd inspect next, don't pile alternatives.

Per `feedback_no_multi_option_menus.md`: if Rajesh has expressed a
clear outcome (e.g. "revert the mutation"), execute — don't ask at
each step.
