# Next Session: Router Alerts — Implement Plan from Session 387

## What Was Done (Session 387)

### Adversarial Review + Plan Finalized
- **2 hostile reviewers** (architecture + code quality) found **16 issues** across the original plan
- **10 implemented** into the revised plan, 4 deferred (pre-existing bugs), 1 noted, 1 rejected
- Plan finalized at `~/.claude/plans/playful-wandering-puddle.md`

### Critical Discovery: `model_tier` Default Bug
- `AgentDefinition` (`agent_discovery.py:72`) and scheduler (`agent_context.py:924`) both default `model_tier` to `"nano"`
- Session 386's removal of `model_tier: nano` from `network_anomaly.yaml` and `proactive-triage.yaml` was a **NO-OP** — those agents are still running on Nano
- **Rajesh confirmed: change default to "super"** — all agents use Beast, automatic Nano fallback via `_get_client_for_tier()`

### Titan Health (verified)
- Collector healthy, snapshots every 5 min, CPU 77°C, 0 active alerts
- All 3 critical devices (Titan/Beast/Panda) online
- 12 agents running, 8 scheduled jobs

## The Plan (3 Items)

### Item 1: Change default model_tier "nano" → "super"

**Changes:**
1. `services/annie-voice/agent_discovery.py:72` — `model_tier: str = "nano"` → `"super"`
2. `services/annie-voice/agent_context.py:924` — `spec.metadata.get("model_tier", "nano")` → `"super"`
3. Remove `model_tier: nano` from repo test fixtures:
   - `services/annie-voice/test_agents/orchestration_heartbeat.yaml` (line 6) — also update description
   - `services/annie-voice/test_agents/orchestration_summarize.yaml` (line 6) — also remove "(Nano)" from description
4. SSH to Titan — remove `model_tier: nano` from 3 deployed YAMLs:
   - `~/.her-os/annie/agents/orchestration_heartbeat.yaml`
   - `~/.her-os/annie/agents/orchestration_summarize.yaml`
   - `~/.her-os/annie/agents/update-claude-code.yaml`

**Note:** `budget: nano` on heartbeat is fine — `budget` controls token limits, NOT model selection.

**Impact:** All 12 agents on Titan get Super on Beast. Fallback chain `super → nano` protects against Beast being down. No agents need Nano pin (all background/cron).

### Item 2: Telegram Network Query Speedup (~7s savings)

**Problem:** Regex already identifies `network_status` intent, but LLM still called (~7s) just to generate trivial args. Total: 15-17s per query.

**Solution (4 steps):**

**Step 2a:** Add `classify_network_query(user_message) -> dict | None` in `router_monitor.py` (~40 lines)
- Keyword → query-enum table (devices/speed/bandwidth/server_bandwidth/isp_report/alerts/overview)
- Days extraction with `min(days, 90)` cap
- Device-reference detection (`\bfor\s+\w+\b` or possessive) → returns `None` → LLM handles it
- Co-located with the tool (review finding: avoids parallel regex lists in text_llm.py)

**Step 2b:** Bypass in `text_llm.py` stream_chat round 0 (~20 lines)
- Before `client.chat.completions.create()`, call `classify_network_query(user_message)`
- If non-None, construct synthetic `SimpleNamespace` response (same pattern as streaming path lines 1337-1351)
- Skip LLM call #1 — tool executes, result goes to LLM call #2 for natural language formatting
- Direct `if tool_name == "network_status"` check (no module-level dispatch dict — avoids import-time poisoning)

**Step 2c:** Tests in `tests/test_text_llm.py` (~50 lines)
- `TestNetworkArgDerivation`: 11+ cases including device-reference fallback and days cap

**Step 2d:** Integration test in `tests/test_router_e2e.py` (~30 lines)

### Item 3: End-to-End Alert Test

SSH to Titan with atomic rollback:
1. Set `ALERT_CPU_TEMP_C=70` (router runs at ~77°C)
2. Wait for next collection cycle (~5 min)
3. Check `alert_state.json` for cpu_temp alert
4. Verify Telegram delivery
5. Revert to `ALERT_CPU_TEMP_C=85` (in try/finally)

## Key Files

| File | What |
|------|------|
| `services/annie-voice/agent_discovery.py:72` | `model_tier` default |
| `services/annie-voice/agent_context.py:924` | `model_tier` scheduler default |
| `services/annie-voice/router_monitor.py` | Add `classify_network_query()` |
| `services/annie-voice/text_llm.py:1250-1300` | Bypass logic in stream_chat |
| `services/annie-voice/test_agents/*.yaml` | Test fixture YAMLs |
| `services/annie-voice/tests/test_text_llm.py` | Arg derivation tests |
| `services/annie-voice/tests/test_router_e2e.py` | Bypass integration test |
| `~/.claude/plans/playful-wandering-puddle.md` | Full plan with review summary |

## Review Findings Already Baked Into Plan

These are addressed — don't re-solve them:
- Two parallel regex lists → classifier in router_monitor.py (not text_llm.py)
- `device_name` dropped → device-reference detection returns None
- `days` unbounded → capped at 90
- Import-time `_ARG_DERIVERS` poisoning → direct if-check instead
- YAML descriptions stale → updated when editing
- E2E test no rollback → try/finally pattern
- Claude path not wired → deferred, add comment only

## First Commands

```bash
# Read the full plan
cat ~/.claude/plans/playful-wandering-puddle.md

# Verify Titan is healthy
ssh titan "pgrep -af router_collector && tail -3 ~/.her-os/annie/router/collector.log"
```

## Prompt

Continue router alerts implementation from `docs/NEXT-SESSION-ROUTER-ALERTS-2.md`. The plan at `~/.claude/plans/playful-wandering-puddle.md` was adversarially reviewed (16 findings, 10 implemented) and approved by Rajesh. Implement all 3 items, run tests, deploy to Titan, and verify. Start with Item 1 (model_tier default change) since it's simplest and independent.
