# Next Session: Zomato Token Self-Recovery — CONTINUE (Layers 2+3)

## Status
**Layer 1 DONE** (session 27). Layers 2+3 remain. Token is still 401 (server-side revoked).

## Step 0 Findings (from session 27)
- Token endpoint: `https://mcp-server.zomato.com/token` (from `.well-known/oauth-authorization-server`)
- Auth endpoint: `https://mcp-server.zomato.com/authorize`
- Register endpoint: `https://mcp-server.zomato.com/register` (always returns same client_id)
- Client ID: `fd37dd28-254b-42b7-a55a-c85369d625c8` (fixed/shared)
- Client Secret: `Z-MCP`
- PKCE: S256 required
- Grants: `authorization_code` + `refresh_token`
- **Refresh currently returns "Invalid code format"** — existing token has `scope: "offline openid"` (wrong). Proper PKCE flow with `scope=mcp:tools` may produce a refreshable token.

## What
Implement Layers 2+3 of token recovery. Also need fresh token via PKCE auth flow.

## Plan
`~/.claude/plans/recursive-meandering-kitten.md` — **Read this first.** It has the full implementation, all adversarial review findings, and design decisions. The plan is v2 (post-review).

## Key Design Decisions (from adversarial review)

1. **`_refresh_lock` on Layer 1** — Concurrent 401s from parallel Telegram callbacks would race on refresh without an asyncio.Lock. Double-check pattern: if `_client_cache is not None` after acquiring lock, someone else refreshed already.

2. **JSON-RPC auth error detection** — Zomato MCP may return HTTP 200 with `{"error": {"code": -32001, "message": "Unauthorized"}}` instead of HTTP 401. `_is_auth_error()` checks the JSON-RPC response, not just HTTP status. Without this, token expiry causes "No restaurants found" instead of triggering recovery.

3. **`ZomatoAuthError` propagation path** — Three functions (`_show_restaurants` line 605, `_show_reorder` line 635, `send_zomato_menu` line 597) catch `except Exception` which swallows `ZomatoAuthError`. Fix: `except ZomatoAuthError: raise` before `except Exception`. `send_zomato_menu` needs its own catch since it's called from `pending_handler`, not `handle_zomato_callback`.

4. **Layer 2 in telegram-bot with own Playwright** — NOT the annie-voice Playwright singleton. telegram-bot and annie-voice are SEPARATE OS processes. Layer 2 uses `browser.new_context()` (temp profile) to avoid profile lock conflicts. Secondary cleanup timeout: `asyncio.wait_for(browser.close(), 10)`.

5. **OTP handler separate from credential handler** — `handle_otp_text()` in `bot.py`, wired BEFORE `handle_credential_text()`. NOT inside `handle_credential_text()`. Avoids false-positive OTP consumption of regular numeric messages.

6. **Layer 3 = refresh only, no OTP relay** — Layer 3 (annie-voice cron) does NOT trigger the OTP flow. It tries refresh, and if that fails, sends a proactive pulse notification. This avoids hostile UX (5-min OTP timeout at 09:00). Layer 3 also makes a lightweight MCP call to detect server-side revocation (mtime check alone can't catch this).

7. **Cross-process communication** — Layer 3 (annie-voice) → telegram-bot via task_results JSON files (existing pattern). NOT via asyncio.Lock (doesn't work cross-process).

8. **Endpoint discovery is Step 0** — Try `GET .well-known/oauth-authorization-server` on the MCP server, then `zomato_client.json` for `token_url`, then standard `accounts.zomato.com/oauth/token`. If no endpoint exists, Layer 1 returns False always (graceful no-op).

## Files to Modify (in order)

### Step 0: Discover endpoint + re-auth (on Titan)
- Read `~/.her-os/annie/mcp-tokens/zomato_client.json` for config
- `curl https://mcp-server.zomato.com/.well-known/oauth-authorization-server`
- Try refresh_token POST manually

### Step 1: Layer 1 — Refresh Token
- `services/telegram-bot/zomato_handler.py` — `ZomatoAuthError`, `_refresh_lock`, `_is_auth_error()`, `_try_refresh_token()`, `_call_with_retry()`, fix `except Exception` in 3 functions, add reauth triggers in `send_zomato_menu` + `handle_zomato_callback`
- `services/telegram-bot/tests/test_zomato_handler.py` — ~18 new tests

### Step 2: Layer 2 — Telegram OTP Relay
- `services/telegram-bot/zomato_auth.py` (NEW) — `LocalhostOAuthCallback` (port from `scripts/prototypes/swiggy_prototype.py:147-220`), `start_reauth()`, `request_otp()`, `receive_otp()`
- `services/telegram-bot/bot.py` — `handle_otp_text()`, wire into `handle_text()`
- `services/telegram-bot/zomato_handler.py` — `_start_background_reauth()`
- `services/telegram-bot/tests/test_zomato_auth.py` (NEW) — ~12 new tests

### Step 3: Layer 3 — Proactive Monitoring
- `services/annie-voice/zomato_token_monitor.py` (NEW) — context source + health check
- `services/annie-voice/agents/zomato_token_check.yaml` (NEW) — cron 09:00 IST
- `services/annie-voice/agent_scheduler.py` — add `zomato_token_monitor` to `_KNOWN_REGISTER_MODULES`
- `services/annie-voice/tests/test_zomato_token_monitor.py` (NEW) — ~10 tests

### Step 4: Deploy
- `git commit` + `git push`
- From laptop: `./start.sh telegram` then `./start.sh annie`
- `ssh titan 'find ~/workplace/her/her-os/services -name __pycache__ -exec rm -rf {} +'`

## Start Command
```
cat ~/.claude/plans/recursive-meandering-kitten.md
```
Then implement the plan. All adversarial findings are already addressed in it.

## Verification
1. **Layer 1**: Set `os.utime(token_path, (0, 0))` to simulate expiry → order food → logs show "Token refreshed"
2. **Layer 1 JSON-RPC**: Verify `_is_auth_error()` detects `{"error": {"code": -32001}}`
3. **Layer 1 propagation**: Verify `ZomatoAuthError` from `_search_restaurants` → `_show_restaurants` → `send_zomato_menu` triggers reauth (not swallowed)
4. **Layer 2**: Mock-test OTP sync: `request_otp → receive_otp → value returned`
5. **Layer 3**: Check AgentScheduler logs for `zomato-token-check` registered
6. **E2E**: "order food from McDonald's" → full Telegram ordering flow succeeds
7. Run all tests: `pytest services/telegram-bot/tests/test_zomato_*.py services/annie-voice/tests/test_zomato_*.py -v`
