# Next Session: WhatsApp Image Search & Send

## What

Enable Annie to search for and send images via WhatsApp Web. Currently Annie can only send text — when asked for pictures, the LLM hallucinates ("I'll find some nice photos and send them") but has no image capability. This adds: regex-based image intent detection → SearXNG image search → download+validate+convert → WhatsApp Web file attachment via Playwright → send with LLM-generated caption. Falls through to normal text response on any failure step.

## Plan

`~/.claude/plans/magical-knitting-sutherland.md`

Read the plan first — it has the full implementation, all 25 adversarial review findings with resolutions, state machine diagram, pre-mortem analysis, and code for every step.

## Key Design Decisions (from adversarial review)

1. **Heuristic intent, NOT tool calling** — Gemma 4 26B's tool calling is unreliable at 200 max tokens. Regex detects image requests, dedicated pipeline handles the rest.

2. **page_lock held only during DOM ops (~3-5s)** — NOT during search/download/caption. Reviewers caught that the original plan held lock for 10-30s, starving the poll loop. Search, download, and caption generation are all lock-free.

3. **SSRF: follow_redirects=False + validate each hop** — Reviewers caught that `follow_redirects=True` defeats the initial URL check. Manually handle 3xx responses, validate each `Location` header against the full private IP blocklist (including CGNAT, link-local, IPv6).

4. **Single `_navigate_to_chat_unlocked()` helper** — Reviewers caught that `send_image` would create a 4th copy of navigation logic (+ reentrant deadlock on page_lock). Extract one helper, all callers use it.

5. **No fake `input[type=file]` fallback** — Reviewers confirmed modern WhatsApp Web dynamically injects and removes file inputs. The fallback would always fail silently. If `expect_file_chooser` fails, return False immediately.

6. **Selectors scoped to preview container** — `media_caption_box` and `media_send_button` MUST be scoped to `[data-testid="media-editor"]` or equivalent. Unscoped selectors match the main compose box, sending caption as plain text.

7. **`_dismiss_attach_menu(page)` in finally block** — After any failure in `send_image`, press Escape to dismiss open menus/dialogs, verify compose box is focused. Without this, the page is stuck in a modal state and all subsequent sends fail.

8. **Pillow validation + JPEG conversion** — Reject images < 200x200px (thumbnails), < 10KB (error pages), skip stock photo domains. Convert all images to JPEG before sending (strips alpha, resize if >1920px).

9. **Prompt injection prevention** — Caption prompt uses f-string with pre-sanitized query (braces stripped), NOT `.format()`. Reviewers caught that `{guard}` in user queries would crash or expand the safety guard.

10. **try/finally cleanup on ALL paths** — `image_path.unlink(missing_ok=True)` in `finally` block of `_handle_image_request`. Startup sweep deletes files >1h old. Reviewers elevated /tmp leak from LOW to MEDIUM risk.

11. **Startup SearXNG probe** — Validate image search returns results on agent startup. Set `_image_search_available` flag. Without this, zero results is indistinguishable from "no matching images" (silent misconfiguration).

12. **CSS injection fix** — `_escape_css_string()` helper for `span[title="{chat_name}"]`. Fixes existing open TODO from MEMORY.md.

## Files to Modify

1. `services/annie-voice/searxng/settings.yml` — Add `brave.images` engine for image category
2. `services/whatsapp-agent/image_search.py` — **CREATE**: SearXNG image search + download + SSRF + Pillow validation (~150 lines)
3. `services/whatsapp-agent/wa_web.py` — 5 new selectors, `_navigate_to_chat_unlocked()`, `_escape_css_string()`
4. `services/whatsapp-agent/wa_sender.py` — `send_image()`, `_dismiss_attach_menu()`, refactor `send_message()` to use shared nav
5. `services/whatsapp-agent/trigger.py` — `detect_image_intent()` with false-positive exclusion
6. `services/whatsapp-agent/responder.py` — `generate_image_caption()` (max_tokens=50), system prompt update
7. `services/whatsapp-agent/agent.py` — `_handle_image_request()`, startup validation/cleanup, test endpoint
8. `services/whatsapp-agent/config.py` — Image constants with env overrides
9. `services/whatsapp-agent/wa_observability.py` — Image search/send events
10. `services/whatsapp-agent/tests/test_image_search.py` — **CREATE**: SSRF, validation, retry tests (~150 lines)
11. `services/whatsapp-agent/tests/test_image_intent.py` — **CREATE**: Intent detection tests (~100 lines)
12. `services/whatsapp-agent/tests/test_wa_sender.py` — Image attachment flow tests (+60 lines)

## Start Command

```bash
# Read the reviewed plan first
cat ~/.claude/plans/magical-knitting-sutherland.md

# Then implement in order:
# Phase 1: SearXNG settings.yml (brave.images engine)
# Phase 2: image_search.py + tests
# Phase 3: wa_web.py selectors + wa_sender.py send_image + tests
#   IMPORTANT: Run /v1/debug/attach-selectors on Panda FIRST to discover live selectors
# Phase 4: trigger.py + responder.py + agent.py wiring + tests
# Phase 5: observability
```

## Verification

1. `cd services/whatsapp-agent && python3 -m pytest tests/ -v` — all existing + new tests pass
2. SearXNG: `curl "http://192.168.68.52:8888/search?q=sunset&categories=images&format=json"` — non-empty `img_src` URLs
3. Deploy to Panda, restart agent
4. Send from Pixel: "Annie send me a picture of Skandagiri trek" → image appears with caption
5. Send "the picture frame is broken" → normal text response, NO image (false positive test)
6. Send "Annie send me a picture of xyznonexistent12345" → text fallback (no results)
7. `curl panda:8780/v1/health` → stats updated
8. Verify `/tmp/annie-images/` is empty after sends
