# Next Session: WhatsApp Silent Observer Agent + Gemma 4 128K

**Date:** 2026-04-06
**Status:** NEEDS REVALIDATION — plan designed but codebase may have changed since

---

## What Happened

1. **WhatsApp installed + signed in** on Annie's Pixel 9a (session 440). Message sent+delivered via u2.
2. **SDK decision**: ADB-only, no business API (`docs/RESEARCH-WHATSAPP-SDK.md`).
3. **Silent Observer Agent fully researched** (`docs/RESEARCH-WHATSAPP-AGENT.md`) — 3 corrections from Rajesh's review applied to research doc.
4. **Gemma 4 128K context** requested by Rajesh (currently limited to 32K in start.sh + compaction.py).
5. **Plan written** at `~/.claude/plans/flickering-churning-moth.md` but may be stale — other sessions have made changes since.

## Three Design Corrections (Rajesh's Review)

These are DECIDED, not open questions:

1. **No `memory.json`** — use existing Context Engine. CE is already channel-agnostic (JSONL ingestion + entity extraction + hybrid retrieval). Zero CE changes needed. WhatsApp agent writes segments with `device_id="annie-whatsapp"`.

2. **Tasker = PRIMARY ingestion** — not "later optimization". Eliminates UI scraping (most fragile layer), eliminates phone mutex for reading, event-driven (instant vs 10s polling), notification API is stable. UI scraping = fallback only.

3. **128K context window** — Gemma 4 supports it natively. vLLM `--max-model-len` in start.sh needs bump from 32768 to 131072. Compaction presets need updating. Context budget: 2K system + 4K CE briefing + 5K weekly digests + 20K raw messages + 3K hourly digests = 34K used, ~94K headroom.

## What Needs Revalidation

Before implementing, verify these against CURRENT codebase state:

1. **start.sh** — has the Gemma 4 vLLM config changed? Check `--max-model-len` value and `--gpu-memory-utilization`.
2. **compaction.py** — has the PRESETS dict changed? Any new models added?
3. **Tests** — have test_compaction.py and test_context_inspect.py changed?
4. **Context Engine** — any new ingestion patterns or API changes since session 440?
5. **phone_loop.py** — any changes to the phone daemon that affect mutex design?
6. **phone_ui.py** — any changes to u2 helpers?
7. **Tasker** — is it installed on the Pixel yet? (It wasn't as of session 440)

## Two-Part Plan

### Part A: Gemma 4 128K Context Upgrade

Files to update (REVALIDATE line numbers first):
- `start.sh` — `--max-model-len 32768` → `131072`
- `services/annie-voice/compaction.py` — gemma-4-26b preset `ctx_size=32768` → `131072`
- `services/annie-voice/tests/test_compaction.py` — assertions
- `services/annie-voice/tests/test_context_inspect.py` — assertions
- `docs/RESOURCE-REGISTRY.md` — VRAM budget (MANDATORY per CLAUDE.md)

### Part B: WhatsApp Silent Observer Agent

Architecture: standalone daemon on Panda with own 128K context, Tasker ingestion, 3-tier compaction, CE integration.

New files (under `services/whatsapp-agent/`):
- `agent.py` — main daemon
- `tasker_receiver.py` — Tasker event handler
- `scraper.py` — u2 fallback
- `compaction.py` — 3-tier WhatsApp compaction
- `trigger.py` — regex + LLM gate
- `responder.py` — u2 reply delivery
- `context_client.py` — CE HTTP client (copy from telegram-bot)
- `jsonl_writer.py` — CE ingestion (copy from audio-pipeline)
- `config.py`, `mutex.py`, `tests/`

Implementation phases: (0) Gemma 4 128K → (1) Tasker + receiver → (2) Compaction + CE → (3) Trigger + response → (4) Integration → (5) Hardening

## Research Documents

- `docs/RESEARCH-WHATSAPP-SDK.md` — SDK options evaluated, ADB-only decided
- `docs/RESEARCH-WHATSAPP-AGENT.md` — Full architecture (REVISED with 3 corrections)
- `~/.claude/plans/flickering-churning-moth.md` — Detailed plan (may be stale)

## Start Command

```bash
# Revalidate plan:
# 1. Read this doc
# 2. Check git log for changes since session 440
# 3. Verify start.sh, compaction.py, phone_loop.py current state
# 4. Update plan, then implement Part A first (quick win), then Part B
```