# Research: Context Management for Annie Voice

**Date:** 2026-03-04
**Status:** Research complete, not yet implemented

## Problem

Annie Voice conversations are ephemeral (not persisted) and her pre-session context is thin (2,000 chars, 10 snippets). Long conversations exhaust the context window. Cross-session memory relies on Context Engine search, but current conversations are never saved.

## Anthropic's Context Management Stack

### 1. Server-Side Compaction (Beta: `compact-2026-01-12`)

Auto-summarizes long conversations when input tokens exceed a threshold. The API generates a structured `<summary>` block and replaces the full history.

**API usage:**
```python
context_management={
    "edits": [{
        "type": "compact_20260112",
        "trigger": {"type": "input_tokens", "value": 80000},
        "instructions": "Summarize: topics discussed, promises made, emotional state, open questions."
    }]
}
```

**Results:** 84% token reduction, 39% performance improvement with memory tool.

**Docs:** https://platform.claude.com/docs/en/build-with-claude/compaction

### 2. Context Editing (Beta: `context-management-2025-06-27`)

Surgical pruning of old tool results and thinking blocks.

- `clear_tool_uses_20250919` — clears oldest tool call/result pairs, keeps N most recent
- `clear_thinking_20251015` — clears extended thinking blocks

**Usage:**
```python
betas=["context-management-2025-06-27"],
context_management={
    "edits": [{
        "type": "clear_tool_uses_20250919",
        "trigger": {"type": "input_tokens", "value": 50000},
        "keep": {"type": "tool_uses", "value": 5},
    }]
}
```

**Docs:** https://platform.claude.com/docs/en/build-with-claude/context-editing

### 3. Memory Tool (SDK: `memory_20250818`)

Built-in API tool type. Claude manages its own persistent file-based memory. Reads/writes `/memories/` directory across sessions. Claude decides what's worth remembering.

- Python SDK examples: https://github.com/anthropics/anthropic-sdk-python/blob/main/examples/memory/basic.py
- Subclass `BetaAbstractMemoryTool` to hook into custom storage
- License: MIT (SDK)

**Docs:** https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool

### 4. MCP Knowledge Graph Memory Server

Open source (MIT) MCP server for entity/relation/observation storage in local JSON.

- **Repo:** https://github.com/modelcontextprotocol/servers/tree/main/src/memory
- Tools: `create_entities`, `create_relations`, `add_observations`, `search_nodes`

### 5. Contextual Retrieval (September 2024)

RAG technique: enrich document chunks with LLM-generated context before indexing. Reduces retrieval failures by 49% (67% with reranking).

- **Blog:** https://www.anthropic.com/news/contextual-retrieval
- **Cookbook:** https://github.com/anthropics/anthropic-cookbook

## Application to Annie Voice

| Annie's Gap | Solution | Sprint |
|------------|---------|--------|
| Conversations not saved | Transcript persistence (periodic context.messages snapshot) | Sprint 8 |
| Long conversations exhaust context | Server-side compaction | Sprint 8 (Layer 2) |
| Old tool results pile up | Context editing (clear_tool_uses) | Sprint 8 (Layer 2) |
| Thin pre-session briefing (2,000 chars) | Richer session summaries + Memory Tool | Sprint 9+ |
| Cross-session "what did we talk about" | Transcript persistence + compaction summaries | Sprint 8 |

## Architecture Recommendation

### Layer 1: Transcript Persistence (Sprint 8, planned)
- Periodic snapshot from `context.messages` → JSONL → Context Engine watcher → PostgreSQL
- No pipeline modification, zero latency impact
- Adversarial-reviewed plan at `.claude/plans/reactive-honking-quill.md`

### Layer 2: Compaction + Context Editing (Sprint 8, ~20 lines)
- Add `context_management` parameter to Annie's Claude API calls
- Compaction trigger at 80K tokens with Annie-specific instructions
- Tool result clearing at 50K tokens, keep 5 most recent
- Only applies when `LLM_BACKEND == "claude"` (Ollama doesn't support this)

### Layer 3: Persistent Memory Notes (Sprint 9+)
- Use Memory Tool pattern for Annie to maintain curated notes about Rajesh
- Not raw transcripts, but knowledge ("Rajesh prefers morning meetings")
- Could use MCP Memory Server or custom implementation backed by Context Engine

## Key Insight

The compaction feature is what Claude Code itself uses for long sessions. Applying it to Annie means she can have hour-long conversations without degrading. The custom `instructions` parameter preserves Annie-specific context (emotional state, promises, open questions).
