# Annie's Three-Tier Memory System

**How a personal AI builds lasting knowledge from fleeting conversations.**

---

## Why Three Tiers?

Human memory is not a single bucket. Cognitive science distinguishes at least three
kinds of remembering:

- **Episodic memory** -- vivid, timestamped recall of specific events ("I had coffee
  with Priya at the new place on MG Road, Tuesday at 3 PM").
- **Semantic memory** -- consolidated facts stripped of episodic context ("Rajesh
  likes coffee. Priya is his wife").
- **Procedural/pattern memory** -- deep knowledge about identity, personality, and
  behavioral patterns ("Rajesh is more creative in mornings. He values privacy
  deeply").

Most AI assistants treat memory as a flat key-value store or a single vector
database. Annie does not. Her Context Engine implements a three-tier system --
L0, L1, L2 -- that mirrors the episodic-to-semantic-to-pattern progression of
human memory. Entities are born in L0 when first mentioned, promoted to L1 once
validated across multiple conversations, and elevated to L2 when they become
deep, stable aspects of the user's identity.

This is not academic theory. It solves real problems:

1. **Recency bias.** Without tiers, a question like "What's Rajesh's favorite
   shoe brand?" returns twelve conversations sorted by date. With tiers, L1
   holds the consolidated answer ("Hoka"), and L0 holds the raw evidence.

2. **Contradiction handling.** People change their minds. L1 uses Graphiti's
   bi-temporal edges (`t_valid` / `t_invalid`) to track evolving truths without
   losing history.

3. **Noise vs. signal.** Not every conversation fragment deserves permanent
   storage. The nightly decay job lets ephemeral mentions fade naturally, while
   evergreen facts (people, places, relationships) persist with a guaranteed
   minimum salience.

The design draws from the Zep/Graphiti paper ("A Temporal Knowledge Graph
Architecture for Agent Memory," Rasmussen et al., 2025), the PulseHQ "living
garden" architecture, and open-source projects like Memorose (Rust, three-layer
L0/L1/L2) and CortexGraph (Ebbinghaus forgetting curve).

---

## Architecture Overview

```
                         Conversation via OMI Wearable
                                    |
                                    v
                     +----------------------------+
                     |     Audio Pipeline          |
                     |  WhisperX + pyannote + SER  |
                     +----------------------------+
                                    |
                              JSONL segments
                                    |
                                    v
                     +----------------------------+
                     |     Context Engine           |
                     |  ingest.py -> extract.py     |
                     |  -> graphiti_client.py        |
                     +----------------------------+
                                    |
                       Entity enters as L0 (episodic)
                                    |
        +---------------------------+---------------------------+
        |                           |                           |
        v                           v                           v
  +-----------+             +-----------+              +-----------+
  |    L0     |  promote    |    L1     |   promote    |    L2     |
  |  Episodic | =========>  |  Semantic | ==========>  |  Pattern  |
  |  <7 days  |  7d + 3ep   | 7-90 days |  90d + type  |  >90 days |
  +-----------+             +-----------+              +-----------+
                                  ^                         |
                                  |       demote            |
                                  +-------------------------+
                                   (contradicted by new
                                    evidence: t_invalid)
```

---

## L0: Episodic Memory (Recent, <7 days)

L0 is the raw, vivid layer. Every entity extracted from a conversation enters
L0 at full salience (1.0). It is the "what just happened" buffer.

### What it stores

- Full-detail entities with timestamps, speaker attribution, and evidence quotes
- Every entity type: person, topic, promise, emotion, event, place, decision,
  question, relationship
- The conversation context that produced each entity (evidence field)
- Sensitivity classification: open, private, or sensitive

### How entities enter

When the audio pipeline writes JSONL segments, the Context Engine's filesystem
watcher (`watcher.py`) detects changes. The ingest pipeline reads new segments,
and `extract.py` sends them to the extraction LLM (Qwen3.5-35B-A3B via Ollama)
with this prompt structure:

```
Extract structured entities from this conversation transcript.

For each entity, provide:
- type: one of [person, topic, promise, emotion, event, place, decision,
        question, relationship]
- name: short canonical name
- confidence: 0.0-1.0
- properties: relevant key-value pairs
- evidence: the exact quote that supports this entity
- sensitivity: "open", "private", or "sensitive"
```

Extracted entities pass through two filters before storage:

1. **Per-type confidence gating** -- promises and decisions require 0.7+
   confidence; topics and emotions only 0.5. From `config.py`:

   ```python
   CONFIDENCE_THRESHOLDS = {
       "person": 0.6,
       "topic": 0.5,
       "promise": 0.7,     # false promises are harmful
       "emotion": 0.5,
       "event": 0.6,
       "place": 0.6,
       "decision": 0.7,    # high threshold
       "question": 0.5,
       "relationship": 0.6,
   }
   ```

2. **Deduplication** -- keyed on `(entity_type, name)`, case-insensitive. When
   duplicates exist, the highest-confidence instance wins. Critically,
   `Topic:Google != Person:Google` -- different types are never merged.

### Retention rules

- Entities remain in L0 for up to 7 days
- Subject to nightly temporal decay (see Decay section below)
- Non-evergreen entities with very low salience (<0.1) are flagged as
  `low_salience` in dashboard stats
- If an entity accumulates 3+ episode mentions within 7 days, it becomes
  eligible for early promotion consideration at the next nightly job

### What L0 feels like

Annie at L0 can answer: "What did you talk about with Priya yesterday?" She
has the raw material -- timestamps, speakers, emotional context from SER
(speech emotion recognition). But she has not yet decided what is *fact* versus
what was a passing mention.

---

## L1: Semantic Memory (Consolidated, 7-90 days)

L1 is the fact layer. Entities that survive here have been mentioned across
multiple conversations and cross-validated. This is where Annie builds her
understanding of who you are and what you care about.

### Promotion criteria (L0 to L1)

An entity moves from L0 to L1 when ALL of the following are true:

1. **Age >= 7 days** -- the entity has existed long enough to prove it is not
   ephemeral noise
2. **Episode count >= 3** -- mentioned in at least three separate conversation
   sessions
3. **Not contradicted** -- the entity's `contradicted` flag is `False`

From `memory_tiers.py`:

```python
L0_TO_L1_AGE_DAYS = 7
L0_TO_L1_MIN_EPISODES = 3

def classify_tier(entity: MemoryEntity) -> MemoryTier:
    now = time.time()
    age_days = (now - entity.first_seen) / 86400

    # Contradicted entities: demote L2->L1, block any promotion
    if entity.contradicted:
        if entity.tier == MemoryTier.L2:
            return MemoryTier.L1
        return entity.tier

    # L1 promotion: old enough + mentioned in multiple episodes
    if (age_days >= L0_TO_L1_AGE_DAYS
            and entity.episode_count >= L0_TO_L1_MIN_EPISODES):
        return MemoryTier.L1

    return MemoryTier.L0
```

### What changes at L1

- Entities gain **validated** status -- they are no longer raw observations but
  cross-referenced facts
- **Contradiction detection** activates: Graphiti's bi-temporal edges track
  when facts change. If Rajesh says "I love Nike shoes" in week 1 and "these
  Hokas are incredible" in week 3, Graphiti sets `t_invalid` on the Nike edge
  and creates a new Hoka edge with its own `t_valid`. Both facts remain in the
  graph, but queries resolve to the current truth.
- L1 entities participate in **hybrid retrieval**: cuVS vector similarity (GPU,
  0.17ms) + BM25 keyword search + Graphiti graph traversal, fused via
  Reciprocal Rank Fusion (RRF, k=60)
- Salience continues to decay nightly, but L1 entities with enough episodes tend
  to get "refreshed" by new mentions

### What L1 feels like

Annie at L1 can answer: "What's Rajesh's favorite shoe brand?" with confidence.
She knows it is Hoka (was Nike, invalidated three weeks ago). She can track
promises: "You said you'd call your mom this weekend -- did you?" She has facts,
not just fragments.

### All entity types are eligible for L1

Every type (person, topic, promise, emotion, event, place, decision, question,
relationship) can reach L1 if it meets the age and episode thresholds.

---

## L2: Community/Pattern Memory (Deep Knowledge, >90 days)

L2 is the personality layer. These are not individual facts but deep patterns
about who the user is as a person. They emerge from cuGraph community detection
running over months of accumulated L1 entities.

### Promotion criteria (L1 to L2)

An entity reaches L2 when ALL of the following are true:

1. **Age >= 90 days** -- the entity has persisted for at least three months
2. **Entity type is L2-eligible** -- only certain types represent deep identity:
   ```python
   L2_ELIGIBLE_TYPES = {"person", "place", "relationship", "habit", "emotion"}
   ```
3. **Episode count >= 3** -- still requires cross-validation
4. **Not contradicted** -- no unresolved contradictions

### Why only these types?

L2 represents *who the user is*, not *what the user discussed*. Topics, events,
decisions, questions, and promises are important but situational. They belong in
L1 as searchable facts. The types eligible for L2 reflect enduring aspects of
personhood:

| Type | Why L2-eligible | Example |
|------|----------------|---------|
| person | Lasting relationships define identity | "Priya (wife), Amma (mother)" |
| place | Home, workplace, meaningful locations | "Lives in Bangalore" |
| relationship | Social fabric | "Close to his sister, mentors junior devs" |
| habit | Behavioral patterns | "Codes best in mornings, procrastinates on admin" |
| emotion | Emotional tendencies | "Gets energized by new tech, anxious about deadlines" |

### What L2 looks like

L2 entities represent deep truths surfaced by community detection (cuGraph
PageRank + clustering):

> "Rajesh is more creative in mornings. He values privacy deeply. He tends to
> procrastinate on admin tasks. His relationship with his parents is important
> to him."

These are not things Rajesh said in a single conversation. They are patterns
Annie inferred across hundreds of conversation segments over months.

---

## The Demotion Path: L2 to L1

Memory is not monotonic. People change. A deeply held belief may be contradicted
by new evidence, and the system must handle this gracefully.

### How demotion works

When Graphiti detects a contradiction (via its dedicated contradiction detection
LLM -- Qwen3 32B, configured as a specialist in `config.py`), it:

1. Sets `t_invalid` on the old edge (event timeline)
2. Sets `t'_expired` on the old edge (transactional timeline -- when the system
   learned the fact was no longer true)
3. Creates a new edge with `t_valid` set to the current time
4. Marks the old entity's `contradicted` flag as `True`

The next nightly consolidation job calls `classify_tier()` on this entity. The
logic is explicit:

```python
# Contradicted entities: demote L2->L1, block any promotion
if entity.contradicted:
    if entity.tier == MemoryTier.L2:
        return MemoryTier.L1
    return entity.tier  # stay at current tier until contradiction resolved
```

Key behaviors:
- **L2 + contradicted = demote to L1** for re-evaluation
- **L1 + contradicted = stay at L1** (no further demotion -- the fact needs
  resolution, not erasure)
- **L0 + contradicted = stay at L0** (too young to have been promoted; the
  contradiction is noted but the entity keeps its current tier)
- **Contradicted entities cannot be promoted** regardless of age or episode
  count. The flag must be cleared first.

### The four timestamps (Zep/Graphiti bi-temporal model)

Every edge in the knowledge graph carries four timestamps:

| Timestamp | Timeline | Meaning |
|-----------|----------|---------|
| `t_valid` | Event (T) | When the fact became true in the real world |
| `t_invalid` | Event (T) | When the fact stopped being true in the real world |
| `t'_created` | Transaction (T') | When the system first learned this fact |
| `t'_expired` | Transaction (T') | When the system learned the fact was invalidated |

This bi-temporal model means Annie never loses history. She can answer "What did
Rajesh think about Nike?" with full temporal context, even after the preference
changed to Hoka.

---

## Temporal Decay

Not all memories are worth keeping at full strength. The nightly decay job
reduces salience on entities that have not been mentioned recently, letting
unimportant observations fade naturally while protecting permanent facts.

### The formula

```
decay = 2^(-age_days / half_life)
```

Where:
- `age_days` = days since the entity was last mentioned (`last_seen`)
- `half_life` = 30 days (`TEMPORAL_DECAY_HALF_LIFE_DAYS` in `config.py`)

This means:
| Days since last mention | Decay factor | Salience |
|------------------------|--------------|----------|
| 0 | 1.000 | 100% |
| 7 | 0.841 | 84% |
| 15 | 0.707 | 71% |
| 30 | 0.500 | 50% |
| 60 | 0.250 | 25% |
| 90 | 0.125 | 12.5% |
| 120 | 0.063 | 6.3% |
| 365 | 0.0004 | ~0% |

### Evergreen exceptions

Some entity types should never fully disappear. You should always be able to
find "Priya is Rajesh's wife" regardless of when it was last mentioned. These
are the **evergreen types**:

```python
EVERGREEN_TYPES = {"person", "place", "relationship"}
EVERGREEN_DECAY_FLOOR = 0.3
```

For evergreen entities, the decay formula becomes:

```python
def compute_decay(entity: MemoryEntity) -> float:
    now = time.time()
    age_seconds = max(0, now - entity.last_seen)
    half_life_seconds = TEMPORAL_DECAY_HALF_LIFE_DAYS * 86400

    raw_decay = math.pow(2, -age_seconds / half_life_seconds)

    if entity.entity_type in EVERGREEN_TYPES:
        return max(EVERGREEN_DECAY_FLOOR, raw_decay)

    return raw_decay
```

The `max(0.3, ...)` floor means a person entity one year old still has 30%
salience -- dimmed but always findable. A topic entity from the same period
would be at 0.04% -- effectively invisible unless explicitly searched for.

### Why these types are evergreen

- **person** -- "Who is Priya?" should always return a result
- **place** -- "Where does Rajesh live?" is permanently relevant
- **relationship** -- "Who is Rajesh's wife?" is a core identity fact

Topics, emotions, events, promises, decisions, and questions are all
situational. A promise from six months ago that was never fulfilled is
legitimately low-priority. An emotion from a year-old conversation is
historical, not current.

---

## The Nightly Consolidation Job (3 AM)

Every night at 3 AM, the Context Engine runs a batch job that processes every
entity in the system. The job does three things in sequence:

1. **Compute decay** -- calculate fresh salience for each entity based on
   time since `last_seen`
2. **Re-classify tiers** -- check if any entity should be promoted or demoted
   based on its current age, episode count, type, and contradiction status
3. **Log tier changes** -- record promotions and demotions for observability

From `memory_tiers.py`:

```python
def apply_nightly_decay(entities: list[MemoryEntity]) -> list[MemoryEntity]:
    updated = []
    promotions = 0
    demotions = 0

    for entity in entities:
        old_tier = entity.tier

        # Compute decay
        decay_factor = compute_decay(entity)
        entity.salience = decay_factor

        # Re-classify tier
        new_tier = classify_tier(entity)
        if new_tier != old_tier:
            entity.tier = new_tier
            if _tier_rank(new_tier) > _tier_rank(old_tier):
                promotions += 1
            else:
                demotions += 1

        updated.append(entity)

    if promotions or demotions:
        logger.info(
            "Nightly decay: %d entities, %d promoted, %d demoted",
            len(entities), promotions, demotions,
        )

    return updated
```

### Dashboard stats

The `get_tier_stats()` function provides real-time counts for the Aquarium
dashboard's memory zone:

```python
def get_tier_stats(entities: list[MemoryEntity]) -> dict:
    stats = {"L0": 0, "L1": 0, "L2": 0, "low_salience": 0}
    for e in entities:
        stats[e.tier.value] = stats.get(e.tier.value, 0) + 1
        if e.salience < 0.1:
            stats["low_salience"] += 1
    return stats
```

Example output after 120 days of use:
```json
{"L0": 47, "L1": 186, "L2": 34, "low_salience": 14}
```

---

## Real-World Example: Tracking "Coffee" from Day 0 to Day 120

Here is how a single entity -- the topic "coffee" -- moves through the three
tiers over four months.

### Day 0: First mention

Rajesh tells a colleague: "I found this great new coffee shop near Koramangala."

- **Extraction:** LLM extracts `{type: "topic", name: "coffee", confidence: 0.7,
  evidence: "great new coffee shop near Koramangala"}`
- **Also extracted:** `{type: "place", name: "Koramangala", confidence: 0.8}`
- **Tier:** L0 (brand new, episode_count = 1)
- **Salience:** 1.0

### Day 3: Second mention

Rajesh mentions to Priya: "Should we try that coffee place this weekend?"

- **Episode count:** now 2
- **Tier:** still L0 (needs 3 episodes, only has 2)
- **Salience:** 0.93 (3 days of decay: `2^(-3/30) = 0.933`)

### Day 8: Third mention + L1 promotion

Rajesh on a call: "The pour-over at that Koramangala cafe is seriously good."

- **Episode count:** now 3
- **Age:** 8 days (> 7-day threshold)
- **Nightly job runs:** `classify_tier()` returns L1
- **Tier: PROMOTED to L1**
- **Salience:** 0.83 (8 days: `2^(-8/30) = 0.832`)
- **Graph edge created:** `Rajesh --[likes]--> coffee` with `t_valid = Day 0`

### Day 35: Routine mention

Rajesh: "Grabbed coffee on the way to work."

- **Episode count:** now 7
- **Tier:** L1 (stable)
- **Salience refreshed:** resets decay clock (`last_seen` updated to Day 35)
- **New salience:** 1.0 (just mentioned)

### Day 62: No mentions for a month

Coffee has not come up since Day 35.

- **Age since last mention:** 27 days
- **Salience:** 0.53 (`2^(-27/30) = 0.534`)
- **Tier:** L1 (still valid -- age and episodes qualify)

### Day 91: L2 consideration

Coffee is now 91 days old. But `topic` is NOT in `L2_ELIGIBLE_TYPES`.

- **Tier:** stays at L1
- **Salience:** 0.16 (`2^(-56/30) = 0.159`, 56 days since Day 35)
- The topic "coffee" will never reach L2. It is a *topic*, not a *person*,
  *place*, *relationship*, *habit*, or *emotion*.

Meanwhile, "Koramangala" (type: `place`, also extracted on Day 0) IS L2-eligible
and HAS been mentioned in 3+ episodes:

- **"Koramangala" tier: PROMOTED to L2** (age 91 days, type `place`, 4 episodes)
- **"Koramangala" salience:** 0.3 (evergreen floor -- place type never drops
  below 0.3)

### Day 95: Contradiction

Rajesh: "Actually, I've been switching to tea. Coffee was giving me headaches."

- **Graphiti contradiction detection:** `Rajesh --[likes]--> coffee` gets
  `t_invalid = Day 95`
- **New edge:** `Rajesh --[avoids]--> coffee` with `t_valid = Day 95`
- **New entity:** `{type: "topic", name: "tea", confidence: 0.8}` enters L0
- **Coffee's `contradicted` flag:** set to `True`
- **Coffee's tier:** stays at L1 (contradiction blocks further promotion, but
  L1 + contradicted = stay L1)

### Day 120: Final state

| Entity | Type | Tier | Salience | Status |
|--------|------|------|----------|--------|
| coffee | topic | L1 | 0.04 | contradicted, fading |
| tea | topic | L0 | 0.65 | 3 episodes, nearing L1 |
| Koramangala | place | L2 | 0.30 | evergreen floor |
| coffee shop | place | L1 | 0.30 | evergreen floor |

Annie can now answer:
- "Do I like coffee?" -- "You used to, but you switched to tea around Day 95
  because of headaches."
- "Where's that cafe?" -- "Koramangala" (L2, always findable)
- "What am I drinking lately?" -- "Tea" (L0, high salience, recent)

---

## Entity Lifecycle Diagram

```
  CONVERSATION                 L0 (Episodic)              L1 (Semantic)              L2 (Pattern)
  ===========                  ============               =============              ============

  "I like Nike"  --------->  [Nike, topic]
                             salience: 1.0
                             episodes: 1
                                  |
  (5 more mentions)              |
                                  v
  Day 8, ep=3   --------->  [Nike, topic] ============>  [Nike, topic]
                             (age>7, ep>=3)              salience: 0.83
                                                         validated fact
                                                              |
  Day 45:                                                     |
  "These Hokas                                                |
   are amazing"  --------->  [Hoka, topic]               [Nike, topic]
                             salience: 1.0               contradicted=True
                             episodes: 1                 t_invalid set
                                                              |
                                                              | (stays L1,
                                                              |  cannot promote)
                                                              v
                                                         [Nike, topic]
                                                         salience: 0.04
                                                         fading...

  -----------------------------------------------------------------------

  "Rajesh" (person)
  mentioned in 50+          [Rajesh, person] ========>  [Rajesh, person] ========>  [Rajesh, person]
  episodes over              salience: 1.0              salience: 0.95             salience: 0.30+
  120+ days                  ep=1                       ep=50                      EVERGREEN
                                                        validated                  deep identity
```

---

## Summary Table

| Property | L0 (Episodic) | L1 (Semantic) | L2 (Pattern) |
|----------|---------------|---------------|--------------|
| **Age** | <7 days | 7-90 days | >90 days |
| **Min episodes** | 1 | 3 | 3 |
| **Eligible types** | all 9 types | all 9 types | person, place, relationship, habit, emotion |
| **Contains** | raw observations | validated facts | identity patterns |
| **Contradiction** | noted, stays L0 | noted, stays L1 | demoted to L1 |
| **Decay** | standard (30-day half-life) | standard | standard |
| **Evergreen floor** | 0.3 for person/place/relationship | same | same |
| **Example** | "Mentioned coffee today" | "Likes coffee" | "Morning person" |
| **Retrieval** | BM25 keyword | hybrid (vector + BM25 + graph) | hybrid |

---

## Test Coverage

The memory tier system has 24 tests across 6 test classes in
`services/context-engine/tests/test_memory_tiers.py`:

| Test Class | Tests | What it covers |
|------------|-------|----------------|
| `TestClassifyTier` | 8 | L0/L1/L2 classification, boundary conditions, type eligibility |
| `TestComputeDecay` | 5 | Decay formula, 30-day half-life, evergreen floor |
| `TestNightlyDecay` | 4 | Batch processing, salience updates, promotion during decay |
| `TestTierStats` | 3 | Per-tier counts, low-salience flagging, empty list |
| `TestTierConstants` | 6 | Threshold values, eligible types, floor value |
| `TestContradictedPromotion` | 4 | Demotion from L2, promotion blocking, non-contradicted sanity |

Key test examples:

- `test_contradicted_l2_demotes_to_l1` -- An L2 person entity with 50 episodes
  and 120 days of history is demoted to L1 when contradicted
- `test_30_day_old_half_decay` -- Verifies the decay factor is approximately
  0.5 after 30 days (within 0.4-0.6 tolerance)
- `test_evergreen_person_has_floor` -- A person entity 365 days old still has
  salience >= 0.3

---

## Configuration Reference

All tunable constants live in two files:

**`services/context-engine/config.py`:**

| Constant | Value | Purpose |
|----------|-------|---------|
| `TEMPORAL_DECAY_HALF_LIFE_DAYS` | 30 | Half-life for salience decay |
| `CONFIDENCE_THRESHOLDS` | per-type dict | Minimum confidence for entity extraction |
| `DEFAULT_CONFIDENCE_THRESHOLD` | 0.5 | Fallback for unknown entity types |
| `CONTRADICTION_LLM_MODEL` | `qwen3:32b` | Specialist model for contradiction detection |

**`services/context-engine/memory_tiers.py`:**

| Constant | Value | Purpose |
|----------|-------|---------|
| `L0_TO_L1_AGE_DAYS` | 7 | Minimum age for L1 promotion |
| `L0_TO_L1_MIN_EPISODES` | 3 | Minimum episode mentions for L1 |
| `L1_TO_L2_AGE_DAYS` | 90 | Minimum age for L2 promotion |
| `L2_ELIGIBLE_TYPES` | person, place, relationship, habit, emotion | Types that can reach L2 |
| `EVERGREEN_TYPES` | person, place, relationship | Types with decay floor |
| `EVERGREEN_DECAY_FLOOR` | 0.3 | Minimum salience for evergreen entities |

---

## What Comes Next

The memory tier system is implemented (`memory_tiers.py`, 182 lines, fully
tested). The remaining integration work is in Sprint M2 of the build plan:

1. **Wire `memory_tiers.py` into the nightly scheduler** -- currently the
   `apply_nightly_decay()` function exists but needs a scheduler (APScheduler
   or asyncio cron) to trigger it at 3 AM
2. **cuGraph community detection for L2** -- identify behavioral clusters
   across months of L1 entities, then promote cluster-representative entities
   to L2
3. **Dashboard visualization** -- the Aquarium dashboard already has type
   swim-lane columns and tier depth bands in `memoryZone.ts`; entities need to
   render as dots that move between tiers with promotion trail animations
4. **Contradiction pipeline** -- wire Graphiti's `t_invalid` events to the
   `contradicted` flag on `MemoryEntity`, triggering demotion at the next
   nightly job