# Research: Data Storage Strategy for her-os

**Date:** 2026-02-24
**Context:** Phase 0 — answering 6 open questions about data storage, hardware requirements, portability, multi-user, and business model for a self-hosted personal AI platform on NVIDIA DGX Spark.

**Prerequisites:** ADR-017 (Docker Deployment) decided. Phase 0 validation complete (20/21 technologies). All benchmarks passed.

---

## Table of Contents

1. [Minimum Hardware (No GPU)](#1-minimum-hardware-no-gpu)
2. [Disk Usage Projections](#2-disk-usage-projections)
3. [Multi-Device Convergence](#3-multi-device-convergence)
4. [Data Portability](#4-data-portability)
5. [Household Mode](#5-household-mode)
6. [Revenue Model](#6-revenue-model)
7. [Summary of Recommendations](#7-summary-of-recommendations)
8. [Sources](#8-sources)

---

## 1. Minimum Hardware (No GPU)

### The Question

Can her-os run without a DGX Spark? What about a regular laptop with no NVIDIA GPU?

### Component-by-Component Degradation Analysis

Every component in the her-os stack was validated on DGX Spark (aarch64, GB10 Blackwell, 128 GB unified). The question is what happens when you remove the GPU from the equation.

| Component | GPU Mode (DGX Spark) | CPU Fallback | Latency Multiplier | Viable? |
|-----------|---------------------|-------------|---------------------|---------|
| **Qwen3-Embedding-0.6B** | 69ms (1.1 GB VRAM) | ~350-500ms (CPU, FP32) | 5-7x slower | Yes, but degrades real-time feel |
| **Qwen3-Embedding-8B** | 342ms (14.1 GB VRAM) | ~4-8s (CPU, FP32, needs 16+ GB RAM) | 12-23x slower | Batch only, unusable real-time |
| **Whisper large-v3 STT** | 483ms/5s audio (8.75 GB VRAM) | ~8-15s/5s audio (CPU, faster-whisper) | 17-31x slower | Yes, but laggy. Use smaller model. |
| **GLiNER NER** | 132ms (CPU, validated) | 132ms (already CPU) | 1x (no change) | Yes, fully viable |
| **Kokoro-82M TTS** | 30ms short text (GPU) | ~50ms short text (CPU) | 1.7x slower | Yes, fully viable |
| **cuVS vector search** | 0.25ms/query (GPU) | N/A — hard GPU dependency | N/A | **No. Use FAISS-CPU.** |
| **FAISS-CPU vector search** | 0.81ms/query (validated) | 0.81ms/query | 1x (already CPU) | Yes, drop-in replacement |
| **cuGraph analytics** | 1.0ms PageRank (GPU) | N/A — hard GPU dependency | N/A | **No. Use NetworkX.** |
| **NetworkX analytics** | Not benchmarked | ~50-200ms PageRank (10K nodes, CPU) | 50-200x slower | Yes, but slower. Adequate for small graphs. |
| **PostgreSQL** | 0.44ms/query (Docker) | Same | 1x | Yes |
| **Redis** | 0.034ms/GET (Docker) | Same | 1x | Yes |
| **Neo4j** | 1.9ms/lookup (Docker) | Same | 1x | Yes |
| **Qdrant** | 5.71ms/search (Docker) | Same | 1x | Yes |
| **Claude API** | 2.8s (network) | Same | 1x | Yes (the LLM runs in the cloud) |

### Hard Blockers vs. Graceful Degradation

**Hard blockers (require code changes, not just slower):**
- **cuVS** — no CPU mode. Replace with FAISS-CPU (validated at 0.81ms, only 3.2x slower).
- **cuGraph** — no CPU mode. Replace with NetworkX (standard Python graph library). Slower, but adequate for personal-scale graphs (<100K nodes).

**Graceful degradation (works, just slower):**
- Embedding: Use 0.6B model only (skip 8B entirely). On CPU, expect ~350-500ms per query instead of 69ms. Still usable but noticeably laggy.
- STT: Use `faster-whisper` with `small` or `medium` model on CPU instead of `large-v3` on GPU. Expect 2-5s per 30s audio (vs. 481ms on GPU). Transcription still works, just delayed.
- TTS: Kokoro CPU is 1.7x slower than GPU — barely noticeable (50ms vs. 30ms for short text).
- NER: Already CPU. No change.

### CPU-Only Mode: What It Looks Like

| Metric | GPU Mode (DGX Spark) | CPU-Only Mode | Target |
|--------|---------------------|---------------|--------|
| Context retrieval pipeline | **71.9ms** | ~360-520ms | <100ms (fails) |
| Voice pipeline (STT+TTS, excl. LLM) | **513ms** | ~3-8s | <900ms (fails) |
| Entity extraction (NER) | **132ms** | **132ms** | <500ms (passes) |
| Memory search (vector) | **0.25ms** (cuVS) | **0.81ms** (FAISS-CPU) | <10ms (passes) |
| Graph traversal | **2.3ms** (cuGraph) | ~100-200ms (NetworkX) | <50ms (fails on large graphs) |
| Daily reflection generation | ~3-5s (Claude API) | ~3-5s (Claude API) | N/A (batch, not real-time) |

**The minimum usable experience on CPU-only:**
- Memory search works (FAISS-CPU + Qdrant are fine).
- Entity extraction works (GLiNER is CPU-native).
- Daily reflection/morning debrief works (Claude API is cloud, no GPU needed).
- Voice feels laggy (3-8s delay) but functional.
- Real-time context retrieval misses the <100ms target but is usable at ~400-500ms.
- Graph analytics are slow but functional for small personal graphs.

### Minimum Hardware Specifications

| Tier | Hardware | RAM | Storage | Experience |
|------|----------|-----|---------|------------|
| **Tier 1: DGX Spark** (reference) | GB10 Blackwell, 6144 CUDA cores | 128 GB unified | 3.7 TB | Full experience. All benchmarks pass. |
| **Tier 2: NVIDIA Desktop GPU** | RTX 4060 or better (8+ GB VRAM) | 32 GB RAM | 256 GB SSD | Near-full experience. Whisper/embedding on GPU. cuVS/cuGraph need x86 CUDA. |
| **Tier 3: Apple Silicon** | M2 Pro or better | 32 GB unified | 256 GB SSD | Good experience. PyTorch MPS backend for embedding/STT. No cuVS/cuGraph (use FAISS-CPU + NetworkX). |
| **Tier 4: CPU-Only** | Any modern x86-64 (8+ cores) | 16 GB RAM minimum, 32 GB recommended | 128 GB SSD | Degraded but usable. Memory search and daily reflection work. Voice is laggy. Skip 8B model. |
| **Tier 5: Minimum viable** | 4-core x86-64, no GPU | 8 GB RAM | 64 GB SSD | Bare minimum. Use `tiny` or `small` Whisper. 0.6B embedding. No graph analytics. Very slow voice. |

### Recommendation

**Implement a `HER_OS_MODE` environment variable:**

```
HER_OS_MODE=full       # DGX Spark — all GPU features
HER_OS_MODE=gpu        # Generic NVIDIA GPU — embedding + STT on GPU, FAISS-CPU for vectors
HER_OS_MODE=apple      # Apple Silicon — MPS backend, FAISS-CPU, NetworkX
HER_OS_MODE=cpu        # CPU-only — everything on CPU, smaller models
```

The mode selects:
- Which embedding model to load (8B+0.6B vs. 0.6B-only vs. smaller)
- Which vector search backend (cuVS vs. FAISS-CPU)
- Which graph analytics backend (cuGraph vs. NetworkX)
- Which STT model/size (large-v3 GPU vs. medium CPU vs. small CPU)
- Whether to pre-load models or load on-demand

**For launch: support `full` (DGX Spark) only.** Add `cpu` mode as a community contribution path after the core product works. The Docker Compose stack already targets DGX Spark (ADR-017). CPU mode is a separate `docker-compose.cpu.yml` with no GPU reservations, different model configs, and FAISS-CPU + NetworkX substituted.

---

## 2. Disk Usage Projections

### The Question

How much storage does one user consume per year? When does the 3.7 TB DGX Spark disk fill up?

### Assumptions

| Parameter | Value | Rationale |
|-----------|-------|-----------|
| Active capture hours/day | 8-16 hours | Omi worn during waking hours. Not all is speech. |
| Actual speech hours/day | 2-4 hours | Average person speaks 2-3 hours/day (work calls, family, social). Heavy conversationalist: 4-6 hours. |
| Words per minute (speech) | 130 wpm | Average English conversational rate |
| Characters per word (avg) | 5.5 | Including spaces |
| Transcript segments per hour | ~120 | Omi sends ~30s segments, so ~120 per hour of speech |
| Entities extracted per hour | ~20-50 | People, topics, promises, emotions per hour of conversation |
| Embedding dimensions | 4096 (float32) | Qwen3-Embedding model dimension |
| Bytes per embedding vector | 16,384 bytes (16 KB) | 4096 dims x 4 bytes/float32 |
| Entity file avg size | ~2 KB | YAML frontmatter + Markdown body |
| Conversations per day | 8-15 | Discrete conversation sessions (gap >15 min = new session) |

### Per-Component Storage Estimates

#### PostgreSQL (Conversation Records + Metadata)

| Data Type | Per Record | Records/Day | Daily | Yearly |
|-----------|-----------|-------------|-------|--------|
| Transcript segments | ~500 bytes (text + metadata) | 240-480 (2-4 hrs) | 120-240 KB | 44-88 MB |
| Conversation sessions | ~2 KB (summary + metadata) | 8-15 | 16-30 KB | 6-11 MB |
| Emotion/sentiment records | ~200 bytes | 8-15 | 1.6-3 KB | 0.6-1.1 MB |
| Promise/commitment records | ~300 bytes | 2-5 | 0.6-1.5 KB | 0.2-0.5 MB |
| Event records | ~300 bytes | 3-8 | 0.9-2.4 KB | 0.3-0.9 MB |
| Indexes + overhead | ~30% of data | - | - | ~15-30 MB |
| **PostgreSQL total** | | | **~140-280 KB/day** | **~65-130 MB/year** |

PostgreSQL is extremely compact for this workload. Even with 10 years of data, the database would be ~650 MB-1.3 GB. Negligible on 3.7 TB.

#### Neo4j (Knowledge Graph)

| Data Type | Per Item | Items/Day | Daily | Yearly |
|-----------|---------|-----------|-------|--------|
| Entity nodes (Person, Topic, etc.) | ~500 bytes (properties) | 5-15 new entities | 2.5-7.5 KB | 1-3 MB |
| Relationship edges | ~200 bytes | 20-60 | 4-12 KB | 1.5-4.4 MB |
| Temporal properties | ~100 bytes/edge | 20-60 | 2-6 KB | 0.7-2.2 MB |
| Indexes (30+ Graphiti indexes) | ~50% of data | - | - | ~1.5-5 MB |
| Transaction logs | ~2x data | - | - | ~5-15 MB |
| **Neo4j total** | | | **~20-50 KB/day** | **~10-30 MB/year** |

Neo4j's overhead is dominated by indexes and transaction logs, not the actual data. A personal knowledge graph grows slowly — most days add only a few new entities (you talk to the same people, about many of the same topics). The graph broadens gradually.

**Entity count projections:**

| Timeframe | Estimated Unique Entities | Estimated Edges | Neo4j Data Size |
|-----------|--------------------------|-----------------|-----------------|
| 1 month | 200-500 | 1,000-3,000 | ~5-15 MB |
| 6 months | 800-2,000 | 5,000-15,000 | ~30-90 MB |
| 1 year | 1,500-4,000 | 10,000-40,000 | ~60-180 MB |
| 5 years | 5,000-15,000 | 40,000-150,000 | ~300-900 MB |
| 10 years | 8,000-25,000 | 80,000-300,000 | ~600 MB-2 GB |

#### Qdrant (Vector Embeddings)

This is the **largest storage consumer**. Every transcript chunk and every entity gets a 4096-dimensional float32 embedding.

| Data Type | Per Item | Items/Day | Daily | Yearly |
|-----------|---------|-----------|-------|--------|
| Transcript chunk embeddings | 16 KB (4096 x float32) | 240-480 chunks | 3.8-7.7 MB | 1.4-2.8 GB |
| Entity embeddings | 16 KB | 5-15 new | 80-240 KB | 29-88 MB |
| HNSW index overhead | ~1.5x vector data | - | - | ~2.1-4.2 GB |
| Payload storage | ~200 bytes/point | 245-495 | 49-99 KB | 18-36 MB |
| **Qdrant total** | | | **~5.7-12 MB/day** | **~3.5-7.1 GB/year** |

**Qdrant is the dominant storage consumer.** The HNSW index alone grows with the embedding count. After 10 years: ~35-70 GB.

**Optimization options (if storage becomes a concern):**
- **Scalar quantization:** Qdrant supports quantizing float32 to uint8, reducing vector storage by 4x (from 16 KB to 4 KB per vector). Slight quality loss (~2-5% recall). Would reduce 10-year estimate from 35-70 GB to ~9-18 GB.
- **Dimensionality reduction:** Qwen3-Embedding supports Matryoshka dimensions — can truncate from 4096 to 2048 or 1024. Halving dimensions halves storage. Quality impact: ~3-5% on retrieval benchmarks.
- **Archival pruning:** Embeddings for transcript chunks older than N years could be removed (keep entity embeddings, discard chunk-level detail). Re-embeddable from stored transcripts if needed.

#### Entity Files (Human-Readable, Host Bind Mount)

| Data Type | Per File | Files Growth/Day | Daily | Yearly |
|-----------|---------|-----------------|-------|--------|
| New entity files | ~2 KB avg | 2-10 new files | 4-20 KB | 1.5-7.3 MB |
| Updated entity files | ~500 bytes delta | 10-30 updates | 5-15 KB | 1.8-5.5 MB |
| Skill files | ~3 KB avg | ~0 (manual creation) | 0 | ~100 KB |
| **Entity files total** | | | **~10-35 KB/day** | **~3.5-13 MB/year** |

Entity files are tiny. Even with 25,000 entity files after 10 years, total size would be ~50-100 MB. This is the most negligible storage component.

#### Redis (Cache, Bounded)

Redis is configured with `maxmemory 1gb --maxmemory-policy allkeys-lru` (ADR-017). It never exceeds 1 GB. It stores:
- Session state and conversation context
- Recent embedding cache (LRU eviction)
- Pub/sub channels for WebSocket events
- Rate limiting counters

**Redis: constant 1 GB maximum.** No growth projection needed.

### Total Storage Summary

| Component | Year 1 | Year 3 | Year 5 | Year 10 |
|-----------|--------|--------|--------|---------|
| PostgreSQL | 65-130 MB | 195-390 MB | 325-650 MB | 650 MB-1.3 GB |
| Neo4j | 60-180 MB | 180-540 MB | 300-900 MB | 600 MB-2 GB |
| Qdrant | 3.5-7.1 GB | 10.5-21.3 GB | 17.5-35.5 GB | 35-71 GB |
| Entity files | 3.5-13 MB | 10.5-39 MB | 17.5-65 MB | 35-130 MB |
| Redis | 1 GB (fixed) | 1 GB (fixed) | 1 GB (fixed) | 1 GB (fixed) |
| Docker images + models | 38 GB (fixed) | 38 GB (fixed) | 38 GB (fixed) | 38 GB (fixed) |
| **Total** | **~43-47 GB** | **~50-61 GB** | **~57-76 GB** | **~76-113 GB** |

### When Does Storage Become a Problem?

| Disk | Capacity | Time to 50% Full | Time to 80% Full |
|------|----------|-------------------|-------------------|
| DGX Spark (3.7 TB) | 3,500 GB usable | **~47 years** (moderate use) | **~76 years** |
| External 1 TB SSD | 930 GB usable | ~12 years | ~20 years |
| External 512 GB SSD | 465 GB usable | ~6 years | ~10 years |

**Storage will never be a problem on DGX Spark for a single user.** Even at the heavy-use upper bound (113 GB after 10 years), that is 3% of the 3.7 TB disk. The 38 GB of Docker images and models is the largest fixed cost, and it only needs to be paid once.

**If future features increase storage dramatically** (audio retention, screen capture, photo memory), Qdrant scalar quantization and Matryoshka dimension truncation provide 4-8x reduction. The DGX Spark also supports external storage via USB-C and Thunderbolt.

---

## 3. Multi-Device Convergence

### The Question

One user, multiple input devices. Where does data converge? How to handle overlaps?

### Architecture: her-os-core Is the Single Convergence Point

```
┌─────────────────┐   ┌─────────────────┐   ┌─────────────────┐
│  Omi Wearable   │   │  Phone Mic App  │   │  Laptop Text    │
│  (BLE → Flutter)│   │  (Direct STT)   │   │  (Web UI input) │
└────────┬────────┘   └────────┬────────┘   └────────┬────────┘
         │                     │                     │
         ▼                     ▼                     ▼
    POST /webhook/         POST /webhook/        POST /api/
    transcript             transcript             message
    (device_id: omi-001)  (device_id: phone-001) (device_id: web-001)
         │                     │                     │
         └─────────────────────┴─────────────────────┘
                               │
                    ┌──────────▼──────────┐
                    │    her-os-core      │
                    │  (FastAPI Gateway)  │
                    │                     │
                    │  Device Registry    │
                    │  Dedup Engine       │
                    │  Session Manager    │
                    │  Context Engine     │
                    └─────────────────────┘
```

All devices send data to the same her-os-core instance. The convergence is architectural — there is exactly one database, one knowledge graph, one vector store. No sync protocol needed.

### Device Identity and Source Tagging

Every incoming transcript segment carries device metadata per the Omi webhook contract:

```json
{
  "session_id": "S-2026-02-24-001",
  "device_id": "omi-001",
  "device_type": "omi_pendant",
  "segments": [...],
  "client_timestamp": "2026-02-24T10:30:00Z",
  "offline_buffered": false,
  "meta": {
    "firmware_version": "1.2.3",
    "battery_level": 0.85
  }
}
```

**Device Registry table (PostgreSQL):**

| Column | Type | Example |
|--------|------|---------|
| `device_id` | VARCHAR(64) PK | `omi-001` |
| `device_type` | ENUM | `omi_pendant`, `phone_mic`, `web_input`, `email`, `calendar` |
| `user_id` | FK → users | `rajesh` |
| `display_name` | VARCHAR | "Omi (pendant)" |
| `registered_at` | TIMESTAMP | 2026-02-24T00:00:00Z |
| `last_seen_at` | TIMESTAMP | 2026-02-24T10:30:00Z |
| `priority` | INT | 1 (lower = higher priority for dedup) |
| `is_active` | BOOL | true |

### Handling Duplicate/Overlapping Captures

**The problem:** Omi (pendant) and phone microphone in the same room will capture the same conversation. The transcripts will differ slightly (different mic positions, different STT accuracy), but represent the same speech.

**Deduplication strategy: temporal-spatial fingerprinting.**

1. **Time window matching:** If two segments from different devices arrive within a 30-second window and contain the same speaker(s), flag as potential overlap.

2. **Semantic similarity check:** Embed both segments with the 0.6B model. If cosine similarity > 0.85, they likely describe the same speech.

3. **Priority-based resolution:** The device with higher priority wins. Priority order:
   - Omi pendant (closest to speaker, best mic quality, always-on)
   - Phone microphone (backup, higher latency)
   - Web text input (manual, highest authority — user typed it)

4. **Merge, don't discard:** Keep both segments in PostgreSQL (for audit trail), but only the priority device's segment feeds into entity extraction and graph updates. The lower-priority segment is marked `is_duplicate: true` with a reference to the primary segment.

```sql
-- Deduplication status on transcript_segments table
ALTER TABLE transcript_segments ADD COLUMN dedup_status ENUM(
  'primary',      -- This is the authoritative segment
  'duplicate',    -- Duplicate of another segment (reference in dedup_primary_id)
  'merged'        -- Combined from multiple sources
);
ALTER TABLE transcript_segments ADD COLUMN dedup_primary_id UUID REFERENCES transcript_segments(id);
ALTER TABLE transcript_segments ADD COLUMN dedup_similarity FLOAT;  -- Cosine similarity score
```

5. **Quality-based override:** If the lower-priority device has significantly higher transcription quality (measured by confidence scores or word count), it can override the priority device. This handles the case where Omi's mic is muffled but the phone picks up clearly.

### Offline Buffering

**The Omi Flutter app already handles this** (per the platform architecture):

- SQLite retry queue on the phone persists webhook payloads across app restarts
- Exponential backoff for failed deliveries
- 200 MB cap on buffered data (~8-16 hours of transcripts)
- `offline_buffered: true` flag in the webhook payload tells her-os this data arrived late
- her-os processes offline-buffered segments in chronological order (by `client_timestamp`, not arrival time)

**her-os-core handles late-arriving data gracefully:**
- Session boundaries are computed from `client_timestamp`, not server receipt time
- Entity extraction runs on the buffered segments in order
- Knowledge graph updates are idempotent — re-processing a segment that already arrived via another device is harmless

### Multi-Device Scenarios

| Scenario | Behavior |
|----------|----------|
| Omi + phone in same room | Dedup via similarity. Omi wins (higher priority). Phone segment marked duplicate. |
| Omi only (normal use) | Standard single-device flow. No dedup needed. |
| Phone only (Omi not worn) | Phone is sole source. All segments are primary. |
| Web text input | Always primary (user explicitly typed). Highest authority. |
| Omi goes offline, phone takes over | Phone segments are primary during gap. Omi segments (when they arrive late) are deduped against phone's coverage. |
| Two Omis (future: left ear + right ear) | Merge as stereo capture. Dedup normally. |

---

## 4. Data Portability

### The Question

Can users migrate from one machine to another? What about cross-architecture?

### What Needs to Be Migrated

| Component | Location | Size (Year 1) | Migration Method |
|-----------|----------|---------------|-----------------|
| PostgreSQL | Docker volume | 65-130 MB | `pg_dump` / `pg_restore` |
| Neo4j | Docker volume | 60-180 MB | `neo4j-admin database dump` / `load` |
| Qdrant | Docker volume | 3.5-7.1 GB | Qdrant snapshot API |
| Entity files | `~/her-os-data/entities/` | 3.5-13 MB | `tar` / `rsync` / `git push` |
| Redis | Docker volume | <1 GB | Not needed (cache, rebuilds automatically) |
| ML models | Docker volume | ~19 GB | Re-download on new machine (or copy volume) |
| Configuration | `.env` + `config/` | <1 MB | `cp` / version control |
| Skills | `~/her-os-data/skills/` | <1 MB | `tar` / `rsync` / `git push` |
| IDENTITY.md / Soul.md | `~/her-os-data/` | <10 KB | Copy |

### Migration Scripts

**`scripts/backup.sh` (already planned in ADR-017):**

```bash
#!/bin/bash
# Full backup of all her-os data
# Usage: ./scripts/backup.sh [output_dir]
set -euo pipefail

BACKUP_DIR="${1:-$HOME/her-os-backups/$(date +%Y%m%d-%H%M%S)}"
mkdir -p "$BACKUP_DIR"

echo "=== her-os backup to $BACKUP_DIR ==="

# 1. PostgreSQL dump
echo "[1/5] Dumping PostgreSQL..."
docker exec her-os-postgres pg_dump -U heros -Fc heros > "$BACKUP_DIR/postgres.dump"

# 2. Neo4j dump (requires stopping neo4j, or using APOC export)
echo "[2/5] Dumping Neo4j..."
docker exec her-os-neo4j neo4j-admin database dump neo4j --to-path=/tmp/neo4j-backup
docker cp her-os-neo4j:/tmp/neo4j-backup/neo4j.dump "$BACKUP_DIR/neo4j.dump"

# 3. Qdrant snapshot
echo "[3/5] Snapshotting Qdrant..."
curl -s -X POST "http://localhost:16333/collections/her_os_entities/snapshots" \
  | jq -r '.result.name' \
  | xargs -I{} curl -s -o "$BACKUP_DIR/qdrant-entities.snapshot" \
    "http://localhost:16333/collections/her_os_entities/snapshots/{}"
curl -s -X POST "http://localhost:16333/collections/her_os_chunks/snapshots" \
  | jq -r '.result.name' \
  | xargs -I{} curl -s -o "$BACKUP_DIR/qdrant-chunks.snapshot" \
    "http://localhost:16333/collections/her_os_chunks/snapshots/{}"

# 4. Entity files
echo "[4/5] Archiving entity files..."
tar czf "$BACKUP_DIR/entities.tar.gz" -C "$HOME/her-os-data" entities/ skills/ IDENTITY.md 2>/dev/null || true

# 5. Configuration
echo "[5/5] Copying configuration..."
cp .env "$BACKUP_DIR/env.backup"
tar czf "$BACKUP_DIR/config.tar.gz" config/

echo "=== Backup complete: $BACKUP_DIR ==="
du -sh "$BACKUP_DIR"
```

**`scripts/restore.sh`:**

```bash
#!/bin/bash
# Restore her-os data from backup
# Usage: ./scripts/restore.sh <backup_dir>
set -euo pipefail

BACKUP_DIR="${1:?Usage: restore.sh <backup_dir>}"
[ -d "$BACKUP_DIR" ] || { echo "Backup dir not found: $BACKUP_DIR"; exit 1; }

echo "=== her-os restore from $BACKUP_DIR ==="

# 1. Restore PostgreSQL
echo "[1/5] Restoring PostgreSQL..."
docker exec -i her-os-postgres pg_restore -U heros -d heros --clean --if-exists < "$BACKUP_DIR/postgres.dump"

# 2. Restore Neo4j
echo "[2/5] Restoring Neo4j..."
docker cp "$BACKUP_DIR/neo4j.dump" her-os-neo4j:/tmp/neo4j.dump
docker exec her-os-neo4j neo4j-admin database load neo4j --from-path=/tmp --overwrite-destination

# 3. Restore Qdrant
echo "[3/5] Restoring Qdrant snapshots..."
for snapshot in "$BACKUP_DIR"/qdrant-*.snapshot; do
  collection=$(basename "$snapshot" .snapshot | sed 's/qdrant-/her_os_/')
  curl -s -X POST "http://localhost:16333/collections/$collection/snapshots/upload" \
    -H "Content-Type: multipart/form-data" \
    -F "snapshot=@$snapshot"
done

# 4. Restore entity files
echo "[4/5] Restoring entity files..."
tar xzf "$BACKUP_DIR/entities.tar.gz" -C "$HOME/her-os-data"

# 5. Restore configuration
echo "[5/5] Restoring configuration..."
cp "$BACKUP_DIR/env.backup" .env
tar xzf "$BACKUP_DIR/config.tar.gz"

echo "=== Restore complete. Restart services: docker compose down && docker compose up -d ==="
```

### Cross-Architecture Migration

| Migration Path | Complexity | Notes |
|----------------|-----------|-------|
| **DGX Spark → DGX Spark** | Low | Same arch (aarch64). Binary-compatible. Direct dump/restore. |
| **DGX Spark → x86-64 Linux** | Medium | PostgreSQL dumps are architecture-independent (text format). Neo4j dumps are Java-based (cross-platform). Qdrant snapshots are binary — **vectors may need re-indexing** (HNSW index is architecture-specific). Entity files are plain text (trivial). |
| **x86-64 → DGX Spark** | Medium | Same as above, reverse direction. Qdrant HNSW re-index on restore. |
| **DGX Spark → macOS (Apple Silicon)** | Medium-High | Same data portability as x86. Docker images change (ARM64 macOS vs. ARM64 Linux). Re-download models. |

**Key insight: the only cross-architecture problem is Qdrant's HNSW index.** The solution is to restore vectors (raw data) and rebuild the index on the new machine:

```python
# After restoring Qdrant vectors, trigger re-indexing
# This happens automatically on collection creation with different config
# Or: re-embed from stored transcripts (nuclear option, slow but guaranteed)
```

**Recommended migration flow:**

```
Old machine:
  1. ./scripts/backup.sh ~/migration-backup
  2. rsync -avz ~/migration-backup/ newmachine:~/migration-backup/
  3. rsync -avz ~/her-os-data/ newmachine:~/her-os-data/

New machine:
  1. git clone https://github.com/her-os/deploy
  2. cp ~/migration-backup/env.backup .env  (edit passwords)
  3. docker compose up -d  (downloads images + models)
  4. ./scripts/restore.sh ~/migration-backup
  5. docker compose restart  (pick up restored data)
```

### Full Export Command

**`scripts/export.sh` — GDPR-compliant full data export:**

```bash
#!/bin/bash
# Export all user data in portable, human-readable formats
# Usage: ./scripts/export.sh [output_dir]
set -euo pipefail

EXPORT_DIR="${1:-$HOME/her-os-export-$(date +%Y%m%d)}"
mkdir -p "$EXPORT_DIR"/{transcripts,entities,graph,embeddings}

echo "=== her-os full data export ==="

# 1. Transcripts as JSONL (human-readable)
docker exec her-os-postgres psql -U heros -d heros \
  -c "COPY (SELECT * FROM transcript_segments ORDER BY client_timestamp) TO STDOUT WITH (FORMAT csv, HEADER)" \
  > "$EXPORT_DIR/transcripts/all_segments.csv"

# 2. Entity files (already human-readable)
cp -r ~/her-os-data/entities/ "$EXPORT_DIR/entities/"
cp -r ~/her-os-data/skills/ "$EXPORT_DIR/skills/" 2>/dev/null || true
cp ~/her-os-data/IDENTITY.md "$EXPORT_DIR/" 2>/dev/null || true

# 3. Knowledge graph as Cypher export
docker exec her-os-neo4j cypher-shell -u neo4j -p "$NEO4J_PASSWORD" \
  "CALL apoc.export.cypher.all(null, {stream: true})" \
  > "$EXPORT_DIR/graph/full_graph.cypher"

# 4. Conversations as JSON (with all metadata)
docker exec her-os-postgres psql -U heros -d heros \
  -c "COPY (SELECT json_agg(c) FROM conversations c) TO STDOUT" \
  > "$EXPORT_DIR/transcripts/conversations.json"

# 5. Metadata about the export
cat > "$EXPORT_DIR/EXPORT_README.md" << 'EXPORTEOF'
# her-os Data Export

This directory contains a complete export of your her-os data.

## Contents

- `transcripts/` — All conversation transcripts (CSV + JSON)
- `entities/` — All extracted entities (Markdown + YAML)
- `skills/` — Your custom skills
- `graph/` — Full knowledge graph (Cypher format, importable to any Neo4j instance)
- `IDENTITY.md` — Annie's identity document

## Formats

All data is in open, human-readable formats. No proprietary encodings.
You can read every file with a text editor.

## Re-import

To import into a new her-os instance, use `scripts/restore.sh`.
To import the graph into standalone Neo4j, run the Cypher file.
EXPORTEOF

echo "=== Export complete: $EXPORT_DIR ==="
du -sh "$EXPORT_DIR"
```

### GDPR Considerations

| Requirement | How her-os Satisfies It |
|-------------|------------------------|
| **Right to access** (Art. 15) | `scripts/export.sh` — full data export in human-readable formats |
| **Right to erasure** (Art. 17) | `scripts/delete-user.sh` — drops all PG tables, deletes Neo4j graph, removes Qdrant collections, deletes entity files |
| **Right to portability** (Art. 20) | Export in CSV, JSON, Markdown, Cypher — all machine-readable, open formats |
| **Right to rectification** (Art. 16) | Entity files are editable (they are Markdown files on disk). Graph re-indexes from files. |
| **Data minimization** (Art. 5) | No raw audio retention. Transcripts only. Configurable retention periods. |

---

## 5. Household Mode

### The Question

Multiple people, one DGX Spark, separate memory spaces. How?

### Option Analysis

| Approach | Isolation | Complexity | Resource Use | Privacy Guarantee |
|----------|-----------|-----------|-------------|-------------------|
| **A. Separate Docker Compose stacks** | Complete | Low | High (6 containers x N users) | Perfect — separate DBs, separate networks |
| **B. Multi-tenant single stack** | Logical (row-level) | High | Low (shared services) | Moderate — code bug could leak data |
| **C. Namespace within single DB** | Schema-level | Medium | Low | Good — schema isolation in PG/Neo4j |
| **D. Hybrid (shared infra, separate app)** | Per-service | Medium | Medium | Good — shared PG/Redis, separate Neo4j/Qdrant |

### Recommendation: Option A — Separate Stacks

**For a privacy-first personal AI, complete isolation is the only acceptable choice.** Person A should never see Person B's memories under any circumstance — not through a code bug, not through a misconfigured query, not through a shared cache.

**Implementation:**

```
~/her-os-data/
├── rajesh/
│   ├── entities/
│   ├── skills/
│   ├── IDENTITY.md
│   └── .env
├── partner/
│   ├── entities/
│   ├── skills/
│   ├── IDENTITY.md
│   └── .env
└── shared/
    └── models/  (shared HuggingFace cache — models are not personal data)
```

```yaml
# docker-compose.rajesh.yml
name: her-os-rajesh
services:
  postgres:
    container_name: her-os-rajesh-postgres
    ports: ["15432:5432"]
    volumes: [rajesh-postgres-data:/var/lib/postgresql/data]
  neo4j:
    container_name: her-os-rajesh-neo4j
    ports: ["17474:7474", "17687:7687"]
  qdrant:
    container_name: her-os-rajesh-qdrant
    ports: ["16333:6333"]
  redis:
    container_name: her-os-rajesh-redis
    ports: ["16379:6379"]
  her-os-core:
    container_name: her-os-rajesh-core
    ports: ["8000:8000"]
    volumes:
      - shared-model-cache:/data/models      # Shared (read-only in app)
      - ~/her-os-data/rajesh/entities:/data/entity-files
    environment:
      HER_OS_USER: rajesh
      WEBHOOK_PATH: /webhook/rajesh/transcript
```

```yaml
# docker-compose.partner.yml
name: her-os-partner
services:
  postgres:
    container_name: her-os-partner-postgres
    ports: ["25432:5432"]
  # ... same pattern, different ports, different volumes
  her-os-core:
    container_name: her-os-partner-core
    ports: ["8001:8000"]
    volumes:
      - shared-model-cache:/data/models:ro   # Shared, read-only
      - ~/her-os-data/partner/entities:/data/entity-files
```

### Resource Budget for Household Mode

| Resource | Per Instance | 2 Users | 3 Users | DGX Spark Total |
|----------|-------------|---------|---------|-----------------|
| **VRAM (GPU models)** | ~10.4 GB (0.6B + Whisper + Kokoro) | ~20.8 GB | ~31.2 GB | 128 GB (24% for 3 users) |
| **RAM (infra services)** | ~8 GB (PG 2G + Neo4j 4G + Redis 1G + Qdrant 4G) | ~16 GB | ~24 GB | 128 GB shared with VRAM |
| **RAM (her-os-core)** | ~4-6 GB (FastAPI + Python) | ~8-12 GB | ~12-18 GB | |
| **CPU cores** | 2-4 active | 4-8 active | 6-12 active | 12 cores (Grace) |
| **Disk (year 1)** | ~43-47 GB | ~48-56 GB | ~53-65 GB | 3.7 TB |
| **Docker containers** | 6 (5 infra + 1 app) | 12 | 18 | Plenty of capacity |

**DGX Spark comfortably supports 2-3 concurrent users.** The bottleneck is VRAM for GPU models, not CPU/RAM/disk. With 128 GB unified memory, even 3 full instances use only ~31 GB VRAM for models — 24% of capacity.

**Optimization: shared model serving.** Instead of loading models in each her-os-core container, run a shared model inference server (similar to Ollama's pattern). All instances query the same model service. This reduces VRAM from ~31 GB (3 instances) to ~10.4 GB (1 instance serving all).

```
┌──────────────────────────────────────────────────────┐
│  Shared Model Server (GPU)                           │
│  - Qwen3-Embedding-0.6B (1.1 GB)                    │
│  - Whisper large-v3 (8.75 GB)                        │
│  - Kokoro-82M (0.5 GB)                               │
│  Endpoint: http://model-server:9000                  │
│  Total VRAM: ~10.4 GB (regardless of user count)     │
└──────────────────────────────────────────────────────┘
         ▲                    ▲                    ▲
         │                    │                    │
┌────────┴────────┐  ┌───────┴────────┐  ┌───────┴────────┐
│ her-os-rajesh   │  │ her-os-partner │  │ her-os-child   │
│ (CPU-only app)  │  │ (CPU-only app) │  │ (CPU-only app) │
│ Port 8000       │  │ Port 8001      │  │ Port 8002      │
│ Own PG/Neo4j/Qd │  │ Own PG/Neo4j/Qd│  │ Own PG/Neo4j/Qd│
└─────────────────┘  └────────────────┘  └────────────────┘
```

### Omi Device to User Mapping

Each Omi device has a unique BLE address (device_id). The mapping lives in the `device_registry` table:

| Omi Device ID | User | Instance | Webhook URL |
|---------------|------|----------|-------------|
| `omi-AB12CD34` | Rajesh | her-os-rajesh | `http://localhost:8000/webhook/transcript` |
| `omi-EF56GH78` | Partner | her-os-partner | `http://localhost:8001/webhook/transcript` |

The Flutter app supports multiple webhook URLs (per the platform architecture). Each Omi device routes to its owner's her-os instance.

### Privacy Enforcement

| Layer | Isolation Mechanism |
|-------|-------------------|
| **Network** | Separate Docker bridge networks per stack. Containers cannot reach across stacks. |
| **Database** | Separate PostgreSQL instances. No shared database. No schema-level isolation (which could leak via joins). |
| **Graph** | Separate Neo4j instances. No shared graph. |
| **Vectors** | Separate Qdrant instances. No shared collections. |
| **Files** | Separate entity file directories on host. Linux file permissions enforce access control. |
| **Cache** | Separate Redis instances. No shared cache. |
| **Models** | Shared model weights (read-only). Models contain no personal data. |
| **API keys** | Separate `.env` files. Each user provides their own Anthropic API key (or shares one with separate billing tags). |

---

## 6. Revenue Model

### The Question

If her-os is self-hosted and privacy-first, what do users pay for?

### Cost Structure for the User

**Fixed costs (one-time):**

| Item | Cost | Notes |
|------|------|-------|
| NVIDIA DGX Spark | ~$3,999 | The hardware. Required for full GPU experience. |
| Omi wearable | ~$89 | Audio capture device. |
| **Total hardware** | **~$4,088** | One-time purchase. |

**Recurring costs (monthly):**

| Item | Cost/Month | Notes |
|------|-----------|-------|
| Claude API (Anthropic) | $15-50/month | The dominant cost. Entity extraction, daily reflection, reasoning. |
| Electricity (DGX Spark) | ~$3-5/month | ~140W under load, ~4W idle. ~$0.10/kWh US average. |
| Internet | Already paid | Needed for Claude API calls and Omi data. |
| **Total recurring** | **~$18-55/month** | Mostly Claude API. |

**Claude API cost breakdown (estimated):**

| Operation | Frequency | Tokens/Op | Monthly Tokens | Monthly Cost |
|-----------|-----------|-----------|----------------|-------------|
| Entity extraction | 8-15 convos/day | ~2,000 input + 500 output | ~750K in + 225K out | ~$3.75 |
| Daily reflection | 1/day | ~5,000 input + 2,000 output | ~150K in + 60K out | ~$1.05 |
| Memory search/reasoning | 10-20 queries/day | ~3,000 input + 1,000 output | ~1.5M in + 600K out | ~$9.00 |
| Meditation (nightly) | 1/day | ~10,000 input + 3,000 output | ~300K in + 90K out | ~$2.10 |
| Ad-hoc conversations | 5-10/day | ~2,000 input + 1,000 output | ~600K in + 300K out | ~$4.50 |
| **Total Claude API** | | | **~3.3M in + 1.3M out** | **~$20/month** |

*(Based on Claude Sonnet at ~$3/M input, $15/M output tokens. With Claude Haiku for routine extraction, cost drops to ~$8-12/month.)*

### What Products in This Space Charge

| Product | Model | Price | What You Get |
|---------|-------|-------|-------------|
| **Rewind AI** (pre-Meta) | SaaS subscription | $20/month | Screen recording + search. Cloud-processed. |
| **Limitless AI** (pre-Meta) | Hardware + subscription | $99 pendant + $29-49/month | Wearable + transcription + AI search. Cloud. |
| **Notion AI** | Per-user subscription | $10/user/month | AI writing/search within Notion. |
| **Mem.ai** | Freemium + subscription | Free / $8-15/month | AI-powered note organization. |
| **Obsidian (Sync + Publish)** | One-time + subscription | $50 one-time / $8-16/month sync | Note app + optional cloud sync. |
| **Plaud NotePin** | Hardware + subscription | $179 + subscription | Meeting recording + AI summaries. |
| **Compass** | Hardware + subscription | $99 + $14/month | Wearable + transcription. |
| **1Password** | Subscription | $3-5/month (individual) | Password manager. Local-first, self-hostable. |
| **Tailscale** | Freemium | Free personal / $5-18/user/month teams | VPN. Self-hostable (Headscale). |

### Revenue Model Options

| Model | How It Works | Revenue/User/Year | Pros | Cons |
|-------|-------------|-------------------|------|------|
| **A. Open-source + donations** | Free software, Patreon/GitHub Sponsors | $0-5 | Community goodwill, maximum adoption | Unsustainable as sole income |
| **B. One-time license** | Buy once, own forever | $99-299 (one-time) | Simple, aligns with self-hosted ethos | No recurring revenue. Updates become charity. |
| **C. Annual subscription** | Pay yearly for updates + support | $99-199/year | Predictable revenue. Funds development. | "Why am I paying monthly for software I host?" friction. |
| **D. Freemium + premium features** | Core free, advanced features paid | $0-199/year | Low barrier to entry. Upsell path. | What's "premium" in a self-hosted app? |
| **E. Support + SLA** | Software free, support is paid | $0-500/year | Works for enterprise. | Individuals won't pay for support. |
| **F. Dual license** | AGPLv3 (free) + commercial license (paid) | $99-499/year (commercial) | Protects against cloud resellers. Individuals use free. | Legal complexity. |

### Recommendation: Model F (Dual License) + Model C (Optional Subscription)

**Tier structure:**

| Tier | Price | What's Included |
|------|-------|-----------------|
| **Community** (AGPLv3) | Free | Full software. Self-hosted. Community support (GitHub Issues, Discord). All features. No artificial limits. Must share modifications (AGPL). |
| **Personal** (annual) | $99/year | Same software + commercial license (no AGPL share requirement). Priority bug fixes. Early access to new features. Direct email support. |
| **Household** (annual) | $199/year | Personal tier + multi-user setup scripts + household management UI. Up to 5 users on one DGX Spark. |
| **Patron** (annual) | $499/year | Household tier + name in credits + quarterly video call with developers + influence on roadmap priorities. |

**Why this works:**

1. **Privacy-first users get the full product for free.** No bait-and-switch. No feature gates. The AGPL license ensures modifications stay open-source, which is good for the ecosystem.

2. **The $99/year Personal tier is priced below Claude API costs** ($20-50/month = $240-600/year). Users who can afford a $4,000 DGX Spark and $20+/month in API costs will gladly pay $99/year (~$8/month) for a commercial license, support, and early access. It is a rounding error on their total cost.

3. **The AGPL prevents cloud resellers** from taking her-os, hosting it as a SaaS, and undercutting the project. Anyone who modifies and distributes her-os must share their code. A commercial license exempts paying users from this requirement.

4. **Comparison with the market:** Limitless charged $29-49/month ($348-588/year) for a cloud service. Compass charges $14/month ($168/year). her-os at $99/year is a significant discount, and the user owns their data.

### What Users Actually Pay For (Value Proposition)

The user is not paying for software they "already have." They are paying for:

| Value | Description |
|-------|-------------|
| **Continued development** | New features, new dimensions, new skills. The 6-dimension roadmap is multi-year. |
| **Model updates** | As new embedding models, STT models, and LLM APIs improve, her-os integrates them. |
| **Security patches** | Self-hosted software needs security updates. Dependency vulnerabilities, CVEs. |
| **Compatibility testing** | NVIDIA releases new CUDA versions, Docker updates, Python upgrades. Someone needs to validate the stack. |
| **Documentation and guides** | Setup guides, troubleshooting, skill authoring tutorials. |
| **Community** | Discord, GitHub, knowledge sharing with other users running the same stack. |

### Cost Comparison: her-os vs. Alternatives

| Solution | Year 1 Cost | Year 2+ Cost | Data Ownership | Privacy |
|----------|------------|-------------|----------------|---------|
| **her-os (Community)** | $4,088 (hardware) + $240-660 (API) | $240-660/year (API only) | Complete | Self-hosted |
| **her-os (Personal)** | $4,088 + $240-660 + $99 | $240-660 + $99/year | Complete | Self-hosted |
| **Limitless (pre-Meta)** | $99 (pendant) + $348-588 | $348-588/year | Meta owns it | Cloud (Meta) |
| **Compass** | $99 (pendant) + $168 | $168/year | Cloud | Cloud (encrypted) |
| **DIY (no her-os)** | $4,088 (hardware) + manual setup | $0 + your time | Complete | Self-hosted |

**her-os competes on long-term cost (year 2+) and total ownership.** The hardware investment pays for itself in ~2-3 years compared to premium cloud services, and the user retains full ownership forever.

### Claude API Cost Reduction Roadmap

The largest recurring cost is Claude API. The roadmap to reduce it:

| Phase | Strategy | Estimated Cost Reduction |
|-------|----------|------------------------|
| **Phase 1** (launch) | Claude Haiku for routine extraction, Sonnet for reasoning only | 40-60% reduction |
| **Phase 2** | Local LLM (Qwen 3 30B on DGX Spark) for routine extraction | 70-80% reduction |
| **Phase 3** | Fine-tuned local models for entity extraction + summarization | 85-90% reduction |
| **Phase 4** | Claude API only for complex reasoning and novel situations | 90-95% reduction |

**At Phase 4, recurring cost drops to ~$2-5/month in API fees.** The DGX Spark has enough compute (1 PFLOP FP4) to run a 30B local model at 60 tokens/second (already validated by the community). The investment in local hardware pays for itself.

---

## 7. Summary of Recommendations

### Decisions

| # | Question | Recommendation | Confidence |
|---|----------|---------------|------------|
| 1 | Minimum hardware | DGX Spark is primary target. CPU-only mode is architecturally possible via `HER_OS_MODE` env var. Hard blockers: cuVS → FAISS-CPU, cuGraph → NetworkX. Launch with `full` mode only; add `cpu` mode post-launch. Minimum: 8-core CPU, 16 GB RAM, no GPU. | High |
| 2 | Disk usage | ~43-47 GB year 1, ~76-113 GB year 10. Qdrant vectors are the dominant consumer. DGX Spark's 3.7 TB will never fill from a single user's data. Scalar quantization available as 4x reduction if needed. | High |
| 3 | Multi-device | Single her-os-core instance is the convergence point. Temporal-semantic deduplication for overlapping captures. Priority-based device resolution (Omi > phone > web). Offline buffering handled by Flutter app. | High |
| 4 | Data portability | `scripts/backup.sh` + `scripts/restore.sh` for machine-to-machine migration. `scripts/export.sh` for GDPR-compliant human-readable export. Cross-architecture: PG/Neo4j dumps are portable, Qdrant HNSW index needs rebuild. Entity files are plain text. | High |
| 5 | Household mode | Separate Docker Compose stacks per user (complete isolation). Shared model volume (read-only). DGX Spark supports 2-3 concurrent users comfortably. Future optimization: shared model inference server. | High |
| 6 | Revenue model | Dual-license: AGPLv3 (free, full features) + commercial license tiers ($99-499/year). Users pay for continued development, updates, support. Claude API cost ($20-50/month) is the user's biggest expense — roadmap to reduce via local models. | Medium (business model needs market validation) |

### Implementation Priority

| Priority | Task | Blocked By |
|----------|------|-----------|
| **P0 (Sprint 1)** | `HER_OS_MODE` env var in config (even if only `full` is implemented) | Nothing |
| **P0 (Sprint 1)** | Device registry table in PostgreSQL | Nothing |
| **P1 (Sprint 2)** | `scripts/backup.sh` | Working data stores |
| **P1 (Sprint 2)** | `scripts/restore.sh` | `scripts/backup.sh` |
| **P2 (Sprint 3)** | Deduplication engine for multi-device | Entity extraction pipeline |
| **P2 (Sprint 3)** | `scripts/export.sh` (GDPR export) | Working data stores |
| **P3 (Sprint 4+)** | Household mode (multi-stack scripts) | Single-user mode stable |
| **P3 (Sprint 4+)** | CPU-only mode (`docker-compose.cpu.yml`) | Single-user mode stable |
| **P4 (Post-MVP)** | License + billing infrastructure | Revenue model validated |

### New Entity Types

This research introduces one new table but no new entity types:

| Table | Purpose | Added By |
|-------|---------|----------|
| `device_registry` | Maps device_id → user, device_type, priority, last_seen | Section 3 (Multi-Device) |

### Open Questions Resolved

| Question | Resolution |
|----------|-----------|
| "Can her-os run without GPU?" | Yes, with degraded latency. CPU-only mode is viable for memory search + daily reflection. Voice is laggy. |
| "When does disk fill up?" | Never, for a single user on DGX Spark. ~113 GB after 10 years = 3% of 3.7 TB. |
| "How do multiple devices converge?" | Single her-os-core instance. Dedup via temporal-semantic matching. Priority-based device resolution. |
| "Can users migrate machines?" | Yes. PG dump + Neo4j dump + Qdrant snapshot + tar entity files. Cross-arch needs Qdrant re-index. |
| "Can multiple people share one DGX Spark?" | Yes. Separate Docker Compose stacks. 2-3 users comfortable. Shared models. |
| "What do users pay for?" | AGPLv3 (free) + commercial license ($99-499/year). Claude API is the main recurring cost ($20-50/month), reducible via local models. |

---

## 8. Sources

### Storage and Database

1. PostgreSQL documentation: physical storage, row sizes, TOAST — https://www.postgresql.org/docs/16/storage.html
2. Neo4j storage architecture, property graph model — https://neo4j.com/docs/operations-manual/current/configuration/store-formats/
3. Qdrant storage documentation, scalar quantization, HNSW index — https://qdrant.tech/documentation/guides/quantization/
4. FAISS-CPU benchmarks, IVF-Flat on ARM64 — validated Phase 0 (0.81ms/query on 10K x 4096-dim)
5. Redis maxmemory and eviction policies — https://redis.io/docs/management/config/

### Hardware and GPU

6. NVIDIA DGX Spark specifications: GB10, 128 GB unified, 3.7 TB, 6144 CUDA cores — https://www.nvidia.com/en-us/products/workstations/dgx-spark/
7. Phase 0 validation results: all benchmark data from `docs/PHASE0-TITAN-VALIDATION.md`
8. cuVS (replaces FAISS-GPU): 0.25ms/query IVF-Flat — validated Phase 0
9. cuGraph PageRank: 2.9ms on 10K nodes — validated Phase 0
10. NetworkX CPU performance estimates: ~10-100x slower than cuGraph for graph algorithms — https://networkx.org/documentation/stable/

### Competing Products and Pricing

11. Rewind AI pricing (pre-Meta acquisition): $20/month — https://web.archive.org/web/2025/https://www.rewind.ai/pricing
12. Limitless AI pricing (pre-Meta acquisition): $29-49/month — https://www.limitless.ai/pricing
13. Compass AI wearable: $99 + $14/month — https://www.compass.tech
14. Plaud NotePin: $179 + subscription — https://www.plaud.ai
15. Notion AI: $10/user/month — https://www.notion.so/product/ai

### Privacy and Legal

16. GDPR Articles 15-20 (access, erasure, portability, rectification) — https://gdpr-info.eu/
17. CCPA 2026 compliance requirements — https://secureprivacy.ai/blog/ccpa-requirements-2026-complete-compliance-guide
18. AGPL v3 license text — https://www.gnu.org/licenses/agpl-3.0.en.html

### Architecture Patterns

19. Docker Compose multi-service orchestration — NVIDIA dgx-spark-playbooks reference
20. HuggingFace model caching and volume mounts — validated in ADR-017
21. Omi webhook contract: `POST /webhook/transcript` — per platform architecture (`~/workplace/hackathons-2026/omi-hackathon-blr-nudge/docs/PLATFORM-ARCHITECTURE.md`)
22. OpenClaw self-hosted model, dual licensing — https://github.com/openclaw
23. Ollama shared model serving pattern — https://ollama.ai

### Claude API Pricing

24. Anthropic Claude API pricing (as of Feb 2026): Sonnet ~$3/M input, $15/M output; Haiku ~$0.25/M input, $1.25/M output — https://www.anthropic.com/pricing
25. Qwen 3 30B performance on DGX Spark: ~60 tokens/second, 49 GB memory — community benchmarks
