# Phase 0: Titan Technology Validation

**Started:** 2026-02-24
**Hardware:** DGX Spark (NVIDIA GB10 Blackwell, aarch64 ARM64)

---

## 1. Environment Survey (completed)

### Hardware Profile

| Component | Value |
|-----------|-------|
| Architecture | **aarch64** (ARM64, NOT x86) |
| OS | Ubuntu 24.04.4 LTS (Noble Numbat) |
| Kernel | 6.17.0-1008-nvidia (PREEMPT_DYNAMIC) |
| GPU | NVIDIA GB10 (Blackwell) |
| GPU Driver | 580.126.09 |
| CUDA Runtime | **13.0** |
| GPU Memory | 128 GB unified (shared CPU/GPU) |
| Total RAM | 119 GB |
| Disk | 3.7 TB total, **3.5 TB free** (3% used) |
| Swap | 15 GB |
| GPU Temperature | 43°C idle |
| GPU Power | 4W idle / N/A cap |
| GPU Utilization | 0% (Xorg + gnome-shell only) |

### Software Installed

| Software | Version | Notes |
|----------|---------|-------|
| Python | 3.12.3 | System python, includes pip 24.0 + venv |
| Docker | 29.1.3 | User in docker group (no sudo needed) |
| pip | 24.0 | `/usr/lib/python3/dist-packages/pip` |
| venv | Available | `python3 -m venv` works |
| Node.js | **22.22.0** | Installed via nodesource apt |
| npm | **10.9.4** | Bundled with Node.js |
| Claude Code CLI | **2.1.52** | `~/.local/bin/claude` (official installer) |
| nvcc | **12.0** (V12.0.140) | CUDA toolkit (note: runtime is 13.0, toolkit is 12.0) |

### Docker Images Present

| Image | Tag |
|-------|-----|
| ghcr.io/github/github-mcp-server | latest |
| ghcr.io/open-webui/open-webui | ollama |

### Networking

| Check | Result |
|-------|--------|
| PyPI reachable | ✅ Yes |
| SSH | Port 22 |
| Port 11000 | In use (likely open-webui) |
| DNS | Local resolver running |

### NOT Installed

- Redis ❌ (will use Docker)
- PostgreSQL ❌ (will use Docker)
- PyTorch ❌

---

## 2. System Install Commands (Rajesh runs with sudo)

### 2a. Node.js 22 LTS (ARM64)

```bash
# Install Node.js 22 LTS via NodeSource
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt-get install -y nodejs

# Verify
node --version   # expect: v22.x.x
npm --version    # expect: 10.x.x
```

### 2b. CUDA Development Toolkit (for nvcc)

The NVIDIA driver (580.126.09) and CUDA runtime (13.0) are installed, but `nvcc` is not available. This is needed for compiling CUDA extensions (PyTorch, CTranslate2, etc.).

```bash
# Check what's available
apt-cache search cuda-toolkit 2>/dev/null
apt-cache search nvidia-cuda 2>/dev/null

# Option A: If nvidia-cuda-toolkit is available
sudo apt install -y nvidia-cuda-toolkit

# Option B: If NVIDIA's repo is configured
sudo apt install -y cuda-toolkit-13-0

# Verify
nvcc --version
```

**Note:** If neither works, PyTorch wheels with pre-built CUDA support may still function — the nvcc requirement is primarily for building from source. We'll test this in Tier 2 validation.

### 2c. Claude Code CLI

```bash
# Official installer (no sudo needed)
curl -fsSL https://claude.ai/install.sh | bash

# Verify
~/.local/bin/claude --version
~/.local/bin/claude -p "Say hello" --output-format json
```

---

## 3. Validation Results

### Tier 1 — Sprint 1 Blockers

| # | Technology | Status | Version | Install Method | Latency | Notes |
|---|-----------|--------|---------|----------------|---------|-------|
| 1 | Python 3.12 venv | ✅ | 3.12.3 | Built-in | - | pip 24.0, venv works, ARM64 wheels available |
| 2 | Node.js (ARM64) | ✅ | 22.22.0 | apt (nodesource) | - | npm 10.9.4 bundled |
| 3 | Claude Code CLI | ✅ | 2.1.52 | `claude.ai/install.sh` | 2.8s (2.3s API) | `~/.local/bin/claude`, JSON output works |
| 4 | FastAPI + uvicorn | ✅ | 0.133.0 / 0.41.0 | pip | - | /health endpoint, async, HTTP 200 confirmed |
| 5 | PostgreSQL 16 | ✅ | 16 (alpine) | Docker | - | ARM64 image, container running on :15432 |
| 6 | Redis 7 | ✅ | 7.4.8 | Docker | 0.005-0.007ms/op | ARM64 image, pub/sub works. Docker Hub IPv6 flaky — retry if pull fails |
| 7 | asyncpg | ✅ | 0.31.0 | pip (C ext) | 0.07ms insert, 0.09ms query | ARM64 wheel exists, C extension compiled fine |
| 8 | SQLAlchemy async | ✅ | 2.0.46 | pip | 0.03ms/insert (ORM batched) | Full ORM + asyncpg engine, create_all works |

### Tier 2 — Entity Storage + Search

| # | Technology | Status | Version | Install Method | Latency | VRAM | Notes |
|---|-----------|--------|---------|----------------|---------|------|-------|
| 9 | PyTorch (Blackwell) | ✅ | 2.12.0.dev+cu128 | pip (nightly cu128 index) | 0.38ms matmul, 12 TFLOPS FP16 | 45.6 MB idle | **RISK ELIMINATED**: aarch64 nightly has CUDA. Compute cap 12.1, 128.5 GB. Install: `pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128` |
| 10 | Qwen3-Embedding-8B | ✅ | 8B (4096-dim) | HuggingFace + transformers | 37ms/sentence (batch=16) | 15.13 GB | **RISK ELIMINATED**: Runs on Blackwell FP16. 11.8% of 128GB. CLS pooling needs tuning (use mean pooling). |
| 11 | Mem0 (self-hosted) | ✅ | 0.x | pip + Qdrant | - | - | **RISK ELIMINATED**: Connects to Qdrant. LLM ops need Claude adapter. |
| 12 | Qdrant | ✅ | latest (ARM64) | Docker | 5.71ms/search (1K vecs, 4096-dim) | - | Batched inserts (100/batch for 4096-dim). |
| 13 | Neo4j 5 | ✅ | 5.26.21 community | Docker | 1.9ms/lookup, 4.9ms/create | - | Full Cypher, relationships, temporal queries. |
| 14 | Graphiti | ✅ | 0.28.1 | pip + Neo4j | - | - | 30+ indexes created. Needs Claude LLM adapter (only ships OpenAI). |
| 15 | FalkorDB | ❌ | - | Docker | - | - | Docker pull blocked by persistent DNS timeout. IPv6 not disabled. Fallback: Neo4j (validated, production choice). |

### Tier 3 — Voice + NER (Sprint 2+)

| # | Technology | Status | Version | Install Method | Latency | VRAM | Notes |
|---|-----------|--------|---------|----------------|---------|------|-------|
| 16 | DeBERTa/GLiNER | ✅ | 0.2.25 | pip (gliner) | 120ms/text (CPU) | ~500 MB | Excellent NER: Person 0.98, Date 0.97, Org 0.84. 8 entity types. 100-text bench: 120ms/text. |
| 17 | Whisper (PyTorch) | ✅ | large-v3 (1.55B) | pip (openai-whisper) | **476ms/30s audio (GPU)** = 0.016x RTF | 8.75 GB | **62x faster than real-time on GPU!** PyTorch-native Whisper works perfectly. |
| 17b | faster-whisper | ⚠️ | 1.2.1 (CTranslate2 4.7.1) | pip | 352ms/30s (CPU) | CPU only | CTranslate2 PyPI aarch64 wheels are CPU-only. The `mekopa/whisperx-blackwell` Docker container has CTranslate2 compiled from source with full CUDA. |
| 18 | Kokoro-82M TTS | ✅ | 0.9.4 | pip (kokoro) | **0.238x RTF (GPU patched)** | GPU | **GPU works with TorchSTFT patch** (avoid complex tensors). 1.6x faster than CPU. nvrtc SM_120 workaround: patch `transform()` + `inverse()` to avoid Jiterator. |
| 19 | Pipecat | ✅ | 0.0.103 | pip (pipecat-ai) | - | - | WhisperSTT + AnthropicLLM + KokoroTTS services all available. Pipeline construction works. |
| 20 | cuGraph | ✅ | 26.02.00 | pip (`cugraph-cu13 --extra-index-url=https://pypi.nvidia.com`) | PageRank 2.9ms (10K nodes) | GPU | **FIXED**: Was using wrong package (`cu12` not `cu13`) and missing NVIDIA PyPI index. ARM64 wheels exist! |
| 21 | cuVS (replaces FAISS-GPU) | ✅ | 26.2.0 | pip (`cuvs-cu13 --extra-index-url=https://pypi.nvidia.com`) | **0.25ms/query IVF-Flat** (3.2x vs FAISS-CPU) | GPU | GPU vector search. CAGRA + IVF-Flat. Superior to FAISS-GPU. Official ARM64 + CUDA 13 wheels. |
| 21b | FAISS-CPU | ✅ | 1.13.2 | pip (faiss-cpu) | 0.81ms/query (10K×4096) | CPU | Backup for in-process search. cuVS is primary GPU path. |
| 22 | pyannote Diarization | ✅ | 3.3.2 (speaker-diarization-3.1) | mekopa/whisperx-blackwell container | **0.060s/10s audio = 166x RT** (after JIT) | 198 MB peak | WhisperX container (sidecar). 2 fixes: `add_safe_globals()` + `check_version` no-op. 211s first-run JIT warmup. HF gated model. |

**Legend:** ⏳ Untested | 🔄 Testing | ✅ Pass | ❌ Fail | ⚠️ Pass with workaround

### Tier 3 Summary

- **Whisper large-v3 (PyTorch GPU)** — **62x faster than real-time!** 476ms for 30s audio on GPU. 8.75 GB VRAM. OpenAI Whisper via PyTorch (not CTranslate2) is the production STT path.
- **GLiNER** — Production-quality NER. 120ms/text on CPU is acceptable for transcript processing (not real-time).
- **Kokoro TTS (GPU patched)** — **0.238x RTF on GPU** (1.6x faster than CPU). Requires monkey-patching `TorchSTFT.transform()` and `.inverse()` to avoid complex tensors that trigger Jiterator/nvrtc on SM_120.
- **Pipecat** — Framework works. All three services (STT/LLM/TTS) importable. Ready for voice pipeline construction.
- **cuGraph** — **GPU graph analytics working!** Key: use `cugraph-cu13` (not `cu12`) with `--extra-index-url=https://pypi.nvidia.com`. PageRank: 2.9ms on 10K nodes.
- **cuVS** — **GPU vector search working!** Replaces FAISS-GPU. IVF-Flat: 0.25ms/query (3.2x faster than FAISS-CPU). CAGRA index for large-scale batch search.
- **FAISS-CPU** — 0.81ms/query backup. cuVS is the primary GPU vector search.
- **pyannote Diarization (GPU)** — **166x faster than real-time!** 0.060s for 10s audio (after 211s JIT warmup). 198 MB peak VRAM. Runs in `mekopa/whisperx-blackwell` container (sidecar, not in her-os-core). Needs `add_safe_globals()` for PyTorch 2.6+ and `check_version` no-op for NGC semver bug. HF gated model requires license acceptance.

### Key Breakthrough: `--extra-index-url=https://pypi.nvidia.com`

**Many NVIDIA packages for aarch64 + CUDA 13 are on pypi.nvidia.com, NOT on standard PyPI.** Always use:

```bash
pip install <package>-cu13 --extra-index-url=https://pypi.nvidia.com
```

This unlocked cuGraph, cuVS, cuDF, and the full RAPIDS stack on DGX Spark.

### Kokoro GPU Patch (for production)

The Blackwell nvrtc doesn't support SM_120. Complex tensor ops trigger the Jiterator (runtime kernel compiler). Fix:

```python
import kokoro.istftnet as istftnet

def _patched_transform(self, input_data):
    fwd = torch.stft(input_data, self.filter_length, self.hop_length, self.win_length,
                     window=self.window.to(input_data.device), return_complex=False)
    real, imag = fwd[..., 0], fwd[..., 1]
    return torch.sqrt(real**2 + imag**2 + 1e-8), torch.atan2(imag, real)

def _patched_inverse(self, magnitude, phase):
    real = magnitude * torch.cos(phase)
    imag = magnitude * torch.sin(phase)
    return torch.istft(torch.complex(real, imag), self.filter_length, self.hop_length,
                       self.win_length, window=self.window.to(magnitude.device)).unsqueeze(-2)

istftnet.TorchSTFT.transform = _patched_transform
istftnet.TorchSTFT.inverse = _patched_inverse
```

---

## 4. Validation Protocol

For each technology:

```
1. Create isolated environment (venv or Docker container)
2. Install with version pinning
3. Run minimal "hello world" test
4. Run stress test (1000 ops, measure p50/p95/p99 latency)
5. Document: version, install method, result, latency, VRAM usage, workarounds
6. If FAIL: test alternative, document fallback plan
```

---

## 5. Risk Assessment

### Risk Resolution

| Risk | Severity | Outcome |
|------|----------|---------|
| CUDA 13.0 + ARM64 PyTorch | **HIGH** | **RESOLVED** — nightly cu128 index works. 12 TFLOPS FP16. |
| CTranslate2 (faster-whisper) on ARM64 | **HIGH** | **RESOLVED** — PyPI aarch64 wheels are CPU-only, but `mekopa/whisperx-blackwell` Docker has it compiled from source with CUDA. PyTorch Whisper also works (62x RT). |
| Qwen3-Embedding-8B VRAM | **MEDIUM** | **RESOLVED** — 14.13 GB. But 341ms latency fails <100ms target. Use 0.6B (69ms) for real-time. |
| asyncpg C extension on ARM64 | **MEDIUM** | **RESOLVED** — ARM64 wheel exists. 0.07ms/insert. |
| Qdrant Docker ARM64 | **LOW** | **RESOLVED** — Works perfectly. 5.71ms/search. |
| Neo4j Docker ARM64 | **LOW** | **RESOLVED** — Works perfectly. 1.9ms/lookup. |
| Blackwell SM_120 CUDA JIT | **HIGH** | **RESOLVED** — Only affects complex tensor ops (Jiterator). Fix: decompose to real/imag. Kokoro patched. |
| Docker Hub DNS on Titan | **LOW** | **KNOWN** — IPv6 timeouts. Disable IPv6 or use `--dns 8.8.8.8`. |
| RAPIDS on ARM64 | **HIGH** | **RESOLVED** — `pip install <pkg>-cu13 --extra-index-url=https://pypi.nvidia.com`. cuGraph + cuVS working. |

### Fallback Plans

| If This Fails... | Fallback |
|-------------------|----------|
| PyTorch native | NVIDIA NGC PyTorch container (Docker) |
| Qwen3-Embedding-8B | Smaller model (e5-large-v2, 335M) or API-based embedding |
| faster-whisper | whisper.cpp (native C++, good ARM support) |
| Mem0 | Direct Qdrant + manual memory management |
| Graphiti | Direct Neo4j Cypher + manual temporal queries |
| FalkorDB | Stay with Neo4j (primary choice) |
| cuGraph | NetworkX (CPU-only, slower but works) |
| FAISS-GPU | Qdrant (already validated) or HNSWlib |

---

## 6. Architecture Decisions (Based on Validation)

### Confirmed Stack

| Component | Technology | Benchmark | Notes |
|-----------|-----------|-----------|-------|
| Runtime | Python 3.12 + FastAPI 0.133 | - | All ARM64 compatible |
| Database | PostgreSQL 16 (Docker) | **0.44ms** query | asyncpg 0.31.0, SQLAlchemy 2.0.46 |
| Cache | Redis 7.4.8 (Docker) | **0.034ms** GET | Sub-ms ops, pub/sub works |
| Graph DB | **Neo4j 5** (not FalkorDB) | 1.9ms/lookup | FalkorDB blocked, Neo4j fully validated |
| Vector DB | Qdrant (Docker) | 5.71ms/search | On 4096-dim vectors |
| Embeddings (real-time) | **Qwen3-Embedding-0.6B** | **69ms** single, **4.7ms** batch | 1.1 GB VRAM. Meets <100ms target. |
| Embeddings (batch/offline) | Qwen3-Embedding-8B | 342ms single, 51ms batch | 14.1 GB VRAM. Better quality, offline re-indexing. |
| ML Framework | PyTorch 2.12.0.dev+cu128 | 12 TFLOPS FP16 | Nightly with CUDA, compute cap 12.1 |
| Memory | Mem0 (self-hosted) + Qdrant | - | Needs Claude LLM adapter |
| Knowledge Graph | Graphiti 0.28.1 + Neo4j | - | Needs Claude LLM adapter |
| NER | GLiNER 0.2.25 | **132ms**/text CPU | Excellent quality: Person 0.98, Date 0.97 |
| STT | **OpenAI Whisper large-v3 (GPU)** | **483ms**/5s audio | 0.097x RTF (10x RT), 8.75 GB VRAM |
| TTS | **Kokoro-82M 0.9.4 (GPU patched)** | **30ms** short text | 0.016x RTF (62x RT), TorchSTFT patch required |
| Voice Pipeline | Pipecat 0.0.103 | - | STT + LLM + TTS orchestration |
| Vector Search (GPU) | **cuVS 26.2.0** | **0.17ms**/query | IVF-Flat on 100K vectors |
| Vector Search (CPU backup) | FAISS-CPU 1.13.2 | 0.81ms/query | Backup for in-process search |
| Graph Analytics (GPU) | **cuGraph 26.02.00** | **1.0ms** PageRank, **2.3ms** BFS | On 10K nodes |
| LLM | Claude CLI 2.1.52 | 2.8s | Subprocess invocation |

### Key Learnings

1. **PyTorch on Blackwell:** Must use nightly cu128 index. Standard pip gives CPU-only on aarch64. `pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128`
2. **NVIDIA PyPI index is critical:** Many NVIDIA packages for aarch64 + CUDA 13 live on `https://pypi.nvidia.com`, NOT standard PyPI. Always use `--extra-index-url=https://pypi.nvidia.com` for RAPIDS packages. Use `-cu13` suffix (not `-cu12`).
3. **CUDA JIT (nvrtc):** SM_120 not recognized by Jiterator. Affects **only** elementwise ops on complex CUDA tensors. Fix: avoid complex tensors (use real/imag decomposition). Standard PyTorch ops (matmul, conv, FFT) work perfectly.
4. **Whisper: PyTorch > CTranslate2 (PyPI):** CTranslate2 PyPI aarch64 wheels lack CUDA, but the `mekopa/whisperx-blackwell` Docker container has it compiled from source with full CUDA. For bare-metal, `openai-whisper` via PyTorch on GPU gives 62x real-time with large-v3 (1.55B params, 8.75 GB VRAM).
5. **cuVS replaces FAISS-GPU:** cuVS is NVIDIA's successor to FAISS-GPU with aarch64 wheels. 3.2x faster on IVF-Flat.
6. **Graphiti + Mem0:** Both only ship OpenAI LLM client. **Sprint 1 task: write Claude adapter.**
7. **Docker Hub DNS:** Intermittent on Titan. Use `--dns 8.8.8.8` flag or disable IPv6 system-wide.

---

## 7. Definitive Benchmark Results (2026-02-24)

All 8 components tested. Docker services (PostgreSQL, Redis) running on Titan.

### Context Retrieval Pipeline (target: <100ms)

| Component | 0.6B Embedding | 8B Embedding |
|-----------|---------------|-------------|
| Embedding (single sentence) | **69.4ms** | 341.6ms |
| Embedding (batch 16, per-sentence) | **4.7ms** | 50.9ms |
| cuVS vector search (100K, top-10) | **0.17ms** | 0.17ms |
| cuGraph BFS (10K nodes) | **2.3ms** | 2.3ms |
| **Pipeline total** | **71.9ms ✓ PASS** | **344.0ms ✗ OVER** |

### Voice Pipeline (target: <900ms, excl. LLM)

| Component | Latency | RTF |
|-----------|---------|-----|
| Whisper large-v3 STT (5s audio, GPU) | **483ms** | 0.097x (10x real-time) |
| Whisper large-v3 STT (30s audio, GPU) | **481ms** | 0.016x (62x real-time) |
| Kokoro TTS (short text, GPU) | **30ms** | 0.016x (62x real-time) |
| Kokoro TTS (medium text, GPU) | **40ms** | 0.014x (71x real-time) |
| **Pipeline total (5s input)** | **513ms ✓ PASS** | — |

### Infrastructure Services

| Service | Latency |
|---------|---------|
| Redis GET | **0.034ms** |
| Redis SET | **0.036ms** |
| PostgreSQL SELECT+ORDER+LIMIT | **0.44ms** |
| cuGraph PageRank (10K nodes) | **1.0ms** |
| cuGraph BFS (10K nodes) | **2.3ms** |
| GLiNER NER (6 entity types) | **132.5ms** |

### VRAM Budget

| Model | VRAM |
|-------|------|
| Qwen3-Embedding-0.6B | 1.1 GB |
| Qwen3-Embedding-8B | 14.1 GB |
| Whisper large-v3 | 8.75 GB |
| Kokoro-82M | ~0.5 GB |
| **Real-time config (0.6B + Whisper + Kokoro)** | **~10.4 GB** |
| **Full config (both embeddings + Whisper + Kokoro)** | **~24.5 GB** |
| **Available** | **128 GB** |

### Embedding Model Decision (ADR-016: Progressive Retrieval)

**Dual-model strategy adopted:**
- **Real-time:** Qwen3-Embedding-0.6B (69ms, 1.1 GB VRAM, MTEB retrieval 61.83)
- **Nightly re-index:** Qwen3-Embedding-8B (51ms/sentence batch, 14.1 GB VRAM, MTEB retrieval 69.44)
- **Async deep-search:** When 0.6B confidence is low, 8B re-searches asynchronously and Annie follows up
- **Both loaded simultaneously:** 15.2 GB total (12% of 128 GB)
- **FP4 rejected:** GB10 (SM_121) lacks hardware FP4 tensor cores. Software 4-bit only gives ~1.5-2x (still >100ms)
- **NV-Embed-v2 rejected:** Pulse HQ's choice, but CC-BY-NC license (non-commercial only)

See `docs/TITAN-SETUP-RECIPES.md` for the complete gotchas and recipes document.
See ADR-016 in `docs/PROJECT.md` for the full progressive retrieval architecture.

### 7b. Docker Container Benchmark Results (2026-02-25, session 59)

**NGC PyTorch container provides massive performance improvement over bare-metal pip installs.**
The NGC container (`nvcr.io/nvidia/pytorch:25.11-py3`) includes NVIDIA-optimized CUDA libraries,
cuDNN tuning, and TensorRT integration that dramatically reduce inference latency.

#### Context Retrieval Pipeline (target: <100ms)

| Component | Docker (NGC) | Bare-metal | Change |
|-----------|:---:|:---:|:---:|
| Embed 0.6B (single) | **9.37ms** | 69.4ms | **-86.5%** |
| Embed 0.6B (batch/sent) | **1.08ms** | 4.7ms | -77.0% |
| Embed 8B (single) | **74.27ms** | 341.6ms | -78.3% |
| Embed 8B (batch/sent) | **7.94ms** | 50.9ms | -84.4% |
| cuVS search (100K) | 0.19ms | 0.17ms | +11.8% |
| cuGraph PageRank (10K) | **0.99ms** | 1.0ms | -1.0% |
| cuGraph BFS (10K) | **2.08ms** | 2.3ms | -9.6% |
| **Pipeline total** | **11.6ms ✓ PASS** | 71.9ms | **-83.9%** |

#### Voice Pipeline (target: <900ms)

| Component | Docker (NGC) | Bare-metal | Change |
|-----------|:---:|:---:|:---:|
| Whisper STT (5s) | **281.4ms** | 483ms | **-41.7%** |
| Whisper STT (10s) | **285.7ms** | — | — |
| Whisper STT (30s) | **288.7ms** | 481ms | -40.0% |
| Kokoro TTS | *(spacy bug)* | 30ms | — |
| **Pipeline total** | **281.4ms ✓ PASS** | 513ms | **-45.1%** |

#### Infrastructure

| Service | Docker | Bare-metal | Change |
|---------|:---:|:---:|:---:|
| GLiNER NER | **54.4ms** | 132.5ms | **-59.0%** |
| Redis SET | **0.023ms** | 0.036ms | -36.1% |
| Redis GET | **0.021ms** | 0.034ms | -38.2% |
| PostgreSQL | *(auth bug)* | 0.44ms | — |

#### Known Issues (session 59)
- **PostgreSQL:** `POSTGRES_PASSWORD` env var not passed to container (only `DATABASE_URL`). Fix committed.
- **Kokoro TTS:** spacy `en_core_web_sm` not installed in Docker image. Fix: added `python -m spacy download en_core_web_sm` to Dockerfile.

#### Key Insight
NGC PyTorch is **4-7x faster** for embedding and STT inference compared to bare-metal `pip install --pre torch --index-url nightly/cu128`. This validates the Docker-first deployment strategy (ADR-017).

---

### Remaining Actions

- [ ] Fix Docker DNS on Titan (disable IPv6: `sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1`)
- [ ] Write Claude LLM adapter for Graphiti (replaces OpenAI client)
- [ ] Write Claude LLM adapter for Mem0 (replaces OpenAI client)
- [x] Data Storage Strategy research → `docs/RESEARCH-DATA-STORAGE.md` (completed session 57)
- [x] Create Dockerfile + docker-compose.yml for reproducible deployment (completed session 57 — full Docker infrastructure: Dockerfile, docker-compose.yml, .env.example, requirements/{base,ml,voice}.txt, scripts/{install,backup,restore,export,upgrade,docker-certify}.sh, config/{postgres,qdrant}, her_os/{__init__,main}.py)
- [ ] Create `her_os/patches/kokoro_blackwell.py` — production TorchSTFT patch module