# her-os Deployment Guide -- DGX Spark (Titan)

> Practical guide for deploying the full her-os stack on NVIDIA DGX Spark.
> Last updated: 2026-03-04

## 1. Hardware Requirements

| Spec | Value |
|------|-------|
| Architecture | **aarch64** (ARM64) |
| GPU | NVIDIA GB10 (Blackwell, SM_120/SM_121) |
| CUDA | 13.0 |
| Memory | 128 GB unified (shared CPU/GPU) |
| OS | Ubuntu 24.04 LTS |
| Python | 3.12+ |

### VRAM Budget (~25-40 GB of 128 GB)

| Component | VRAM | Notes |
|-----------|------|-------|
| Audio Pipeline (WhisperX + pyannote) | ~4 GB | Docker, CTranslate2 GPU |
| SER Sidecar (emotion2vec + audeering) | ~1.3 GB | Speech emotion recognition |
| Ollama (qwen3:8b) | ~5 GB | Entity extraction |
| llama-server (Qwen3.5-9B Q4_K_M) | ~6 GB | Annie voice local LLM |
| Annie Voice (Whisper STT + Kokoro TTS) | ~3.5 GB | In-process GPU |

## 2. Prerequisites

### NVIDIA Container Toolkit (one-time)

Pre-installed on DGX Spark but **NOT registered** with Docker by default:
```bash
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
docker info | grep -i runtime  # Must show "nvidia"
```

### DNS Fix (DGX Spark WiFi breaks periodically)

```bash
sudo mkdir -p /etc/systemd/resolved.conf.d
sudo tee /etc/systemd/resolved.conf.d/dns.conf << 'EOF'
[Resolve]
DNS=8.8.8.8 8.8.4.4
FallbackDNS=1.1.1.1 1.0.0.1
EOF
sudo systemctl restart systemd-resolved
```

### HuggingFace Token (pyannote gated model)

```bash
echo "hf_YOUR_TOKEN" > ~/.huggingface/token
```

### Models to Download

```bash
docker exec ollama ollama pull qwen3:8b          # Entity extraction (~5 GB)
docker exec ollama ollama pull qwen3-embedding:8b  # Embeddings (~5 GB)
mkdir -p ~/models  # Download Qwen3.5-9B-Q4_K_M.gguf for llama-server
```

## 3. Service Startup Order (CRITICAL)

```
 1. PostgreSQL           -- database
 2. Neo4j + Qdrant       -- graph + vector store
 3. Ollama               -- local LLM
 4. Context Engine       -- shared brain
 5. Audio Pipeline       -- STT + diarization
 6. llama-server         -- Qwen3.5-9B
 7. SearXNG              -- web search
 8. Annie Voice          -- voice agent
 9. Dashboard            -- observability UI
```

## 4. Docker Compose Services

### 4.1 Infrastructure (Root Stack)

```bash
cd ~/workplace/her/her-os
cat > .env << 'EOF'
POSTGRES_PASSWORD=your_pg_password
REDIS_PASSWORD=your_redis_password
NEO4J_PASSWORD=her-os-graph
ANTHROPIC_API_KEY=sk-ant-...
EOF
docker compose up -d postgres redis neo4j qdrant
```

### 4.2 Context Engine + PostgreSQL

```bash
cd ~/workplace/her/her-os/services/context-engine
cat > .env << 'EOF'
POSTGRES_PASSWORD=your_ce_pg_password
CONTEXT_ENGINE_TOKEN=<python3 -c "import secrets; print(secrets.token_urlsafe(32))">
ANTHROPIC_API_KEY=sk-ant-...
EXTRACTION_LLM=ollama/qwen3:8b
DAILY_LLM=ollama/qwen3:8b
OLLAMA_BASE_URL=http://host.docker.internal:11434
TRANSCRIPT_DIR=/home/rajesh/.local/share/her-os-audio/transcripts
EVENTS_DIR=/home/rajesh/.local/share/her-os-audio/events
NEO4J_PASSWORD=her-os-graph
EOF
./run.sh                     # build + start
curl http://localhost:8100/health
```

**Context Engine env vars:**

| Variable | Required | Default |
|----------|----------|---------|
| `POSTGRES_PASSWORD` | Yes | -- |
| `CONTEXT_ENGINE_TOKEN` | Yes | -- |
| `ANTHROPIC_API_KEY` | No | -- |
| `EXTRACTION_LLM` | No | `ollama/qwen3.5:27b` |
| `DAILY_LLM` | No | `ollama/qwen3.5:27b` |
| `OLLAMA_BASE_URL` | No | `http://host.docker.internal:11434` |
| `TRANSCRIPT_DIR` | No | (docker-compose default) |
| `NEO4J_PASSWORD` | No | `her-os-graph` |
| `QDRANT_URL` | No | `http://host.docker.internal:16333` |

**run.sh commands:** `./run.sh` (start), `./run.sh stop`, `./run.sh logs`, `./run.sh test`, `./run.sh rebuild`

**Shared transcript volume:** Both audio-pipeline and context-engine mount `/home/rajesh/.local/share/her-os-audio/transcripts` -- audio writes JSONL, context-engine watches via inotify.

### 4.3 Audio Pipeline + SER Sidecar

```bash
cd ~/workplace/her/her-os/services/audio-pipeline
./run.sh              # builds + starts both containers
./run.sh logs         # watch for "Warmup complete -- ready for traffic!"
```

First run: ~276s (CUDA JIT compilation). Subsequent: ~3-8s (cached at `~/.local/share/her-os-audio/cuda-cache/`).

**Hot deploy:** `git pull && docker restart her-os-audio` (3-8s with warm cache).

**Containers:** `her-os-audio` (:9100, published) + `her-os-ser` (:9101, bridge-internal only).

### 4.4 Ollama

```bash
docker start ollama
# Or first time:
docker run -d --gpus all --name ollama -p 11434:11434 -v ollama-data:/root/.ollama ollama/ollama:latest
docker exec ollama ollama pull qwen3:8b
```

### 4.5 SearXNG

```bash
cd ~/workplace/her/her-os/services/annie-voice
docker compose up -d   # SearXNG on 127.0.0.1:8888
```

## 5. Bare-Metal Services

### 5.1 llama-server (Qwen3.5-9B)

```bash
export LD_LIBRARY_PATH=/usr/local/cuda-13/compat:$LD_LIBRARY_PATH
nohup ~/llama-cpp-latest/build-gpu/bin/llama-server \
  --host 0.0.0.0 --port 8003 \
  -m ~/models/Qwen3.5-9B-Q4_K_M.gguf \
  --alias qwen3.5-9b --ctx-size 32768 --n-gpu-layers 999 -fa auto --jinja \
  > /tmp/llama-9b.log 2>&1 &
```

**Critical flags:** `--jinja` (Qwen3.5 template), `--ctx-size 32768` (minimum), `-fa auto` (flash attention).

### 5.2 Annie Voice

```bash
fuser -k 7860/tcp 2>/dev/null; sleep 2
cd ~/workplace/her/her-os/services/annie-voice
CONTEXT_ENGINE_TOKEN="<your-token>" \
CONTEXT_ENGINE_URL="http://localhost:8100" \
LLAMACPP_BASE_URL="http://localhost:8003/v1" \
VOICE_AGENT_HOST="0.0.0.0" \
nohup .venv/bin/python server.py > /tmp/annie-voice.log 2>&1 &
```

**Annie Voice env vars:**

| Variable | Required | Default |
|----------|----------|---------|
| `CONTEXT_ENGINE_TOKEN` | Yes | -- |
| `CONTEXT_ENGINE_URL` | No | `http://localhost:8100` |
| `ANTHROPIC_API_KEY` | If Claude | -- |
| `LLAMACPP_BASE_URL` | No | `http://localhost:8003/v1` |
| `OLLAMA_BASE_URL` | No | `http://localhost:11434/v1` |
| `VOICE_AGENT_HOST` | No | `localhost` |
| `VOICE_AGENT_PORT` | No | `7860` |
| `WHISPER_MODEL` | No | `large-v3-turbo` |
| `KOKORO_VOICE` | No | `af_heart` |
| `LLM_BACKEND` | No | `claude` |
| `CLAUDE_MODEL` | No | `claude-sonnet-4-5-20250929` |

**LLM backends:** `claude` (Anthropic API, all tools), `qwen3.5-9b` (local llama-server, no code tools), `auto` (routes to Claude).

### 5.3 Dashboard

```bash
cd ~/workplace/her/her-os/services/context-engine/dashboard
nohup npx vite --host 0.0.0.0 --port 5174 > /tmp/dashboard.log 2>&1 &
# First visit: http://<ip>:5174/?token=<CONTEXT_ENGINE_TOKEN>
# Subsequent: http://<ip>:5174/ (token persisted in localStorage)
```

## 6. Port Map

| Port | Service | Bind |
|------|---------|------|
| 5432 | PostgreSQL (Context Engine) | all |
| 15432 | PostgreSQL (root stack) | all |
| 16379 | Redis | all |
| 17474 | Neo4j HTTP | all |
| 17687 | Neo4j Bolt | all |
| 16333 | Qdrant HTTP | all |
| 16334 | Qdrant gRPC | all |
| 11434 | Ollama | all |
| 8100 | Context Engine | all |
| 9100 | Audio Pipeline | all |
| 9101 | SER Sidecar | bridge only |
| 8003 | llama-server | all |
| 8888 | SearXNG | **localhost only** |
| 7860 | Annie Voice | all |
| 5174 | Dashboard | all |

## 7. Health Checks

```bash
# Docker containers
docker ps --format 'table {{.Names}}\t{{.Status}}'

# Bare-metal processes
fuser 7860/tcp 5174/tcp 8003/tcp 2>/dev/null

# Service-level checks
curl -s http://localhost:8100/health                  # Context Engine
curl -s http://localhost:9100/health                  # Audio Pipeline
curl -s http://localhost:7860/health                  # Annie Voice
curl -s http://localhost:11434/api/tags               # Ollama
curl -s http://localhost:8003/health                  # llama-server
curl -s "http://localhost:8888/search?q=test&format=json" | head -c 200  # SearXNG
curl -s http://localhost:16333/collections            # Qdrant
curl -s http://localhost:17474                        # Neo4j

# Context Engine detailed check
TOKEN="<your-token>"
curl -s -H "X-Internal-Token: $TOKEN" http://localhost:8100/v1/stats | python3 -m json.tool

# Ollama GPU verification (MUST show GPU %, NOT "100% CPU")
docker exec ollama nvidia-smi
docker exec ollama ollama ps

# Quick SSH check from remote
ssh titan "docker ps --format 'table {{.Names}}\t{{.Status}}' && fuser 7860/tcp 5174/tcp 8003/tcp 2>/dev/null"
```

## 8. Common Gotchas and Troubleshooting

### llama-server: CUDA errors on startup

**Cause:** Missing `LD_LIBRARY_PATH`. **Fix:** `export LD_LIBRARY_PATH=/usr/local/cuda-13/compat:$LD_LIBRARY_PATH`

### Annie Voice: no memory context

**Cause:** `CONTEXT_ENGINE_TOKEN` not set or mismatched. **Fix:** Ensure token matches Context Engine's `.env`.

### WebRTC secure context (SSH tunnel required)

**Symptom:** "WebRTC not supported" at `http://<ip>:7860`.
**Cause:** `getUserMedia` requires HTTPS or localhost.
**Fix:** `ssh -L 7860:localhost:7860 titan` then open `http://localhost:7860`.

### Ollama GPU: NVML initialization failed

**Symptom:** `docker exec ollama nvidia-smi` fails, models run on CPU.
**Fix:**
```bash
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
docker restart ollama
```

### Audio Pipeline: ~276s first warmup

**Cause:** CUDA JIT compiling PTX to SASS for Blackwell SM_121. Cache persists at `~/.local/share/her-os-audio/cuda-cache/`. Subsequent restarts: 3-8s.

### Context Engine: cannot reach Ollama

**Cause:** Linux Docker lacks `host.docker.internal` by default.
**Fix:** Already handled via `extra_hosts: ["host.docker.internal:host-gateway"]` in docker-compose.yml.

### DNS timeout during Docker build

**Fix:** `sudo resolvectl dns wlP9s9 8.8.8.8 8.8.4.4` then `docker compose build --no-cache`.

### Kokoro TTS: nvrtc SM_120 error

**Fix:** `blackwell_patch.py` auto-applied by `kokoro_tts.py`. No manual action needed.

### Qwen3.5: empty or slow responses

**Cause:** Thinking mode is ON. **Fix:** Start with `--jinja`, use `--ctx-size 32768`, ensure API calls set `"chat_template_kwargs": {"enable_thinking": false}`.

### Quick Reference Table

| Problem | Fix |
|---------|-----|
| Context Engine 401 | Token mismatch -- must be identical everywhere |
| No entities extracted | `docker exec ollama ollama list` -- model must be pulled |
| Dashboard no events | Check browser devtools for 401 on `/v1/events/stream` |
| Graphiti sync fails | `curl http://localhost:17474`, check NEO4J_PASSWORD |
| Audio Pipeline no segments | Very quiet audio produces 0 Whisper segments |
| Slow LLM responses | GPU contention -- check if sweep is running |

## 9. Log Locations

| Service | View Command |
|---------|--------------|
| Context Engine | `cd services/context-engine && docker compose logs -f context-engine` |
| PostgreSQL (CE) | `cd services/context-engine && docker compose logs -f postgres` |
| Audio Pipeline | `cd services/audio-pipeline && ./run.sh logs` |
| SER Sidecar | `cd services/audio-pipeline && ./run.sh logs-ser` |
| Ollama | `docker logs -f ollama` |
| SearXNG | `docker logs -f searxng` |
| Neo4j | `docker logs -f her-os-neo4j` |
| llama-server | `tail -f /tmp/llama-9b.log` |
| Annie Voice | `tail -f /tmp/annie-voice.log` |
| Dashboard | `tail -f /tmp/dashboard.log` |

**What to watch:** Ingest (`Ingested session X: N segments`), extraction (`Extracted N entities`), Graphiti sync (`episode created`), audio sweep (`Sweep complete` every 60s), Annie connection (`Client connected`).

## 10. Restart Procedures

### Full Stack Restart (After Reboot)

```bash
# 1. DNS fix (WiFi)
sudo resolvectl dns wlP9s9 8.8.8.8 8.8.4.4 && sudo resolvectl domain wlP9s9 "~."

# 2. Infrastructure
cd ~/workplace/her/her-os && docker compose up -d postgres redis neo4j qdrant

# 3. Ollama
docker start ollama

# 4. Context Engine
cd ~/workplace/her/her-os/services/context-engine && docker compose up -d

# 5. Audio Pipeline
cd ~/workplace/her/her-os/services/audio-pipeline && ./run.sh

# 6. llama-server
export LD_LIBRARY_PATH=/usr/local/cuda-13/compat:$LD_LIBRARY_PATH
nohup ~/llama-cpp-latest/build-gpu/bin/llama-server \
  --host 0.0.0.0 --port 8003 -m ~/models/Qwen3.5-9B-Q4_K_M.gguf \
  --alias qwen3.5-9b --ctx-size 32768 --n-gpu-layers 999 -fa auto --jinja \
  > /tmp/llama-9b.log 2>&1 &

# 7. SearXNG
cd ~/workplace/her/her-os/services/annie-voice && docker compose up -d

# 8. Annie Voice
fuser -k 7860/tcp 2>/dev/null; sleep 2
cd ~/workplace/her/her-os/services/annie-voice
CONTEXT_ENGINE_TOKEN="<token>" CONTEXT_ENGINE_URL="http://localhost:8100" \
LLAMACPP_BASE_URL="http://localhost:8003/v1" VOICE_AGENT_HOST="0.0.0.0" \
nohup .venv/bin/python server.py > /tmp/annie-voice.log 2>&1 &

# 9. Dashboard
cd ~/workplace/her/her-os/services/context-engine/dashboard
nohup npx vite --host 0.0.0.0 --port 5174 > /tmp/dashboard.log 2>&1 &
```

### Individual Service Restarts

**Context Engine:** `cd services/context-engine && docker compose up -d --build`

**Audio Pipeline (hot):** `git pull && docker restart her-os-audio`

**Audio Pipeline (full):** `cd services/audio-pipeline && ./run.sh build && ./run.sh`

**Annie Voice:** `fuser -k 7860/tcp; sleep 2` then start command from section 5.2

**llama-server:** `fuser -k 8003/tcp; sleep 2` then start command from section 5.1

**Ollama:** `docker restart ollama`

## 11. Architecture Diagram

```
                      +-----------+
                      |  Omi App  |   (BLE -> Opus -> VAD -> STT -> webhook)
                      +-----+-----+
                            |  POST /webhook/transcript
                            v
     +----------------------------------------------+
     |          Audio Pipeline (Docker :9100)        |
     |  WhisperX + pyannote + diarization            |
     |  Writes: /data/transcripts/SESSION.jsonl      |
     +---+------------------------------------------+
         |  JSONL (shared volume)    |  SER (bridge net)
         v                          v
     +------------------+   +--------------------+
     | Context Engine   |   |  SER Sidecar       |
     | Docker :8100     |   |  :9101 (internal)  |
     | inotify->ingest  |   +--------------------+
     | extract->PG+Neo4j|
     +---+---------+----+
         |         |
    +----+--+ +----+---+ +--------+
    |Ollama | | Neo4j  | | Qdrant |
    |:11434 | | :17687 | | :16333 |
    +-------+ +--------+ +--------+
         |
         v  /v1/context, SSE
     +----------------------------------------------+
     |        Annie Voice (bare-metal :7860)         |
     |  Whisper STT (GPU) + Kokoro TTS (GPU)         |
     |  LLM: Claude / Qwen3.5-9B / Ollama           |
     +--------+-----------------+-------------------+
              ^                 ^
         +--------+      +------------+
         | SearXNG|      | llama-srvr |
         | :8888  |      | :8003      |
         +--------+      +------------+

     +----------------------------------------------+
     |      Dashboard (Vite :5174)                   |
     |  Canvas 2D aquarium + SSE from :8100          |
     +----------------------------------------------+
```
