# Research: NVIDIA Open Models for her-os (Nemotron, Cosmos, GR00T)

**Date:** 2026-02-25 (Session 63), updated 2026-02-27 (Session 73)
**Status:** Research complete, updated with Feb 2026 landscape scan (Section 14)
**Source:** YouTube — [Reasoning at the Edge with NVIDIA Open Models](https://www.youtube.com/watch?v=u4ZA7XH7rN8) (49 min, NVIDIA Developer)
**Transcript:** `~/workplace/her/her-player/downloads/u4ZA7XH7rN8/subtitles.vtt`
**Context:** Three families of NVIDIA open models designed for edge/embedded inference. Evaluated for potential her-os use on DGX Spark (GB10 Blackwell, 128GB unified memory).

---

## Table of Contents

1. [Executive Summary](#1-executive-summary)
2. [Nemotron 3 Family (Language / Agentic AI)](#2-nemotron-3-family-language--agentic-ai)
3. [Cosmos Reason2 (Vision-Language Models)](#3-cosmos-reason2-vision-language-models)
4. [Isaac GR00T (Robotics)](#4-isaac-gr00t-robotics)
5. [NIM Containers on DGX Spark](#5-nim-containers-on-dgx-spark)
6. [Inference Engines: NIM vs Ollama vs llama.cpp](#6-inference-engines-nim-vs-ollama-vs-llamacpp)
7. [NVFP4 Quantization (Blackwell-Native)](#7-nvfp4-quantization-blackwell-native)
8. [Memory Budget on DGX Spark](#8-memory-budget-on-dgx-spark)
9. [Licensing](#9-licensing)
10. [Language Support & Kannada Gap](#10-language-support--kannada-gap)
11. [her-os Relevance: Vision-Language for Image/Video Understanding](#11-her-os-relevance-vision-language-for-imagevideo-understanding)
12. [Recommendation (Original, Session 63)](#12-recommendation)
13. [References](#13-references)
14. [Status Update — February 2026](#14-status-update--february-2026)

---

## 1. Executive Summary

NVIDIA released three families of open models optimized for edge inference:

| Family | Purpose | Sizes | her-os Relevance |
|--------|---------|-------|-------------------|
| **Nemotron 3** | Language/agentic AI | 30B (Nano), 100B (Super), 500B (Ultra) | Entity extraction (blocked by Kannada gap) |
| **Cosmos Reason2** | Vision-language (VLM) | 2B, 8B | **High** — image/video interpretation |
| **Isaac GR00T** | Robotics foundation model | N2/N2.5 | Not relevant |

**Key finding:** Nemotron 3 Nano 30B is impressive (MoE, only 3B active params, 55+ tok/s on edge) but **lacks Kannada** — dealbreaker for primary entity extraction. However, **Cosmos Reason2 VLMs** (2B/8B) are very interesting for adding image and video understanding to her-os context capture.

---

## 2. Nemotron 3 Family (Language / Agentic AI)

### Architecture

Nemotron 3 uses a **hybrid Mamba-2 + Transformer** architecture with **Mixture of Experts (MoE)** routing:
- Linear-time Mamba-2 layers for long sequences (1M context window)
- Sparse MoE routing activates only a fraction of total parameters per token
- Result: frontier reasoning at tiny inference cost

### Model Tiers

| Tier | Total Params | Active Params | Target Hardware | Speed |
|------|-------------|---------------|-----------------|-------|
| **Nano** | 30B | 3.5B (A3B) | Jetson Thor, Orin AGX, DGX Spark | 55 t/s (Thor), 35 t/s (Orin AGX) |
| **Super** | ~100B | ~10B | DGX Spark, workstations | Not yet released |
| **Ultra** | ~500B | ~50B | DGX B200, clusters | Expected H1 2026 |

### Nemotron 3 Nano 30B-A3B — Details

- **Architecture:** Mamba-2 hybrid + MoE, 30B total, 3.5B active per token
- **Context window:** 1M tokens
- **Benchmark quality:** Competitive with Llama 3.3 70B on many benchmarks despite 20x fewer active params
- **Availability:** HuggingFace, Ollama (`ollama run nemotron-3-nano`), NIM container
- **Quantizations:** BF16 (~60GB), FP8 (~32GB), **NVFP4 (~20GB, Blackwell-optimized)**, GGUF Q4 (~18-20GB)

### Nemotron Nano 9B v2

- Smaller variant, runs on Orin Nano (9 tok/s)
- NIM container: `nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2:latest`
- Good for constrained devices, less interesting for DGX Spark

### Nemotron Nano 12B v2 VL (Vision-Language)

- **12B vision-language model** based on Nemotron architecture
- NIM container: `nvcr.io/nim/nvidia/nemotron-nano-12b-v2-vl:1.5.0`
- Can process images alongside text
- Potential her-os use: understanding photos shared in conversations, reading screenshots

---

## 3. Cosmos Reason2 (Vision-Language Models)

### What It Is

Cosmos Reason2 models are **Vision-Language Models (VLMs)** designed for **physical world reasoning** — understanding spatial relationships, object interactions, physics, and temporal sequences from images and video.

### Architecture

- Based on **Qwen3-VL** (Alibaba's vision-language model)
- Fine-tuned by NVIDIA for physical reasoning tasks
- Available in 2B and 8B sizes

### Sizes and Hardware

| Model | Params | Target Hardware | Use Case |
|-------|--------|-----------------|----------|
| **Cosmos Reason2 2B** | 2B | Jetson Orin Nano/NX | Lightweight visual reasoning |
| **Cosmos Reason2 8B** | 8B | Jetson AGX Orin, DGX Spark | Full visual reasoning |

### Capabilities

- **Spatial reasoning:** Understanding object positions, sizes, relationships in images
- **Temporal reasoning:** Understanding sequences of events in video
- **Physical reasoning:** Predicting what happens next (gravity, collisions, fluid dynamics)
- **Scene understanding:** Describing complex scenes with multiple objects and interactions
- **Video analysis:** Processing video frames for action recognition and event detection

### Why This Matters for her-os

Cosmos Reason2 models could enable:
1. **Photo context capture:** When Rajesh takes/receives photos, Annie understands what's in them
2. **Screenshot reading:** Understanding UI screenshots, documents, receipts shared in conversations
3. **Video clip interpretation:** Understanding short video clips shared via messaging
4. **Visual memory:** "Remember that restaurant we saw in the photo last week?"
5. **Document understanding:** Reading handwritten notes, whiteboards, printed documents
6. **Medical image context:** Understanding medical reports/images discussed in conversations (privacy-sensitive)

The 8B model at ~16GB fits easily alongside the her-os stack on DGX Spark.

---

## 4. Isaac GR00T (Robotics)

Isaac GR00T N2/N2.5 is a robotics foundation model for humanoid robot control. Uses a dual-system architecture (vision-language "thinking" + diffusion transformer "acting"). **Not relevant to her-os** — included here for completeness.

---

## 5. NIM Containers on DGX Spark

### Available NIM Containers

| Model | Container Image | Status on DGX Spark |
|-------|----------------|---------------------|
| Nemotron 3 Nano 30B | `nvcr.io/nim/nvidia/nemotron-3-nano:latest` | **Reported issues on ARM64** |
| Nemotron Nano 9B v2 | `nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2:latest` | Untested |
| Nemotron Nano 12B v2 VL | `nvcr.io/nim/nvidia/nemotron-nano-12b-v2-vl:1.5.0` | Untested |
| Cosmos Reason2 | TBD (check NGC catalog) | Untested |

### Deployment Command (Standard)

```bash
export NGC_API_KEY=<YOUR_KEY>
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
docker login nvcr.io -u '$oauthtoken' -p "$NGC_API_KEY"
docker run -it --rm --gpus all --shm-size=16GB \
  -e NGC_API_KEY -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -p 8000:8000 nvcr.io/nim/nvidia/nemotron-3-nano:latest
```

NIM exposes an **OpenAI-compatible API** at `http://localhost:8000/v1/chat/completions`.

### DGX Spark ARM64 Compatibility — CRITICAL RISK

Multiple community reports of NIM issues on DGX Spark (ARM64 Grace CPU):

1. **Missing ARM64 images:** Many NIM containers are `linux/amd64` only, causing pull failures on DGX Spark (`linux/arm64/v8`)
   - Forum: https://forums.developer.nvidia.com/t/missing-official-native-arm64-nim-images-for-essential-ai-models/350681

2. **CUDA 13 + ONNX Runtime:** `cudaErrorSymbolNotFound` on ARM64 embedding NIM containers
   - Forum: https://forums.developer.nvidia.com/t/dgx-spark-gb10-arm64-embedding-nim-llama-3-2-nv-embedqa-1b-v2-1-10-0-fails-with-cudaerrorsymbolnotfound-onnx-runtime/354998

3. **Nemotron 3 Nano BF16 on DGX Spark:** Initially "Does not work", later a pre-built container fix was posted
   - Discussion: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/discussions/13

4. **vLLM SM_121 support:** Open issue for Blackwell DGX Spark GPU architecture
   - Issue: https://github.com/vllm-project/vllm/issues/31128

**Positive signal:** NIM LLM containers v1.14.0-pb5.0+ claim both AMD64 and ARM64 support. NVIDIA is actively fixing this — DGX Spark is their flagship personal AI hardware.

---

## 6. Inference Engines: NIM vs Ollama vs llama.cpp

### Performance Comparison on DGX Spark

| Engine | Nemotron 3 Nano 30B | Notes |
|--------|---------------------|-------|
| **llama.cpp** | 63-74 t/s | Proven, community-validated, best single-user choice |
| **llama.cpp + NVFP4** | 65-67 t/s | With CUDA graph acceleration |
| **Ollama** | ~60-70 t/s | 3-4 t/s slower than raw llama.cpp (wrapper overhead) |
| **NIM (TensorRT-LLM)** | Theoretically fastest | ARM64 compatibility uncertain |
| **vLLM** | Good for concurrent | Overkill for single-user, SM_121 support pending |

**Source:** "Choosing an Inference Engine on DGX Spark" — https://medium.com/sparktastic/choosing-an-inference-engine-on-dgx-spark-8a312dfcaac6

### Recommendation

1. **Try NIM first** — if it works, you get TensorRT-LLM optimization for free
2. **If NIM fails on ARM64, use Ollama** — proven easy path, ~60-70 t/s
3. **For maximum performance, build llama.cpp from source** on DGX Spark with SM_121 CUDA kernels
   - NVIDIA playbook: https://build.nvidia.com/spark/nemotron

---

## 7. NVFP4 Quantization (Blackwell-Native)

NVFP4 is a **Blackwell-native quantization format** (SM_120/121) that delivers:
- **4x higher throughput** vs BF16
- **99.4% of BF16 accuracy** retention
- ~20GB memory for Nemotron 3 Nano 30B (vs 60GB BF16)

NVIDIA released an official model: `NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4` on HuggingFace.

This is the **recommended quantization** for DGX Spark — purpose-built for the Blackwell architecture.

---

## 8. Memory Budget on DGX Spark

### Current her-os Stack (from Phase 0 validation)

| Component | VRAM |
|-----------|------|
| NV-Embed 0.6B (fallback) | 1.2 GB |
| NV-Embed 8B (primary) | 14.1 GB |
| Whisper large-v3 | 8.75 GB |
| Kokoro TTS | ~0.5 GB |
| FalkorDB | ~2 GB |
| Qdrant | ~2 GB |
| PostgreSQL | ~1 GB |
| **Subtotal** | **~30 GB** |

### Adding NVIDIA Models

| Addition | VRAM | Running Total |
|----------|------|---------------|
| Nemotron 3 Nano 30B (NVFP4) | ~20 GB | ~50 GB |
| Cosmos Reason2 8B | ~16 GB | ~66 GB |
| Nemotron Nano 12B VL | ~24 GB | ~90 GB |
| **128 GB unified memory** | | **38-62 GB free** |

**Verdict:** DGX Spark can run the full her-os stack + Nemotron Nano + Cosmos Reason2 simultaneously with ~62 GB to spare. Adding the 12B VL model too still leaves ~38 GB free.

Not all models need to run simultaneously — VLM inference is on-demand (load when photo/video arrives, unload after).

---

## 9. Licensing

### NVIDIA Open Model License

All three families (Nemotron, Cosmos, GR00T) use the **NVIDIA Open Model License**:
- **Commercial use:** Allowed
- **Derivative works:** Allowed
- **Distribution:** Allowed
- **Guardrail clause:** If you remove safety guardrails, you must replace with equivalent

### NIM Container License

- **NVIDIA Developer Program (free):** Self-hosted NIM for research/development/experimentation on up to 16 GPUs
- **Enterprise License:** Required only for "production" deployments (not relevant for personal AI)
- Source: https://developer.nvidia.com/blog/access-to-nvidia-nim-now-available-free-to-developer-program-members/

**Verdict:** her-os use case (personal, self-hosted, 2 DGX Sparks) is clearly within free Developer Program scope.

---

## 10. Language Support & Kannada Gap

### Nemotron 3 Nano — Supported Languages

English + 19 other languages (20 total):
> Arabic, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Vietnamese

**Kannada is NOT supported.** This is a dealbreaker for her-os entity extraction from Kannada-English code-mixed transcripts (3 of our 8 eval transcripts are Kannada-heavy).

### Cosmos Reason2 — Language Support

Based on Qwen3-VL which supports **119 languages including Kannada**. However, Cosmos fine-tuning is primarily English/Chinese. Visual reasoning tasks are less language-dependent than entity extraction — the model describes what it *sees*, language is secondary.

### Implications for her-os

- **Entity extraction:** Nemotron cannot replace Claude/Haiku for Kannada transcripts
- **Visual understanding:** Cosmos Reason2 is viable — image/video descriptions can be in English even for a Kannada-speaking user
- **Potential hybrid:** Use Nemotron for English-only reasoning tasks (summarization, planning, agentic), Cosmos for visual, Claude/Haiku for Kannada entity extraction

---

## 11. her-os Relevance: Vision-Language for Image/Video Understanding

### The Opportunity

her-os currently captures conversational context via audio (Omi wearable → STT → entity extraction). Adding visual understanding opens new context dimensions:

#### Near-term Use Cases (Cosmos Reason2 8B)

1. **Photo interpretation:** "Annie, what's in this photo?" → Scene description, object identification, text reading (OCR)
2. **Receipt/document capture:** Photo of receipt → extract merchant, amount, items, date
3. **Medical report reading:** Photo of lab report → extract values (TSH 6.5, BP 120/80) into health entities
4. **Screenshot understanding:** Screenshot of conversation → extract context, names, topics
5. **Visual memory search:** "Remember that building we saw?" → search visual memories by description

#### Medium-term Use Cases

6. **Video clip interpretation:** Short video from messaging → describe what's happening, who's visible, what's said
7. **Continuous visual context:** If Omi adds camera or paired with smart glasses → ambient visual understanding
8. **Whiteboard/handwriting:** Photo of whiteboard → extract action items, diagrams, flow descriptions
9. **Food/health tracking:** Photo of meal → estimate contents for health dimension

#### Architecture Integration

```
Visual Input (photo/video/screenshot)
    ↓
Cosmos Reason2 8B (on Titan, ~16GB VRAM)
    ↓
Structured description (JSON: objects, text, scene, actions)
    ↓
Entity Extraction (Claude/Haiku — same pipeline as audio)
    ↓
Knowledge Graph (Graphiti — visual entities linked to temporal context)
```

The VLM acts as a **visual-to-text bridge** — it converts images/video into structured descriptions that feed into the existing entity extraction pipeline. No new graph schema needed.

### Nemotron 12B VL — Alternative

The Nemotron Nano 12B v2 VL model is another option:
- Larger (12B vs 8B) but Nemotron-native architecture
- NIM container available: `nvcr.io/nim/nvidia/nemotron-nano-12b-v2-vl:1.5.0`
- Same language limitation as text Nemotron (no Kannada for text generation, but visual features are language-agnostic)

---

## 14. Status Update — February 2026

**Date:** 2026-02-27 (Session 73)
**Status:** Comprehensive web research on Nemotron evolution since session 63

---

### 14.1 Nemotron Model Family — Current Landscape

#### Released Models (Available Now)

| Model | Params (Total/Active) | Architecture | Multimodal? | Languages | VRAM (NVFP4) | Status |
|-------|----------------------|--------------|-------------|-----------|--------------|--------|
| **Nemotron 3 Nano 30B-A3B** | 30B / 3.5B | Hybrid Mamba-2 + MoE | No | 20 | ~20 GB | Available |
| **Nemotron Nano 9B v2** | 9B | Dense | No | ~10 | ~5 GB | Available |
| **Nemotron Nano 12B v2** | 12B | Hybrid Mamba-Transformer | No | 10 | ~7 GB | Available |
| **Nemotron Nano 12B v2 VL** | 12B | Hybrid + RADIOv2.5 vision | **Yes** (Image+Video) | 10 | ~7 GB | Available, NIM ready |
| **Nemotron-4-Mini-Hindi 4B** | 4B | Dense (from Nemotron-4 15B) | No | 3 (En/Hi/Hinglish) | ~2 GB | Available |
| **Nemotron Speech 0.6B** | 0.6B | FastConformer | N/A (ASR) | English only | <1 GB | Available |

#### Announced But NOT Yet Released

| Model | Params (Total/Active) | Expected | Notes |
|-------|----------------------|----------|-------|
| **Nemotron 3 Super** | ~100B / ~10B | **H1 2026** (Q1 originally) | High-accuracy reasoning, latent MoE, NVFP4 training |
| **Nemotron 3 Ultra** | ~500B / ~50B | **H1 2026** | Complex AI, needs DGX B200/clusters |

**Critical note:** As of Feb 27, 2026, Nemotron 3 Super and Ultra have NOT been released despite the original "Q1 2026" target. They remain "expected H1 2026." No language support details have been disclosed for Super/Ultra.

---

### 14.2 Nemotron 3 Nano — Complete Language List

The full 20-language list (confirmed from multiple sources):

> English, Arabic, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Vietnamese

Plus 43 programming languages.

**Kannada: NOT supported. No Dravidian language is supported.** Hindi is the sole Indic language.

---

### 14.3 Nemotron Nano 12B v2 VL — Vision-Language Details

The VLM variant has been significantly updated since our session 63 research:

- **Architecture:** Hybrid Mamba-Transformer LLM + RADIOv2.5 vision encoder
- **Context:** 16K to 128K tokens (extended via multi-stage training)
- **Capabilities:** Multi-image, video understanding, visual Q&A, dense captioning, OCR
- **OCR benchmark:** #1 on OCRBench v2 (industry-leading)
- **Languages for VLM:** English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese (10 languages)
- **Kannada OCR/VLM:** NOT tested, NOT in the OCR training data (English + Chinese + "other languages")
- **NIM container:** `nvcr.io/nim/nvidia/nemotron-nano-12b-v2-vl:1.5.0` (available)
- **Key feature:** Efficient Video Sampling (EVS) — prunes temporally static patches for faster video inference

**her-os relevance:** Strong for English document/receipt/screenshot OCR. Cannot be relied on for Kannada script OCR.

---

### 14.4 Nemotron Speech ASR

- **Model:** `nemotron-speech-streaming-en-0.6b` — English-only, FastConformer architecture
- **Design:** Cache-aware streaming ASR for real-time voice agents
- **Indic languages:** NOT supported natively. However:
  - **Gnani.ai** fine-tuned Nemotron Speech for Indic languages, achieving 15x inference cost reduction, scaling to 10M+ calls/day
  - Gnani is building a 14B speech-to-speech model on Nemotron Speech + NeMo
  - NVIDIA acknowledged on HuggingFace that Indic language support is "something they'll look into adding to the roadmap"
- **Multilingual Nemotron Speech:** Not yet planned publicly. Current model is English-focused.

**her-os relevance:** Not useful — we already have Whisper large-v3 (99 languages, 62x RT on Titan). Nemotron Speech would only matter if it could beat Whisper on Kannada, which it cannot since it is English-only.

---

### 14.5 Nemotron-Personas-India Dataset

NVIDIA released **Nemotron-Personas-India**, the first open synthetic dataset of Indic personas aligned to India's real-world demographics:
- Addresses code-switching between English and Hindi
- Regional occupational categories
- Cultural context essential for trust and adoption

This signals NVIDIA recognizes the Indic language gap but is addressing it via data/fine-tuning rather than base model expansion. The dataset could theoretically be used to fine-tune Nemotron 3 Nano for Hindi-English code-switching, but **Kannada is not mentioned** in the Personas-India dataset scope.

---

### 14.6 Benchmark Comparisons — Nemotron 3 Nano vs Qwen3.5 vs Others

| Benchmark | Nemotron 3 Nano 30B-A3B | Qwen3 30B-A3B | Qwen3.5 35B-A3B | Notes |
|-----------|------------------------|---------------|-----------------|-------|
| MATH | 82.88% | 61.14% | — | Nemotron significantly ahead |
| HumanEval | 78.05% | 70.73% | — | Nemotron ahead |
| AIME 2025 | 89.1% | 85.0% | — | Nemotron ahead |
| LiveCodeBench v6 | 68.3% | 66.0% | — | Nemotron ahead |
| Arena-Hard-v2 | 67.7% | 57.8% | — | Nemotron significantly ahead |
| MMLU-Pro | 78.3% | 80.9% | 85.3% | Qwen3.5 leads |
| Throughput (H200) | 3.3x vs Qwen3 | baseline | — | Nemotron far more efficient |
| Context window | 1M tokens | 128K | 262K (1M w/ YaRN) | Nemotron and Qwen3.5 both long |
| **Kannada** | **NO** | **YES (119 langs)** | **YES (201 langs)** | **Dealbreaker** |
| **Multimodal VLM** | **NO** (text-only) | **NO** | **YES** (Image+Video) | Qwen3.5 unique advantage |

**Throughput advantage is real:** On single H200, Nemotron 3 Nano provides 3.3x higher throughput than Qwen3-30B at similar quality. The hybrid Mamba-2 architecture is genuinely more efficient.

**But for her-os, the language gap is disqualifying.** Nemotron 3 Nano cannot process Kannada-English code-mixed transcripts, and has no VLM capability. Qwen3.5-35B-A3B covers both gaps (201 languages + native VLM).

---

### 14.7 The Indic Ecosystem — Models That DO Support Kannada

While Nemotron itself does not support Kannada, the broader NVIDIA-adjacent ecosystem has significant Kannada-capable models:

#### Sarvam AI (NVIDIA NeMo-trained)

| Model | Params (Total/Active) | Languages | Kannada? | Status |
|-------|----------------------|-----------|----------|--------|
| **Sarvam-M (Sarvam 1)** | 24B | 11 (En + 10 Indic) | **YES** | Available, API free |
| **Sarvam-30B (Vikram)** | 30B / ~1B (MoE) | 22 Indic + En | **YES** | Launched Feb 18, 2026 |
| **Sarvam-105B (Indus)** | 105B / ~9B (MoE) | 22 Indic + En | **YES** | Beta (iOS/Android/Web), Feb 20, 2026 |
| Saaras v3 (STT) | — | 23 languages | **YES** | Available |
| Bulbul v3 (TTS) | — | 11 languages | **YES** | Available |

- Pre-trained from scratch on ~16-22 trillion tokens using **NVIDIA NeMo + Megatron-LM**
- Post-trained with NeMo RL
- Uses NeMo Curator for data + subset of NVIDIA Nemotron datasets
- **Open source announced** but HuggingFace weights not yet confirmed available for download
- Sarvam-105B context: 128K tokens
- Sarvam-30B context: 32K tokens

#### BharatGen Param2 (IndiaAI Initiative)

| Model | Params (Total/Active) | Languages | Kannada? | Status |
|-------|----------------------|-----------|----------|--------|
| **Param2-17B-A2.4B** | 17B / 2.4B (MoE) | 23 (En + Hi + 21 Indic) | **YES** | Launched Feb 2026, HuggingFace available |

- Pre-trained from scratch on ~22 trillion tokens
- **Shared routing experts** specifically designed for Indian language diversity (code-switching, cross-lingual representations)
- Benchmarked with Kannada in ARC Challenge (Indic) and TriviaQA (Indic MCQ) macro-averaged scores
- Sovereign AI initiative — government-backed
- Available on HuggingFace: `bharatgenai/Param2-17B-A2.4B-Thinking`

---

### 14.8 NVIDIA's India Strategy

NVIDIA's approach to Indic languages is **ecosystem enablement**, not direct model support:

1. **NeMo framework** — used by Sarvam, BharatGen, Gnani to train Indic models
2. **Nemotron datasets** — subsets used as training data for Indic fine-tuning
3. **Nemotron-Personas-India** — synthetic Indic persona dataset (Hindi focus)
4. **Nemotron-4-Mini-Hindi** — dedicated Hindi model (but Hindi only, no Kannada)
5. **NIM containers** — deployment platform for all models
6. **DGX Spark** — target hardware for India AI mission

This means: **NVIDIA will likely never ship a Nemotron-Kannada model directly.** Instead, they enable the Indian ecosystem (Sarvam, BharatGen, Gnani) to build Kannada models on NVIDIA infrastructure. For her-os, the path to Kannada is through these ecosystem models, not through Nemotron itself.

---

### 14.9 Updated Recommendation for her-os

#### Nemotron Verdict: STILL NOT for Kannada Entity Extraction

The session 63 conclusion holds. No Nemotron model supports Kannada. Nemotron 3 Super/Ultra are delayed and their language support is unknown.

#### What Changed Since Session 63

| Change | Impact on her-os |
|--------|-----------------|
| **Nemotron 3 Super/Ultra delayed** | Cannot count on them for H1 2026. Original "may have broader language support" hope unverifiable. |
| **Sarvam-30B and 105B launched** | Potential Kannada-capable local models IF weights become downloadable. MoE architecture (1B and 9B active) fits DGX Spark well. |
| **BharatGen Param2 on HuggingFace** | 17B/2.4B MoE with explicit Kannada benchmarks. Smallest VRAM (~10-12 GB Q4). Worth evaluating for Kannada entity extraction. |
| **Nemotron Nano VL tops OCR benchmarks** | Best-in-class OCR but only for supported languages (not Kannada). |
| **Nemotron Speech is English-only** | Irrelevant for her-os (Whisper large-v3 is superior for multilingual). |

#### Revised Action Items

| Action | Priority | Rationale |
|--------|----------|-----------|
| **Evaluate BharatGen Param2-17B on Titan** | **HIGH** | Available on HuggingFace now, explicit Kannada benchmarks, MoE (2.4B active) = fast inference, ~10-12 GB Q4 |
| **Monitor Sarvam-30B/105B weight release** | **HIGH** | If open-sourced on HuggingFace, Sarvam-30B (1B active MoE) could be the Kannada specialist |
| **Test Qwen3.5-35B-A3B Kannada OCR** | **MEDIUM** | Already validated on Titan but Kannada OCR scored 2-5/10 — needs targeted improvement or fallback |
| **Keep Nemotron Nano VL for English OCR** | **LOW** | Top OCR benchmark but no Kannada — use only for English documents/receipts |
| **Watch Nemotron 3 Super release** | **MONITOR** | May expand language support but no guarantees |
| **Do NOT fine-tune Nemotron for Kannada** | **FIRM** | No Kannada in pretraining data = catastrophic tokenization. Better to use models pretrained with Kannada. |

#### The Emerging Architecture

```
Kannada-English code-mixed transcripts
    ↓
Qwen3.5-35B-A3B (primary, 201 langs, VLM)     ← VALIDATED
    ↓
If Kannada extraction quality insufficient:
    ↓
BharatGen Param2-17B (Kannada specialist)       ← EVALUATE ON TITAN
    or
Sarvam-30B (when weights available)             ← MONITOR
    ↓
Entity Extraction Pipeline
    ↓
Knowledge Graph (Graphiti)
```

This parallels the existing ADR-019 dual-model routing (Qwen3.5 primary + Qwen3 specialist) but adds a **Kannada specialist lane** using Indian-origin models that were actually pretrained on Kannada data.

---

## 12. Recommendation

### Immediate (Session 63+)

| Action | Priority | Rationale |
|--------|----------|-----------|
| **Do NOT use Nemotron for entity extraction** | Firm | Kannada gap is a dealbreaker |
| **Evaluate Cosmos Reason2 8B on Titan** | Medium | Image/video understanding is a new capability dimension |
| **Try NIM container pull on Titan** | Low | Test ARM64 compatibility, may save time later |

### When To Revisit

- **Nemotron 3 Super (~100B):** When released (expected Q1 2026). May have broader language support.
- **Nemotron Kannada:** If NVIDIA adds Indic language support in future releases (track release notes)
- **Cosmos for her-os visual dimension:** When building Dimension 3 (Life Logger), integrate VLM as visual context bridge

### What NOT To Do

- Don't replace Claude/Haiku with Nemotron for entity extraction (Kannada gap)
- Don't invest in NIM container debugging until ARM64 support matures
- Don't run all models simultaneously — use on-demand loading for VLM

---

## 13. References

### NVIDIA Official

1. **Nemotron 3 Nano NIM Deploy:** https://build.nvidia.com/nvidia/nemotron-3-nano-30b-a3b/deploy
2. **Nemotron 3 Nano NVFP4 (HuggingFace):** https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
3. **Nemotron 3 Nano BF16 (HuggingFace):** https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
4. **Nemotron Nano 12B VL NIM:** NGC catalog — `nvcr.io/nim/nvidia/nemotron-nano-12b-v2-vl:1.5.0`
5. **NIM Free for Developers:** https://developer.nvidia.com/blog/access-to-nvidia-nim-now-available-free-to-developer-program-members/
6. **DGX Spark Nemotron Playbook:** https://build.nvidia.com/spark/nemotron

### Community / Troubleshooting

7. **Choosing an Inference Engine on DGX Spark:** https://medium.com/sparktastic/choosing-an-inference-engine-on-dgx-spark-8a312dfcaac6
8. **Missing ARM64 NIM Images (Forum):** https://forums.developer.nvidia.com/t/missing-official-native-arm64-nim-images-for-essential-ai-models/350681
9. **CUDA 13 ARM64 Embedding NIM Failure:** https://forums.developer.nvidia.com/t/dgx-spark-gb10-arm64-embedding-nim-llama-3-2-nv-embedqa-1b-v2-1-10-0-fails-with-cudaerrorsymbolnotfound-onnx-runtime/354998
10. **Nemotron BF16 DGX Spark Fix:** https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/discussions/13
11. **vLLM SM_121 Blackwell Support:** https://github.com/vllm-project/vllm/issues/31128

### Source Video

12. **YouTube:** "Reasoning at the Edge with NVIDIA Open Models: Nemotron, Cosmos, and Isaac GR00T on Jetson" — https://www.youtube.com/watch?v=u4ZA7XH7rN8 (49 min, NVIDIA Developer channel)

### Session 73 Update (Feb 2026)

13. **Nemotron 3 Family Research Page:** https://research.nvidia.com/labs/nemotron/Nemotron-3/
14. **Nemotron Nano VL OCR Blog:** https://developer.nvidia.com/blog/new-nvidia-llama-nemotron-nano-vision-language-model-tops-ocr-benchmark-for-accuracy/
15. **Nemotron Speech ASR (MarkTechPost):** https://www.marktechpost.com/2026/01/06/nvidia-ai-released-nemotron-speech-asr-a-new-open-source-transcription-model-designed-from-the-ground-up-for-low-latency-use-cases-like-voice-agents/
16. **Nemotron-Personas-India:** https://huggingface.co/blog/nvidia/nemotron-personas-india
17. **Sarvam-30B/105B Launch (BusinessToday):** https://www.businesstoday.in/tech-today/story/sarvam-ai-launches-30b-and-105b-open-source-models-516729-2026-02-18
18. **Sarvam-3 Open Source (TechCrunch):** https://techcrunch.com/2026/02/18/indian-ai-lab-sarvams-new-models-are-a-major-bet-on-the-viability-of-open-source-ai/
19. **BharatGen Param2 17B (HuggingFace):** https://huggingface.co/bharatgenai/Param2-17B-A2.4B-Thinking
20. **BharatGen Param2 Launch (News9):** https://www.news9live.com/technology/artificial-intelligence/bharatgen-param2-17b-moe-model-indian-languages-indiaai-impact-summit-2026-2932750
21. **NVIDIA India AI Mission:** https://blogs.nvidia.com/blog/india-ai-mission-infrastructure-models/
22. **India Enterprises Indic LLMs (NVIDIA Blog):** https://blogs.nvidia.com/blog/llms-indian-languages/
23. **Nemotron 3 Nano Benchmarks (llm-stats):** https://llm-stats.com/blog/research/nemotron-3-nano-launch
24. **Nemotron 3 Nano vs Qwen3 (llm-stats):** https://llm-stats.com/models/compare/nemotron-3-nano-30b-a3b-vs-qwen3-30b-a3b
25. **Nemotron Nano V2 VL Paper:** https://arxiv.org/html/2511.03929v2
26. **NIM on DGX Spark:** https://build.nvidia.com/spark/nim-llm
27. **Gnani + Nemotron Speech for Indic:** https://blogs.nvidia.com/blog/llms-indian-languages/
28. **NVIDIA NeMo Framework:** https://github.com/NVIDIA-NeMo/NeMo
