# Research: Qwen3 Family — Full Pipeline Evaluation for Annie

**Date:** 2026-03-31
**Status:** Complete research. No immediate adoption recommended. Several models worth monitoring.
**Scope:** Every Qwen3 and Qwen3.5 model evaluated against Annie's voice/AI pipeline on Panda (RTX 5070 Ti, 16 GB) and Titan (DGX Spark, 128 GB).

---

## Table of Contents

1. [Complete Model Inventory](#1-complete-model-inventory)
2. [Pipeline Slot 1: STT (Speech-to-Text)](#2-pipeline-slot-1-stt)
3. [Pipeline Slot 2: LLM (Language Model)](#3-pipeline-slot-2-llm)
4. [Pipeline Slot 3: TTS (Text-to-Speech)](#4-pipeline-slot-3-tts)
5. [Pipeline Slot 4: Omni (All-in-One)](#5-pipeline-slot-4-omni)
6. [Pipeline Slot 5: Vision (Phone Screen/Photo)](#6-pipeline-slot-5-vision)
7. [Pipeline Slot 6: Embedding (Semantic Search)](#7-pipeline-slot-6-embedding)
8. [Bonus: Qwen3-Coder](#8-bonus-qwen3-coder)
9. [Kannada Language Support Matrix](#9-kannada-language-support-matrix)
10. [Hardware Fit Matrix](#10-hardware-fit-matrix)
11. [Best-of-Breed Recommendation](#11-best-of-breed-recommendation)
12. [What to Monitor](#12-what-to-monitor)
13. [Sources](#13-sources)

---

## 1. Complete Model Inventory

### Qwen3 Family (released Sept 2025 -- Jan 2026)

| Model | Params | Active | Type | Release | License | Open Weights |
|-------|-------:|-------:|------|---------|---------|:------------:|
| Qwen3-0.6B | 0.6B | 0.6B | Dense LLM | Apr 2025 | Apache 2.0 | Yes |
| Qwen3-1.7B | 1.7B | 1.7B | Dense LLM | Apr 2025 | Apache 2.0 | Yes |
| Qwen3-4B | 4B | 4B | Dense LLM | Apr 2025 | Apache 2.0 | Yes |
| Qwen3-8B | 8B | 8B | Dense LLM | Apr 2025 | Apache 2.0 | Yes |
| Qwen3-16B-A3B | 16B | 3B | MoE LLM | Jul 2025 | Apache 2.0 | Yes |
| Qwen3-30B-A3B | 30B | 3B | MoE LLM | Apr 2025 | Apache 2.0 | Yes |
| Qwen3-32B | 32B | 32B | Dense LLM | Apr 2025 | Apache 2.0 | Yes |
| Qwen3-235B-A22B | 235B | 22B | MoE LLM | Apr 2025 | Apache 2.0 | Yes |
| Qwen3-ASR-0.6B | 0.6B | 0.6B | ASR | Jan 2026 | Apache 2.0 | Yes |
| Qwen3-ASR-1.7B | 1.7B | 1.7B | ASR | Jan 2026 | Apache 2.0 | Yes |
| Qwen3-TTS-0.6B | 0.6B | 0.6B | TTS | Jan 2026 | Apache 2.0 | Yes |
| Qwen3-TTS-1.7B | 1.7B | 1.7B | TTS | Jan 2026 | Apache 2.0 | Yes |
| Qwen3-VL-2B | 2B | 2B | Vision-LLM | Oct 2025 | Apache 2.0 | Yes |
| Qwen3-VL-4B | 4B | 4B | Vision-LLM | Oct 2025 | Apache 2.0 | Yes |
| Qwen3-VL-8B | 8B | 8B | Vision-LLM | Oct 2025 | Apache 2.0 | Yes |
| Qwen3-VL-32B | 32B | 32B | Vision-LLM | Oct 2025 | Apache 2.0 | Yes |
| Qwen3-VL-30B-A3B | 30B | 3B | Vision MoE | Oct 2025 | Apache 2.0 | Yes |
| Qwen3-VL-235B-A22B | 235B | 22B | Vision MoE | Oct 2025 | Apache 2.0 | Yes |
| Qwen3-Omni-30B-A3B | 30B | 3B | Omni (speech I/O) | Sept 2025 | Apache 2.0 | Yes |
| Qwen3-Embedding-0.6B | 0.6B | 0.6B | Embedding | Jun 2025 | Apache 2.0 | Yes |
| Qwen3-Embedding-4B | 4B | 4B | Embedding | Jun 2025 | Apache 2.0 | Yes |
| Qwen3-Embedding-8B | 8B | 8B | Embedding | Jun 2025 | Apache 2.0 | Yes |
| Qwen3-Coder-30B-A3B | 30B | 3B | Code MoE | 2025 | Apache 2.0 | Yes |
| Qwen3-Coder-480B-A35B | 480B | 35B | Code MoE | 2025 | Apache 2.0 | Yes |

### Qwen3.5 Family (released Feb -- Mar 2026)

| Model | Params | Active | Type | Release | License | Open Weights |
|-------|-------:|-------:|------|---------|---------|:------------:|
| Qwen3.5-0.8B | 0.8B | 0.8B | Dense LLM | Mar 2026 | Apache 2.0 | Yes |
| Qwen3.5-2B | 2B | 2B | Dense LLM | Mar 2026 | Apache 2.0 | Yes |
| Qwen3.5-4B | 4B | 4B | Dense LLM | Mar 2026 | Apache 2.0 | Yes |
| Qwen3.5-9B | 9B | 9B | Dense LLM | Mar 2026 | Apache 2.0 | Yes |
| Qwen3.5-27B | 27B | 27B | Dense LLM | Feb 2026 | Apache 2.0 | Yes |
| Qwen3.5-35B-A3B | 35B | 3B | MoE LLM | Feb 2026 | Apache 2.0 | Yes |
| Qwen3.5-122B-A10B | 122B | 10B | MoE LLM | Feb 2026 | Apache 2.0 | Yes |
| Qwen3.5-397B-A17B | 397B | 17B | MoE LLM | Feb 2026 | Apache 2.0 | Yes |
| Qwen3.5-Omni-Plus | ~397B? | ~17B? | Omni (speech I/O) | Mar 30, 2026 | TBD | **No** (API only) |
| Qwen3.5-Omni-Flash | ~35B? | ~3B? | Omni (speech I/O) | Mar 30, 2026 | TBD | **No** (API only) |
| Qwen3.5-Omni-Light | ~9B? | ~3B? | Omni (speech I/O) | Mar 30, 2026 | TBD | **No** (API only) |

### Qwen3-Next Family (Mar 2026)

| Model | Params | Active | Type | Release | License | Open Weights |
|-------|-------:|-------:|------|---------|---------|:------------:|
| Qwen3-Next-80B-A3B | 80B | 3.9B | Hybrid MoE LLM | Mar 2026 | Apache 2.0 | Yes |
| Qwen3-Coder-Next-80B-A3B | 80B | 3B | Code Hybrid MoE | Mar 2026 | Apache 2.0 | Yes |

### Qwen3-2507 Updates (Jul 2025 refresh)

| Model | Params | Active | Type | Notes |
|-------|-------:|-------:|------|-------|
| Qwen3-235B-A22B-Instruct-2507 | 235B | 22B | MoE LLM | Updated instruct tune |
| Qwen3-30B-A3B-Instruct-2507 | 30B | 3B | MoE LLM | Updated instruct tune |
| Qwen3-4B-Instruct-2507 | 4B | 4B | Dense LLM | Updated instruct tune |

---

## 2. Pipeline Slot 1: STT

### Current Solution
- **Titan:** Nemotron Speech 0.6B (431ms avg, 2.49 GB VRAM, English only)
- **Panda:** IndicConformerASR 600M (145ms, 303 MB, 22 Indian langs, no English) + Whisper medium (~2 GB, code-mixed fallback)

### Qwen3 Alternative: Qwen3-ASR

| Feature | Qwen3-ASR-0.6B | Qwen3-ASR-1.7B | Nemotron Speech 0.6B | IndicConformerASR 600M |
|---------|:--------------:|:--------------:|:--------------------:|:---------------------:|
| Params | 0.6B | 1.7B | 0.6B | 600M |
| VRAM (est.) | ~2 GB | ~3.5 GB | 2.49 GB | 303 MB |
| Languages | 30 + 22 Chinese dialects | 30 + 22 Chinese dialects | English only | 22 Indian |
| **Hindi** | **Yes** | **Yes** | No | Yes |
| **Kannada** | **No** | **No** | No | **Yes** |
| Tamil | No | No | No | Yes |
| English | Yes | Yes | Yes | No |
| Code-mixed | Unknown (single-lang inference) | Unknown | No | No |
| Streaming | Yes | Yes | Yes (RNNT) | No |
| TTFT | ~211ms (vLLM benchmark) | ~215ms | ~130ms | 145ms (batch) |
| Throughput | 2000x at concurrency 128 | SOTA among open-source | — | 21x real-time |
| License | Apache 2.0 | Apache 2.0 | Open weights | MIT |
| aarch64 (Titan) | Needs torchaudio (blocker) | Needs torchaudio (blocker) | Works | Works |
| x86_64 (Panda) | Works | Works | Works | Works |

### Qwen3-ASR Language List (30 languages)

zh, en, yue, ar, de, fr, es, pt, id, it, ko, ru, th, vi, ja, tr, **hi**, ms, nl, sv, da, fi, pl, cs, fil, fa, el, hu, mk, ro

**Critical finding:** Qwen3-ASR supports Hindi but NOT Kannada, Tamil, Bengali, Telugu, Marathi, or any Dravidian/South Indian language except via Hindi. The 22 Chinese dialects pad the "52 languages" count but are irrelevant for Annie.

### Verdict: STT

**No change.** Keep current stack.

- **Titan (English voice):** Nemotron Speech 0.6B -- already optimized for our pipeline, lower TTFT
- **Panda (Indian langs):** IndicConformerASR 600M -- covers Kannada + all 22 Indian scheduled languages, only 303 MB
- **Panda (code-mixed):** Whisper medium -- auto-detect, good enough per "meaning over accuracy" principle

Qwen3-ASR would be a downgrade for Annie because it lacks Kannada. It also has the torchaudio aarch64 blocker on Titan.

---

## 3. Pipeline Slot 2: LLM

### Current Solution
- **Titan:** Nemotron 3 Nano 30B-A3B NVFP4 (18 GB, 48-65 tok/s, voice + extraction + daily)
- **Beast:** Nemotron 3 Super 120B-A12B NVFP4 (~80 GB, text chat + background agents)

### Qwen3/3.5 LLM Alternatives

| Model | Total | Active | VRAM (NVFP4/INT4) | Tool Calling | Throughput | Fits Panda? | Fits Titan? |
|-------|------:|-------:|-------------------:|:------------:|:----------:|:-----------:|:-----------:|
| Qwen3.5-9B | 9B | 9B | ~5 GB (INT4) | Yes | Moderate | **Yes** | Yes |
| Qwen3.5-27B | 27B | 27B | ~15 GB (INT4) | Yes | Low | Tight | Yes |
| Qwen3.5-35B-A3B | 35B | 3B | ~18 GB (NVFP4) | Yes (thinking) | High | Tight | Yes |
| Qwen3.5-122B-A10B | 122B | 10B | ~75 GB (NVFP4) | Yes | 14-15 tok/s | No | Yes (tight) |
| Qwen3-Next-80B-A3B | 80B | 3.9B | ~40 GB (INT4) | Yes | Very high | No | Yes |
| Qwen3.5-397B-A17B | 397B | 17B | ~100 GB (NVFP4) | Yes | ~152 tok/s | No | Barely |
| **Nemotron Nano 30B** | 30B | 3B | **18 GB** | Yes (89.1% AIME) | **48-65 tok/s** | No | **Yes (current)** |
| **Nemotron Super 120B** | 120B | 12B | **~80 GB** | Yes (97%) | **458-484 tok/s** | No | **Beast (current)** |

### Head-to-Head: Nemotron Nano vs Qwen3-30B-A3B

| Benchmark | Nemotron Nano 30B | Qwen3-30B-A3B |
|-----------|:-----------------:|:-------------:|
| AIME 2025 (no tools) | **89.1%** | 85.0% |
| AIME 2025 (with tools) | **99.2%** | — |
| Arena-Hard-v2 | **67.7%** | 57.8% |
| Throughput (H200, 8K/16K) | **3.3x higher** | 1x baseline |

Nemotron Nano wins on tool calling accuracy AND throughput, both critical for Annie's real-time voice pipeline.

### Qwen3-Next-80B-A3B: Interesting but Impractical

The Qwen3-Next-80B-A3B uses a novel Hybrid Transformer-Mamba architecture (like Nemotron). Only 3.9B active params, outperforms Qwen3-235B on many benchmarks. But:
- ~40 GB INT4 -- too large for Panda, would consume most of Titan alongside other models
- No NVFP4 yet
- Untested on DGX Spark aarch64

### Verdict: LLM

**No change.** Nemotron Nano remains the best choice for Titan voice pipeline (tool calling + throughput + VRAM efficiency). Nemotron Super on Beast handles heavy reasoning.

The only Qwen that could replace Nano would be Qwen3.5-35B-A3B (~18 GB NVFP4), but Nemotron Nano beats it on tool calling benchmarks and throughput, which are the two most important metrics for a real-time voice agent.

---

## 4. Pipeline Slot 3: TTS

### Current Solution
- **Titan:** Kokoro v0.19 (0.5 GB, ~30ms first-packet, English only)
- **Panda:** IndicF5 (1.7 GB, RTF 0.808, 11 Indian languages, voice cloning)

### Qwen3 Alternative: Qwen3-TTS

| Feature | Qwen3-TTS-0.6B | Qwen3-TTS-1.7B | Kokoro v0.19 | IndicF5 |
|---------|:--------------:|:--------------:|:------------:|:-------:|
| Params | 0.6B | 1.7B | 82M | 400M |
| VRAM | ~2-5 GB | ~2-7 GB | 0.5 GB | 1.7 GB |
| First-packet latency | **97ms** | ~120ms | **~30ms** | ~2100ms (RTF 0.808) |
| Languages | 10 (CJK + Western) | 10 (CJK + Western) | English | 11 Indian |
| **Kannada** | **No** | **No** | No | **Yes** |
| Hindi | No | No | No | **Yes** |
| Voice cloning | Yes (3-sec) | Yes (3-sec) | No | Yes (3-sec) |
| Quality | SOTA (beats ElevenLabs) | SOTA | Good | Good |
| Natural voice control | Yes ("speak warmly") | Yes | No | No |
| Streaming | Yes | Yes | Yes | No |
| aarch64 (Titan) | **Broken** (torchaudio) | **Broken** (torchaudio) | **Works** (patched) | N/A (Panda only) |
| x86_64 (Panda) | Works | Works | Works | Works |

### Qwen3-TTS Supported Languages

Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian.

**No Indian languages whatsoever.** No Kannada, Hindi, Tamil, Bengali, Telugu, Marathi, Malayalam, or Gujarati.

### DGX Spark torchaudio Fix (NEW)

A fix has been reported: `uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130` provides aarch64 CUDA wheels for torchaudio. However, this uses CUDA 13.0 wheels which may conflict with Titan's CUDA 12.8 environment. Not verified on our hardware.

### Could Qwen3-TTS Replace Kokoro on Panda for English?

Theoretically yes -- better quality, voice cloning, natural voice control. But:
1. 4-14x more VRAM (2-7 GB vs 0.5 GB)
2. 3x higher first-packet latency (97ms vs 30ms)
3. Adds no Indian language capability
4. torchaudio dependency adds fragility

### Verdict: TTS

**No change.** Keep Kokoro on Titan for English, IndicF5 on Panda for Indian languages.

Qwen3-TTS offers superior quality but no Indian language support, higher latency than Kokoro, and much higher VRAM. If Annie ever needs high-quality English voice cloning AND we have VRAM headroom on Panda, the 0.6B CustomVoice variant would be the one to try -- but that is a nice-to-have, not a need.

---

## 5. Pipeline Slot 4: Omni (All-in-One Voice)

### Current Solution
Cascade pipeline: STT (Nemotron Speech/IndicConformer) -> LLM (Nemotron Nano) -> TTS (Kokoro/IndicF5)
Total VRAM: ~21-23 GB on Titan, ~4 GB on Panda
Total latency: ~680ms (Titan English), ~2.7s (Panda Kannada)

### Qwen3 Alternatives

| Model | Total Params | Active | VRAM (INT4) | End-to-End Latency | Speech In | Speech Out | Open Weights |
|-------|:-----------:|:------:|:-----------:|:------------------:|:---------:|:----------:|:------------:|
| Qwen3-Omni-30B-A3B | 30B | 3B | ~18 GB (Q4_K_M) | 234ms first-packet | 119 langs | 10 langs | **Yes** |
| Qwen3.5-Omni-Plus | ~397B? | ~17B? | ~100+ GB | Unknown | 113 langs | 36 langs | No |
| Qwen3.5-Omni-Flash | ~35B? | ~3B? | ~20 GB (est.) | Unknown | 113 langs | 36 langs | No |
| Qwen3.5-Omni-Light | ~9B? | ~3B? | ~5-8 GB (est.) | Unknown | 113 langs | 36 langs | No |

### Qwen3-Omni-30B-A3B (the one with open weights)

This is the only Omni model you can actually run today. Key specs:
- **Architecture:** Thinker (MoE, text reasoning) + Talker (speech generation)
- **Speech input:** 119 languages (ASR built-in)
- **Speech output:** 10 languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian)
- **No Kannada speech output.** Same 10 languages as Qwen3-TTS.
- **Kannada speech input:** Unknown -- 119 languages is broad but the specific list is not published. Hindi is likely included; Kannada is uncertain.
- **VRAM:** ~18 GB (Q4_K_M GGUF) -- fits Panda or Titan
- **DGX Spark (Titan):** Needs torchaudio (same blocker). On Panda (x86_64), should work.
- **Serving:** vLLM-Omni project supports it; Ollama does NOT support speech output

### Qwen3.5-Omni (announced Mar 30, 2026)

**No open weights.** API-only via Alibaba Cloud Bailian. Three sizes announced (Plus/Flash/Light) but:
- No download links on HuggingFace
- No VRAM measurements possible
- No aarch64 testing possible
- License not confirmed
- Release of open weights: unknown timeline (could be weeks to months)

The Flash variant (~35B/3B, ~20 GB INT4) would be the most interesting for Annie IF/WHEN it drops. It claims 113-language ASR + 36-language TTS, which would cover far more Indian languages than the Qwen3-Omni's 10.

### Verdict: Omni

**Not now. Monitor Qwen3.5-Omni-Flash/Light for open weight release.**

Qwen3-Omni-30B-A3B (the one we CAN run) has two blockers:
1. **No Kannada speech output** (only 10 languages)
2. **torchaudio aarch64 blocker** on Titan

It COULD work on Panda as an all-in-one English voice agent (~18 GB Q4_K_M), but we already have a working pipeline that is better optimized per component.

The game-changer would be Qwen3.5-Omni-Flash with open weights -- 36 TTS languages likely includes Hindi and possibly Kannada. Worth monitoring.

---

## 6. Pipeline Slot 5: Vision (Phone Screen Reading)

### Current Solution
No dedicated vision model deployed. Phone screen reading is planned for the Pixel 9a via ADB screenshots + OCR.

### Qwen3-VL Family

| Model | Total | Active | VRAM (BF16) | VRAM (INT4) | OCR Languages | Fits Panda? | Fits Titan? |
|-------|------:|-------:|:-----------:|:-----------:|:-------------:|:-----------:|:-----------:|
| Qwen3-VL-2B | 2B | 2B | ~4 GB | ~2 GB | 32 | **Yes** | Yes |
| Qwen3-VL-4B | 4B | 4B | ~8 GB | ~3 GB | 32 | **Yes** | Yes |
| Qwen3-VL-8B | 8B | 8B | ~16 GB | ~5-6 GB | 32 | Tight | Yes |
| Qwen3-VL-30B-A3B | 30B | 2.4B (vision active) | ~57 GB | ~18 GB | 32 | Tight | Yes |
| Qwen3-VL-32B | 32B | 32B | ~64 GB | ~21 GB | 32 | No | Yes |

### Key Capabilities for Annie's Phone Use

- **32-language OCR** (expanded from 19 in Qwen2-VL)
- Low-light, blur, and tilt-robust recognition
- Phone screen understanding (UI elements, buttons, text)
- Document parsing with layout + position info
- Video understanding (up to 400s at 1 FPS)
- 256K context (extendable to 1M)
- Instruct and Thinking variants available

### Best Fit for Panda (16 GB)

**Qwen3-VL-2B** (INT4: ~2 GB) or **Qwen3-VL-4B** (INT4: ~3 GB):
- Load on-demand for phone screen reading
- Unload when not needed to preserve VRAM for STT/TTS
- Qwen3-VL-2B + IndicConformerASR + IndicF5 + Whisper = ~2 + 0.3 + 1.7 + 2 = ~6 GB total. Fits easily.

### Best Fit for Titan (128 GB)

**Qwen3-VL-8B** (INT4: ~5-6 GB) would be the quality sweet spot. Could load via Ollama with KEEP_ALIVE=30s like the embedding model.

Or **Qwen3-VL-30B-A3B** (INT4: ~18 GB) for maximum quality at MoE efficiency -- same VRAM as Nemotron Nano.

### Verdict: Vision

**ADOPT Qwen3-VL-2B on Panda for phone screen reading when Pixel 9a arrives.** This is the clearest win in the entire Qwen3 family for Annie.

- Only ~2 GB VRAM (INT4), loads on-demand
- 32-language OCR covers Hindi + Kannada script on screen
- Handles UI element recognition, document parsing
- Available via Ollama (`ollama run qwen3-vl:2b`)
- No torchaudio dependency
- x86_64 Panda has no aarch64 issues

For higher-quality vision tasks (photo understanding, complex documents), Qwen3-VL-8B on Titan via Ollama would be the upgrade path.

---

## 7. Pipeline Slot 6: Embedding (Semantic Search)

### Current Solution
- **Titan:** qwen3-embedding:8b via Ollama (14 GB, KEEP_ALIVE=30s)

### Qwen3-Embedding Family

| Model | Params | VRAM (est.) | MTEB Score | Languages | Fits Panda? |
|-------|-------:|:-----------:|:----------:|:---------:|:-----------:|
| Qwen3-Embedding-0.6B | 0.6B | ~1-2 GB | Good | 100+ | **Yes** |
| Qwen3-Embedding-4B | 4B | ~3-5 GB | Very good | 100+ | **Yes** |
| **Qwen3-Embedding-8B** | **8B** | **~14 GB** | **#1 MTEB multilingual (70.58)** | **100+** | Tight |

### Analysis

Annie already uses the best embedding model in the Qwen3 family (8B, #1 on MTEB multilingual leaderboard). This is already deployed on Titan.

**For Panda**, if local embedding is ever needed (e.g., for phone-local semantic search):
- Qwen3-Embedding-0.6B (~1-2 GB) would be the right choice
- Lightweight enough to coexist with STT + TTS + Vision models

### Key Features

- Instruction-based embeddings (task-specific prompts improve results)
- Flexible vector dimensions (can reduce for storage efficiency)
- 32K context length (can process long documents)
- Multilingual including Indian languages
- Dual-encoder architecture optimized for retrieval

### Verdict: Embedding

**No change.** Already using the best available model (Qwen3-Embedding-8B on Titan). No Qwen3.5 embedding model exists yet.

---

## 8. Bonus: Qwen3-Coder

### Models

| Model | Total | Active | VRAM (INT4) | Key Benchmark | Fits Panda? | Fits Titan? |
|-------|------:|-------:|:-----------:|:-------------:|:-----------:|:-----------:|
| Qwen3-Coder-30B-A3B | 30B | 3B | ~18 GB | Good | Tight | Yes |
| Qwen3-Coder-Next-80B-A3B | 80B | 3B | ~40 GB | SWE-Bench ~70% | No | Yes |
| Qwen3-Coder-480B-A35B | 480B | 35B | ~200+ GB | SOTA open-source | No | No |

### Relevance to Annie

Annie already uses Claude Code CLI for code tasks (paper-to-notebook, self-programming). A local coding model could reduce API costs for simple code generation tasks.

**Qwen3-Coder-Next-80B-A3B** is interesting: 80B total but only 3B active params, achieves SWE-Bench-Pro comparable to models 10-20x larger. But at ~40 GB INT4, it would consume significant Titan VRAM alongside Nemotron Nano.

### Verdict: Coder

**Not needed now.** Claude Code CLI handles coding tasks. If we ever want a local code model, Qwen3-Coder-Next-80B-A3B would be the one -- but only after verifying it fits alongside existing Titan workloads.

---

## 9. Kannada Language Support Matrix

This is the most critical question for Annie. Mom speaks Kannada; Rajesh mixes Kannada-English.

| Model | Kannada STT | Kannada TTS | Hindi STT | Hindi TTS | English |
|-------|:-----------:|:-----------:|:---------:|:---------:|:-------:|
| Qwen3-ASR-0.6B/1.7B | **No** | N/A | **Yes** | N/A | Yes |
| Qwen3-TTS-0.6B/1.7B | N/A | **No** | N/A | **No** | Yes |
| Qwen3-Omni-30B-A3B | Uncertain | **No** (10 langs) | Likely | **No** | Yes |
| Qwen3.5-Omni-Flash (announced) | Likely (113 ASR) | **Maybe** (36 TTS) | Likely | Likely | Yes |
| IndicConformerASR 600M | **Yes** | N/A | **Yes** | N/A | No |
| IndicF5 | N/A | **Yes** | N/A | **Yes** | Weak |
| Kokoro | N/A | No | N/A | No | **Yes** |
| Nemotron Speech 0.6B | No | N/A | No | N/A | **Yes** |
| Nemotron Nano 30B (LLM) | Understands | Generates text | **Yes** | **Yes** | **Yes** |

**Key insight:** No Qwen3 model currently supports Kannada in any modality. The only models with Kannada support are AI4Bharat models (IndicConformerASR, IndicF5) already deployed on Panda. The Nemotron Nano LLM understands and generates Kannada text but does not do speech.

**The Qwen3.5-Omni series** (when open weights drop) MIGHT change this -- 113-language ASR and 36-language TTS could include Kannada. But this is unconfirmed and unavailable.

---

## 10. Hardware Fit Matrix

### Panda (RTX 5070 Ti, 16 GB VRAM, x86_64)

| Model | VRAM (INT4/Q4) | Coexist with STT+TTS? | Use Case | Recommendation |
|-------|:--------------:|:----------------------:|----------|:--------------:|
| Qwen3-VL-2B | ~2 GB | **Yes** (total ~6 GB) | Phone screen reading | **ADOPT** |
| Qwen3-VL-4B | ~3 GB | **Yes** (total ~7 GB) | Better phone OCR | Consider |
| Qwen3-Embedding-0.6B | ~1-2 GB | Yes | Local semantic search | If needed |
| Qwen3-TTS-0.6B | ~2-5 GB | Yes (if Kokoro removed) | English voice cloning | Not needed |
| Qwen3-ASR-0.6B | ~2 GB | Yes | Hindi STT (no Kannada) | Not needed |
| Qwen3.5-9B | ~5 GB | Tight | Small local LLM | Not needed |
| Qwen3-Omni-30B-A3B | ~18 GB Q4 | **No** (alone fills GPU) | All-in-one voice | Not practical |

### Titan (DGX Spark, 128 GB unified, aarch64)

| Model | VRAM | Alongside Nemotron Nano (18 GB)? | Use Case | Recommendation |
|-------|:----:|:--------------------------------:|----------|:--------------:|
| Qwen3-VL-8B | ~5-6 GB | **Yes** (total ~65 GB) | High-quality vision | Consider (Ollama) |
| Qwen3.5-35B-A3B | ~18 GB | Yes (total ~77 GB) | Alternative LLM | Not better than Nano |
| Qwen3.5-122B-A10B | ~75 GB | **No** (would need to replace Nano) | Powerful LLM | Use Beast instead |
| Qwen3-Coder-Next-80B-A3B | ~40 GB | Tight (total ~100 GB) | Local code agent | Not needed |
| Qwen3-Omni-30B-A3B | ~18 GB Q4 | Yes (if torchaudio fixed) | All-in-one voice | Blocked by torchaudio |

### Beast (DGX Spark, 128 GB unified, aarch64)

Beast is already running Nemotron Super 120B (~97 GB). No room for additional large models. Small utility models could coexist:
- Qwen3-Embedding-0.6B (~1-2 GB) -- if needed for Beast-local search
- Qwen3-VL-2B (~2 GB) -- if Beast handles vision tasks

---

## 11. Best-of-Breed Recommendation

### Current Pipeline — Best-of-Breed (Session 380, benchmarked)

| Slot | Component | Model | Location | VRAM | Latency | Status |
|------|-----------|-------|----------|-----:|--------:|--------|
| STT (English) | Nemotron Speech 0.6B | Titan | 2.49 GB | 431ms | Deployed |
| STT (Indian, pure) | IndicConformerASR 600M | Panda | 303 MB | **145ms** | Deployed |
| STT (code-mixed) | **Whisper large-v3** | Panda | **6,029 MB** | **805ms** | **Deployed** (perfect Kannada) |
| LLM (voice) | Nemotron Nano 30B NVFP4 | Titan | 18 GB | ~500ms | Deployed |
| LLM (text/agents) | Nemotron Super 120B NVFP4 | Beast | ~80 GB | — | Deployed |
| TTS (English) | Kokoro v0.19 | Titan | 0.5 GB | 30ms | Deployed |
| TTS (Indian) | **IndicF5 EPSS7+BF16** | Panda | **1,347 MB** | **285ms** | **Deployed** (RTF 0.082) |
| Vision | **Qwen3-VL-2B** | Panda | **1,900 MB** | — | **Deployed** (Ollama 0.19.0) |
| Embedding | qwen3-embedding:8b | Titan | 14 GB | — | Deployed |

**Panda VRAM total: ~9.6 GB / 16 GB (41% free)**

**Full voice pipeline: ~1.0s (pure Kannada) or ~1.6s (code-mixed) — conversational speed.**

### Key Decisions from Benchmarking

- **Whisper medium → large-v3**: Medium garbles Kannada ("ನಾಮಸ್ಕರಾ ನಾನೆ ವಾನೆ"). Large-v3 is perfect ("ನಮಸ್ಕಾರ ನಾನು ಆನೀ"). Worth the 3 GB VRAM increase.
- **Whisper large-v3-turbo rejected**: Fastest (226ms) but misdetects Kannada as Tamil in auto-detect mode.
- **IndicF5 NFE32 → EPSS7+BF16**: 8x speedup (2284ms → 285ms). FP16 BROKEN (vocoder ComplexHalf noise). BF16 works because 8-bit exponent matches FP32.
- **Qwen3-VL-2B deployed**: Phone screen reading ready for when Pixel 9a arrives.

### Still to Monitor

| Slot | Model | Location | VRAM | Priority | When |
|------|-------|----------|-----:|:--------:|------|
| Vision (upgrade) | Qwen3-VL-8B | Titan (Ollama) | ~5-6 GB | LOW | If 2B quality insufficient |
| Omni (future) | Qwen3.5-Omni-Flash | TBD | ~20 GB? | MONITOR | When open weights drop |

### What Would Make Qwen3 Worth Adopting?

1. **Qwen3.5-Omni-Flash open weights** -- If it truly supports 36-language TTS including Kannada, it could replace the entire cascade pipeline (STT+LLM+TTS) with a single model. Monitor HuggingFace.
2. **torchaudio aarch64 fix verified on Titan** -- Unblocks Qwen3-TTS and Qwen3-Omni on DGX Spark. The cu130 wheel fix needs testing.
3. **Qwen3-ASR adding Kannada** -- If a future Qwen3-ASR version covers Dravidian languages, it could replace IndicConformerASR for a more unified stack.

---

## 12. What to Monitor

| Item | Why | Check Frequency |
|------|-----|:---------------:|
| Qwen3.5-Omni-Flash open weights on HuggingFace | Could replace entire voice cascade | Weekly |
| Qwen3.5-Omni 36-language TTS list | Confirm Kannada/Hindi in TTS output | When weights drop |
| torchaudio aarch64 CUDA cu130 fix on DGX Spark | Unblocks all Qwen speech models on Titan | Monthly |
| Qwen3-ASR language expansion (Dravidian) | Replace IndicConformerASR if Kannada added | Quarterly |
| Qwen3-Embedding updates (larger models) | May be irrelevant -- already using 8B #1 | Quarterly |
| Qwen3-VL updates (VL-2B Thinking variant?) | Better phone screen understanding | Quarterly |

---

## 13. Sources

### Official Qwen Repositories
- [QwenLM/Qwen3 -- GitHub](https://github.com/QwenLM/Qwen3)
- [QwenLM/Qwen3.5 -- GitHub](https://github.com/QwenLM/Qwen3.5)
- [QwenLM/Qwen3-Omni -- GitHub](https://github.com/QwenLM/Qwen3-Omni)
- [QwenLM/Qwen3-ASR -- GitHub](https://github.com/QwenLM/Qwen3-ASR)
- [QwenLM/Qwen3-TTS -- GitHub](https://github.com/QwenLM/Qwen3-TTS)
- [QwenLM/Qwen3-VL -- GitHub](https://github.com/QwenLM/Qwen3-VL)
- [QwenLM/Qwen3-Embedding -- GitHub](https://github.com/QwenLM/Qwen3-Embedding)
- [QwenLM/Qwen3-Coder -- GitHub](https://github.com/QwenLM/Qwen3-Coder)

### HuggingFace Model Cards
- [Qwen/Qwen3-Omni-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct)
- [Qwen/Qwen3-ASR-1.7B](https://huggingface.co/Qwen/Qwen3-ASR-1.7B)
- [Qwen/Qwen3-ASR-0.6B](https://huggingface.co/Qwen/Qwen3-ASR-0.6B)
- [Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice](https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice)
- [Qwen/Qwen3-TTS-12Hz-0.6B-Base](https://huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-Base)
- [Qwen/Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B)
- [Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B)
- [Qwen/Qwen3-VL-2B-Instruct-GGUF](https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct-GGUF)
- [Qwen/Qwen3-Next-80B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct)
- [Qwen/Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next)
- [Qwen3.5 Collection -- HuggingFace](https://huggingface.co/collections/Qwen/qwen35)

### Technical Reports and Benchmarks
- [Qwen3-Omni Technical Report -- arXiv](https://arxiv.org/abs/2509.17765)
- [Qwen3-ASR Technical Report -- arXiv](https://arxiv.org/abs/2601.21337)
- [Qwen3-TTS Technical Report -- arXiv](https://arxiv.org/abs/2601.15621)
- [Qwen3 Embedding Blog -- Qwen](https://qwenlm.github.io/blog/qwen3-embedding/)
- [Qwen3-Coder Blog -- Qwen](https://qwenlm.github.io/blog/qwen3-coder/)
- [Qwen3.5 Blog -- Qwen](https://qwen.ai/blog?id=qwen3.5)
- [Qwen3-Coder-Next Blog -- Qwen](https://qwen.ai/blog?id=qwen3-coder-next)

### DGX Spark / aarch64 Compatibility
- [vLLM Issue #35519 -- NVFP4 ARM64 Bug (fixed)](https://github.com/vllm-project/vllm/issues/35519)
- [torchaudio ARM64 Issue -- NVIDIA Forum](https://forums.developer.nvidia.com/t/support-for-qwen3-tts-on-dgx-spark-gb10-torchaudio-installation-failure-on-arm64/359663)
- [vLLM-Omni for Qwen3-TTS on DGX Spark](https://forums.developer.nvidia.com/t/running-vllm-omni-for-qwen3-tts-voice-design-voice-clone-on-dgx-spark/361255)
- [NVFP4 Qwen3-Coder-30B on DGX Spark](https://astrujic.medium.com/nvfp4-qwen3-coder-30b-a3b-instruct-on-dgx-spark-a44cdf0df858)

### Comparison and Analysis
- [Qwen3.5-35B vs Nemotron Nano -- AwesomeAgents](https://awesomeagents.ai/tools/qwen-3-5-35b-a3b-vs-nemotron-3-nano/)
- [Nemotron Nano vs Qwen3-30B -- llm-stats](https://llm-stats.com/models/compare/nemotron-3-nano-30b-a3b-vs-qwen3-30b-a3b)
- [Qwen3-VL OCR and Document Processing -- DeepWiki](https://deepwiki.com/QwenLM/Qwen3-VL/5.1-ocr-and-document-processing)
- [Qwen3.5-Omni 215 SOTA Benchmarks -- ToolMesh](https://www.toolmesh.ai/news/qwen3-5-omni-model-released-sota-vibe-coding)
- [Qwen3.5-Omni Multimodal Launch -- Aihola](https://aihola.com/article/qwen35-omni-multimodal-voice-launch)
- [MTEB Leaderboard -- Modal](https://modal.com/blog/mteb-leaderboard-article)
- [Qwen Wikipedia](https://en.wikipedia.org/wiki/Qwen)
- [LM Studio Qwen3 Models](https://lmstudio.ai/models/qwen3)
- [Ollama Qwen3-VL](https://ollama.com/library/qwen3-vl)
- [Hardware Requirements for Qwen3 -- Hardware Corner](https://www.hardware-corner.net/qwen3-coder-next-hardware-requirements/)
- [GPU Requirements for Qwen3 -- apxml](https://apxml.com/posts/qwen-3-5-system-requirement-vram-guide)
- [Best Vision Models Locally -- InsiderLLM](https://insiderllm.com/guides/vision-models-locally/)
- [Qwen3.5-Omni -- MarkTechPost](https://www.marktechpost.com/2026/03/30/alibaba-qwen-team-releases-qwen3-5-omni-a-native-multimodal-model-for-text-audio-video-and-realtime-interaction/)
