{
  "timestamp": "2026-04-14T23:00+05:30",
  "updated": "2026-04-15 (post-research correction)",
  "verdict": "deferred_pending_vllm_omni_source_build_on_aarch64",
  "verdict_correction_note": "ORIGINAL VERDICT WAS WRONG. I tested mainline `vllm` (which doesn't register voxtral_tts) but FAILED to test the SEPARATE `vllm-omni` pip package which DOES register it via `vllm_omni/model_executor/models/registry.py` module `voxtral_tts` with classes VoxtralTTSForConditionalGeneration, VoxtralTTSAudioGeneration, VoxtralTTSAudioTokenizer. The correct runtime is `vllm-omni serve … --omni` (NOT `vllm serve` which falls back to mainline). Primary-source confirmation: https://github.com/vllm-project/vllm-omni/blob/main/vllm_omni/model_executor/models/registry.py plus issue #2388 where another user hit our exact error and Mistral contributor y123456y78 confirmed the --omni fix. See also the TTS Q2 2026 roadmap at vllm-omni issue #2115.",
  "rationale_original_wrong": "Voxtral-4B-TTS-2603 reports model_type='voxtral_tts' which has no registry entry in MAINLINE vLLM — correct observation, wrong conclusion. Only 'voxtral' (STT) and 'voxtral_realtime' are in mainline. The TTS variant lives in the vllm-omni package, not mainline.",

  "install_paths_probed": {
    "2A_vllm_omni_docker": {
      "status": "blocked_but_non_blocking",
      "reason": "vllm/vllm-omni Docker Hub has amd64-only images. But Docker is NOT the only install path — pip install vllm-omni + source build on aarch64 is the maintainer-recommended path."
    },
    "2B_vllm_main": {
      "status": "confirmed_does_not_support_voxtral_tts",
      "reason": "Mainline vllm (0.18.2rc1 production + 0.19.1rc1 nightly) deliberately does NOT register voxtral_tts — TTS arch lives in the separate vllm-omni package. Testing mainline was looking in the wrong place."
    },
    "2C_hf_transformers": {
      "status": "disallowed_by_vendor",
      "reason": "Mistral HF card sets 'inference: false'. Use vllm-omni instead."
    },
    "2D_mudler_pure_c": {
      "status": "not_tested_this_session",
      "reason": "vllm-omni is now the recommended path; mudler/voxtral-tts.c is a fallback."
    },
    "2E_vllm_omni_source_build_aarch64": {
      "status": "UNTESTED_BUT_IS_THE_CORRECT_PATH",
      "reason": "Pip package `vllm-omni` (NOT Docker) has registry for voxtral_tts. On aarch64 Titan, must source-build: `git clone github.com/vllm-project/vllm-omni && uv pip install -e .`. Maintainers explicitly recommend source build given fast iteration. No verified DGX Spark deploy reports in issues as of 2026-04-14, but no blockers either.",
      "command": "vllm-omni serve mistralai/Voxtral-4B-TTS-2603 --tokenizer-mode mistral --omni --stage-configs-path vllm_omni/model_executor/stage_configs/voxtral_tts.yaml",
      "latest_versions": {
        "stable": "vllm-omni v0.18.0 (2026-03-28)",
        "rc": "v0.19.0rc1 (2026-04-04)",
        "main_head": "weekly commits from Mistral team (patrickvonplaten)"
      },
      "open_prs_to_watch": [
        "#2790 (2026-04-14) — ref_audio upload fix, likely needed for voice-clone from our Samantha ref",
        "#2405 (2026-04-01) — VoxtralTTSConfig @strict AttributeError fix, cherry-pick if hitting recent transformers"
      ]
    }
  },

  "phases_completed": {
    "A0_samantha_extraction": {
      "status": "complete",
      "output": "3 female voice candidates extracted + pitch-filtered",
      "user_confirmation": "2026-04-14 — user confirmed samantha_movie_v3_dearTheodore_35s.wav is 100% Samantha (volume was low, normalized via ffmpeg loudnorm -16 LUFS → RMS 512->4863, 9.5x boost). Promoted to samantha_movie_primary.wav."
    },
    "0_gemma_baseline": {
      "status": "complete",
      "p50_ms": 190,
      "abort_threshold_2x_ms": 380
    },
    "1_voxtral_feasibility": "complete_as_no-go — 2A/2B/2C blocked, 2D not tested",
    "2_voxtral_container": "failed_deterministically",
    "3_benchmark_utterances": "not_reached",
    "4_ab_scoring": "not_reached",
    "5_post_flight": "complete — Gemma restored cleanly at p50=190ms, drift 1.0x"
  },

  "gemma_impact": {
    "baseline_p50_ms": 190,
    "post_flight_p50_ms": 190,
    "drift_ratio": 1.0,
    "outage_duration_minutes": 10,
    "user_mode": "pause-gemma",
    "post_flight_gate_passed": true
  },

  "voxtral_weights_cached_gb": 7.5,
  "voxtral_weights_path_titan": "~/.cache/huggingface/hub/models--mistralai--Voxtral-4B-TTS-2603/",
  "voxtral_total_download_time_minutes": 6,
  "vllm_nightly_image_size_gb": 8.81,
  "vllm_nightly_pull_time_minutes": 9,

  "samantha_references": {
    "primary": {
      "path": "services/audio-pipeline/voice-references/samantha_movie_primary.wav",
      "originally": "samantha_movie_v3_dearTheodore_35s.wav",
      "source_timestamp_s": 5750,
      "duration_s": 34.7,
      "median_f0_hz": 183.2,
      "text_preview": "It's good, it's good. It's really good, it's good. Okay? Listen. Dear Theodore Twombly...",
      "user_confirmed_samantha": true,
      "volume_normalized": true,
      "normalization_filter": "ffmpeg loudnorm I=-16:LRA=11:TP=-1.5"
    },
    "alternates": [
      "services/audio-pipeline/voice-references/samantha_movie_v2_keepwalking_38s.wav (camera-directing scene, 37.6s, 207Hz — not yet user-confirmed)",
      "services/audio-pipeline/voice-references/samantha_movie_v1_goodish_41s.wav (41.1s, 190.5Hz — user-rejected-or-unconfirmed)"
    ]
  },

  "recommendation_for_user_CORRECTED_2026-04-15": [
    "PRIMARY PATH: source-build vllm-omni on Titan. `git clone github.com/vllm-project/vllm-omni && uv pip install -e .`. Launch with `vllm-omni serve mistralai/Voxtral-4B-TTS-2603 --tokenizer-mode mistral --omni --stage-configs-path vllm_omni/model_executor/stage_configs/voxtral_tts.yaml`. This is the MAINTAINER-RECOMMENDED path and the only first-party supported runtime for voxtral_tts. aarch64 wheel not pre-built, source build required, but no known blockers.",
    "Watch PRs: #2790 (ref_audio upload fix, 2026-04-14) and #2405 (VoxtralTTSConfig @strict fix, 2026-04-01). Cherry-pick if hitting transformers version conflicts.",
    "PARALLEL PATH (low risk, immediate value): swap services/audio-pipeline/voice-references/samantha_movie_primary.wav into Chatterbox's voice-clone reference. Gets Samantha's voice on the current stack immediately, independent of the Voxtral outcome.",
    "IF vllm-omni source-build fails on aarch64: probe Branch 2D (mudler/voxtral-tts.c) as fallback — less-verified than vllm-omni but CUDA-free.",
    "IF both fail: continue the 20-model TTS survey in docs/NEXT-SESSION-TTS-ALTERNATIVES.md. Next candidates: Chatterbox-Turbo, CosyVoice 2, F5-TTS."
  ],

  "artifacts_committed": {
    "scripts": [
      "scripts/extract_samantha_from_movie.py",
      "scripts/filter_samantha_by_pitch.py",
      "scripts/rank_samantha_by_voice_similarity.py",
      "scripts/benchmark_utils.py",
      "scripts/benchmark_voxtral_titan.py",
      "scripts/run-voxtral-bench.sh",
      "scripts/generate_chatterbox_baseline.sh",
      "scripts/tts_ab_score.py"
    ],
    "voice_references": [
      "services/audio-pipeline/voice-references/samantha_movie_primary.wav (normalized v3)",
      "services/audio-pipeline/voice-references/samantha_movie_v1_goodish_41s.wav",
      "services/audio-pipeline/voice-references/samantha_movie_v2_keepwalking_38s.wav",
      "services/audio-pipeline/voice-references/samantha_movie_v3_dearTheodore_35s.wav"
    ],
    "branch": "voxtral-bench-20260414",
    "verdict_json": "docs/BENCHMARK-VOXTRAL-TITAN-20260414-2300.json"
  }
}