# Voxtral-4B-TTS-2603 preset voice samples

**Generated:** 2026-04-15 (session 106)
**Source:** self-hosted Voxtral on Titan DGX Spark aarch64 via vllm-omni v0.18.0
**Test phrase:** "Hi Rajesh. I have been thinking about the way we talked yesterday. How are you feeling today?"

Same test phrase across all 20 preset voices so you can blind-compare timbre, cadence,
and emotional range. Listen to these vs the AI Studio demos at
`https://console.mistral.ai/build/audio/text-to-speech` to see if self-hosted quality
matches Mistral-hosted.

**Samantha clone is NOT possible** (Mistral withheld encoder weights). These 20 presets
are the only voices usable on self-hosted Voxtral.

## Files (alphabetical)

| Voice | File | Notes |
|---|---|---|
| ar_male | voxtral_ar_male.wav | Arabic male preset — likely strong Arabic accent on English text |
| casual_female | voxtral_casual_female.wav | Warm/conversational — **strong Samantha-adjacent candidate** |
| casual_male | voxtral_casual_male.wav | Warm male |
| cheerful_female | voxtral_cheerful_female.wav | High-energy female — **Samantha-adjacent candidate** |
| de_female | voxtral_de_female.wav | German female |
| de_male | voxtral_de_male.wav | German male |
| es_female | voxtral_es_female.wav | Spanish female |
| es_male | voxtral_es_male.wav | Spanish male |
| fr_female | voxtral_fr_female.wav | French female |
| fr_male | voxtral_fr_male.wav | French male |
| hi_female | voxtral_hi_female.wav | Hindi female — relevant for Indian-context phrases |
| hi_male | voxtral_hi_male.wav | Hindi male |
| it_female | voxtral_it_female.wav | Italian female |
| it_male | voxtral_it_male.wav | Italian male |
| **neutral_female** | voxtral_neutral_female.wav | The benchmark default — **Samantha-adjacent candidate** |
| neutral_male | voxtral_neutral_male.wav | Neutral male |
| nl_female | voxtral_nl_female.wav | Dutch female |
| nl_male | voxtral_nl_male.wav | Dutch male |
| pt_female | voxtral_pt_female.wav | Portuguese female |
| pt_male | voxtral_pt_male.wav | Portuguese male |

## Latency per synthesis (post-warmup)

Range: 3.4s – 5.6s on a ~7s utterance. Mean RTF ~0.7–0.8× (faster than real-time).

## How to listen

```bash
# CLI:
ffplay -nodisp -autoexit docs/voxtral-voice-samples-20260415/voxtral_casual_female.wav

# Or open the folder in a file manager and double-click
```

## USER SELECTION (2026-04-15)

**If we ever deploy Voxtral**: chosen preset is **`casual_female`**.

```bash
ffplay -nodisp -autoexit docs/voxtral-voice-samples-20260415/voxtral_casual_female.wav
```

This becomes the Voxtral-fallback voice in her-os configuration. Primary Samantha remains via Chatterbox voice-clone using `samantha_movie_primary.wav`.

## Samantha-candidate shortlist

Per Scarlett Johansson's Samantha characteristics (breathy, smoky, intimate female, mid-high pitch):
- **casual_female** — conversational warmth
- **cheerful_female** — energy match to Samantha's playful moments
- **neutral_female** — our benchmark baseline; most versatile

Pick one of these three after listening; swap it into Chatterbox's config as fallback/preset
if you ever want Voxtral's specific synthesis quality over Chatterbox's clone of Samantha.
