# Research: Google Pixel 9a On-Device Speech Recognition

**Date:** 2026-03-31
**Status:** Research complete. Verdict: DON'T USE for Annie's primary pipeline.
**Relevance:** Whether Pixel 9a can replace Panda-based STT (IndicConformerASR / Whisper)

---

## Executive Summary

The Pixel 9a CAN do on-device speech recognition, including offline Kannada, via the Android `SpeechRecognizer` API and Gboard voice typing. However, it is **NOT suitable as Annie's primary STT** for several critical reasons:

1. **Pixel 9a ships with Gemini Nano XXS** (not XS) -- text-only, no multimodal audio understanding
2. **No programmatic API** to capture transcription from system features (Recorder, Live Caption)
3. **Code-mixed Kannada-English is poorly supported** on-device
4. **No phone call transcription access** via public APIs
5. **ML Kit GenAI Speech Recognition requires Pixel 10** (not available on 9a)
6. **Latency is acceptable but unquantified** -- no published ms-level benchmarks for Indian languages

**Recommendation: Keep the Panda-based pipeline** (IndicConformerASR 145ms + Whisper fallback). The Pixel stays as a dumb terminal (mic + speaker). Monitor Sarvam Edge SDK for a future on-device option.

---

## 1. Google's On-Device Speech Recognition

### 1.1 The Model

Google uses an **RNN-Transducer (RNN-T)** model for on-device ASR, first deployed on Pixel 4 (2019):
- **Original size:** 450 MB full model, compressed to **~80 MB** via int8 quantization
- **Architecture:** End-to-end all-neural, streaming character-by-character output
- **Speed:** Runs faster than real-time on a single CPU core
- **TPU acceleration:** Tensor G4 has a dedicated TPU for ML inference, further accelerating ASR

The model has evolved significantly since 2019. Current Pixel phones (Tensor G3/G4) use updated versions with better accuracy and multilingual support, but Google has not published the exact model architecture or size for the 2025/2026 generation.

### 1.2 Latency

Google does not publish precise latency numbers for on-device ASR on Pixel. What is known:
- **Connection setup:** ~150-250ms on modern devices before recognizer is ready
- **Streaming output:** Characters appear in real-time as you speak (no wait-for-utterance-end)
- **Subjective speed:** "Words appear as fast as you can speak" (Pixel 6+ marketing)
- **No network round-trip:** Eliminates the 200-500ms cloud latency

**Estimated total latency:** ~200-400ms from speech start to first text output (setup + first character). This is comparable to our IndicConformerASR (145ms for 3s audio), but the Pixel processes streaming while IndicConformer processes complete utterances.

### 1.3 Language Support

Google's offline speech recognition supports **~50 languages** including:
- **English (en-IN)** -- YES, on-device
- **Hindi (hi-IN)** -- YES, on-device
- **Kannada (kn-IN)** -- YES, listed as supported for offline download
- Plus: Bengali, Gujarati, Malayalam, Marathi, Tamil, Telugu, Urdu

**How to enable:** Settings > System > Languages & Input > On-screen keyboard > Gboard > Voice typing > Offline speech recognition > Download language pack

### 1.4 Code-Mixed Kannada-English

**NOT natively supported.** The Android SpeechRecognizer takes a single `LANGUAGE` parameter (e.g., `kn-IN` or `en-IN`). It does NOT:
- Auto-detect language switches mid-sentence
- Handle bilingual code-mixed input
- Produce mixed-script output

If set to `kn-IN`: English words will be garbled into Kannada characters.
If set to `en-IN`: Kannada words will be garbled or dropped.

**This is the same fundamental limitation as IndicConformerASR.** Neither handles code-mixing. Only Sarvam Saaras v3 API and Whisper (to some degree) handle code-mixed Indian speech.

### 1.5 Quality vs Our Models

| Criterion | Pixel On-Device | IndicConformerASR | Whisper Medium |
|-----------|----------------|-------------------|----------------|
| Pure Kannada | Good (Google quality) | Best (22 Indian langs specialist) | Weak |
| Code-mixed | No | No | Partial (auto-detect) |
| English | Excellent | Not supported | Excellent |
| Latency | ~200-400ms streaming | 145ms (3s batch) | ~500ms (3s batch) |
| Runs on | Pixel 9a (8 GB RAM) | Panda GPU (303 MB VRAM) | Panda GPU (~2 GB VRAM) |
| Programmability | Limited (see below) | Full Python API | Full Python API |

---

## 2. Android SpeechRecognizer API

### 2.1 Two Creation Methods

```java
// Method 1: Default (may use cloud)
SpeechRecognizer.createSpeechRecognizer(context);

// Method 2: On-device only (API 31+, Android 12+)
SpeechRecognizer.createOnDeviceSpeechRecognizer(context);
```

### 2.2 EXTRA_PREFER_OFFLINE

```java
Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
intent.putExtra(RecognizerIntent.EXTRA_PREFER_OFFLINE, true);
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "kn-IN");
```

This flag tells Android to prefer offline recognition. On Pixel 9a with downloaded Kannada pack, it should work fully offline.

### 2.3 Programmatic Access

**YES, it can be used from an app.** The `SpeechRecognizer` API provides:
- `onPartialResults()` -- streaming partial transcription
- `onResults()` -- final transcription
- `onError()` -- error handling

**Requirements:**
- `RECORD_AUDIO` permission
- Must run on main thread (or use `Handler`)
- Must call `destroy()` when done

### 2.4 Background Mode

**LIMITED.** The API is "not intended to be used for continuous recognition" per the docs. Limitations:
- No built-in wake word / keyword detection
- Consumes significant battery for long sessions
- May be killed by the OS in background
- `ForegroundService` with `RECORD_AUDIO` helps but is not guaranteed

### 2.5 Phone Call Audio

**CANNOT transcribe phone calls.** The `AudioPlaybackCaptureConfiguration` API explicitly blocks voice call audio capture. This is a privacy/legal restriction at the OS level.

---

## 3. Pixel-Specific Features

### 3.1 Google Recorder App

- **On-device transcription:** YES, uses the same on-device ASR engine
- **API or Intent:** NO public API. No documented Intent to start transcription programmatically.
- **Languages:** English only for transcription (other languages may be added)
- **Gemini Nano integration:** Recorder uses Gemini Nano for summarization (text-to-summary), not for speech-to-text

### 3.2 Live Caption / Live Translate

- **Live Caption:** Works during phone calls on Pixel. Uses on-device ASR.
- **Programmatic access:** NO. It's a system accessibility service. No API to read caption text.
- **Languages:** Primarily English for Live Caption
- **Expressive Captions (Android 16):** Adds emotion detection ([joy], [sadness]) -- system feature only

### 3.3 Gemini Nano on Pixel 9a

**CRITICAL LIMITATION:** The Pixel 9a uses **Gemini Nano XXS** (extra-extra-small), NOT Nano XS:
- **Text-only:** Can only interpret text input -- no images, no audio
- **8 GB RAM constraint:** Forces smaller model
- **Missing features vs Pixel 9:** No Call Notes, no Pixel Screenshots AI, no multimodal understanding
- **On-demand activation:** Not always-on (unlike Pixel 9's Nano XS)

Gemini Nano XXS cannot be used for speech recognition. It can only process text that has already been transcribed by the separate ASR engine.

### 3.4 ML Kit GenAI Speech Recognition API

**NOT available on Pixel 9a.** This is the newest API (2025/2026):
- Two modes: Basic (traditional ASR) and Advanced (Gemini-powered, broader language coverage)
- **Requires Pixel 10** -- explicitly stated in docs ("available on Pixel 10 devices, with more devices in development")
- API level 31+ required
- Would be the ideal solution if it were available on Pixel 9a

---

## 4. Kannada Support Specifically

### 4.1 Offline Language Pack

**YES, Kannada is available as an offline download** for Google Voice Typing on Pixel:
- Download via: Gboard Settings > Voice Typing > Offline Speech Recognition > Kannada
- Estimated size: ~80-100 MB (similar to English pack at 85 MB)
- Once downloaded: Fully offline Kannada STT

### 4.2 Quality Assessment

Google's Kannada ASR quality (rough estimates from available data):
- **Pure Kannada:** ~86% accuracy offline (from one research comparison)
- **IndicConformerASR:** Higher accuracy for pure Kannada (specialist model, 22 Indian langs)
- **Google Cloud STT Chirp 3:** Better than on-device for Kannada (cloud model is larger)

The on-device Kannada quality is "good enough for casual use" but likely inferior to IndicConformerASR's specialist model for pure Kannada transcription.

### 4.3 Code-Mixed: The Dealbreaker

Google Voice Typing does NOT support code-mixed Kannada-English on-device. The `SpeechRecognizer` API accepts a single language locale. This is the same limitation as IndicConformerASR. Per the "meaning over accuracy" insight from session 379, this may be acceptable if the LLM can understand garbled code-mixed output.

---

## 5. Capturing Transcription Results

### 5.1 From an App (Best Path)

Build a minimal Android app (or Expo/React Native app) that:
1. Creates `SpeechRecognizer.createOnDeviceSpeechRecognizer(context)`
2. Sets locale to `kn-IN` or `en-IN`
3. Listens via `RecognitionListener.onPartialResults()` for streaming text
4. Sends text over WebSocket to Titan

**This is the cleanest programmatic path.**

### 5.2 AccessibilityService

An AccessibilityService can observe on-screen text changes (e.g., text appearing in Live Transcribe or Gboard). However:
- Fragile: Depends on UI element IDs that Google may change
- Privacy concerns: Google Play may reject apps that use AccessibilityService for non-accessibility purposes
- Side-loaded apps avoid Play Store restrictions

### 5.3 Tasker + AutoVoice

- Tasker can trigger speech recognition and capture results in `%VOICE` variable
- AutoVoice provides dedicated speech recognition plugin
- Can run offline with PocketSphinx mode
- Latency: ~680ms median with offline-only mode
- Can forward results via HTTP/WebSocket to backend
- **Best for quick prototyping** without building a custom app

### 5.4 ADB-Based Capture

No direct ADB command triggers `SpeechRecognizer`. However:
- uiautomator2 can tap the microphone button in any app
- Can read resulting text from input fields
- Roundabout but works for automation

---

## 6. Phone Call Transcription

### 6.1 SpeechRecognizer During Calls

**Cannot access call audio.** Android's audio routing does not allow apps to capture the telephony audio stream. The `AudioPlaybackCapture` API explicitly excludes `USAGE_VOICE_COMMUNICATION`.

### 6.2 Call Screen (Pixel Feature)

- Pixel's Call Screen uses on-device ASR to transcribe incoming callers
- **No programmatic access.** It's a system-level feature in the Phone app.
- Cannot be triggered or read from third-party apps

### 6.3 Call Notes (Pixel 9, NOT 9a)

- Uses Gemini Nano XS multimodal to understand call audio and generate summaries
- **NOT available on Pixel 9a** (requires Nano XS, 9a has Nano XXS)

### 6.4 Live Caption During Calls

- Works during phone calls on Pixel
- Displays captions on screen
- **No API to read the caption text programmatically**
- Could theoretically be scraped via AccessibilityService (fragile)

---

## 7. Revised Architecture Analysis

### 7.1 Proposed: Pixel-Based STT

```
Pixel 9a (mic → on-device STT → text) → WebSocket → Titan (LLM)
Titan → text response → Panda (TTS) → audio → Pixel speaker
```

**Pros:**
- Eliminates audio streaming from Pixel to Panda
- Lower bandwidth (text vs audio)
- Privacy: audio never leaves the phone
- Works without USB connection (WiFi WebSocket)

**Cons:**
- No code-mixed Kannada-English support
- Quality inferior to IndicConformerASR for pure Kannada
- Requires building/sideloading an Android app
- Cannot transcribe phone calls
- Gemini Nano XXS limitation means no advanced on-device AI
- Single-language only per session

### 7.2 Current: Panda-Based STT (Recommended to Keep)

```
Pixel 9a (mic → raw audio) → USB/WebSocket → Panda (STT) → text → Titan (LLM)
Titan → text response → Panda (TTS) → audio → Pixel speaker
```

**Pros:**
- IndicConformerASR: 145ms, 303 MB VRAM, best Kannada accuracy
- Whisper fallback: handles code-mixed input
- Full Python control over ASR pipeline
- Can switch models without touching the phone
- GPU-accelerated (RTX 5070 Ti)

**Cons:**
- Requires audio streaming (USB ADB or WebSocket)
- Audio leaves the phone (but stays on local network)
- Panda must be running

### 7.3 Verdict: Keep Panda Pipeline

The Pixel's on-device STT does not offer enough advantage to justify the architecture change:

| Factor | Pixel STT | Panda STT | Winner |
|--------|-----------|-----------|--------|
| Pure Kannada accuracy | Good (~86%) | Best (IndicConformer) | **Panda** |
| Code-mixed KN-EN | No | Whisper partial | **Panda** |
| English accuracy | Excellent | Whisper excellent | Tie |
| Latency | ~200-400ms streaming | 145ms batch | **Panda** (for batch) |
| Programmability | Limited API | Full Python | **Panda** |
| Model flexibility | Google's model only | Any model | **Panda** |
| Phone call access | No | N/A | Neither |
| Privacy | Audio on-device | Audio on LAN | **Pixel** (marginal) |
| Setup effort | Build Android app | Already running | **Panda** |

---

## 8. Future Options to Monitor

### 8.1 Sarvam Edge SDK (MOST PROMISING)

Announced February 2026 by Sarvam AI:
- **74M parameter ASR model**, 294 MB footprint
- **10 Indian languages including Kannada**
- **<300ms latency** on Snapdragon 8 Gen 3 (8.5x real-time)
- **Runs entirely on-device**, no cloud dependency
- **Automatic language identification** -- no manual language selection needed
- **SDK NOT YET PUBLIC** -- in development with OEM partners

**If Sarvam Edge SDK becomes available and supports code-mixed input, it would be a game-changer for on-device Indian language STT on Pixel.** Monitor: https://www.sarvam.ai/blogs/sarvam-edge

### 8.2 ML Kit GenAI Speech Recognition on Pixel 9a

Currently Pixel 10 only. Google says "more devices in development." If backported to Pixel 9a:
- Advanced mode uses on-device Gemini for broader language coverage
- May handle code-mixed input better than basic SpeechRecognizer
- API level 31+ (Pixel 9a qualifies)

### 8.3 Pixel 10a (Tensor G5)

The Pixel 10a (expected late 2026) with Tensor G5:
- 18-25% faster ML inference than G4
- May ship with Nano XS instead of XXS
- ML Kit GenAI Speech Recognition would be native
- Consider waiting if the 9a purchase isn't urgent

---

## 9. Practical Quick Win: Tasker Prototype

If you want to **test** Pixel on-device Kannada STT quality before the full Panda pipeline is ready:

1. Install Tasker + AutoVoice on Pixel 9a
2. Download Kannada offline language pack in Gboard
3. Create Tasker profile: Speech recognition → HTTP POST to Titan
4. Test with pure Kannada, pure English, and code-mixed speech
5. Evaluate: Does the LLM respond correctly despite transcription errors?

This takes ~30 minutes to set up and gives empirical data on whether on-device STT is "good enough" for the LLM pipeline.

---

## Sources

- [Google Research: An All-Neural On-Device Speech Recognizer](https://research.google/blog/an-all-neural-on-device-speech-recognizer/)
- [Android SpeechRecognizer API Reference](https://developer.android.com/reference/android/speech/SpeechRecognizer)
- [ML Kit GenAI Speech Recognition API](https://developers.google.com/ml-kit/genai/speech-recognition/android)
- [How the Pixel 9a's on-device AI is worse than the Pixel 9's](https://www.androidpolice.com/google-pixel-9a-gemini-nano-xxs/)
- [Pixel 9a Gemini Nano XXS limitations](https://www.androidauthority.com/pixel-9a-gemini-model-3536829/)
- [Gboard voice typing configuration](https://www.keyboardapps.net/gboard-voice-typing)
- [Sarvam Edge announcement](https://www.sarvam.ai/blogs/sarvam-edge)
- [Sarvam Edge guide (Analytics Vidhya)](https://www.analyticsvidhya.com/blog/2026/03/sarvam-edge/)
- [Android Developers Blog: Recorder with Gemini Nano](https://android-developers.googleblog.com/2024/08/recorder-app-on-pixel-sees-boost-in-engagement-with-gemini-nano.html)
- [Google Tensor G4 AI capabilities](https://www.allaboutai.com/ai-tech/google-pixel-9a-tensor-g4/)
- [Tasker/AutoVoice speech recognition](https://joaoapps.com/autovoice/)
- [Live Caption during calls](https://support.google.com/accessibility/android/answer/9350862?hl=en)
- [Google Pixel March 2026 Feature Drop](https://blog.google/products-and-platforms/devices/pixel/march-2026-pixel-drop/)
- [Android 16 accessibility features](https://www.engadget.com/mobile/smartphones/android-is-getting-a-slew-of-new-accessibility-features-190016358.html)
