# Research: Transcript Correction UI & Feedback Loop

**Date:** 2026-03-12
**Trigger:** Rajesh asked "Sometimes especially some Indian names, places etc, the STT gets it wrong. Is there a way I can edit or correct the data?"
**Decision:** Build a 4-phase transcript correction system: backend API → WhisperX hotwords → dashboard UI → feedback loop.

## Existing Infrastructure (already built)

| Component | File | Status | What it does |
|-----------|------|--------|-------------|
| ASR fuzzy matching | `services/context-engine/asr_correct.py` | DEPLOYED | Levenshtein matching (≥0.7) against top 500 known entities |
| LLM post-correction | `services/audio-pipeline/llm_correct.py` | OFF (`LLM_CORRECT_ENABLED=0`) | Qwen3.5-9B rewrites transcript with known names |
| Entity validation API | `main.py: PATCH /v1/entities/validations/{id}` | DEPLOYED | Accept/reject extracted entities (no UI) |
| Speaker/emotion feedback | `audio-pipeline: POST /v1/feedback` | DEPLOYED | Correct speaker label + emotion, sets `pinned=true` |
| Segment pinning | `db.py: delete_overlapping_unpinned()` | DEPLOYED | Pinned segments survive background sweep re-transcription |

**Gap:** No transcript-level text correction. No dashboard UI for any corrections. No `initial_prompt` hotwords in WhisperX.

## Industry Approaches

### 3-Tier Correction Model

| Tier | When | Example | Latency cost |
|------|------|---------|-------------|
| Pre-decode (hotwords) | Before STT | Whisper `initial_prompt`, Deepgram custom vocabulary | Zero |
| Post-decode (automated) | After STT, before storage | `asr_correct.py`, `llm_correct.py` | ~50-200ms |
| Human-in-the-loop (UI) | User reviews transcript | Otter.ai edit mode, Descript correction wizard | Manual |

### Otter.ai UX Pattern
- Click "Edit" button → enter edit mode (not always-on)
- Blue highlight tracks word-by-word during playback
- Click any word → jump playhead to timestamp
- Inline contentEditable correction
- Speaker labels per paragraph, timestamps per paragraph
- **Not open-source.** No usable API (closed beta only).

### Descript UX Pattern (most sophisticated)
- "Edit like a doc" — transcript IS the editing surface
- `Option+C` enters "Correct text" mode (blue outline, corrections don't affect audio)
- `Hold E + click word` → correct that word (pauses playback, resumes after)
- Low-confidence word flagging + "Correction Wizard" (guided error review)
- Wordbar: visual timeline showing word alignment boundaries
- **Not open-source.** Limited partner API (media import/export only).

## Open-Source Alternatives Evaluated

### Tier 1: Best fit for vanilla TypeScript (her-os dashboard)

| Project | License | Tech Stack | Word-Level | Stars |
|---------|---------|------------|-----------|-------|
| **hyperaudio-lite** | MIT | Vanilla JS, zero deps | Yes (`data-m`, `data-d` attributes) | 159 |
| hyperaudio-lite-editor | AGPL-3.0 | Vanilla JS + contentEditable | Yes | — |
| oTranscribe | MIT | Vanilla JS | Paragraph-level only | — |

**hyperaudio-lite** is the recommended foundation:
```html
<span data-m="12340" data-d="500">Hello</span>
<span data-m="12840" data-d="300">world</span>
```
- `data-m` = start time in ms, `data-d` = duration in ms
- Click any word → jump audio playhead
- Synchronized word highlighting during playback
- Zero dependencies, MIT licensed, ~5KB

### Tier 2: React-based (architecture reference only)

| Project | License | Notes |
|---------|---------|-------|
| pietrop/slate-transcript-editor | MIT | DPE data format is good reference |
| BBC react-transcript-editor | MIT | Draft.js (deprecated), archived |

**DPE (Digital Paper Edit) format** — good internal data model reference:
```json
{
  "words": [{"start": 1.5, "end": 1.8, "text": "Hello"}],
  "paragraphs": [{"speaker": "Rajesh", "start": 1.5, "end": 5.2}]
}
```

### Tier 3: Full applications (too heavy to embed)

| Project | License | Why not |
|---------|---------|---------|
| Scriberr | MIT | Full Go+Svelte app |
| transcribee | AGPL-3.0 | Full Python+CRDT stack |
| WGBH transcript-editor | MIT | Ruby on Rails monolith |

### Decision: hyperaudio-lite (MIT) + custom editing layer

- Use hyperaudio-lite's span model for display + click-to-seek
- Build custom editing layer (~200-400 lines TS) for contentEditable correction
- Don't use AGPL hyperaudio-lite-editor — write our own to avoid license issues
- Reference DPE format for internal data model

## WhisperX `initial_prompt` Integration

**Current state:** WhisperX `transcribe()` calls in `pipeline.py` do NOT use `initial_prompt`.

**Integration point:** faster-whisper (WhisperX backend) supports `initial_prompt` parameter in `transcribe()`:
```python
result = self._whisper_model.transcribe(
    audio, batch_size=16, language=pin_lang,
    initial_prompt="Rajesh, Koramangala, Bengaluru, Amma"  # NEW
)
```

**Constraint:** Whisper's prompt window is 224 tokens. Limit to top ~100 most-corrected proper nouns.

**Feedback loop:**
1. User corrects "Koramangla" → "Koramangala" in dashboard UI
2. Correction stored in `segment_corrections` table
3. Audio-pipeline refreshes hotword list every 5 minutes
4. Next STT session uses corrected names as `initial_prompt`
5. Whisper decoder favors these spellings → fewer errors

## Implementation Plan (5 phases)

See plan file for full details. Summary:

| Phase | What | Effort |
|-------|------|--------|
| 1 | Backend: session browse + segment correct API + DB table | Medium |
| 2 | WhisperX: wire `initial_prompt` with correction hotwords | Small |
| 3 | Dashboard: transcript correction panel (hyperaudio-lite pattern) | Large |
| 4 | Feedback loop: corrections → hotwords → better STT | Small |
| 5 | Verification + deploy | Small |

## Anti-Patterns to Avoid

1. **Don't mutate JSONL files** — JSONL is immutable source of truth. Corrections update PostgreSQL only.
2. **Don't skip `pinned=true`** — Background sweep will overwrite unpinned corrections.
3. **Don't exceed 224 tokens in `initial_prompt`** — Whisper's prompt window is limited.
4. **Don't use React/Vue** — Dashboard is vanilla TypeScript. No framework dependencies.
5. **Don't auto-save on every keystroke** — Wait for Enter/blur to submit corrections.
6. **Don't normalize corrections across sessions** — Each correction is session-specific evidence.

## References

- [hyperaudio-lite (MIT)](https://github.com/hyperaudio/hyperaudio-lite)
- [hyperaudio-lite-editor (AGPL)](https://github.com/hyperaudio/hyperaudio-lite-editor)
- [oTranscribe (MIT)](https://github.com/oTranscribe/oTranscribe)
- [pietrop/slate-transcript-editor (MIT)](https://github.com/pietrop/slate-transcript-editor)
- [Descript correction mode](https://help.descript.com/hc/en-us/articles/10119613609229-Correct-your-transcript)
- [Otter.ai edit mode](https://help.otter.ai/hc/en-us/articles/360047731754-Edit-a-conversation)
- [WhisperX GitHub](https://github.com/m-bain/whisperX)