Closes the TTS schema layer. The v0.2 render pipeline auto-runs an
audit chain after each chapter narration:
F5 render → narration_runs (succeeded)
→ ffmpeg chunk into ~30s windows
→ Whisper-large-v3 STT each chunk
→ word-level diff vs source chapter text
→ mismatches → narration_findings (kind=pronunciation|skip|insert)
→ ffmpeg silence/clip detect → narration_findings (kind=glitch)
→ (optional) Gemini Flash audio review pass
→ narration_findings (kind=prosody|tone)
→ unresolved crits trigger automatic re-roll with new seed
Distinct from audit_findings: that table is canon/continuity at the
text layer, populated by the third-Opus canon-audit pass.
narration_findings is audio-quality only, populated by detectors
that consume the rendered WAV.
The 'detector' field captures which model produced the finding so
we can tune thresholds per detector when one over- or under-flags.
cobb's audio agent intuition was right: STT-and-diff catches the
'name came out wrong' case airtight, and a separate audio-native
LLM call catches the subtler 'this sentence sounded weird' cases
Whisper can't see.