skald

cobb/skald

History

Kayos 4a91e0738d schema: narration_findings — audio-layer audit table Closes the TTS schema layer. The v0.2 render pipeline auto-runs an audit chain after each chapter narration: F5 render → narration_runs (succeeded) → ffmpeg chunk into ~30s windows → Whisper-large-v3 STT each chunk → word-level diff vs source chapter text → mismatches → narration_findings (kind=pronunciation\|skip\|insert) → ffmpeg silence/clip detect → narration_findings (kind=glitch) → (optional) Gemini Flash audio review pass → narration_findings (kind=prosody\|tone) → unresolved crits trigger automatic re-roll with new seed Distinct from audit_findings: that table is canon/continuity at the text layer, populated by the third-Opus canon-audit pass. narration_findings is audio-quality only, populated by detectors that consume the rendered WAV. The 'detector' field captures which model produced the finding so we can tune thresholds per detector when one over- or under-flags. cobb's audio agent intuition was right: STT-and-diff catches the 'name came out wrong' case airtight, and a separate audio-native LLM call catches the subtler 'this sentence sounded weird' cases Whisper can't see.		2026-05-13 10:10:04 -07:00
..
0001_init.sql	scaffold v0.1: postgres+pgvector inside-container, schema, markdown ingest, CLI	2026-05-13 09:04:28 -07:00
0002_voices_and_pronunciation.sql	schema: voices + pronunciation_overrides + narration_runs (v0.2 prep)	2026-05-13 10:07:32 -07:00
0003_narration_findings.sql	schema: narration_findings — audio-layer audit table	2026-05-13 10:10:04 -07:00