skald

cobb/skald

History

Kayos 465c94b745 schema: voices + pronunciation_overrides + narration_runs (v0.2 prep) TTS layer landed as schema-only — synthesis pipeline ships in v0.2. Putting the tables in v0.1 means imports already carry the right shape; we won't need a 'migrate every existing story' pass later. Decisions locked 2026-05-13: - Engine: F5-TTS (best 8GB FOSS option, mid-2026 SOTA) - Default voice source: LJ Speech (Linda Johnson, PD released specifically for TTS training — airtight for sharing/uploading generated audio. The 'AI-consent-released' license posture is the difference between 'should be fine' and 'definitely fine.') - Variety voices: Hi-Fi TTS speaker IDs (Apache 2.0, same consent shape). LibriVox is optional but never default. - Pronunciation overrides DB layer (story-scoped + global) to fix proper-noun mispronunciation — the actual TTS-quality gap on Cobb's bar of 'must not wake me up.' Pre-pass with Opus extracts proper nouns + IPA, operator verifies, table caches forever. Tables: - voices — name, license, reference_path/text, sample_rate, default flag - pronunciation_overrides — story-scoped or global, IPA/arpabet - narration_runs — TTS audit trail mirroring generation_runs - stories.preferred_voice_id FK Unique constraints: - one default voice (partial index) - one row per (story, word) override - one global row per word	2026-05-13 10:07:32 -07:00
..
0001_init.sql	scaffold v0.1: postgres+pgvector inside-container, schema, markdown ingest, CLI	2026-05-13 09:04:28 -07:00
0002_voices_and_pronunciation.sql	schema: voices + pronunciation_overrides + narration_runs (v0.2 prep)	2026-05-13 10:07:32 -07:00

Kayos 465c94b745 schema: voices + pronunciation_overrides + narration_runs (v0.2 prep)

TTS layer landed as schema-only — synthesis pipeline ships in v0.2.
Putting the tables in v0.1 means imports already carry the right
shape; we won't need a 'migrate every existing story' pass later.

Decisions locked 2026-05-13:
- Engine: F5-TTS (best 8GB FOSS option, mid-2026 SOTA)
- Default voice source: LJ Speech (Linda Johnson, PD released
  specifically for TTS training — airtight for sharing/uploading
  generated audio. The 'AI-consent-released' license posture is
  the difference between 'should be fine' and 'definitely fine.')
- Variety voices: Hi-Fi TTS speaker IDs (Apache 2.0, same consent
  shape). LibriVox is optional but never default.
- Pronunciation overrides DB layer (story-scoped + global) to fix
  proper-noun mispronunciation — the actual TTS-quality gap on
  Cobb's bar of 'must not wake me up.' Pre-pass with Opus extracts
  proper nouns + IPA, operator verifies, table caches forever.

Tables:
- voices — name, license, reference_path/text, sample_rate, default flag
- pronunciation_overrides — story-scoped or global, IPA/arpabet
- narration_runs — TTS audit trail mirroring generation_runs
- stories.preferred_voice_id FK

Unique constraints:
- one default voice (partial index)
- one row per (story, word) override
- one global row per word

2026-05-13 10:07:32 -07:00

0001_init.sql

scaffold v0.1: postgres+pgvector inside-container, schema, markdown ingest, CLI

2026-05-13 09:04:28 -07:00

0002_voices_and_pronunciation.sql

schema: voices + pronunciation_overrides + narration_runs (v0.2 prep)

2026-05-13 10:07:32 -07:00