skald

cobb/skald

Author	SHA1	Message	Date
Kayos	c9bd38034c	multi-voice: per-character dialogue rendering Schema: characters.voice_id + characters.slug (migration 0007). voice_id is FK to voices(id); slug is the stable lowercase token the narrate_prep pass uses inside [voice:slug]...[/voice]. Forge::narrate_prep takes &[CharacterSpeaker]. System prompt expanded to instruct the author to wrap dialogue lines in voice tags based on a roster supplied in the user prompt (slug + name + short hint from key_facts). Unattributed dialogue stays unwrapped and inherits the narrator voice. skald narrate substitutes [voice:<character-slug>] → [voice:<kokoro-voice-name>] right before sending to Kokoro, using characters.voice_id JOIN voices.reference_path as the map. Slugs with no voice or no character row fall back to the narrator voice defensively (logged as warn). kokoro_server.py v0.4: splitter recognises [voice:X]...[/voice] blocks at the paragraph level. Each text node carries an optional voice attribution; renderer feeds it to Kokoro per-segment. Outside voice blocks the request's default voice is used. voices_used is reported back so callers can verify multi-voice actually ran. Only kokoro-routed renders pre-process voice tags; F5 paths leave the tags in place (F5 multi-voice not implemented). Defensive fallback: orphan/unclosed [/voice] markers are silently absorbed rather than failing the render.	2026-05-14 08:35:33 -07:00
Kayos	330bc8bde2	migration 0005: idempotent ADD COLUMN IF NOT EXISTS Caught when redeploying after the 0006 patch: the live DB had migration 5 stamped with a stale checksum + the column already present, so neither re-apply nor checksum-only-fix worked cleanly. Making 0005 idempotent fixes both paths.	2026-05-13 20:32:41 -07:00
Kayos	2ed3d3373a	migration 0006: extend generation_runs.kind to allow narrate_prep Migration 0005 added the chapters.body_md_tts column but missed this check constraint update — caught at runtime when prepare-narration tried to insert kind='narrate_prep'. Postgres doesn't ALTER CHECK in place; we drop + re-add.	2026-05-13 20:28:59 -07:00
Kayos	89c35fd9d3	narrate: body_md_tts column + narrate_prep pass + Kokoro routing Two new things working together: 1. Migration 0005 adds chapters.body_md_tts (nullable). Narrate path prefers it over body_md when present — that's the annotated-for- audiobook variant. Falls back to body_md if not set. 2. New Forge::narrate_prep pass: author (or House) annotates prose with [breath] / [pause:Xs] / [scene] beat markers AND occasional humanizing narrator stumbles (em-dash repetition, self-correction, hesitation — sparingly, 1-3 per chapter). Apart from stumbles, the prose is verbatim. Author voice threads through. 3. New CLI: 'skald prepare-narration --chapter <uuid> [--author slug] [--overwrite]'. Records as generation_runs row kind=narrate_prep. 4. skald narrate now routes by voice.source — kokoro_* voices hit KOKORO_URL (Apache 2.0 stack, audiobook-tuned with the v0.2 render- and-stitch server), everything else hits F5_TTS_URL (voice-cloning path). Voice DB row carries source as the dispatch key. Why no new tag for narrator stumbles: em-dash repetition and self- correction are just prose patterns Kokoro reads correctly because of its punctuation cues. No new server-side machinery.	2026-05-13 20:24:38 -07:00
Kayos	713ba41977	v0.3 step 1: migration 0004 + authors module + web form panels Migration 0004 — authors + author_revisions + stories.author_id + stories.author_revision_id + stories.cross_story_memory + author_corpus. Soul versioning built in from day one per cobb's locked decisions: - authors.id immutable identity (slug + display_name + tagline + model) - author_revisions tracks each soul revision with n monotonic - Partial unique index 'idx_author_revisions_current' enforces exactly one is_current=true per author - stories.author_revision_id pins to the exact soul used at gen time (so 'this was the Orson Black active when chapter 8 was written' is always recoverable) - author_corpus tracks 'authored' + 'read' relationships for the v0.3 cross-story memory toggle skald-core::authors module — CRUD: get_by_slug, get_with_current_revision, get_current_revision, get_revision, create_or_get (idempotent), add_revision (transactional, demotes prior is_current=true), assign_to_story (also touches author_corpus). Web v0.1 forms (the second feedback bucket — 'no way to make new stories', 'no options for sequels'): handlers + form panels + POST routes for /stories/new and /stories/:id/continue. Both create a story stub with status='seed'; actual generation will be fired by 'skald continue' (next commit) walking seed rows. Norse visual revamp + mobile collapse deferred — vetting full gen is the priority per cobb's 'green light for v0.3'. Coming back to the aesthetic after the pipeline works end-to-end against a real Orson Black-authored Chapter 8 of Coast-Down.	2026-05-13 12:01:29 -07:00
Kayos	4a91e0738d	schema: narration_findings — audio-layer audit table Closes the TTS schema layer. The v0.2 render pipeline auto-runs an audit chain after each chapter narration: F5 render → narration_runs (succeeded) → ffmpeg chunk into ~30s windows → Whisper-large-v3 STT each chunk → word-level diff vs source chapter text → mismatches → narration_findings (kind=pronunciation\|skip\|insert) → ffmpeg silence/clip detect → narration_findings (kind=glitch) → (optional) Gemini Flash audio review pass → narration_findings (kind=prosody\|tone) → unresolved crits trigger automatic re-roll with new seed Distinct from audit_findings: that table is canon/continuity at the text layer, populated by the third-Opus canon-audit pass. narration_findings is audio-quality only, populated by detectors that consume the rendered WAV. The 'detector' field captures which model produced the finding so we can tune thresholds per detector when one over- or under-flags. cobb's audio agent intuition was right: STT-and-diff catches the 'name came out wrong' case airtight, and a separate audio-native LLM call catches the subtler 'this sentence sounded weird' cases Whisper can't see.	2026-05-13 10:10:04 -07:00
Kayos	465c94b745	schema: voices + pronunciation_overrides + narration_runs (v0.2 prep) TTS layer landed as schema-only — synthesis pipeline ships in v0.2. Putting the tables in v0.1 means imports already carry the right shape; we won't need a 'migrate every existing story' pass later. Decisions locked 2026-05-13: - Engine: F5-TTS (best 8GB FOSS option, mid-2026 SOTA) - Default voice source: LJ Speech (Linda Johnson, PD released specifically for TTS training — airtight for sharing/uploading generated audio. The 'AI-consent-released' license posture is the difference between 'should be fine' and 'definitely fine.') - Variety voices: Hi-Fi TTS speaker IDs (Apache 2.0, same consent shape). LibriVox is optional but never default. - Pronunciation overrides DB layer (story-scoped + global) to fix proper-noun mispronunciation — the actual TTS-quality gap on Cobb's bar of 'must not wake me up.' Pre-pass with Opus extracts proper nouns + IPA, operator verifies, table caches forever. Tables: - voices — name, license, reference_path/text, sample_rate, default flag - pronunciation_overrides — story-scoped or global, IPA/arpabet - narration_runs — TTS audit trail mirroring generation_runs - stories.preferred_voice_id FK Unique constraints: - one default voice (partial index) - one row per (story, word) override - one global row per word	2026-05-13 10:07:32 -07:00
Kayos	f575ad3722	scaffold v0.1: postgres+pgvector inside-container, schema, markdown ingest, CLI Skald is a generic story-writer. The database is the product; the binary is the tooling. Everything story-specific lives in rows, not in code. cwho's monorepo + binary-per-role pattern transplanted to this domain. What this commit ships: - Cargo workspace (resolver=3, edition 2024): skald-core (lib) + skald (bin) - Migration 0001: stories, characters, canon_facts, chapters, chapter_summaries, passages (vector(1536)), generation_runs, audit_findings, tags. pgvector + pg_trgm extensions. ivfflat index deferred until we have data (post-import the first ~1k passages and add the index). - skald-core::ingest — markdown parser for the cwho/coast-down shape: '# Title' → '## Chapter N — date' headings → '# Continuity Bible' section with character roster (real + fictional sub-sections) + setting / mystery / historical / liberty / hook sub-sections. Decomposed into structured rows; original bullet body preserved in key_facts/body fields for fidelity. 6 unit tests cover the shape. - skald-core::db — Postgres connection pool + migration runner. - skald-core::models — row types via sqlx::FromRow. - skald binary — clap CLI: 'serve' (http + migrations) and 'import-markdown' (one-shot ingest). - Dockerfile — multi-stage: rust:1.95-bookworm builder, pgvector/ pgvector:pg17 runtime, tini under PID 1, custom entrypoint.sh that boots embedded postgres then execs skald serve. - compose.yml — singleton container, postgres data in volume, story corpus mounted read-only at /seed. Decisions locked 2026-05-13: 1. DB in same container 'till we have a real working tool' (cobb) 2. postgres+pgvector (NOT sqlite) — keeps semantic-search story 3. Network-not-socket connection (postgresql://localhost:5432) from day one so future split is config-only, not code-rewrite Not yet wired: - Web UI - clawdforge calls (gen → cleanup → canon-audit pipeline) - Embedding pass - TTS sidecar	2026-05-13 09:04:28 -07:00

8 commits