Commit graph

12 commits

Author SHA1 Message Date
2820d173e8 forge: dedup pass — the fix half of the audit loop
Adds `skald dedup --story <id>`: reads a story's most recent
prose-audit findings and walks every chapter, handing the author
the chapter prose + the findings with instructions to rephrase
ONLY the flagged repetitions (each recurrence made distinct) and
fix flagged continuity errors — everything else stays verbatim.
A surgical dedup, not a rewrite. Overwrites body_md, clears
body_md_tts so the chapter is re-prepped before narration. High
effort (prose-craft). Migration 0011 adds the 'dedup' run kind.

Completes the QC loop: audit (find) -> dedup (fix) -> re-audit.
2026-05-15 14:49:08 -07:00
fd7a34ac1d forge: prose-quality audit pass + anti-repetition directives
Adds `skald audit --story <id>`: a whole-story QC pass that reads
every chapter end to end and flags repetition, template tics,
self-restatement and continuity drift — the gate before a story
goes to narration, where repetition a silent reader skims is
glaring read aloud. Runs at max effort (real reasoning work,
worth the spend); findings land in audit_findings and print.

Also hardens the gen + cleanup directives to hunt repetition at
the source: re-phrase recurring motifs fresh, no stacked template
anaphora, dialogue echoed verbatim at most once.

Migration 0010: 'prose_audit' generation_runs.kind, 'repetition'
audit_findings.area.
2026-05-15 11:19:04 -07:00
575749b774 web: audiobook player — stitched-file playback with chapter seek
Adds GET /stories/{id}/listen: one <audio> element over a story's
stitched audiobook file plus a clickable chapter list. Clicking a
chapter seeks; the chapter under the playhead highlights as it
plays. Chapter offsets are summed from each chapter's latest
succeeded narration_run duration — the same order the file was
stitched. One small inline script, the web UI's first JS.

New stories.audiobook_path column (migration 0009) holds the
served path; the story page shows a "listen" action when set.
2026-05-15 07:30:56 -07:00
d2442f0a87 forge: rewrite pass — re-author prose in an author's voice
New Forge::rewrite + PassKind::Rewrite. An author re-authors
existing chapter prose entirely in their voice — sentence rhythm,
word choice, paragraph shape all become theirs — while canon
(names, dates, places, events, order, technical facts) is preserved
exactly. Not editing; re-authoring. SystemMode::Replace, max effort.

skald rewrite --chapter <uuid> [--author slug] overwrites body_md
with the rewritten version. The pre-rewrite prose is stashed in the
new chapters.body_md_original column on first rewrite (migration
0008, idempotent) so the original is never lost. body_md_tts is
cleared — it was annotated against the old prose and must be
regenerated by a fresh prepare-narration.

prepare-narration gains --single-voice: skips the character speaker
roster so no [voice:X] dialogue tags are inserted, only beat
markers. Right for one-voice narration.

Migration 0008 also extends generation_runs.kind to allow 'rewrite'.
2026-05-14 21:35:20 -07:00
c9bd38034c multi-voice: per-character dialogue rendering
Schema: characters.voice_id + characters.slug (migration 0007).
voice_id is FK to voices(id); slug is the stable lowercase token
the narrate_prep pass uses inside [voice:slug]...[/voice].

Forge::narrate_prep takes &[CharacterSpeaker]. System prompt
expanded to instruct the author to wrap dialogue lines in voice
tags based on a roster supplied in the user prompt (slug + name +
short hint from key_facts). Unattributed dialogue stays unwrapped
and inherits the narrator voice.

skald narrate substitutes [voice:<character-slug>] →
[voice:<kokoro-voice-name>] right before sending to Kokoro, using
characters.voice_id JOIN voices.reference_path as the map. Slugs
with no voice or no character row fall back to the narrator voice
defensively (logged as warn).

kokoro_server.py v0.4: splitter recognises [voice:X]...[/voice]
blocks at the paragraph level. Each text node carries an optional
voice attribution; renderer feeds it to Kokoro per-segment. Outside
voice blocks the request's default voice is used. voices_used is
reported back so callers can verify multi-voice actually ran.

Only kokoro-routed renders pre-process voice tags; F5 paths leave
the tags in place (F5 multi-voice not implemented). Defensive
fallback: orphan/unclosed [/voice] markers are silently absorbed
rather than failing the render.
2026-05-14 08:35:33 -07:00
330bc8bde2 migration 0005: idempotent ADD COLUMN IF NOT EXISTS
Caught when redeploying after the 0006 patch: the live DB had
migration 5 stamped with a stale checksum + the column already
present, so neither re-apply nor checksum-only-fix worked cleanly.
Making 0005 idempotent fixes both paths.
2026-05-13 20:32:41 -07:00
2ed3d3373a migration 0006: extend generation_runs.kind to allow narrate_prep
Migration 0005 added the chapters.body_md_tts column but missed
this check constraint update — caught at runtime when
prepare-narration tried to insert kind='narrate_prep'.

Postgres doesn't ALTER CHECK in place; we drop + re-add.
2026-05-13 20:28:59 -07:00
89c35fd9d3 narrate: body_md_tts column + narrate_prep pass + Kokoro routing
Two new things working together:

1. Migration 0005 adds chapters.body_md_tts (nullable). Narrate path
   prefers it over body_md when present — that's the annotated-for-
   audiobook variant. Falls back to body_md if not set.

2. New Forge::narrate_prep pass: author (or House) annotates prose
   with [breath] / [pause:Xs] / [scene] beat markers AND occasional
   humanizing narrator stumbles (em-dash repetition, self-correction,
   hesitation — sparingly, 1-3 per chapter). Apart from stumbles, the
   prose is verbatim. Author voice threads through.

3. New CLI: 'skald prepare-narration --chapter <uuid> [--author slug]
   [--overwrite]'. Records as generation_runs row kind=narrate_prep.

4. skald narrate now routes by voice.source — kokoro_* voices hit
   KOKORO_URL (Apache 2.0 stack, audiobook-tuned with the v0.2 render-
   and-stitch server), everything else hits F5_TTS_URL (voice-cloning
   path). Voice DB row carries source as the dispatch key.

Why no new tag for narrator stumbles: em-dash repetition and self-
correction are just prose patterns Kokoro reads correctly because of
its punctuation cues. No new server-side machinery.
2026-05-13 20:24:38 -07:00
713ba41977 v0.3 step 1: migration 0004 + authors module + web form panels
Migration 0004 — authors + author_revisions + stories.author_id +
stories.author_revision_id + stories.cross_story_memory +
author_corpus. Soul versioning built in from day one per cobb's
locked decisions:
- authors.id immutable identity (slug + display_name + tagline + model)
- author_revisions tracks each soul revision with n monotonic
- Partial unique index 'idx_author_revisions_current' enforces
  exactly one is_current=true per author
- stories.author_revision_id pins to the exact soul used at gen
  time (so 'this was the Orson Black active when chapter 8 was
  written' is always recoverable)
- author_corpus tracks 'authored' + 'read' relationships for the
  v0.3 cross-story memory toggle

skald-core::authors module — CRUD: get_by_slug,
get_with_current_revision, get_current_revision, get_revision,
create_or_get (idempotent), add_revision (transactional, demotes
prior is_current=true), assign_to_story (also touches
author_corpus).

Web v0.1 forms (the second feedback bucket — 'no way to make new
stories', 'no options for sequels'): handlers + form panels +
POST routes for /stories/new and /stories/:id/continue. Both
create a story stub with status='seed'; actual generation will be
fired by 'skald continue' (next commit) walking seed rows.

Norse visual revamp + mobile collapse deferred — vetting full gen
is the priority per cobb's 'green light for v0.3'. Coming back to
the aesthetic after the pipeline works end-to-end against a real
Orson Black-authored Chapter 8 of Coast-Down.
2026-05-13 12:01:29 -07:00
4a91e0738d schema: narration_findings — audio-layer audit table
Closes the TTS schema layer. The v0.2 render pipeline auto-runs an
audit chain after each chapter narration:

  F5 render → narration_runs (succeeded)
    → ffmpeg chunk into ~30s windows
    → Whisper-large-v3 STT each chunk
    → word-level diff vs source chapter text
    → mismatches → narration_findings (kind=pronunciation|skip|insert)
    → ffmpeg silence/clip detect → narration_findings (kind=glitch)
    → (optional) Gemini Flash audio review pass
      → narration_findings (kind=prosody|tone)
    → unresolved crits trigger automatic re-roll with new seed

Distinct from audit_findings: that table is canon/continuity at the
text layer, populated by the third-Opus canon-audit pass.
narration_findings is audio-quality only, populated by detectors
that consume the rendered WAV.

The 'detector' field captures which model produced the finding so
we can tune thresholds per detector when one over- or under-flags.

cobb's audio agent intuition was right: STT-and-diff catches the
'name came out wrong' case airtight, and a separate audio-native
LLM call catches the subtler 'this sentence sounded weird' cases
Whisper can't see.
2026-05-13 10:10:04 -07:00
465c94b745 schema: voices + pronunciation_overrides + narration_runs (v0.2 prep)
TTS layer landed as schema-only — synthesis pipeline ships in v0.2.
Putting the tables in v0.1 means imports already carry the right
shape; we won't need a 'migrate every existing story' pass later.

Decisions locked 2026-05-13:
- Engine: F5-TTS (best 8GB FOSS option, mid-2026 SOTA)
- Default voice source: LJ Speech (Linda Johnson, PD released
  specifically for TTS training — airtight for sharing/uploading
  generated audio. The 'AI-consent-released' license posture is
  the difference between 'should be fine' and 'definitely fine.')
- Variety voices: Hi-Fi TTS speaker IDs (Apache 2.0, same consent
  shape). LibriVox is optional but never default.
- Pronunciation overrides DB layer (story-scoped + global) to fix
  proper-noun mispronunciation — the actual TTS-quality gap on
  Cobb's bar of 'must not wake me up.' Pre-pass with Opus extracts
  proper nouns + IPA, operator verifies, table caches forever.

Tables:
- voices — name, license, reference_path/text, sample_rate, default flag
- pronunciation_overrides — story-scoped or global, IPA/arpabet
- narration_runs — TTS audit trail mirroring generation_runs
- stories.preferred_voice_id FK

Unique constraints:
- one default voice (partial index)
- one row per (story, word) override
- one global row per word
2026-05-13 10:07:32 -07:00
f575ad3722 scaffold v0.1: postgres+pgvector inside-container, schema, markdown ingest, CLI
Skald is a generic story-writer. The database is the product; the
binary is the tooling. Everything story-specific lives in rows, not
in code. cwho's monorepo + binary-per-role pattern transplanted to
this domain.

What this commit ships:
- Cargo workspace (resolver=3, edition 2024): skald-core (lib) +
  skald (bin)
- Migration 0001: stories, characters, canon_facts, chapters,
  chapter_summaries, passages (vector(1536)), generation_runs,
  audit_findings, tags. pgvector + pg_trgm extensions. ivfflat
  index deferred until we have data (post-import the first ~1k
  passages and add the index).
- skald-core::ingest — markdown parser for the cwho/coast-down shape:
  '# Title' → '## Chapter N — date' headings → '# Continuity Bible'
  section with character roster (real + fictional sub-sections) +
  setting / mystery / historical / liberty / hook sub-sections.
  Decomposed into structured rows; original bullet body preserved
  in key_facts/body fields for fidelity. 6 unit tests cover the
  shape.
- skald-core::db — Postgres connection pool + migration runner.
- skald-core::models — row types via sqlx::FromRow.
- skald binary — clap CLI: 'serve' (http + migrations) and
  'import-markdown' (one-shot ingest).
- Dockerfile — multi-stage: rust:1.95-bookworm builder, pgvector/
  pgvector:pg17 runtime, tini under PID 1, custom entrypoint.sh
  that boots embedded postgres then execs skald serve.
- compose.yml — singleton container, postgres data in volume,
  story corpus mounted read-only at /seed.

Decisions locked 2026-05-13:
1. DB in same container 'till we have a real working tool' (cobb)
2. postgres+pgvector (NOT sqlite) — keeps semantic-search story
3. Network-not-socket connection (postgresql://localhost:5432) from
   day one so future split is config-only, not code-rewrite

Not yet wired:
- Web UI
- clawdforge calls (gen → cleanup → canon-audit pipeline)
- Embedding pass
- TTS sidecar
2026-05-13 09:04:28 -07:00