skald

4 commits 3 branches 0 tags 304 KiB

Author	SHA1	Message	Date
Kayos	f71b533e52	v0.2 scaffold: vendor clawdforge SDK + forge module + Whisper plan The Rust SDK already existed at Sulkta-Coop/clawdforge clients/rust/ — async, reqwest-based, bearer-auth, exposes Client::run() + Session for multi-turn. Vendoring it into vendor/clawdforge so skald is self-contained: no git-submodule + no needing the clawdforge repo cloned next to skald. Trade-off accepted: updates require manual re-copy until both sides stabilize and we publish to a private cargo registry. What landed: - vendor/clawdforge/ — full SDK source from Sulkta-Coop/clawdforge HEAD. Pinned in skald-core/Cargo.toml as a path dep. - skald-core/src/forge.rs — three-pass orchestration shell. Forge wraps clawdforge::Client; generate() / cleanup() / audit() each build a RunRequest with the right system prompt + model alias (always opus), call client.run(), return a PassOutput. Prompt templates are TODO stubs (SYSTEM_GEN_TODO etc) — filling in the actual prose-craft prompts is its own deep session. - skald-core/src/config.rs — ForgeConfig { base_url, app_token, model }. Resolved by the binary from env (CLAWDFORGE_URL + CLAWDFORGE_TOKEN); lib stays env-agnostic. - skald-core::AuditFinding + AuditResponse — parse shape for what the third-Opus canon audit returns, ready to map onto audit_findings rows. - docs/tts-pipeline.md — full plan for v0.2 narration + post-TTS audit chain. Whisper-large-v3 STT does text-to-text verification on every render; an optional Gemini Flash audio pass catches subjective issues (prosody, tone) Whisper can't see. Reroll loop on crit findings. What's still stubbed: - Prompt templates in forge.rs (gen / cleanup / audit) — placeholders that describe the role but don't constrain output shape yet. - context.rs (assemble the LLM context blob from DB rows) — entire module TBD. - No CLI subcommand yet for invoking forge — that comes after context.rs. Naming note: in Rust 2024 'gen' is a reserved keyword (for generators), so the method is Forge::generate(), not Forge::gen().	2026-05-13 10:18:56 -07:00
Kayos	4a91e0738d	schema: narration_findings — audio-layer audit table Closes the TTS schema layer. The v0.2 render pipeline auto-runs an audit chain after each chapter narration: F5 render → narration_runs (succeeded) → ffmpeg chunk into ~30s windows → Whisper-large-v3 STT each chunk → word-level diff vs source chapter text → mismatches → narration_findings (kind=pronunciation\|skip\|insert) → ffmpeg silence/clip detect → narration_findings (kind=glitch) → (optional) Gemini Flash audio review pass → narration_findings (kind=prosody\|tone) → unresolved crits trigger automatic re-roll with new seed Distinct from audit_findings: that table is canon/continuity at the text layer, populated by the third-Opus canon-audit pass. narration_findings is audio-quality only, populated by detectors that consume the rendered WAV. The 'detector' field captures which model produced the finding so we can tune thresholds per detector when one over- or under-flags. cobb's audio agent intuition was right: STT-and-diff catches the 'name came out wrong' case airtight, and a separate audio-native LLM call catches the subtler 'this sentence sounded weird' cases Whisper can't see.	2026-05-13 10:10:04 -07:00
Kayos	465c94b745	schema: voices + pronunciation_overrides + narration_runs (v0.2 prep) TTS layer landed as schema-only — synthesis pipeline ships in v0.2. Putting the tables in v0.1 means imports already carry the right shape; we won't need a 'migrate every existing story' pass later. Decisions locked 2026-05-13: - Engine: F5-TTS (best 8GB FOSS option, mid-2026 SOTA) - Default voice source: LJ Speech (Linda Johnson, PD released specifically for TTS training — airtight for sharing/uploading generated audio. The 'AI-consent-released' license posture is the difference between 'should be fine' and 'definitely fine.') - Variety voices: Hi-Fi TTS speaker IDs (Apache 2.0, same consent shape). LibriVox is optional but never default. - Pronunciation overrides DB layer (story-scoped + global) to fix proper-noun mispronunciation — the actual TTS-quality gap on Cobb's bar of 'must not wake me up.' Pre-pass with Opus extracts proper nouns + IPA, operator verifies, table caches forever. Tables: - voices — name, license, reference_path/text, sample_rate, default flag - pronunciation_overrides — story-scoped or global, IPA/arpabet - narration_runs — TTS audit trail mirroring generation_runs - stories.preferred_voice_id FK Unique constraints: - one default voice (partial index) - one row per (story, word) override - one global row per word	2026-05-13 10:07:32 -07:00
Kayos	f575ad3722	scaffold v0.1: postgres+pgvector inside-container, schema, markdown ingest, CLI Skald is a generic story-writer. The database is the product; the binary is the tooling. Everything story-specific lives in rows, not in code. cwho's monorepo + binary-per-role pattern transplanted to this domain. What this commit ships: - Cargo workspace (resolver=3, edition 2024): skald-core (lib) + skald (bin) - Migration 0001: stories, characters, canon_facts, chapters, chapter_summaries, passages (vector(1536)), generation_runs, audit_findings, tags. pgvector + pg_trgm extensions. ivfflat index deferred until we have data (post-import the first ~1k passages and add the index). - skald-core::ingest — markdown parser for the cwho/coast-down shape: '# Title' → '## Chapter N — date' headings → '# Continuity Bible' section with character roster (real + fictional sub-sections) + setting / mystery / historical / liberty / hook sub-sections. Decomposed into structured rows; original bullet body preserved in key_facts/body fields for fidelity. 6 unit tests cover the shape. - skald-core::db — Postgres connection pool + migration runner. - skald-core::models — row types via sqlx::FromRow. - skald binary — clap CLI: 'serve' (http + migrations) and 'import-markdown' (one-shot ingest). - Dockerfile — multi-stage: rust:1.95-bookworm builder, pgvector/ pgvector:pg17 runtime, tini under PID 1, custom entrypoint.sh that boots embedded postgres then execs skald serve. - compose.yml — singleton container, postgres data in volume, story corpus mounted read-only at /seed. Decisions locked 2026-05-13: 1. DB in same container 'till we have a real working tool' (cobb) 2. postgres+pgvector (NOT sqlite) — keeps semantic-search story 3. Network-not-socket connection (postgresql://localhost:5432) from day one so future split is config-only, not code-rewrite Not yet wired: - Web UI - clawdforge calls (gen → cleanup → canon-audit pipeline) - Embedding pass - TTS sidecar	2026-05-13 09:04:28 -07:00