The Rust SDK already existed at Sulkta-Coop/clawdforge clients/rust/ — async,
reqwest-based, bearer-auth, exposes Client::run() + Session for multi-turn.
Vendoring it into vendor/clawdforge so skald is self-contained: no
git-submodule + no needing the clawdforge repo cloned next to skald.
Trade-off accepted: updates require manual re-copy until both sides
stabilize and we publish to a private cargo registry.
What landed:
- vendor/clawdforge/ — full SDK source from Sulkta-Coop/clawdforge HEAD.
Pinned in skald-core/Cargo.toml as a path dep.
- skald-core/src/forge.rs — three-pass orchestration shell. Forge wraps
clawdforge::Client; generate() / cleanup() / audit() each build a
RunRequest with the right system prompt + model alias (always opus),
call client.run(), return a PassOutput.
Prompt templates are TODO stubs (SYSTEM_GEN_TODO etc) — filling in the
actual prose-craft prompts is its own deep session.
- skald-core/src/config.rs — ForgeConfig { base_url, app_token, model }.
Resolved by the binary from env (CLAWDFORGE_URL + CLAWDFORGE_TOKEN);
lib stays env-agnostic.
- skald-core::AuditFinding + AuditResponse — parse shape for what the
third-Opus canon audit returns, ready to map onto audit_findings rows.
- docs/tts-pipeline.md — full plan for v0.2 narration + post-TTS audit
chain. Whisper-large-v3 STT does text-to-text verification on every
render; an optional Gemini Flash audio pass catches subjective issues
(prosody, tone) Whisper can't see. Reroll loop on crit findings.
What's still stubbed:
- Prompt templates in forge.rs (gen / cleanup / audit) — placeholders
that describe the role but don't constrain output shape yet.
- context.rs (assemble the LLM context blob from DB rows) — entire module
TBD.
- No CLI subcommand yet for invoking forge — that comes after context.rs.
Naming note: in Rust 2024 'gen' is a reserved keyword (for generators),
so the method is Forge::generate(), not Forge::gen().
Closes the TTS schema layer. The v0.2 render pipeline auto-runs an
audit chain after each chapter narration:
F5 render → narration_runs (succeeded)
→ ffmpeg chunk into ~30s windows
→ Whisper-large-v3 STT each chunk
→ word-level diff vs source chapter text
→ mismatches → narration_findings (kind=pronunciation|skip|insert)
→ ffmpeg silence/clip detect → narration_findings (kind=glitch)
→ (optional) Gemini Flash audio review pass
→ narration_findings (kind=prosody|tone)
→ unresolved crits trigger automatic re-roll with new seed
Distinct from audit_findings: that table is canon/continuity at the
text layer, populated by the third-Opus canon-audit pass.
narration_findings is audio-quality only, populated by detectors
that consume the rendered WAV.
The 'detector' field captures which model produced the finding so
we can tune thresholds per detector when one over- or under-flags.
cobb's audio agent intuition was right: STT-and-diff catches the
'name came out wrong' case airtight, and a separate audio-native
LLM call catches the subtler 'this sentence sounded weird' cases
Whisper can't see.
TTS layer landed as schema-only — synthesis pipeline ships in v0.2.
Putting the tables in v0.1 means imports already carry the right
shape; we won't need a 'migrate every existing story' pass later.
Decisions locked 2026-05-13:
- Engine: F5-TTS (best 8GB FOSS option, mid-2026 SOTA)
- Default voice source: LJ Speech (Linda Johnson, PD released
specifically for TTS training — airtight for sharing/uploading
generated audio. The 'AI-consent-released' license posture is
the difference between 'should be fine' and 'definitely fine.')
- Variety voices: Hi-Fi TTS speaker IDs (Apache 2.0, same consent
shape). LibriVox is optional but never default.
- Pronunciation overrides DB layer (story-scoped + global) to fix
proper-noun mispronunciation — the actual TTS-quality gap on
Cobb's bar of 'must not wake me up.' Pre-pass with Opus extracts
proper nouns + IPA, operator verifies, table caches forever.
Tables:
- voices — name, license, reference_path/text, sample_rate, default flag
- pronunciation_overrides — story-scoped or global, IPA/arpabet
- narration_runs — TTS audit trail mirroring generation_runs
- stories.preferred_voice_id FK
Unique constraints:
- one default voice (partial index)
- one row per (story, word) override
- one global row per word
Skald is a generic story-writer. The database is the product; the
binary is the tooling. Everything story-specific lives in rows, not
in code. cwho's monorepo + binary-per-role pattern transplanted to
this domain.
What this commit ships:
- Cargo workspace (resolver=3, edition 2024): skald-core (lib) +
skald (bin)
- Migration 0001: stories, characters, canon_facts, chapters,
chapter_summaries, passages (vector(1536)), generation_runs,
audit_findings, tags. pgvector + pg_trgm extensions. ivfflat
index deferred until we have data (post-import the first ~1k
passages and add the index).
- skald-core::ingest — markdown parser for the cwho/coast-down shape:
'# Title' → '## Chapter N — date' headings → '# Continuity Bible'
section with character roster (real + fictional sub-sections) +
setting / mystery / historical / liberty / hook sub-sections.
Decomposed into structured rows; original bullet body preserved
in key_facts/body fields for fidelity. 6 unit tests cover the
shape.
- skald-core::db — Postgres connection pool + migration runner.
- skald-core::models — row types via sqlx::FromRow.
- skald binary — clap CLI: 'serve' (http + migrations) and
'import-markdown' (one-shot ingest).
- Dockerfile — multi-stage: rust:1.95-bookworm builder, pgvector/
pgvector:pg17 runtime, tini under PID 1, custom entrypoint.sh
that boots embedded postgres then execs skald serve.
- compose.yml — singleton container, postgres data in volume,
story corpus mounted read-only at /seed.
Decisions locked 2026-05-13:
1. DB in same container 'till we have a real working tool' (cobb)
2. postgres+pgvector (NOT sqlite) — keeps semantic-search story
3. Network-not-socket connection (postgresql://localhost:5432) from
day one so future split is config-only, not code-rewrite
Not yet wired:
- Web UI
- clawdforge calls (gen → cleanup → canon-audit pipeline)
- Embedding pass
- TTS sidecar