Public-flip audit: env-driven paths, scrub audit-ticket prefixes, terser README

Lucy bind paths + LAN host pins replaced with env defaults. Repository URLs
→ git.sulkta.com. Audit-changelog scaffolding stripped from inline comments
(technical reasoning preserved). README sheds marketing scaffolding. AI-speak
in load-bearing prompts/SOULs left alone — that IS the product.
This commit is contained in:
Cobb Hayes 2026-05-27 11:42:58 -07:00
parent 4402c53979
commit 346cea515d
21 changed files with 325 additions and 474 deletions

130
README.md
View file

@ -1,95 +1,79 @@
# skald
Long-form story-writer with canon-keeping, sequel continuity, and
(future) self-hosted audiobook narration. Database is the source of
truth — the writer is the tooling.
self-hosted audiobook narration. The database is the source of truth;
the binary is the tooling.
Named for the Old Norse poets who composed and memorized kings'
sagas across generations.
Named for the Old Norse poets who composed and memorized kings' sagas
across generations.
## Status: v0.1 — scaffold
What's wired:
## What's wired
- Rust workspace (`skald-core` + `skald`)
- Postgres schema for stories, characters, canon facts, chapters,
passages, generation runs, audit findings, tags
- pgvector extension installed for future similarity search
- pgvector for future similarity search
- `skald import-markdown` ingests a story file (chapters + bible)
into the schema
- `skald serve` exposes `/health` and runs migrations on boot
- Single-container deploy: postgres + skald in one image
Wired (this commit):
- clawdforge Rust SDK vendored at `vendor/clawdforge/` (upstream:
`Sulkta-Coop/clawdforge` `clients/rust/`)
- `skald-core::forge` — three-pass orchestration shell (gen / cleanup /
audit). Prompts are TODO stubs; pipeline plumbing is in place.
Not yet wired:
- Web UI (the inbox + browse + queue surface)
- Prompt templates for the three passes (heavy prompt-engineering
work — own session)
- `skald-core::context` — assemble the LLM context blob from DB rows
(bible + characters + parent prose summaries + similarity-matched
passages)
- Embeddings backfill + ivfflat index
- TTS sidecar container + post-render audit chain (see
`docs/tts-pipeline.md`)
## v0.1 smoke
```bash
docker compose -p skald up -d
docker exec skald skald import-markdown \
--path /seed/coast-down.md \
--title "The Coast-Down"
curl http://lucy:7780/health
# → { ok: true, db_ok: true, story_count: 1, ... }
```
- `skald serve` exposes `/health` + the web inspector and runs
migrations on boot
- `skald continue` runs gen → cleanup → audit per chapter, with
multi-chapter batching (cap 20)
- `skald rewrite` re-authors a chapter in a named author's voice
- `skald audit` runs whole-story prose-quality audit; `skald dedup`
is the surgical fix half of the loop
- `skald prepare-narration` annotates a chapter with `[breath]` /
`[pause:Xs]` / `[scene]` beats and per-character `[voice:...]`
tags
- `skald narrate` renders a chapter to audio via one of three TTS
engines (F5-TTS, Kokoro-82M, Tortoise-TTS) — see `engines/`
- Named-author "soul" personas via `skald authors seed`; author
voice replaces the model's base system prompt for gen/cleanup/
rewrite/dedup/narrate_prep
## Schema (cheat sheet)
```
stories → meta + status + parent/root for series
characters → real or fictional, story-scoped
canon_facts → setting, mystery, theme, rule, historical_anchor, hook
chapters → full prose body
chapter_summaries → short summaries for cheap context loading
passages → paragraph-level + embedding vector(1536)
generation_runs → every LLM call logged
audit_findings → canon audit output (severity + area)
tags → arbitrary labels
stories meta + status + parent/root for series
authors persona identity (slug, display_name, model)
author_revisions versioned souls; one current per author
characters real or fictional, story-scoped, voice-mappable
canon_facts setting, mystery, theme, rule, historical_anchor, hook
chapters full prose body + optional body_md_tts annotation
chapter_summaries short summaries for cheap context loading
passages paragraph-level + embedding vector(1536)
voices TTS voice rows (F5 ref clips / Kokoro / Tortoise names)
pronunciation_overrides per-story + global respellings for proper nouns
generation_runs every LLM call logged
audit_findings audit pass output (severity + area)
narration_runs per-chapter TTS renders
```
## Architecture (v0.1 + the plan)
## Quickstart
```
┌─────────────────────────────────┐
│ skald container │
│ ┌───────────┐ ┌────────────┐ │
│ │ postgres │ │ skald-rust │ │
│ │ pgvector │←─│ axum + cli │ │
│ │ localhost │ │ :7780 │ │
│ └───────────┘ └─────┬──────┘ │
└─────────────────────────┼────────┘
│ HTTP (future)
┌──────────┐
│clawdforge│
└─────┬────┘
opus calls
```sh
docker compose up -d
docker exec skald skald import-markdown --path /seed/<story>.md \
--title "<title>"
curl http://localhost:7780/health
```
v1.0+: extract postgres to its own container on db-net. skald
becomes pure stateless rust, connects via `DATABASE_URL`. Migration
is a connection-string change + a network move; the binary doesn't
care where the DB lives.
The compose file expects `POSTGRES_PASSWORD` and (optionally)
`CLAWDFORGE_URL` + `CLAWDFORGE_TOKEN` in `.env`. Story markdown
goes into `./seed/`; postgres data persists in `./pgdata/`.
## Architecture
v0.1 ships postgres inside the skald container — singleton until
the tool stabilises. To extract postgres later, swap the runtime
base to `debian:bookworm-slim`, drop `entrypoint.sh`, and point
`DATABASE_URL` at the external pg. The binary doesn't care where
the DB lives.
The generation passes call out to `clawdforge` (a bearer-token-gated
HTTP wrapper around `claude -p`). The Rust client is vendored at
`vendor/clawdforge/`. TTS calls go HTTP+JSON to the per-engine
sidecars under `engines/`.
## License
MIT.
MIT — see `LICENSE`.