skald/engines/README.md
Cobb Hayes 346cea515d Public-flip audit: env-driven paths, scrub audit-ticket prefixes, terser README
Lucy bind paths + LAN host pins replaced with env defaults. Repository URLs
→ git.sulkta.com. Audit-changelog scaffolding stripped from inline comments
(technical reasoning preserved). README sheds marketing scaffolding. AI-speak
in load-bearing prompts/SOULs left alone — that IS the product.
2026-05-27 11:42:58 -07:00

2.5 KiB

Skald TTS engines

This subtree holds the per-engine sidecars that skald's narrate path talks to over HTTP. Each engine has the same contract:

  • POST /synthesize — same JSON shape across engines so skald's one Rust client (skald-core::narrate::Narrator) deserializes all of them. See engines/<name>/server.py for the per-engine implementation.
  • GET /healthz — boot probe + model-loaded flag.

Skald routes per-request by voices.source: a kokoro_* source goes to $KOKORO_URL, a tortoise_* source goes to $TORTOISE_URL, anything else (lj_speech, generic) goes to $F5_TTS_URL.

Engines

Dir Engine License (code/weights) VRAM Speed Voices
f5-tts/ SWivid F5-TTS v1 MIT / CC-BY-NC ~5GB fast (~2x real-time on 2070S) voice cloning (LJ Speech reference shipped)
kokoro/ hexgrad Kokoro-82M Apache 2.0 / Apache 2.0 ~1GB very fast (~50x real-time) 50+ named presets (af_, am_, bf_, bm_)
tortoise/ neonbjb Tortoise-TTS Apache 2.0 / Apache 2.0 ~5GB slow (~0.014x real-time, ~74s/s of audio on 2070S, standard preset) 26 named built-ins (lj, freeman, daniel, weaver, jlaw, etc.)

Branch model

main carries the vanilla version of each engine — what you'd get from a clean pip install <engine> plus the FastAPI sidecar

  • control-tag splitter. No engine-specific kludges. Safe to look at without context.

engine/<name> branches hold engine-tuned tweaks that don't generalise. Examples:

  • engine/kokoro — doubled-?? prosody hack for the 82M's weak question intonation, paragraph/scene/breath gap durations tuned for af_heart's pacing, notes on how respellings need to be all- lowercase to avoid letter-by-letter spell-out by misaki.
  • engine/tortoise — GPU exclusivity coordinator (stops F5 + Kokoro before a Tortoise run since the 2070 Super can't host all three at once), preset choice ergonomics, character→tortoise- voice seed assignments.

To deploy a tuned engine, check out the engine's branch in the build dir and docker compose up -d --build:

git fetch && git checkout engine/kokoro
docker compose up -d --build

GPU coordination

On an 8GB card F5 + Kokoro can co-reside (~5GB + ~1GB). Tortoise pushes the budget over and needs the GPU largely to itself — the engine/tortoise branch carries a script to stop kokoro + f5 before a Tortoise run and restart them after. Replace with proper coordination once more VRAM is available.