skald/engines
Kayos 7a96031aa6 engine/tortoise: GPU exclusivity wrapper + kludges notes
Adds the Tortoise-specific tooling that main intentionally omits:

- engines/tortoise/exclusive-gpu.sh wraps any command, stops F5 +
  Kokoro on the GPU, restarts Tortoise to clear stale CUDA contexts,
  waits for healthz, runs the command, restarts the engines on EXIT
  trap. Solves the 8GB OOM that took down the first smoke.

- engines/tortoise/hacks.md captures the speed reality (~74x real-
  time slowdown on the 2070 Super at standard preset) and the
  pronunciation-overrides cross-engine compatibility note.

Deploy from this branch when you want Tortoise's tuning. Main's
vanilla Tortoise is for the cross-engine reference + future
'we have more VRAM now' cleanup.
2026-05-14 09:42:09 -07:00
..
f5-tts engines: import f5-tts + kokoro + tortoise sidecars into the tree 2026-05-14 09:40:01 -07:00
kokoro engines: import f5-tts + kokoro + tortoise sidecars into the tree 2026-05-14 09:40:01 -07:00
tortoise engine/tortoise: GPU exclusivity wrapper + kludges notes 2026-05-14 09:42:09 -07:00
README.md engines: import f5-tts + kokoro + tortoise sidecars into the tree 2026-05-14 09:40:01 -07:00

Skald TTS engines

This subtree holds the per-engine sidecars that skald's narrate path talks to over HTTP. Each engine has the same contract:

  • POST /synthesize — same JSON shape across engines so skald's one Rust client (skald-core::narrate::Narrator) deserializes all of them. See engines/<name>/server.py for the per-engine implementation.
  • GET /healthz — boot probe + model-loaded flag.

Skald routes per-request by voices.source: a kokoro_* source goes to $KOKORO_URL, a tortoise_* source goes to $TORTOISE_URL, anything else (lj_speech, generic) goes to $F5_TTS_URL.

Engines

Dir Engine License (code/weights) VRAM Speed Voices
f5-tts/ SWivid F5-TTS v1 MIT / CC-BY-NC ~5GB fast (~2x real-time on 2070S) voice cloning (LJ Speech reference shipped)
kokoro/ hexgrad Kokoro-82M Apache 2.0 / Apache 2.0 ~1GB very fast (~50x real-time) 50+ named presets (af_, am_, bf_, bm_)
tortoise/ neonbjb Tortoise-TTS Apache 2.0 / Apache 2.0 ~5GB slow (~0.014x real-time, ~74s/s of audio on 2070S, standard preset) 26 named built-ins (lj, freeman, daniel, weaver, jlaw, etc.)

Branch model

main carries the vanilla version of each engine — what you'd get from a clean pip install <engine> plus the FastAPI sidecar

  • control-tag splitter. No engine-specific kludges. Safe to look at without context.

engine/<name> branches hold engine-tuned tweaks that don't generalise. Examples:

  • engine/kokoro — doubled-?? prosody hack for the 82M's weak question intonation, paragraph/scene/breath gap durations tuned for af_heart's pacing, notes on how respellings need to be all- lowercase to avoid letter-by-letter spell-out by misaki.
  • engine/tortoise — GPU exclusivity coordinator (stops F5 + Kokoro before a Tortoise run since the 2070 Super can't host all three at once), preset choice ergonomics, character→tortoise- voice seed assignments.

When deploying an engine to Lucy, the build dir at /mnt/cache/appdata/<engine>/build/ tracks the engine's branch:

cd /mnt/cache/appdata/kokoro/build
git fetch && git checkout engine/kokoro
docker compose -p <name> up -d --build

GPU coordination (2070 Super)

The 8GB card is the bottleneck. F5 + Kokoro can co-reside (~5GB + ~1GB). Tortoise pushes the budget over and needs the GPU largely to itself — the engine/tortoise branch will carry the script that stops kokoro + f5 before a tortoise run and restarts them after. Replace with proper coordination once we have more VRAM.