Adds the Tortoise-specific tooling that main intentionally omits: - engines/tortoise/exclusive-gpu.sh wraps any command, stops F5 + Kokoro on the GPU, restarts Tortoise to clear stale CUDA contexts, waits for healthz, runs the command, restarts the engines on EXIT trap. Solves the 8GB OOM that took down the first smoke. - engines/tortoise/hacks.md captures the speed reality (~74x real- time slowdown on the 2070 Super at standard preset) and the pronunciation-overrides cross-engine compatibility note. Deploy from this branch when you want Tortoise's tuning. Main's vanilla Tortoise is for the cross-engine reference + future 'we have more VRAM now' cleanup. |
||
|---|---|---|
| .. | ||
| f5-tts | ||
| kokoro | ||
| tortoise | ||
| README.md | ||
Skald TTS engines
This subtree holds the per-engine sidecars that skald's narrate path talks to over HTTP. Each engine has the same contract:
POST /synthesize— same JSON shape across engines so skald's one Rust client (skald-core::narrate::Narrator) deserializes all of them. Seeengines/<name>/server.pyfor the per-engine implementation.GET /healthz— boot probe + model-loaded flag.
Skald routes per-request by voices.source: a kokoro_* source
goes to $KOKORO_URL, a tortoise_* source goes to $TORTOISE_URL,
anything else (lj_speech, generic) goes to $F5_TTS_URL.
Engines
| Dir | Engine | License (code/weights) | VRAM | Speed | Voices |
|---|---|---|---|---|---|
f5-tts/ |
SWivid F5-TTS v1 | MIT / CC-BY-NC | ~5GB | fast (~2x real-time on 2070S) | voice cloning (LJ Speech reference shipped) |
kokoro/ |
hexgrad Kokoro-82M | Apache 2.0 / Apache 2.0 | ~1GB | very fast (~50x real-time) | 50+ named presets (af_, am_, bf_, bm_) |
tortoise/ |
neonbjb Tortoise-TTS | Apache 2.0 / Apache 2.0 | ~5GB | slow (~0.014x real-time, ~74s/s of audio on 2070S, standard preset) | 26 named built-ins (lj, freeman, daniel, weaver, jlaw, etc.) |
Branch model
main carries the vanilla version of each engine — what you'd
get from a clean pip install <engine> plus the FastAPI sidecar
- control-tag splitter. No engine-specific kludges. Safe to look at without context.
engine/<name> branches hold engine-tuned tweaks that don't
generalise. Examples:
engine/kokoro— doubled-??prosody hack for the 82M's weak question intonation, paragraph/scene/breath gap durations tuned for af_heart's pacing, notes on how respellings need to be all- lowercase to avoid letter-by-letter spell-out by misaki.engine/tortoise— GPU exclusivity coordinator (stops F5 + Kokoro before a Tortoise run since the 2070 Super can't host all three at once), preset choice ergonomics, character→tortoise- voice seed assignments.
When deploying an engine to Lucy, the build dir at
/mnt/cache/appdata/<engine>/build/ tracks the engine's branch:
cd /mnt/cache/appdata/kokoro/build
git fetch && git checkout engine/kokoro
docker compose -p <name> up -d --build
GPU coordination (2070 Super)
The 8GB card is the bottleneck. F5 + Kokoro can co-reside (~5GB +
~1GB). Tortoise pushes the budget over and needs the GPU largely
to itself — the engine/tortoise branch will carry the script
that stops kokoro + f5 before a tortoise run and restarts them
after. Replace with proper coordination once we have more VRAM.