The python FastAPI sidecars have lived ad-hoc at /mnt/cache/appdata/ <engine>/build/ on Lucy without version control. Bringing them into the skald repo so the engine code travels with the cross-engine routing it depends on. This commit lands the VANILLA version of each engine on main: engines/f5-tts/ SWivid F5-TTS (CC-BY-NC weights flagged) engines/kokoro/ hexgrad Kokoro-82M (Apache 2.0 top to bottom) engines/tortoise/ neonbjb Tortoise-TTS (Apache 2.0 top to bottom) Engine-specific kludges (question doubling, GPU coordination, pause-duration tuning) get layered on engine/* branches per the README. Main stays the safe-to-read baseline. |
||
|---|---|---|
| .. | ||
| f5-tts | ||
| kokoro | ||
| tortoise | ||
| README.md | ||
Skald TTS engines
This subtree holds the per-engine sidecars that skald's narrate path talks to over HTTP. Each engine has the same contract:
POST /synthesize— same JSON shape across engines so skald's one Rust client (skald-core::narrate::Narrator) deserializes all of them. Seeengines/<name>/server.pyfor the per-engine implementation.GET /healthz— boot probe + model-loaded flag.
Skald routes per-request by voices.source: a kokoro_* source
goes to $KOKORO_URL, a tortoise_* source goes to $TORTOISE_URL,
anything else (lj_speech, generic) goes to $F5_TTS_URL.
Engines
| Dir | Engine | License (code/weights) | VRAM | Speed | Voices |
|---|---|---|---|---|---|
f5-tts/ |
SWivid F5-TTS v1 | MIT / CC-BY-NC | ~5GB | fast (~2x real-time on 2070S) | voice cloning (LJ Speech reference shipped) |
kokoro/ |
hexgrad Kokoro-82M | Apache 2.0 / Apache 2.0 | ~1GB | very fast (~50x real-time) | 50+ named presets (af_, am_, bf_, bm_) |
tortoise/ |
neonbjb Tortoise-TTS | Apache 2.0 / Apache 2.0 | ~5GB | slow (~0.014x real-time, ~74s/s of audio on 2070S, standard preset) | 26 named built-ins (lj, freeman, daniel, weaver, jlaw, etc.) |
Branch model
main carries the vanilla version of each engine — what you'd
get from a clean pip install <engine> plus the FastAPI sidecar
- control-tag splitter. No engine-specific kludges. Safe to look at without context.
engine/<name> branches hold engine-tuned tweaks that don't
generalise. Examples:
engine/kokoro— doubled-??prosody hack for the 82M's weak question intonation, paragraph/scene/breath gap durations tuned for af_heart's pacing, notes on how respellings need to be all- lowercase to avoid letter-by-letter spell-out by misaki.engine/tortoise— GPU exclusivity coordinator (stops F5 + Kokoro before a Tortoise run since the 2070 Super can't host all three at once), preset choice ergonomics, character→tortoise- voice seed assignments.
When deploying an engine to Lucy, the build dir at
/mnt/cache/appdata/<engine>/build/ tracks the engine's branch:
cd /mnt/cache/appdata/kokoro/build
git fetch && git checkout engine/kokoro
docker compose -p <name> up -d --build
GPU coordination (2070 Super)
The 8GB card is the bottleneck. F5 + Kokoro can co-reside (~5GB +
~1GB). Tortoise pushes the budget over and needs the GPU largely
to itself — the engine/tortoise branch will carry the script
that stops kokoro + f5 before a tortoise run and restarts them
after. Replace with proper coordination once we have more VRAM.