engines: import f5-tts + kokoro + tortoise sidecars into the tree
The python FastAPI sidecars have lived ad-hoc at /mnt/cache/appdata/ <engine>/build/ on Lucy without version control. Bringing them into the skald repo so the engine code travels with the cross-engine routing it depends on. This commit lands the VANILLA version of each engine on main: engines/f5-tts/ SWivid F5-TTS (CC-BY-NC weights flagged) engines/kokoro/ hexgrad Kokoro-82M (Apache 2.0 top to bottom) engines/tortoise/ neonbjb Tortoise-TTS (Apache 2.0 top to bottom) Engine-specific kludges (question doubling, GPU coordination, pause-duration tuning) get layered on engine/* branches per the README. Main stays the safe-to-read baseline.
This commit is contained in:
parent
1c3fc11484
commit
d1631ddffe
10 changed files with 1115 additions and 0 deletions
58
engines/README.md
Normal file
58
engines/README.md
Normal file
|
|
@ -0,0 +1,58 @@
|
|||
# Skald TTS engines
|
||||
|
||||
This subtree holds the per-engine sidecars that skald's narrate path
|
||||
talks to over HTTP. Each engine has the same contract:
|
||||
|
||||
- `POST /synthesize` — same JSON shape across engines so skald's
|
||||
one Rust client (`skald-core::narrate::Narrator`) deserializes
|
||||
all of them. See `engines/<name>/server.py` for the per-engine
|
||||
implementation.
|
||||
- `GET /healthz` — boot probe + model-loaded flag.
|
||||
|
||||
Skald routes per-request by `voices.source`: a `kokoro_*` source
|
||||
goes to `$KOKORO_URL`, a `tortoise_*` source goes to `$TORTOISE_URL`,
|
||||
anything else (`lj_speech`, generic) goes to `$F5_TTS_URL`.
|
||||
|
||||
## Engines
|
||||
|
||||
| Dir | Engine | License (code/weights) | VRAM | Speed | Voices |
|
||||
|---|---|---|---|---|---|
|
||||
| `f5-tts/` | SWivid F5-TTS v1 | MIT / **CC-BY-NC** | ~5GB | fast (~2x real-time on 2070S) | voice cloning (LJ Speech reference shipped) |
|
||||
| `kokoro/` | hexgrad Kokoro-82M | Apache 2.0 / Apache 2.0 | ~1GB | very fast (~50x real-time) | 50+ named presets (af_*, am_*, bf_*, bm_*) |
|
||||
| `tortoise/` | neonbjb Tortoise-TTS | Apache 2.0 / Apache 2.0 | ~5GB | **slow** (~0.014x real-time, ~74s/s of audio on 2070S, standard preset) | 26 named built-ins (lj, freeman, daniel, weaver, jlaw, etc.) |
|
||||
|
||||
## Branch model
|
||||
|
||||
`main` carries the **vanilla** version of each engine — what you'd
|
||||
get from a clean `pip install <engine>` plus the FastAPI sidecar
|
||||
+ control-tag splitter. No engine-specific kludges. Safe to look
|
||||
at without context.
|
||||
|
||||
`engine/<name>` branches hold engine-tuned tweaks that don't
|
||||
generalise. Examples:
|
||||
|
||||
- `engine/kokoro` — doubled-`??` prosody hack for the 82M's weak
|
||||
question intonation, paragraph/scene/breath gap durations tuned
|
||||
for af_heart's pacing, notes on how respellings need to be all-
|
||||
lowercase to avoid letter-by-letter spell-out by misaki.
|
||||
- `engine/tortoise` — GPU exclusivity coordinator (stops F5 +
|
||||
Kokoro before a Tortoise run since the 2070 Super can't host
|
||||
all three at once), preset choice ergonomics, character→tortoise-
|
||||
voice seed assignments.
|
||||
|
||||
When deploying an engine to Lucy, the build dir at
|
||||
`/mnt/cache/appdata/<engine>/build/` tracks the engine's branch:
|
||||
|
||||
```bash
|
||||
cd /mnt/cache/appdata/kokoro/build
|
||||
git fetch && git checkout engine/kokoro
|
||||
docker compose -p <name> up -d --build
|
||||
```
|
||||
|
||||
## GPU coordination (2070 Super)
|
||||
|
||||
The 8GB card is the bottleneck. F5 + Kokoro can co-reside (~5GB +
|
||||
~1GB). Tortoise pushes the budget over and needs the GPU largely
|
||||
to itself — the `engine/tortoise` branch will carry the script
|
||||
that stops kokoro + f5 before a tortoise run and restarts them
|
||||
after. Replace with proper coordination once we have more VRAM.
|
||||
Loading…
Add table
Add a link
Reference in a new issue