diff --git a/engines/tortoise/exclusive-gpu.sh b/engines/tortoise/exclusive-gpu.sh new file mode 100755 index 0000000..fb2b0a4 --- /dev/null +++ b/engines/tortoise/exclusive-gpu.sh @@ -0,0 +1,47 @@ +#!/bin/bash +# Tortoise GPU exclusivity wrapper. The 2070 Super (8GB) can't host +# F5 (~4.5GB) + Kokoro (~2.7GB) + Tortoise (~5GB peak) simultaneously, +# so we stop the other two engines for the duration of a Tortoise run +# and restart them after. +# +# Usage: +# exclusive-gpu.sh +# +# Example: +# exclusive-gpu.sh docker exec skald skald narrate --chapter +# +# Exits with the wrapped command's status. Restarts the engines +# regardless of success/failure (trap on EXIT). +set -euo pipefail + +STOP_ENGINES=(f5-tts kokoro) + +cleanup() { + local rc=$? + echo "[exclusive-gpu] restarting engines" + for engine in "${STOP_ENGINES[@]}"; do + docker start "$engine" >/dev/null 2>&1 || \ + echo "[exclusive-gpu] failed to restart $engine — investigate" + done + return "$rc" +} +trap cleanup EXIT + +echo "[exclusive-gpu] stopping engines: ${STOP_ENGINES[*]}" +for engine in "${STOP_ENGINES[@]}"; do + docker stop "$engine" >/dev/null 2>&1 || true +done + +# Restart Tortoise to clean up any cached GPU allocations from the +# now-stopped engines (their CUDA contexts can linger briefly). +docker restart tortoise >/dev/null +echo "[exclusive-gpu] waiting for tortoise healthz..." +for i in {1..30}; do + if curl -sf http://192.168.0.5:7795/healthz | grep -q '"loaded":true'; then + break + fi + sleep 2 +done + +echo "[exclusive-gpu] running: $*" +"$@" diff --git a/engines/tortoise/hacks.md b/engines/tortoise/hacks.md new file mode 100644 index 0000000..ee335b4 --- /dev/null +++ b/engines/tortoise/hacks.md @@ -0,0 +1,71 @@ +# Tortoise engine — kludges branched off main + +This branch carries the engine-specific tweaks that don't generalise +to F5 / Kokoro. Tortoise is the audiobook-quality engine but the +trade-offs are real and need explicit handling — speed and GPU. + +## 1. GPU exclusivity + +**File:** `exclusive-gpu.sh`. + +The 2070 Super has 8GB. F5 (~4.5GB) + Kokoro (~2.7GB) + Tortoise +(~5GB peak) sums to ~12GB — over budget. First Tortoise smoke +caught it: `torch.OutOfMemoryError: ... 9.31 MiB is free`. + +Solution: stop the other two engines for the duration of a Tortoise +run. The script wraps any command, stops `f5-tts` + `kokoro`, +restarts `tortoise` to clean its CUDA context, waits for healthz, +runs the wrapped command, then restarts the engines on EXIT trap +(success or failure). + +```bash +./exclusive-gpu.sh docker exec skald skald narrate --chapter +``` + +Remove when: GPU upgrade (P40 24GB / 3090 24GB / etc) lets all three +engines co-reside. + +## 2. Speed — slow, batch-only + +Tortoise at `standard` preset is **~74x slower than real-time** on +the 2070 Super (smoke: 6.5s of audio took 478s wall clock). A 33-min +Chapter 2 render would take ~8 hours. Tortoise is acceptable for +overnight batched runs but NOT interactive rendering. + +Quality presets and their approx wall-clock for a 3000-word chapter: +- `ultra_fast` — ~1h, noticeable quality drop +- `fast` — ~2h +- `standard` — ~6-8h, the recommended bar +- `high_quality` — ~24h, marginally better than standard + +For most use, `standard` is right. Reserve `high_quality` for +short prologues or named samples. + +## 3. Voice mapping format + +Tortoise's voice roster (`lj`, `freeman`, `daniel`, etc.) lives +behind `source='tortoise_tts'` in the `voices` table. Character +slug → Tortoise voice mapping is independent of the Kokoro mapping +— a story can have BOTH a Kokoro and Tortoise mapping live in +parallel, picked at render time via story.preferred_voice_id or +the --voice flag. + +Tortoise voices may sometimes warble or stutter at chunk boundaries +— the `tortoise.api.TextToSpeech.tts_with_preset` call is per-chunk +and re-conditions the voice each time. Acceptable for v0.1; future +work could feed `conditioning_latents` directly for tighter cohesion. + +## 4. No respelling overrides for Tortoise (yet) + +The `pronunciation_overrides` rows in the DB are seeded with +lowercase-syllable respellings tuned for Kokoro's misaki tokenizer. +Tortoise uses a different phonemizer (`g2p_en`) which handles many +of those proper nouns better natively — but some still mangle. + +For now, narrate's substitution applies the same overrides regardless +of engine, which means Tortoise sees `prip-yat` for "Pripyat" — same +input, different phonemizer interprets differently. Usually OK but +audit after each batch. + +Future: per-engine override sets, OR an `engine` column on +pronunciation_overrides.