engine/tortoise: GPU exclusivity wrapper + kludges notes

Adds the Tortoise-specific tooling that main intentionally omits:

- engines/tortoise/exclusive-gpu.sh wraps any command, stops F5 +
  Kokoro on the GPU, restarts Tortoise to clear stale CUDA contexts,
  waits for healthz, runs the command, restarts the engines on EXIT
  trap. Solves the 8GB OOM that took down the first smoke.

- engines/tortoise/hacks.md captures the speed reality (~74x real-
  time slowdown on the 2070 Super at standard preset) and the
  pronunciation-overrides cross-engine compatibility note.

Deploy from this branch when you want Tortoise's tuning. Main's
vanilla Tortoise is for the cross-engine reference + future
'we have more VRAM now' cleanup.
This commit is contained in:
Kayos 2026-05-14 09:42:09 -07:00
parent d1631ddffe
commit 7a96031aa6
2 changed files with 118 additions and 0 deletions

View file

@ -0,0 +1,47 @@
#!/bin/bash
# Tortoise GPU exclusivity wrapper. The 2070 Super (8GB) can't host
# F5 (~4.5GB) + Kokoro (~2.7GB) + Tortoise (~5GB peak) simultaneously,
# so we stop the other two engines for the duration of a Tortoise run
# and restart them after.
#
# Usage:
# exclusive-gpu.sh <command...>
#
# Example:
# exclusive-gpu.sh docker exec skald skald narrate --chapter <uuid>
#
# Exits with the wrapped command's status. Restarts the engines
# regardless of success/failure (trap on EXIT).
set -euo pipefail
STOP_ENGINES=(f5-tts kokoro)
cleanup() {
local rc=$?
echo "[exclusive-gpu] restarting engines"
for engine in "${STOP_ENGINES[@]}"; do
docker start "$engine" >/dev/null 2>&1 || \
echo "[exclusive-gpu] failed to restart $engine — investigate"
done
return "$rc"
}
trap cleanup EXIT
echo "[exclusive-gpu] stopping engines: ${STOP_ENGINES[*]}"
for engine in "${STOP_ENGINES[@]}"; do
docker stop "$engine" >/dev/null 2>&1 || true
done
# Restart Tortoise to clean up any cached GPU allocations from the
# now-stopped engines (their CUDA contexts can linger briefly).
docker restart tortoise >/dev/null
echo "[exclusive-gpu] waiting for tortoise healthz..."
for i in {1..30}; do
if curl -sf http://192.168.0.5:7795/healthz | grep -q '"loaded":true'; then
break
fi
sleep 2
done
echo "[exclusive-gpu] running: $*"
"$@"

71
engines/tortoise/hacks.md Normal file
View file

@ -0,0 +1,71 @@
# Tortoise engine — kludges branched off main
This branch carries the engine-specific tweaks that don't generalise
to F5 / Kokoro. Tortoise is the audiobook-quality engine but the
trade-offs are real and need explicit handling — speed and GPU.
## 1. GPU exclusivity
**File:** `exclusive-gpu.sh`.
The 2070 Super has 8GB. F5 (~4.5GB) + Kokoro (~2.7GB) + Tortoise
(~5GB peak) sums to ~12GB — over budget. First Tortoise smoke
caught it: `torch.OutOfMemoryError: ... 9.31 MiB is free`.
Solution: stop the other two engines for the duration of a Tortoise
run. The script wraps any command, stops `f5-tts` + `kokoro`,
restarts `tortoise` to clean its CUDA context, waits for healthz,
runs the wrapped command, then restarts the engines on EXIT trap
(success or failure).
```bash
./exclusive-gpu.sh docker exec skald skald narrate --chapter <uuid>
```
Remove when: GPU upgrade (P40 24GB / 3090 24GB / etc) lets all three
engines co-reside.
## 2. Speed — slow, batch-only
Tortoise at `standard` preset is **~74x slower than real-time** on
the 2070 Super (smoke: 6.5s of audio took 478s wall clock). A 33-min
Chapter 2 render would take ~8 hours. Tortoise is acceptable for
overnight batched runs but NOT interactive rendering.
Quality presets and their approx wall-clock for a 3000-word chapter:
- `ultra_fast` — ~1h, noticeable quality drop
- `fast` — ~2h
- `standard` — ~6-8h, the recommended bar
- `high_quality` — ~24h, marginally better than standard
For most use, `standard` is right. Reserve `high_quality` for
short prologues or named samples.
## 3. Voice mapping format
Tortoise's voice roster (`lj`, `freeman`, `daniel`, etc.) lives
behind `source='tortoise_tts'` in the `voices` table. Character
slug → Tortoise voice mapping is independent of the Kokoro mapping
— a story can have BOTH a Kokoro and Tortoise mapping live in
parallel, picked at render time via story.preferred_voice_id or
the --voice flag.
Tortoise voices may sometimes warble or stutter at chunk boundaries
— the `tortoise.api.TextToSpeech.tts_with_preset` call is per-chunk
and re-conditions the voice each time. Acceptable for v0.1; future
work could feed `conditioning_latents` directly for tighter cohesion.
## 4. No respelling overrides for Tortoise (yet)
The `pronunciation_overrides` rows in the DB are seeded with
lowercase-syllable respellings tuned for Kokoro's misaki tokenizer.
Tortoise uses a different phonemizer (`g2p_en`) which handles many
of those proper nouns better natively — but some still mangle.
For now, narrate's substitution applies the same overrides regardless
of engine, which means Tortoise sees `prip-yat` for "Pripyat" — same
input, different phonemizer interprets differently. Usually OK but
audit after each batch.
Future: per-engine override sets, OR an `engine` column on
pronunciation_overrides.