engine/tortoise: GPU exclusivity wrapper + kludges notes

Adds the Tortoise-specific tooling that main intentionally omits: - engines/tortoise/exclusive-gpu.sh wraps any command, stops F5 + Kokoro on the GPU, restarts Tortoise to clear stale CUDA contexts, waits for healthz, runs the command, restarts the engines on EXIT trap. Solves the 8GB OOM that took down the first smoke. - engines/tortoise/hacks.md captures the speed reality (~74x real- time slowdown on the 2070 Super at standard preset) and the pronunciation-overrides cross-engine compatibility note. Deploy from this branch when you want Tortoise's tuning. Main's vanilla Tortoise is for the cross-engine reference + future 'we have more VRAM now' cleanup.
2026-05-14 09:42:09 -07:00 · 2026-05-14 09:42:09 -07:00 · 7a96031aa6
commit 7a96031aa6
parent d1631ddffe
2 changed files with 118 additions and 0 deletions
--- a/engines/tortoise/exclusive-gpu.sh
+++ b/engines/tortoise/exclusive-gpu.sh
@ -0,0 +1,47 @@
+#!/bin/bash
+# Tortoise GPU exclusivity wrapper. The 2070 Super (8GB) can't host
+# F5 (~4.5GB) + Kokoro (~2.7GB) + Tortoise (~5GB peak) simultaneously,
+# so we stop the other two engines for the duration of a Tortoise run
+# and restart them after.
+#
+# Usage:
+#   exclusive-gpu.sh <command...>
+#
+# Example:
+#   exclusive-gpu.sh docker exec skald skald narrate --chapter <uuid>
+#
+# Exits with the wrapped command's status. Restarts the engines
+# regardless of success/failure (trap on EXIT).
+set -euo pipefail
+
+STOP_ENGINES=(f5-tts kokoro)
+
+cleanup() {
+    local rc=$?
+    echo "[exclusive-gpu] restarting engines"
+    for engine in "${STOP_ENGINES[@]}"; do
+        docker start "$engine" >/dev/null 2>&1 || \
+            echo "[exclusive-gpu] failed to restart $engine — investigate"
+    done
+    return "$rc"
+}
+trap cleanup EXIT
+
+echo "[exclusive-gpu] stopping engines: ${STOP_ENGINES[*]}"
+for engine in "${STOP_ENGINES[@]}"; do
+    docker stop "$engine" >/dev/null 2>&1 || true
+done
+
+# Restart Tortoise to clean up any cached GPU allocations from the
+# now-stopped engines (their CUDA contexts can linger briefly).
+docker restart tortoise >/dev/null
+echo "[exclusive-gpu] waiting for tortoise healthz..."
+for i in {1..30}; do
+    if curl -sf http://192.168.0.5:7795/healthz | grep -q '"loaded":true'; then
+        break
+    fi
+    sleep 2
+done
+
+echo "[exclusive-gpu] running: $*"
+"$@"
--- a/engines/tortoise/hacks.md
+++ b/engines/tortoise/hacks.md
@ -0,0 +1,71 @@
+# Tortoise engine — kludges branched off main
+
+This branch carries the engine-specific tweaks that don't generalise
+to F5 / Kokoro. Tortoise is the audiobook-quality engine but the
+trade-offs are real and need explicit handling — speed and GPU.
+
+## 1. GPU exclusivity
+
+**File:** `exclusive-gpu.sh`.
+
+The 2070 Super has 8GB. F5 (~4.5GB) + Kokoro (~2.7GB) + Tortoise
+(~5GB peak) sums to ~12GB — over budget. First Tortoise smoke
+caught it: `torch.OutOfMemoryError: ... 9.31 MiB is free`.
+
+Solution: stop the other two engines for the duration of a Tortoise
+run. The script wraps any command, stops `f5-tts` + `kokoro`,
+restarts `tortoise` to clean its CUDA context, waits for healthz,
+runs the wrapped command, then restarts the engines on EXIT trap
+(success or failure).
+
+```bash
+./exclusive-gpu.sh docker exec skald skald narrate --chapter <uuid>
+```
+
+Remove when: GPU upgrade (P40 24GB / 3090 24GB / etc) lets all three
+engines co-reside.
+
+## 2. Speed — slow, batch-only
+
+Tortoise at `standard` preset is **~74x slower than real-time** on
+the 2070 Super (smoke: 6.5s of audio took 478s wall clock). A 33-min
+Chapter 2 render would take ~8 hours. Tortoise is acceptable for
+overnight batched runs but NOT interactive rendering.
+
+Quality presets and their approx wall-clock for a 3000-word chapter:
+- `ultra_fast` — ~1h, noticeable quality drop
+- `fast` — ~2h
+- `standard` — ~6-8h, the recommended bar
+- `high_quality` — ~24h, marginally better than standard
+
+For most use, `standard` is right. Reserve `high_quality` for
+short prologues or named samples.
+
+## 3. Voice mapping format
+
+Tortoise's voice roster (`lj`, `freeman`, `daniel`, etc.) lives
+behind `source='tortoise_tts'` in the `voices` table. Character
+slug → Tortoise voice mapping is independent of the Kokoro mapping
+— a story can have BOTH a Kokoro and Tortoise mapping live in
+parallel, picked at render time via story.preferred_voice_id or
+the --voice flag.
+
+Tortoise voices may sometimes warble or stutter at chunk boundaries
+— the `tortoise.api.TextToSpeech.tts_with_preset` call is per-chunk
+and re-conditions the voice each time. Acceptable for v0.1; future
+work could feed `conditioning_latents` directly for tighter cohesion.
+
+## 4. No respelling overrides for Tortoise (yet)
+
+The `pronunciation_overrides` rows in the DB are seeded with
+lowercase-syllable respellings tuned for Kokoro's misaki tokenizer.
+Tortoise uses a different phonemizer (`g2p_en`) which handles many
+of those proper nouns better natively — but some still mangle.
+
+For now, narrate's substitution applies the same overrides regardless
+of engine, which means Tortoise sees `prip-yat` for "Pripyat" — same
+input, different phonemizer interprets differently. Usually OK but
+audit after each batch.
+
+Future: per-engine override sets, OR an `engine` column on
+pronunciation_overrides.