Catches up engines/tortoise/server.py with what's been deployed on
Lucy through tonight's smoke iterations:
0.2 — _chunk_for_tortoise splits text nodes at sentence boundaries
(max 220 chars) before each tts_with_preset call. Fixes the
end-of-prompt gibberish past tortoise's ~20s reliable horizon.
0.3 — _get_voice now .to(DEVICE) cached samples + latents. Without
this, non-lj voices crash with 'Expected all tensors to be on
the same device, but found cpu and cuda:0'.
0.4 — [voice:NAME pitch=N rate=R][/voice] tag syntax. librosa
pitch_shift + time_stretch applied per-chunk for single-voice
multi-character renders. The strategy survived the design
table — but the librosa phase-vocoder artifacts at ±5 semitones
ate the quality on the 2070 Super. Parked here for the GPU
rebuild; modulation works architecturally, just needs better
stretching algorithm (rubberband) + more headroom.
Production stayed Kokoro. Coast-Down preferred_voice_id reverted
to kokoro_af_heart in the live DB after this experiment.
The python FastAPI sidecars have lived ad-hoc at /mnt/cache/appdata/
<engine>/build/ on Lucy without version control. Bringing them into
the skald repo so the engine code travels with the cross-engine
routing it depends on.
This commit lands the VANILLA version of each engine on main:
engines/f5-tts/ SWivid F5-TTS (CC-BY-NC weights flagged)
engines/kokoro/ hexgrad Kokoro-82M (Apache 2.0 top to bottom)
engines/tortoise/ neonbjb Tortoise-TTS (Apache 2.0 top to bottom)
Engine-specific kludges (question doubling, GPU coordination,
pause-duration tuning) get layered on engine/* branches per the
README. Main stays the safe-to-read baseline.