Catches up engines/tortoise/server.py with what's been deployed on
Lucy through tonight's smoke iterations:
0.2 — _chunk_for_tortoise splits text nodes at sentence boundaries
(max 220 chars) before each tts_with_preset call. Fixes the
end-of-prompt gibberish past tortoise's ~20s reliable horizon.
0.3 — _get_voice now .to(DEVICE) cached samples + latents. Without
this, non-lj voices crash with 'Expected all tensors to be on
the same device, but found cpu and cuda:0'.
0.4 — [voice:NAME pitch=N rate=R][/voice] tag syntax. librosa
pitch_shift + time_stretch applied per-chunk for single-voice
multi-character renders. The strategy survived the design
table — but the librosa phase-vocoder artifacts at ±5 semitones
ate the quality on the 2070 Super. Parked here for the GPU
rebuild; modulation works architecturally, just needs better
stretching algorithm (rubberband) + more headroom.
Production stayed Kokoro. Coast-Down preferred_voice_id reverted
to kokoro_af_heart in the live DB after this experiment.
|
||
|---|---|---|
| .. | ||
| compose.yml | ||
| Dockerfile | ||
| exclusive-gpu.sh | ||
| hacks.md | ||
| server.py | ||