skald/engines/tortoise
Kayos 9df378f799 engine/tortoise: sentence chunking + device fix + pitch/rate modulation
Catches up engines/tortoise/server.py with what's been deployed on
Lucy through tonight's smoke iterations:

0.2 — _chunk_for_tortoise splits text nodes at sentence boundaries
      (max 220 chars) before each tts_with_preset call. Fixes the
      end-of-prompt gibberish past tortoise's ~20s reliable horizon.

0.3 — _get_voice now .to(DEVICE) cached samples + latents. Without
      this, non-lj voices crash with 'Expected all tensors to be on
      the same device, but found cpu and cuda:0'.

0.4 — [voice:NAME pitch=N rate=R][/voice] tag syntax. librosa
      pitch_shift + time_stretch applied per-chunk for single-voice
      multi-character renders. The strategy survived the design
      table — but the librosa phase-vocoder artifacts at ±5 semitones
      ate the quality on the 2070 Super. Parked here for the GPU
      rebuild; modulation works architecturally, just needs better
      stretching algorithm (rubberband) + more headroom.

Production stayed Kokoro. Coast-Down preferred_voice_id reverted
to kokoro_af_heart in the live DB after this experiment.
2026-05-14 19:08:43 -07:00
..
compose.yml engines: import f5-tts + kokoro + tortoise sidecars into the tree 2026-05-14 09:40:01 -07:00
Dockerfile engines: import f5-tts + kokoro + tortoise sidecars into the tree 2026-05-14 09:40:01 -07:00
exclusive-gpu.sh engine/tortoise: GPU exclusivity wrapper + kludges notes 2026-05-14 09:42:09 -07:00
hacks.md engine/tortoise: GPU exclusivity wrapper + kludges notes 2026-05-14 09:42:09 -07:00
server.py engine/tortoise: sentence chunking + device fix + pitch/rate modulation 2026-05-14 19:08:43 -07:00