skald

Sulkta-OSS/skald

Fork 0

Commit graph

Author	SHA1	Message	Date
Kayos	9df378f799	engine/tortoise: sentence chunking + device fix + pitch/rate modulation Catches up engines/tortoise/server.py with what's been deployed on Lucy through tonight's smoke iterations: 0.2 — _chunk_for_tortoise splits text nodes at sentence boundaries (max 220 chars) before each tts_with_preset call. Fixes the end-of-prompt gibberish past tortoise's ~20s reliable horizon. 0.3 — _get_voice now .to(DEVICE) cached samples + latents. Without this, non-lj voices crash with 'Expected all tensors to be on the same device, but found cpu and cuda:0'. 0.4 — [voice:NAME pitch=N rate=R][/voice] tag syntax. librosa pitch_shift + time_stretch applied per-chunk for single-voice multi-character renders. The strategy survived the design table — but the librosa phase-vocoder artifacts at ±5 semitones ate the quality on the 2070 Super. Parked here for the GPU rebuild; modulation works architecturally, just needs better stretching algorithm (rubberband) + more headroom. Production stayed Kokoro. Coast-Down preferred_voice_id reverted to kokoro_af_heart in the live DB after this experiment.	2026-05-14 19:08:43 -07:00
Kayos	d1631ddffe	engines: import f5-tts + kokoro + tortoise sidecars into the tree The python FastAPI sidecars have lived ad-hoc at /mnt/cache/appdata/ <engine>/build/ on Lucy without version control. Bringing them into the skald repo so the engine code travels with the cross-engine routing it depends on. This commit lands the VANILLA version of each engine on main: engines/f5-tts/ SWivid F5-TTS (CC-BY-NC weights flagged) engines/kokoro/ hexgrad Kokoro-82M (Apache 2.0 top to bottom) engines/tortoise/ neonbjb Tortoise-TTS (Apache 2.0 top to bottom) Engine-specific kludges (question doubling, GPU coordination, pause-duration tuning) get layered on engine/* branches per the README. Main stays the safe-to-read baseline.	2026-05-14 09:40:01 -07:00

Author

SHA1

Message

Date

Kayos

9df378f799

engine/tortoise: sentence chunking + device fix + pitch/rate modulation

Catches up engines/tortoise/server.py with what's been deployed on
Lucy through tonight's smoke iterations:

0.2 — _chunk_for_tortoise splits text nodes at sentence boundaries
      (max 220 chars) before each tts_with_preset call. Fixes the
      end-of-prompt gibberish past tortoise's ~20s reliable horizon.

0.3 — _get_voice now .to(DEVICE) cached samples + latents. Without
      this, non-lj voices crash with 'Expected all tensors to be on
      the same device, but found cpu and cuda:0'.

0.4 — [voice:NAME pitch=N rate=R][/voice] tag syntax. librosa
      pitch_shift + time_stretch applied per-chunk for single-voice
      multi-character renders. The strategy survived the design
      table — but the librosa phase-vocoder artifacts at ±5 semitones
      ate the quality on the 2070 Super. Parked here for the GPU
      rebuild; modulation works architecturally, just needs better
      stretching algorithm (rubberband) + more headroom.

Production stayed Kokoro. Coast-Down preferred_voice_id reverted
to kokoro_af_heart in the live DB after this experiment.

2026-05-14 19:08:43 -07:00

Kayos

d1631ddffe

engines: import f5-tts + kokoro + tortoise sidecars into the tree

The python FastAPI sidecars have lived ad-hoc at /mnt/cache/appdata/
<engine>/build/ on Lucy without version control. Bringing them into
the skald repo so the engine code travels with the cross-engine
routing it depends on.

This commit lands the VANILLA version of each engine on main:

  engines/f5-tts/    SWivid F5-TTS (CC-BY-NC weights flagged)
  engines/kokoro/    hexgrad Kokoro-82M (Apache 2.0 top to bottom)
  engines/tortoise/  neonbjb Tortoise-TTS (Apache 2.0 top to bottom)

Engine-specific kludges (question doubling, GPU coordination,
pause-duration tuning) get layered on engine/* branches per the
README. Main stays the safe-to-read baseline.

2026-05-14 09:40:01 -07:00

2 commits