1.4 KiB
1.4 KiB
skald — engine/kokoro variant
This branch is the Kokoro-82M TTS backend variant of
skald. It carries
engine-specific tuning for Kokoro that doesn't generalise to the
other backends; everything else tracks main.
For the full project — the story-writer, the schema, the narration
pipeline — see main and the root README.md.
What's different here
Kokoro-82M is the fast, audiobook-quality narrator. At 82M
parameters it's tiny and quick (~50x real-time on a modest GPU) but
has a couple of rough edges this branch works around in
engines/kokoro/server.py:
- Question prosody — single
?reads flat, so interrogatives get a stronger rising contour applied at synth time. - Pacing gaps — paragraph / scene / breath gap durations tuned for long-form prose narration.
- Pronunciation respellings — Kokoro's phonemizer treats consecutive capitals as initialisms, so proper-noun respellings are seeded lowercase-syllabified.
Usage
The Kokoro sidecar speaks the same POST /synthesize + GET /healthz contract as the other engines (see engines/README.md).
Point skald's KOKORO_URL at it and route kokoro_* voices to it.
docker compose up -d # skald + postgres
# bring up the Kokoro sidecar from engines/kokoro/
License
AGPL-3.0-or-later — see LICENSE.