skald/README.md

1.4 KiB

skald — engine/kokoro variant

This branch is the Kokoro-82M TTS backend variant of skald. It carries engine-specific tuning for Kokoro that doesn't generalise to the other backends; everything else tracks main.

For the full project — the story-writer, the schema, the narration pipeline — see main and the root README.md.

What's different here

Kokoro-82M is the fast, audiobook-quality narrator. At 82M parameters it's tiny and quick (~50x real-time on a modest GPU) but has a couple of rough edges this branch works around in engines/kokoro/server.py:

  • Question prosody — single ? reads flat, so interrogatives get a stronger rising contour applied at synth time.
  • Pacing gaps — paragraph / scene / breath gap durations tuned for long-form prose narration.
  • Pronunciation respellings — Kokoro's phonemizer treats consecutive capitals as initialisms, so proper-noun respellings are seeded lowercase-syllabified.

Usage

The Kokoro sidecar speaks the same POST /synthesize + GET /healthz contract as the other engines (see engines/README.md). Point skald's KOKORO_URL at it and route kokoro_* voices to it.

docker compose up -d            # skald + postgres
# bring up the Kokoro sidecar from engines/kokoro/

License

AGPL-3.0-or-later — see LICENSE.