engine/kokoro: question doubling + kludges notes
Re-applies the Kokoro-specific hacks that main intentionally omits: - _emphasize_questions doubles '?' to '??' so the 82M's flat interrogative prosody gets a rising-pitch cue - engines/kokoro/hacks.md documents this and the other Kokoro- tuned bits (gap durations, lowercase-only respellings) with the 'remove when we move to a bigger model' marker Deploy from this branch to /mnt/cache/appdata/kokoro/build/ when you want the tuned version. Main's vanilla Kokoro is for reference / future cleanup.
This commit is contained in:
parent
d1631ddffe
commit
36922706a2
2 changed files with 60 additions and 1 deletions
48
engines/kokoro/hacks.md
Normal file
48
engines/kokoro/hacks.md
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
# Kokoro engine — kludges branched off main
|
||||
|
||||
This branch carries the engine-specific tweaks that don't generalise
|
||||
to F5 / Tortoise. Each one is a real workaround for a real Kokoro-82M
|
||||
limitation, not a stylistic choice — when we move to a bigger model
|
||||
these should disappear.
|
||||
|
||||
## 1. Doubled `??` for question prosody
|
||||
|
||||
**File:** `server.py` — `_emphasize_questions` + `_QUESTION_RE`.
|
||||
|
||||
Kokoro-82M's prosody on single `?` is flat — interrogatives read like
|
||||
declaratives. The 82M parameter cap shows up here. Doubling the mark
|
||||
to `??` triggers a noticeably stronger rising-pitch contour.
|
||||
|
||||
Tried + works: `??`. Tried + worse: `?!` (sounds shouty), trailing
|
||||
spaces (no effect).
|
||||
|
||||
Remove when: upgrading to a bigger model OR a Kokoro version with
|
||||
better prosody control.
|
||||
|
||||
## 2. Paragraph / scene / breath gap durations
|
||||
|
||||
**File:** `server.py` — `PARAGRAPH_GAP_S=0.7`, `SCENE_GAP_S=1.5`,
|
||||
`BREATH_GAP_S=0.4`.
|
||||
|
||||
These were eyeballed against af_heart's natural pacing for long-form
|
||||
prose. Other voices (e.g. am_michael's slower delivery) may want
|
||||
shorter gaps; a per-voice override map would be more correct but
|
||||
isn't worth the complexity yet.
|
||||
|
||||
The 2026-05-14 feedback was "some pauses are a tad too long" — the
|
||||
0.7/1.5/0.4 may want to drop to 0.5/1.2/0.3 if confirmed.
|
||||
|
||||
## 3. Pronunciation respellings as ALL-LOWERCASE
|
||||
|
||||
**File:** *(data, not code — pronunciation_overrides DB table)*
|
||||
|
||||
Kokoro's misaki phonemizer treats consecutive uppercase letters as
|
||||
initialisms ("PRIP-yat" → "P-R-I-P yat"). The seeded respellings in
|
||||
`pronunciation_overrides WHERE phoneme_format='respelling'` must
|
||||
therefore use lowercase syllabification: `prip-yat`, `dyat-loff`,
|
||||
`bryu-hah-noff`. Stress marking is lost.
|
||||
|
||||
For tortoise this constraint may not hold (different phonemizer);
|
||||
the respelling format is currently kokoro-tuned. Future: per-engine
|
||||
phoneme_format buckets, or have skald narrate pass the engine name
|
||||
when selecting overrides.
|
||||
|
|
@ -113,12 +113,23 @@ def _parse_tag(match: re.Match) -> float:
|
|||
return dur / 1000.0 if unit == "ms" else dur
|
||||
|
||||
|
||||
# [HACK — engine/kokoro] Kokoro-82M has weak question prosody on a
|
||||
# single `?`. Doubling the question mark to `??` reliably triggers a
|
||||
# more interrogative rising-pitch contour without changing semantics.
|
||||
# Skip if already doubled or part of an interrobang. See hacks.md.
|
||||
_QUESTION_RE = re.compile(r"(?<![?!])\?(?!\?)")
|
||||
|
||||
|
||||
def _emphasize_questions(text: str) -> str:
|
||||
return _QUESTION_RE.sub("??", text)
|
||||
|
||||
|
||||
def _expand_inline(text: str, voice: str | None) -> list[Node]:
|
||||
"""Expand inline [breath]/[pause]/[scene] tags inside a chunk
|
||||
of text that already has a single voice attribution. Voice
|
||||
blocks themselves are handled one level up in split_to_nodes."""
|
||||
out: list[Node] = []
|
||||
text = text.strip()
|
||||
text = _emphasize_questions(text.strip())
|
||||
if not text:
|
||||
return out
|
||||
cursor = 0
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue