engine/kokoro: question doubling + kludges notes

Re-applies the Kokoro-specific hacks that main intentionally
omits:

- _emphasize_questions doubles '?' to '??' so the 82M's flat
  interrogative prosody gets a rising-pitch cue
- engines/kokoro/hacks.md documents this and the other Kokoro-
  tuned bits (gap durations, lowercase-only respellings) with the
  'remove when we move to a bigger model' marker

Deploy from this branch to /srv/appdata/kokoro/build/ when
you want the tuned version. Main's vanilla Kokoro is for
reference / future cleanup.
This commit is contained in:
Sulkta 2026-05-14 09:40:59 -07:00
parent 01ec9ffd0e
commit b5de9776a2

View file

@ -113,12 +113,23 @@ def _parse_tag(match: re.Match) -> float:
return dur / 1000.0 if unit == "ms" else dur
# [HACK — engine/kokoro] Kokoro-82M has weak question prosody on a
# single `?`. Doubling the question mark to `??` reliably triggers a
# more interrogative rising-pitch contour without changing semantics.
# Skip if already doubled or part of an interrobang. See hacks.md.
_QUESTION_RE = re.compile(r"(?<![?!])\?(?!\?)")
def _emphasize_questions(text: str) -> str:
return _QUESTION_RE.sub("??", text)
def _expand_inline(text: str, voice: str | None) -> list[Node]:
"""Expand inline [breath]/[pause]/[scene] tags inside a chunk
of text that already has a single voice attribution. Voice
blocks themselves are handled one level up in split_to_nodes."""
out: list[Node] = []
text = text.strip()
text = _emphasize_questions(text.strip())
if not text:
return out
cursor = 0