multi-voice: per-character dialogue rendering

Schema: characters.voice_id + characters.slug (migration 0007).
voice_id is FK to voices(id); slug is the stable lowercase token
the narrate_prep pass uses inside [voice:slug]...[/voice].

Forge::narrate_prep takes &[CharacterSpeaker]. System prompt
expanded to instruct the author to wrap dialogue lines in voice
tags based on a roster supplied in the user prompt (slug + name +
short hint from key_facts). Unattributed dialogue stays unwrapped
and inherits the narrator voice.

skald narrate substitutes [voice:<character-slug>] →
[voice:<kokoro-voice-name>] right before sending to Kokoro, using
characters.voice_id JOIN voices.reference_path as the map. Slugs
with no voice or no character row fall back to the narrator voice
defensively (logged as warn).

kokoro_server.py v0.4: splitter recognises [voice:X]...[/voice]
blocks at the paragraph level. Each text node carries an optional
voice attribution; renderer feeds it to Kokoro per-segment. Outside
voice blocks the request's default voice is used. voices_used is
reported back so callers can verify multi-voice actually ran.

Only kokoro-routed renders pre-process voice tags; F5 paths leave
the tags in place (F5 multi-voice not implemented). Defensive
fallback: orphan/unclosed [/voice] markers are silently absorbed
rather than failing the render.
This commit is contained in:
Kayos 2026-05-14 08:35:33 -07:00
parent 330bc8bde2
commit c9bd38034c
6 changed files with 186 additions and 10 deletions

View file

@ -192,14 +192,20 @@ impl Forge {
/// Orson Black places beats differently than another author
/// would. Replace-mode if author is set; Append otherwise.
///
/// `characters` is the story's character roster. When provided,
/// the system prompt instructs the model to wrap dialogue in
/// `[voice:<slug>]"..."[/voice]` for multi-voice rendering. The
/// slug is mapped to a Kokoro voice id by skald's narrate path.
///
/// Hard rule the system prompt enforces: do not change a word
/// of prose. Tags are additive only.
pub async fn narrate_prep(
&self,
prose: &str,
author: Option<&AuthorWithRevision>,
characters: &[CharacterSpeaker],
) -> anyhow::Result<PassOutput> {
let user_prompt = narrate_prep_user_prompt(prose);
let user_prompt = narrate_prep_user_prompt(prose, characters);
let (system, mode) = match author {
Some(a) => {
let scaffold = a
@ -339,9 +345,9 @@ const HOUSE_CLEANUP_SYSTEM: &str = "You are a copy editor polishing a draft chap
const SYSTEM_AUDIT: &str = "You are a canon auditor for long-form fiction. You compare a parent story and a new chapter against the bible. You flag continuity drift, character voice shift, retconned facts, dropped threads, timeline contradictions. You return STRUCTURED JSON ONLY — no commentary, no preamble. The exact shape: { \"findings\": [ { \"severity\": \"info\"|\"warn\"|\"crit\", \"area\": \"character\"|\"continuity\"|\"tone\"|\"fact\"|\"timeline\"|\"other\", \"body\": \"...\" } ] }. If no findings, return { \"findings\": [] }.";
const NARRATE_PREP_DIRECTIVE: &str = "This is a NARRATION-ANNOTATION pass. You receive your own prose and prepare it for an audiobook reading. Two kinds of inserts are allowed:\n\n1. BEAT MARKERS (additive, not prose): `[breath]` (~400ms), `[pause:1.2s]` (explicit silence in seconds, e.g. 0.5s, 1.2s, 2s), `[scene]` (~1500ms scene break). Place where the prose's rhythm asks for them — after a hard one-line beat, before a turn in dialogue, on a paragraph that lands with weight.\n\n2. NARRATOR STUMBLES (humanizing prose-level inserts): a real narrator occasionally stumbles on a hard word, catches themselves, repeats. You may add these *sparingly* where the prose's pacing makes them feel right. Patterns: em-dash repetition (`Prip— Pripyat`), self-correction (`she — no, the wife — had been told`), hesitation (`the dose, the dose was`). USE SPARINGLY. Maybe 1-3 per chapter. Pick proper nouns, technical terms, or moments where the narrator might genuinely catch herself. Avoid stumbling on emotional climaxes — those should land clean.\n\nApart from stumbles, do NOT change a word of the original prose. Return the prose with beat markers and stumbles inline. No preamble. No commentary about your choices.";
const NARRATE_PREP_DIRECTIVE: &str = "This is a NARRATION-ANNOTATION pass. You receive your own prose and prepare it for an audiobook reading. Three kinds of inserts are allowed:\n\n1. BEAT MARKERS (additive, not prose): `[breath]` (~400ms), `[pause:1.2s]` (explicit silence in seconds, e.g. 0.5s, 1.2s, 2s), `[scene]` (~1500ms scene break). Place where the prose's rhythm asks for them — after a hard one-line beat, before a turn in dialogue, on a paragraph that lands with weight.\n\n2. SPEAKER VOICE TAGS (multi-voice dialogue): wrap dialogue lines in `[voice:<slug>]\"...\"[/voice]` based on who is speaking. The roster of available speaker slugs is given in the user prompt. The dialogue itself stays verbatim — only the wrapper is added. If a line of dialogue is not clearly attributable to a roster speaker, leave it unwrapped (the narrator voice will read it). Quoted thoughts (italicized interior monologue) stay unwrapped — only spoken aloud dialogue gets a voice tag.\n\n3. NARRATOR STUMBLES (humanizing prose-level inserts): a real narrator occasionally stumbles on a hard word, catches themselves, repeats. You may add these *sparingly* where the prose's pacing makes them feel right. Patterns: em-dash repetition (`Prip— Pripyat`), self-correction (`she — no, the wife — had been told`), hesitation (`the dose, the dose was`). USE SPARINGLY. Maybe 1-3 per chapter. Pick proper nouns, technical terms, or moments where the narrator might genuinely catch herself. Avoid stumbling on emotional climaxes — those should land clean.\n\nApart from stumbles, do NOT change a word of the original prose. Return the prose with beat markers, voice tags, and stumbles inline. No preamble. No commentary about your choices.";
const HOUSE_NARRATE_PREP_SYSTEM: &str = "You are a senior audiobook director annotating prose for narration. You insert (a) beat markers — `[breath]`, `[pause:Xs]`, `[scene]` — where a skilled narrator would breathe or pause, and (b) occasional humanizing narrator stumbles using em-dash repetition or self-correction (sparingly — maybe 1-3 per chapter, on proper nouns or hard words). Apart from those stumbles you do NOT change a word of the prose. Return the prose verbatim plus beat markers and (rare) stumbles inline. No preamble, no commentary.";
const HOUSE_NARRATE_PREP_SYSTEM: &str = "You are a senior audiobook director annotating prose for narration. You insert (a) beat markers — `[breath]`, `[pause:Xs]`, `[scene]` — where a skilled narrator would breathe or pause, (b) speaker voice tags `[voice:<slug>]\"...\"[/voice]` wrapping dialogue based on who is speaking (roster supplied in user prompt; leave unattributed dialogue unwrapped), and (c) occasional humanizing narrator stumbles using em-dash repetition or self-correction (sparingly — maybe 1-3 per chapter, on proper nouns or hard words). Apart from those stumbles you do NOT change a word of the prose. Return the prose verbatim plus beat markers, voice tags, and (rare) stumbles inline. No preamble, no commentary.";
// ─── User-prompt builders ───────────────────────────────────────
@ -376,14 +382,53 @@ fn gen_user_prompt(
out
}
fn narrate_prep_user_prompt(prose: &str) -> String {
/// One row of the story's character roster, passed to narrate_prep
/// so the LLM knows what speaker slugs to use in `[voice:slug]`
/// tags. Built from skald's characters table.
#[derive(Debug, Clone)]
pub struct CharacterSpeaker {
pub slug: String,
pub name: String,
/// Short note (1 sentence) giving the LLM enough to disambiguate
/// who's speaking when prose says "she said". Pulled from
/// characters.key_facts but trimmed.
pub hint: Option<String>,
}
fn narrate_prep_user_prompt(prose: &str, characters: &[CharacterSpeaker]) -> String {
let mut out = String::with_capacity(prose.len() + 512);
if !characters.is_empty() {
out.push_str("# Speaker roster\n\n");
out.push_str(
"Use these slugs in `[voice:<slug>]\"...\"[/voice]` wrappers on dialogue. \
Leave dialogue without a clear roster speaker unwrapped (the narrator \
voice will read it).\n\n",
);
for c in characters {
out.push_str("- `");
out.push_str(&c.slug);
out.push_str("` — ");
out.push_str(&c.name);
if let Some(h) = &c.hint {
if !h.trim().is_empty() {
out.push_str(" (");
out.push_str(h.trim());
out.push(')');
}
}
out.push('\n');
}
out.push('\n');
}
out.push_str("# Prose to annotate\n\n");
out.push_str(prose);
out.push_str(
"\n\n# Task\n\nReturn the prose above with `[breath]`, `[pause:Xs]`, and \
`[scene]` markers inserted at natural narration beats. Do not change \
any word. Do not skip any sentence. Return only the annotated prose.\n",
"\n\n# Task\n\nReturn the prose above with `[breath]`, `[pause:Xs]`, \
`[scene]` markers and `[voice:<slug>]\"...\"[/voice]` dialogue wrappers \
inserted appropriately. Do not change any word. Do not skip any \
sentence. Return only the annotated prose.\n",
);
out
}