multi-voice: per-character dialogue rendering
Schema: characters.voice_id + characters.slug (migration 0007). voice_id is FK to voices(id); slug is the stable lowercase token the narrate_prep pass uses inside [voice:slug]...[/voice]. Forge::narrate_prep takes &[CharacterSpeaker]. System prompt expanded to instruct the author to wrap dialogue lines in voice tags based on a roster supplied in the user prompt (slug + name + short hint from key_facts). Unattributed dialogue stays unwrapped and inherits the narrator voice. skald narrate substitutes [voice:<character-slug>] → [voice:<kokoro-voice-name>] right before sending to Kokoro, using characters.voice_id JOIN voices.reference_path as the map. Slugs with no voice or no character row fall back to the narrator voice defensively (logged as warn). kokoro_server.py v0.4: splitter recognises [voice:X]...[/voice] blocks at the paragraph level. Each text node carries an optional voice attribution; renderer feeds it to Kokoro per-segment. Outside voice blocks the request's default voice is used. voices_used is reported back so callers can verify multi-voice actually ran. Only kokoro-routed renders pre-process voice tags; F5 paths leave the tags in place (F5 multi-voice not implemented). Defensive fallback: orphan/unclosed [/voice] markers are silently absorbed rather than failing the render.
This commit is contained in:
parent
330bc8bde2
commit
c9bd38034c
6 changed files with 186 additions and 10 deletions
|
|
@ -192,14 +192,20 @@ impl Forge {
|
|||
/// Orson Black places beats differently than another author
|
||||
/// would. Replace-mode if author is set; Append otherwise.
|
||||
///
|
||||
/// `characters` is the story's character roster. When provided,
|
||||
/// the system prompt instructs the model to wrap dialogue in
|
||||
/// `[voice:<slug>]"..."[/voice]` for multi-voice rendering. The
|
||||
/// slug is mapped to a Kokoro voice id by skald's narrate path.
|
||||
///
|
||||
/// Hard rule the system prompt enforces: do not change a word
|
||||
/// of prose. Tags are additive only.
|
||||
pub async fn narrate_prep(
|
||||
&self,
|
||||
prose: &str,
|
||||
author: Option<&AuthorWithRevision>,
|
||||
characters: &[CharacterSpeaker],
|
||||
) -> anyhow::Result<PassOutput> {
|
||||
let user_prompt = narrate_prep_user_prompt(prose);
|
||||
let user_prompt = narrate_prep_user_prompt(prose, characters);
|
||||
let (system, mode) = match author {
|
||||
Some(a) => {
|
||||
let scaffold = a
|
||||
|
|
@ -339,9 +345,9 @@ const HOUSE_CLEANUP_SYSTEM: &str = "You are a copy editor polishing a draft chap
|
|||
|
||||
const SYSTEM_AUDIT: &str = "You are a canon auditor for long-form fiction. You compare a parent story and a new chapter against the bible. You flag continuity drift, character voice shift, retconned facts, dropped threads, timeline contradictions. You return STRUCTURED JSON ONLY — no commentary, no preamble. The exact shape: { \"findings\": [ { \"severity\": \"info\"|\"warn\"|\"crit\", \"area\": \"character\"|\"continuity\"|\"tone\"|\"fact\"|\"timeline\"|\"other\", \"body\": \"...\" } ] }. If no findings, return { \"findings\": [] }.";
|
||||
|
||||
const NARRATE_PREP_DIRECTIVE: &str = "This is a NARRATION-ANNOTATION pass. You receive your own prose and prepare it for an audiobook reading. Two kinds of inserts are allowed:\n\n1. BEAT MARKERS (additive, not prose): `[breath]` (~400ms), `[pause:1.2s]` (explicit silence in seconds, e.g. 0.5s, 1.2s, 2s), `[scene]` (~1500ms scene break). Place where the prose's rhythm asks for them — after a hard one-line beat, before a turn in dialogue, on a paragraph that lands with weight.\n\n2. NARRATOR STUMBLES (humanizing prose-level inserts): a real narrator occasionally stumbles on a hard word, catches themselves, repeats. You may add these *sparingly* where the prose's pacing makes them feel right. Patterns: em-dash repetition (`Prip— Pripyat`), self-correction (`she — no, the wife — had been told`), hesitation (`the dose, the dose was`). USE SPARINGLY. Maybe 1-3 per chapter. Pick proper nouns, technical terms, or moments where the narrator might genuinely catch herself. Avoid stumbling on emotional climaxes — those should land clean.\n\nApart from stumbles, do NOT change a word of the original prose. Return the prose with beat markers and stumbles inline. No preamble. No commentary about your choices.";
|
||||
const NARRATE_PREP_DIRECTIVE: &str = "This is a NARRATION-ANNOTATION pass. You receive your own prose and prepare it for an audiobook reading. Three kinds of inserts are allowed:\n\n1. BEAT MARKERS (additive, not prose): `[breath]` (~400ms), `[pause:1.2s]` (explicit silence in seconds, e.g. 0.5s, 1.2s, 2s), `[scene]` (~1500ms scene break). Place where the prose's rhythm asks for them — after a hard one-line beat, before a turn in dialogue, on a paragraph that lands with weight.\n\n2. SPEAKER VOICE TAGS (multi-voice dialogue): wrap dialogue lines in `[voice:<slug>]\"...\"[/voice]` based on who is speaking. The roster of available speaker slugs is given in the user prompt. The dialogue itself stays verbatim — only the wrapper is added. If a line of dialogue is not clearly attributable to a roster speaker, leave it unwrapped (the narrator voice will read it). Quoted thoughts (italicized interior monologue) stay unwrapped — only spoken aloud dialogue gets a voice tag.\n\n3. NARRATOR STUMBLES (humanizing prose-level inserts): a real narrator occasionally stumbles on a hard word, catches themselves, repeats. You may add these *sparingly* where the prose's pacing makes them feel right. Patterns: em-dash repetition (`Prip— Pripyat`), self-correction (`she — no, the wife — had been told`), hesitation (`the dose, the dose was`). USE SPARINGLY. Maybe 1-3 per chapter. Pick proper nouns, technical terms, or moments where the narrator might genuinely catch herself. Avoid stumbling on emotional climaxes — those should land clean.\n\nApart from stumbles, do NOT change a word of the original prose. Return the prose with beat markers, voice tags, and stumbles inline. No preamble. No commentary about your choices.";
|
||||
|
||||
const HOUSE_NARRATE_PREP_SYSTEM: &str = "You are a senior audiobook director annotating prose for narration. You insert (a) beat markers — `[breath]`, `[pause:Xs]`, `[scene]` — where a skilled narrator would breathe or pause, and (b) occasional humanizing narrator stumbles using em-dash repetition or self-correction (sparingly — maybe 1-3 per chapter, on proper nouns or hard words). Apart from those stumbles you do NOT change a word of the prose. Return the prose verbatim plus beat markers and (rare) stumbles inline. No preamble, no commentary.";
|
||||
const HOUSE_NARRATE_PREP_SYSTEM: &str = "You are a senior audiobook director annotating prose for narration. You insert (a) beat markers — `[breath]`, `[pause:Xs]`, `[scene]` — where a skilled narrator would breathe or pause, (b) speaker voice tags `[voice:<slug>]\"...\"[/voice]` wrapping dialogue based on who is speaking (roster supplied in user prompt; leave unattributed dialogue unwrapped), and (c) occasional humanizing narrator stumbles using em-dash repetition or self-correction (sparingly — maybe 1-3 per chapter, on proper nouns or hard words). Apart from those stumbles you do NOT change a word of the prose. Return the prose verbatim plus beat markers, voice tags, and (rare) stumbles inline. No preamble, no commentary.";
|
||||
|
||||
// ─── User-prompt builders ───────────────────────────────────────
|
||||
|
||||
|
|
@ -376,14 +382,53 @@ fn gen_user_prompt(
|
|||
out
|
||||
}
|
||||
|
||||
fn narrate_prep_user_prompt(prose: &str) -> String {
|
||||
/// One row of the story's character roster, passed to narrate_prep
|
||||
/// so the LLM knows what speaker slugs to use in `[voice:slug]`
|
||||
/// tags. Built from skald's characters table.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct CharacterSpeaker {
|
||||
pub slug: String,
|
||||
pub name: String,
|
||||
/// Short note (1 sentence) giving the LLM enough to disambiguate
|
||||
/// who's speaking when prose says "she said". Pulled from
|
||||
/// characters.key_facts but trimmed.
|
||||
pub hint: Option<String>,
|
||||
}
|
||||
|
||||
fn narrate_prep_user_prompt(prose: &str, characters: &[CharacterSpeaker]) -> String {
|
||||
let mut out = String::with_capacity(prose.len() + 512);
|
||||
|
||||
if !characters.is_empty() {
|
||||
out.push_str("# Speaker roster\n\n");
|
||||
out.push_str(
|
||||
"Use these slugs in `[voice:<slug>]\"...\"[/voice]` wrappers on dialogue. \
|
||||
Leave dialogue without a clear roster speaker unwrapped (the narrator \
|
||||
voice will read it).\n\n",
|
||||
);
|
||||
for c in characters {
|
||||
out.push_str("- `");
|
||||
out.push_str(&c.slug);
|
||||
out.push_str("` — ");
|
||||
out.push_str(&c.name);
|
||||
if let Some(h) = &c.hint {
|
||||
if !h.trim().is_empty() {
|
||||
out.push_str(" (");
|
||||
out.push_str(h.trim());
|
||||
out.push(')');
|
||||
}
|
||||
}
|
||||
out.push('\n');
|
||||
}
|
||||
out.push('\n');
|
||||
}
|
||||
|
||||
out.push_str("# Prose to annotate\n\n");
|
||||
out.push_str(prose);
|
||||
out.push_str(
|
||||
"\n\n# Task\n\nReturn the prose above with `[breath]`, `[pause:Xs]`, and \
|
||||
`[scene]` markers inserted at natural narration beats. Do not change \
|
||||
any word. Do not skip any sentence. Return only the annotated prose.\n",
|
||||
"\n\n# Task\n\nReturn the prose above with `[breath]`, `[pause:Xs]`, \
|
||||
`[scene]` markers and `[voice:<slug>]\"...\"[/voice]` dialogue wrappers \
|
||||
inserted appropriately. Do not change any word. Do not skip any \
|
||||
sentence. Return only the annotated prose.\n",
|
||||
);
|
||||
out
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue