sterilize: fan-out compound lines + filter identity rows in diff UI

Cobb spotted job 4's proposals weren't actually doing useful work —
"Toppings (Cinnamon Butter, Marshmallows, Ground Cinnamon, Butter, Etc)"
came back unchanged because the prompt was rigidly 1-in-1-out and
treated the whole compound line as a section header. Same for
"salt and ground black pepper to taste" — should be 2 separate
shopping list items but the parser kept them as one note.

Three changes:

1. STERILIZE_SYSTEM rewritten to allow fan-out. New return shape is
   list-of-lists: outer list mirrors input length, each inner list
   has 1 item (normal case) or N items (fan-out). Explicit fan-out
   rules cover the two patterns Cobb cares about:
     - "salt and pepper" / "X and Y to taste" → 2 items
     - "Toppings (a, b, c, etc)" / "Optional: A, B, C" → N items,
       wrapper word dropped, filler ("etc") skipped
   Plus a heuristic against accidentally splitting compound food
   names ("salt and vinegar chips", "macaroni and cheese" → keep).

2. _parse_batch + IngredientProposal + apply_recipe all updated for
   the new shape. IngredientProposal.parsed → parsed_items: list.
   apply_recipe iterates each child:
     - First child inherits the original Mealie row's id/refId/
       originalText so existing references stay live
     - Additional children are fresh dicts; Mealie generates ids
       on save when none provided
   Backward-compat fallback in apply_recipe accepts the legacy
   single-parsed shape so any in-flight job 4 proposals still apply
   cleanly.

3. /sterilize UI was→becomes table now:
   - Renders one row per parsed child (rowspan'd "was" cell when
     fanning out, with a "→³" superscript marker on the arrow)
   - Drops identity rows (1→1 case where parse matches original
     verbatim) so the diff shows only ACTUAL changes — fixes Cobb's
     "this doesn't look sterilized at all" complaint where every
     diff was identical
   - Cards with all-identity proposals show "no changes proposed
     (all ingredients already matched)" instead of an empty table

Job 4's stored proposals use the legacy 1→1 shape so won't show fan-out
until re-walked. Recommend cancelling job 4 and starting a fresh job 5
with the new prompt to see the toppings line break out properly.
This commit is contained in:
Kayos 2026-04-30 11:07:20 -07:00
parent a0ad363915
commit d359bed450
2 changed files with 203 additions and 84 deletions

View file

@ -23,31 +23,52 @@ from .mealie import Mealie, MealieError
STERILIZE_SYSTEM = """You are a precise recipe ingredient parser. You ONLY output valid JSON.
You receive a list of free-form ingredient strings and must return a parallel
array where each item is parsed into structured form.
You receive a list of free-form ingredient strings and return a parallel
list of LISTS one inner list per input. Most inputs map 11 (single item
inside the list). Compound lines that name multiple distinct foods MUST
fan out into multiple items so each food gets its own row on the shopping
list.
Output schema (per item):
Per-item schema:
{
"quantity": <number or null>, # numeric amount, fractions converted to decimals (1/2 -> 0.5)
"unit": <string or null>, # singular canonical form: "cup", "tbsp", "tsp", "oz", "lb", "g", "kg", "ml", "l", "clove", "slice", "can", "package", "piece", "pinch", "dash", "handful". null if no unit (e.g. "1 onion").
"food": <string or null>, # the core food noun in singular canonical form: "onion", "garlic", "rice", "olive oil". Strip prep state ("chopped", "diced") -- those go in note.
"quantity": <number or null>, # numeric amount; fractions → decimals (1/2 → 0.5)
"unit": <string or null>, # singular canonical: "cup", "tbsp", "tsp", "oz", "lb", "g", "kg", "ml", "l", "clove", "slice", "can", "package", "piece", "pinch", "dash", "handful". null if no unit (e.g. "1 onion").
"food": <string or null>, # core food noun, singular canonical lowercase: "onion", "garlic", "rice", "olive oil". Strip prep state ("chopped", "diced") into note.
"note": <string or null>, # prep state, brand, color, modifier: "chopped", "extra virgin", "yellow", "to taste"
"approx": <bool> # true if the input said "about" / "a pinch" / "to taste" / vague qty
"approx": <bool> # true if input said "about", "a pinch", "to taste", or otherwise vague
}
Rules:
- Convert fractions: "1/2" -> 0.5, "1 1/4" -> 1.25
- "a pinch", "a dash", "to taste" -> {quantity: null, approx: true, note: "to taste"}
- "1 small onion" -> {quantity: 1, unit: null, food: "onion", note: "small"}
- "2 cloves garlic, minced" -> {quantity: 2, unit: "clove", food: "garlic", note: "minced"}
- Section headers like "For the sauce:" -> all fields null EXCEPT note: "<header text>"
- If you genuinely cannot parse, set all fields null and put the original in note.
FAN-OUT RULES return MULTIPLE items for one input when:
- "salt and pepper" / "salt and ground black pepper to taste" split into 2 items, each
{quantity: 1, unit: "dash", food: "salt"|"black pepper", note: "to taste", approx: true}
- "Toppings (cinnamon butter, marshmallows, ground cinnamon, butter, etc)" or
"Optional: cilantro, lime, queso fresco" one item per food in the comma list.
Drop the wrapper word ("Toppings", "Optional"); leave it OUT of food/note. Skip
filler words like "etc". Each item: quantity=null, unit=null, food=<name>, note=null, approx=true.
- "1 lemon, juice and zest" 2 items: {qty:1, unit:null, food:"lemon juice"} and {qty:1, unit:null, food:"lemon zest"}
- DO NOT split "salt and vinegar chips" or "macaroni and cheese" those are
compound food names, not multi-food lines. Heuristic: if the words on either
side of "and" are a recognized standalone food, split; otherwise keep as one.
PARSE RULES (for the common 11 case):
- Convert fractions: "1/2" 0.5, "1 1/4" 1.25
- "a pinch" / "a dash" alone {quantity: 1, unit: "pinch"|"dash", approx: true}
- "to taste" alone {quantity: null, unit: null, food: <food>, note: "to taste", approx: true}
- "1 small onion" {quantity: 1, unit: null, food: "onion", note: "small"}
- "2 cloves garlic, minced" {quantity: 2, unit: "clove", food: "garlic", note: "minced"}
- "1.5 cups broccoli (coarsely chopped florets)" {quantity: 1.5, unit: "cup", food: "broccoli", note: "coarsely chopped florets"}
- Section headers like "For the sauce:" 1 item with all fields null EXCEPT
note: "<header text>" (so Mealie can preserve the header row)
- If you genuinely cannot parse (junk input), return 1 item with all fields null
and the original string in note.
- DO NOT add fields not in the schema.
- DO NOT wrap output in markdown fences.
- DO NOT include any prose before or after the JSON.
You will be given a JSON object: {"ingredients": ["str", "str", ...]}
You return: {"parses": [{...}, {...}, ...]} -- same length, same order.
Input shape: {"ingredients": ["str", "str", ...]}
Output shape: {"parses": [[{...}, {...}], [{...}], [{...}, {...}, {...}], ...]}
The outer list MUST have the same length as the input list. Each inner list
MUST contain at least 1 item (use the all-null junk-fallback if needed).
"""
@ -62,14 +83,16 @@ class IngredientParse:
@dataclass
class IngredientProposal:
"""One ingredient before vs after."""
"""One original ingredient → one or more parsed children. parsed_items
has length 1 in normal cases; >1 when a compound line was fanned out
("Toppings (a, b, c)" 3 children, "salt and pepper" 2 children)."""
index: int
original_display: str
original_quantity: float | None
original_unit_name: str | None
original_food_name: str | None
original_note: str | None
parsed: IngredientParse
parsed_items: list[IngredientParse]
class Sterilizer:
@ -81,17 +104,20 @@ class Sterilizer:
# --- public -------------------------------------------------------------
def preview_recipe(self, slug: str) -> dict:
"""Dry-run: parse all ingredients, return proposals without writing."""
"""Dry-run: parse all ingredients, return proposals without writing.
Each input ingredient produces one IngredientProposal whose
parsed_items list has length 1 (normal case) or N (fan-out)."""
recipe = self.mealie.get_recipe(slug)
ingredients = recipe.get("recipeIngredient") or []
if not ingredients:
return {"slug": slug, "name": recipe.get("name"), "proposals": []}
strings = [_render_ingredient_for_parse(ing) for ing in ingredients]
parses = self._parse_batch(strings)
parses_per_input = self._parse_batch(strings)
proposals: list[IngredientProposal] = []
for i, (ing, parse) in enumerate(zip(ingredients, parses)):
for i, (ing, items) in enumerate(zip(ingredients, parses_per_input)):
proposals.append(
IngredientProposal(
index=i,
@ -100,7 +126,7 @@ class Sterilizer:
original_unit_name=(ing.get("unit") or {}).get("name") if ing.get("unit") else None,
original_food_name=(ing.get("food") or {}).get("name") if ing.get("food") else None,
original_note=ing.get("note"),
parsed=parse,
parsed_items=items,
)
)
@ -138,40 +164,74 @@ class Sterilizer:
new_ingredients: list[dict] = []
for orig_ing, prop in zip(recipe.get("recipeIngredient") or [], proposals):
parsed = prop["parsed"]
new_ing = dict(orig_ing) # preserve id, refId, original_text
# Each proposal can produce 1+ parsed children (fan-out for
# compound inputs like "Toppings (a, b, c)" or "salt and pepper").
# Keep the proposal_json key flexible: prefer parsed_items but
# fall back to a single 'parsed' for backward-compat.
items = prop.get("parsed_items")
if not isinstance(items, list) or not items:
legacy = prop.get("parsed")
items = [legacy] if isinstance(legacy, dict) else []
if not items:
# Nothing to write — pass the original through unchanged
new_ingredients.append(dict(orig_ing))
continue
new_ing["quantity"] = parsed["quantity"]
for child_idx, parsed in enumerate(items):
if child_idx == 0:
# First child inherits id/refId/originalText from the
# original Mealie row, so existing references stay live
new_ing = dict(orig_ing)
else:
# Additional children are fresh rows. Mealie generates
# ids on save when none provided.
new_ing = {
# Inherit refId so all fan-out children belong to
# the same logical group as the original. Some
# Mealie versions tolerate dup refIds; others
# generate one if missing.
"referenceId": orig_ing.get("referenceId"),
"title": None,
"originalText": orig_ing.get("originalText") or orig_ing.get("display"),
"disableAmount": False,
}
food_name = (parsed.get("food") or "").strip()
if food_name:
food_id = self._resolve_food(
food_name, food_index,
create_missing=create_missing,
created_log=created_foods,
)
if food_id:
new_ing["food"] = {"id": food_id, "name": food_name}
new_ing["isFood"] = True
else:
# Section header style — clear food, mark not-food
new_ing["food"] = None
new_ing["isFood"] = False
new_ing["quantity"] = parsed.get("quantity")
unit_name = (parsed.get("unit") or "").strip()
if unit_name:
unit_id = self._resolve_unit(
unit_name, unit_index,
create_missing=create_missing,
created_log=created_units,
)
if unit_id:
new_ing["unit"] = {"id": unit_id, "name": unit_name}
else:
new_ing["unit"] = None
food_name = (parsed.get("food") or "").strip()
if food_name:
food_id = self._resolve_food(
food_name, food_index,
create_missing=create_missing,
created_log=created_foods,
)
if food_id:
new_ing["food"] = {"id": food_id, "name": food_name}
new_ing["isFood"] = True
else:
new_ing["food"] = None
new_ing["isFood"] = False
else:
# Section header style — clear food, mark not-food
new_ing["food"] = None
new_ing["isFood"] = False
new_ing["note"] = parsed.get("note") or ""
new_ingredients.append(new_ing)
unit_name = (parsed.get("unit") or "").strip()
if unit_name:
unit_id = self._resolve_unit(
unit_name, unit_index,
create_missing=create_missing,
created_log=created_units,
)
if unit_id:
new_ing["unit"] = {"id": unit_id, "name": unit_name}
else:
new_ing["unit"] = None
else:
new_ing["unit"] = None
new_ing["note"] = parsed.get("note") or ""
new_ingredients.append(new_ing)
recipe["recipeIngredient"] = new_ingredients
self.mealie.update_recipe(slug, recipe)
@ -283,7 +343,10 @@ class Sterilizer:
# --- private ------------------------------------------------------------
def _parse_batch(self, strings: list[str]) -> list[IngredientParse]:
def _parse_batch(self, strings: list[str]) -> list[list[IngredientParse]]:
"""Returns list-of-lists matching the input length. Each inner list
is the parses derived from one input string (1 in normal case, N
for fan-out, never 0)."""
prompt = json.dumps({"ingredients": strings}, ensure_ascii=False)
try:
resp = self.forge.run(
@ -305,17 +368,34 @@ class Sterilizer:
f"parse count mismatch: got {len(parses_raw)}, expected {len(strings)}"
)
out: list[IngredientParse] = []
out: list[list[IngredientParse]] = []
for p in parses_raw:
out.append(
IngredientParse(
quantity=_coerce_float(p.get("quantity")),
unit=_clean_str(p.get("unit")),
food=_clean_str(p.get("food")),
note=_clean_str(p.get("note")),
approx=bool(p.get("approx")),
# Defensive: accept either a single dict (legacy 1→1 shape) or
# a list of dicts (fan-out shape). Normalize to list-of-dicts.
items_raw = p if isinstance(p, list) else [p]
if not items_raw:
# Empty list — substitute a fallback all-null item so we
# never lose track of an input slot
items_raw = [{"quantity": None, "unit": None, "food": None,
"note": None, "approx": False}]
inner: list[IngredientParse] = []
for it in items_raw:
if not isinstance(it, dict):
continue
inner.append(
IngredientParse(
quantity=_coerce_float(it.get("quantity")),
unit=_clean_str(it.get("unit")),
food=_clean_str(it.get("food")),
note=_clean_str(it.get("note")),
approx=bool(it.get("approx")),
)
)
)
if not inner:
inner.append(IngredientParse(
quantity=None, unit=None, food=None, note=None, approx=False
))
out.append(inner)
return out
@staticmethod

View file

@ -369,22 +369,43 @@
card.appendChild(head);
if (!p.preview_error && p.proposal_json && p.proposal_json.proposals) {
const tbl = document.createElement('table');
tbl.className = 'diff-table';
tbl.innerHTML = `
<thead><tr>
<th>was</th><th></th><th>becomes</th>
</tr></thead>
<tbody>
${p.proposal_json.proposals.map(rowProposal).join('')}
</tbody>`;
card.appendChild(tbl);
const rows = p.proposal_json.proposals.map(rowProposal).join('');
if (rows.trim()) {
const tbl = document.createElement('table');
tbl.className = 'diff-table';
tbl.innerHTML = `
<thead><tr><th>was</th><th></th><th>becomes</th></tr></thead>
<tbody>${rows}</tbody>`;
card.appendChild(tbl);
} else {
// All rows were identity matches — sonnet thinks this recipe is
// already clean. Show a marker; user can still skip/approve as
// a no-op apply (which will be cheap, just refreshes food.id
// resolution if any food row got renamed in Mealie).
const note = document.createElement('div');
note.className = 'proposal-meta';
note.style.marginTop = '6px';
note.textContent = 'no changes proposed (all ingredients already matched)';
card.appendChild(note);
}
}
return card;
}
function renderItem(pa) {
const parts = [];
if (pa.quantity !== null && pa.quantity !== undefined) parts.push(pa.quantity);
if (pa.unit) parts.push(pa.unit);
if (pa.food) parts.push(pa.food);
if (pa.note) parts.push(`(${pa.note})`);
return parts.length ? parts.join(' ') : '—';
}
function rowProposal(rp) {
const pa = rp.parsed || {};
// Render one or more rows for a single proposal. Fan-out shows
// multiple "becomes" rows under one "was" cell; identity rows
// (parse matches original verbatim) are dropped so the table
// only displays actual changes.
const wasParts = [];
if (rp.original_quantity !== null && rp.original_quantity !== undefined) wasParts.push(rp.original_quantity);
if (rp.original_unit_name) wasParts.push(rp.original_unit_name);
@ -392,18 +413,36 @@
if (rp.original_note) wasParts.push(`(${rp.original_note})`);
const wasStr = wasParts.length ? wasParts.join(' ') : (rp.original_display || '—');
const newParts = [];
if (pa.quantity !== null && pa.quantity !== undefined) newParts.push(pa.quantity);
if (pa.unit) newParts.push(pa.unit);
if (pa.food) newParts.push(pa.food);
if (pa.note) newParts.push(`(${pa.note})`);
const newStr = newParts.length ? newParts.join(' ') : '—';
const items = Array.isArray(rp.parsed_items) ? rp.parsed_items
: (rp.parsed ? [rp.parsed] : []);
const fanout = items.length > 1;
const renderedNew = items.map(renderItem);
return `<tr>
<td class="orig">${escapeHtml(wasStr)}</td>
<td></td>
<td class="new">${escapeHtml(newStr)}</td>
</tr>`;
// Decide which rows to keep. In the 1→1 case, drop if was==new.
// In the fan-out case, always show every child (even if one happens
// to match a piece of the original).
let rows;
if (!fanout) {
if (renderedNew.length === 0 || renderedNew[0] === wasStr) return '';
rows = renderedNew;
} else {
rows = renderedNew;
}
if (!rows.length) return '';
const arrow = fanout
? `→<sup style="color:var(--purple-bright); margin-left:2px;">${rows.length}</sup>`
: '→';
return rows.map((newStr, idx) => {
const wasCell = (idx === 0)
? `<td class="orig" rowspan="${rows.length}">${escapeHtml(wasStr)}</td>`
: '';
const arrowCell = (idx === 0)
? `<td rowspan="${rows.length}">${arrow}</td>`
: '';
return `<tr>${wasCell}${arrowCell}<td class="new">${escapeHtml(newStr)}</td></tr>`;
}).join('');
}
function escapeHtml(s) {