Three improvements driven by Cobb's review of the fan-out output:
1. Recipe context. _parse_batch now accepts an optional recipe_context
dict carrying recipe_name, recipe_description, and recipe_steps.
preview_recipe builds the context from the Mealie recipe and passes
it through. The Sonnet prompt has new USE RECIPE CONTEXT WHEN
AMBIGUOUS rules: "1 cup flour" is ambiguous (AP / bread / cake);
the cooking steps usually disambiguate ("knead until elastic" →
bread flour, "sift with cocoa powder" + cake recipe → cake flour).
Step text capped to 3000 chars so the user prompt stays modest;
defaults to all-purpose flour when steps don't disambiguate.
Brand/style hints in the description carry through too.
2. Spell + grammar cleanup. New SPELL/GRAMMAR CLEANUP rules in the
prompt: silently fix typos in food and note ("tomatos" → "tomatoes",
"chopped finly" → "chopped finely", "heavy cram" → "heavy cream").
Normalize spacing. Critically: preserve EVERY semantic value —
numeric quantities verbatim, every prep state, brand, color. When
uncertain whether something is a typo or intentional ("yellow
squash" is a real food, not a typo), keep it. Original strings
stay in originalText for audit / rollback.
3. Defensive food.id preservation in apply_recipe. Three new safeguards
protect against Sonnet hallucinations dropping live recipe data:
a) If Sonnet returns a single all-null parsed item but the original
Mealie row had a real food.id, pass the original through
verbatim. (Sonnet probably parse-failed; never blank a real link.)
b) When Sonnet returns a food name that we can't resolve in Mealie's
catalog AND the original had a food.id, preserve the original
link rather than emit food=null.
c) When Sonnet explicitly returns food=null on the first child of
an ingredient that originally had a food.id, treat that as a
misread and preserve the original. Real section headers — where
the original was ALREADY foodless — still pass through cleanly.
Net effect: no apply path can drop a recipe's existing food
reference. Sonnet can ADD food links (good), CHANGE them (good),
or fail to parse (we keep what was there). It cannot remove them.
The is_new_food field also benefits from recipe context — Sonnet has
more evidence to set is_new_food=false (matched a known canonical)
when the steps confirm the ingredient identity.
663 lines
29 KiB
Python
663 lines
29 KiB
Python
"""Ingredient sterilizer — turn Mealie's free-form ingredient strings into
|
|
structured (qty, unit, food, note) so shopping-list aggregation works.
|
|
|
|
Why this exists: Mealie has its own CRF parser, but it's mediocre and produces
|
|
inconsistent results. Cobb's hand-typed recipes have lots of "about 2 cups
|
|
cooked white rice" / "1 small handful kale" / "a pinch of salt" etc. that
|
|
slip past the parser. We send these to Sonnet via clawdforge and get back
|
|
clean structured form.
|
|
|
|
Flow:
|
|
1. Fetch the recipe from Mealie
|
|
2. Build a single batched prompt with all ingredients (one Sonnet call/recipe)
|
|
3. Get back a parallel array of {quantity, unit, food, note}
|
|
4. (preview) return the proposal
|
|
5. (apply) link each parse to existing Mealie food/unit (create if missing),
|
|
then PUT the updated recipe back
|
|
"""
|
|
import json
|
|
from dataclasses import dataclass, asdict
|
|
|
|
from .forge import Forge, ForgeError
|
|
from .mealie import Mealie, MealieError
|
|
|
|
|
|
STERILIZE_SYSTEM_TEMPLATE = """You are a precise recipe ingredient parser. You ONLY output valid JSON.
|
|
You receive a list of free-form ingredient strings and return a parallel
|
|
list of LISTS — one inner list per input. Most inputs map 1→1 (single item
|
|
inside the list). Compound lines that name multiple distinct foods MUST
|
|
fan out into multiple items so each food gets its own row on the shopping
|
|
list.
|
|
|
|
Per-item schema:
|
|
{{
|
|
"quantity": <number or null>, # numeric amount; fractions → decimals (1/2 → 0.5)
|
|
"unit": <string or null>, # singular canonical: "cup", "tbsp", "tsp", "oz", "lb", "g", "kg", "ml", "l", "clove", "slice", "can", "package", "piece", "pinch", "dash", "handful". null if no unit (e.g. "1 onion").
|
|
"food": <string or null>, # core food noun. PREFER an exact match from the canonical catalog below; only invent a new name if nothing matches.
|
|
"note": <string or null>, # prep state, brand, color, modifier: "chopped", "extra virgin", "yellow", "to taste"
|
|
"approx": <bool>, # true if input said "about", "a pinch", "to taste", or otherwise vague
|
|
"is_new_food": <bool> # true if `food` is NOT in the canonical catalog and you're proposing a new name. false if `food` matches an existing canonical entry verbatim.
|
|
}}
|
|
|
|
CANONICAL FOOD CATALOG (use these names verbatim — exact case + spelling
|
|
— when the ingredient maps to one of them). Rows shown as:
|
|
• <name> [aliases: a, b, c] or • <name> (plural: <plural>)
|
|
|
|
{foods}
|
|
|
|
CATALOG RULES (most important):
|
|
- If the ingredient string matches an entry by name OR pluralName OR alias,
|
|
return the canonical NAME (the first column above) verbatim, exact case.
|
|
Set is_new_food=false.
|
|
- Strip prep modifiers ("chopped", "diced", "minced", "halved") into note;
|
|
the catalog name should be the bare food.
|
|
- Plural in input → singular in output if the canonical is singular.
|
|
e.g. "2 onions" → food: "onion" (when "onion" is in the catalog).
|
|
- Branded variations (e.g. "Heinz ketchup") → use canonical "ketchup" (or
|
|
whatever's in the catalog), put brand in note.
|
|
- Only set is_new_food=true when there's truly no reasonable match. Prefer
|
|
matching even if imperfect (e.g. "kosher salt" → "salt" with note
|
|
"kosher", is_new_food=false), since adding aliases later is easier than
|
|
cleaning duplicate food rows.
|
|
|
|
FAN-OUT RULES — return MULTIPLE items for one input when:
|
|
- "salt and pepper" / "salt and ground black pepper to taste" → split into 2 items, each
|
|
{{quantity: 1, unit: "dash", food: <catalog match>, note: "to taste", approx: true}}
|
|
- "Toppings (cinnamon butter, marshmallows, ground cinnamon, butter, etc)" or
|
|
"Optional: cilantro, lime, queso fresco" → one item per food in the comma list.
|
|
Drop the wrapper word ("Toppings", "Optional"); leave it OUT of food/note. Skip
|
|
filler words like "etc". Each item: quantity=null, unit=null, food=<catalog match>, note=null, approx=true.
|
|
- "1 lemon, juice and zest" → 2 items: {{qty:1, unit:null, food:<catalog match for "lemon juice">}} and {{qty:1, unit:null, food:<catalog match for "lemon zest">}}.
|
|
- DO NOT split "salt and vinegar chips" or "macaroni and cheese" — those are
|
|
compound food names, not multi-food lines. Heuristic: if the words on either
|
|
side of "and" are a recognized standalone food, split; otherwise keep as one.
|
|
|
|
PARSE RULES (for the common 1→1 case):
|
|
- Convert fractions: "1/2" → 0.5, "1 1/4" → 1.25
|
|
- "a pinch" / "a dash" alone → {{quantity: 1, unit: "pinch"|"dash", approx: true}}
|
|
- "to taste" alone → {{quantity: null, unit: null, food: <food>, note: "to taste", approx: true}}
|
|
- "1 small onion" → {{quantity: 1, unit: null, food: "onion", note: "small"}}
|
|
- "2 cloves garlic, minced" → {{quantity: 2, unit: "clove", food: "garlic", note: "minced"}}
|
|
- "1.5 cups broccoli (coarsely chopped florets)" → {{quantity: 1.5, unit: "cup", food: "broccoli", note: "coarsely chopped florets"}}
|
|
- Section headers like "For the sauce:" → 1 item with all fields null EXCEPT
|
|
note: "<header text>", is_new_food: false (so Mealie can preserve the header row)
|
|
- If you genuinely cannot parse (junk input), return 1 item with all fields null
|
|
and the original string in note, is_new_food: false.
|
|
- DO NOT add fields not in the schema.
|
|
- DO NOT wrap output in markdown fences.
|
|
- DO NOT include any prose before or after the JSON.
|
|
|
|
SPELL/GRAMMAR CLEANUP (no info loss):
|
|
- Silently fix obvious typos in food and note: "tomatos" → "tomatoes",
|
|
"all-purpose flouur" → "all-purpose flour", "chopped finly" → "chopped finely",
|
|
"1 cup heavy cram" → food: "heavy cream".
|
|
- Normalize spacing: "1 cup rice" → quantity 1 / unit cup / food rice.
|
|
- Preserve EVERY semantic value: numeric quantities verbatim, every prep state,
|
|
brand, color, cooking method. If you're unsure whether something is a typo
|
|
or intentional (e.g. "yellow squash" is a real food, not a typo of "squash"),
|
|
keep it.
|
|
|
|
USE RECIPE CONTEXT WHEN AMBIGUOUS:
|
|
- The user prompt may include `recipe_name`, `recipe_description`, and
|
|
`recipe_steps` alongside the ingredients. When an ingredient is ambiguous
|
|
(e.g. just "1 cup flour" — could be all-purpose, bread, cake, etc.), use
|
|
the cooking steps to disambiguate.
|
|
- Steps say "knead until smooth and elastic" → bread flour
|
|
- Steps say "sift with cocoa powder" + recipe is a cake → cake flour
|
|
- Steps don't disambiguate → default to all-purpose flour
|
|
- If the recipe description names a brand/style, use it.
|
|
- DO NOT add ingredients that aren't in the input list. Steps may mention
|
|
garnishes that weren't in the ingredient list — leave them out.
|
|
- When the steps confirm what an ingredient is, set is_new_food=false
|
|
(you have evidence it matches a known canonical) when possible.
|
|
|
|
Input shape:
|
|
{{"recipe_name": "...", "recipe_description": "...",
|
|
"recipe_steps": ["step text", ...],
|
|
"ingredients": ["str", "str", ...]}}
|
|
The recipe_name/description/steps fields are optional context — they may
|
|
be empty or missing; ignore them in that case.
|
|
|
|
Output shape: {{"parses": [[{{...}}, {{...}}], [{{...}}], [{{...}}, {{...}}, {{...}}], ...]}}
|
|
The outer list MUST have the same length as the input list. Each inner list
|
|
MUST contain at least 1 item (use the all-null junk-fallback if needed).
|
|
"""
|
|
|
|
|
|
@dataclass
|
|
class IngredientParse:
|
|
quantity: float | None
|
|
unit: str | None
|
|
food: str | None
|
|
note: str | None
|
|
approx: bool
|
|
is_new_food: bool = False # true when Sonnet proposes a new canonical name
|
|
|
|
|
|
@dataclass
|
|
class IngredientProposal:
|
|
"""One original ingredient → one or more parsed children. parsed_items
|
|
has length 1 in normal cases; >1 when a compound line was fanned out
|
|
("Toppings (a, b, c)" → 3 children, "salt and pepper" → 2 children)."""
|
|
index: int
|
|
original_display: str
|
|
original_quantity: float | None
|
|
original_unit_name: str | None
|
|
original_food_name: str | None
|
|
original_note: str | None
|
|
parsed_items: list[IngredientParse]
|
|
|
|
|
|
class Sterilizer:
|
|
def __init__(self, *, mealie: Mealie, forge: Forge, model: str = "sonnet"):
|
|
self.mealie = mealie
|
|
self.forge = forge
|
|
self.model = model
|
|
# Lazy-loaded canonical food catalog from Mealie. Fetched once
|
|
# per Sterilizer instance (so a bulk sterilize job pulls it
|
|
# once and reuses across all 226 recipe parses).
|
|
self._catalog_cache: list[dict] | None = None
|
|
self._catalog_prompt: str | None = None
|
|
|
|
# --- public -------------------------------------------------------------
|
|
|
|
def preview_recipe(self, slug: str) -> dict:
|
|
"""Dry-run: parse all ingredients, return proposals without writing.
|
|
|
|
Each input ingredient produces one IngredientProposal whose
|
|
parsed_items list has length 1 (normal case) or N (fan-out).
|
|
|
|
Recipe context (name + description + cooking steps) is bundled
|
|
into the Sonnet call so ambiguous ingredients ("1 cup flour")
|
|
can be disambiguated by what the recipe actually does with them."""
|
|
recipe = self.mealie.get_recipe(slug)
|
|
ingredients = recipe.get("recipeIngredient") or []
|
|
if not ingredients:
|
|
return {"slug": slug, "name": recipe.get("name"), "proposals": []}
|
|
|
|
strings = [_render_ingredient_for_parse(ing) for ing in ingredients]
|
|
recipe_context = _build_recipe_context(recipe)
|
|
parses_per_input = self._parse_batch(strings, recipe_context=recipe_context)
|
|
|
|
proposals: list[IngredientProposal] = []
|
|
for i, (ing, items) in enumerate(zip(ingredients, parses_per_input)):
|
|
proposals.append(
|
|
IngredientProposal(
|
|
index=i,
|
|
original_display=ing.get("display") or "",
|
|
original_quantity=ing.get("quantity"),
|
|
original_unit_name=(ing.get("unit") or {}).get("name") if ing.get("unit") else None,
|
|
original_food_name=(ing.get("food") or {}).get("name") if ing.get("food") else None,
|
|
original_note=ing.get("note"),
|
|
parsed_items=items,
|
|
)
|
|
)
|
|
|
|
return {
|
|
"slug": slug,
|
|
"name": recipe.get("name"),
|
|
"ingredient_count": len(ingredients),
|
|
"proposals": [_proposal_to_dict(p) for p in proposals],
|
|
}
|
|
|
|
def apply_recipe(self, slug: str, *, create_missing: bool = True) -> dict:
|
|
"""Run preview, then write changes back to Mealie.
|
|
|
|
For each ingredient we resolve (or create) Mealie food/unit by name,
|
|
then assemble the new recipeIngredient list and PUT the recipe.
|
|
|
|
Mealie normalizes food/unit names more aggressively than .lower()
|
|
(its name_normalized strips punctuation + collapses whitespace +
|
|
unicode-folds). So a local-cache miss followed by a blind create
|
|
can hit Mealie's UNIQUE constraint on (name, group_id). We
|
|
ALWAYS try the search endpoint as a tie-break before creating,
|
|
and on a UNIQUE-violation 400 we re-search and adopt whatever
|
|
Mealie has under that normalized form.
|
|
"""
|
|
preview = self.preview_recipe(slug)
|
|
proposals = preview["proposals"]
|
|
if not proposals:
|
|
return {"slug": slug, "updated": 0, "skipped": 0, "created_foods": [], "created_units": []}
|
|
|
|
recipe = self.mealie.get_recipe(slug)
|
|
food_index = self._build_name_index(self.mealie.list_foods())
|
|
unit_index = self._build_name_index(self.mealie.list_units())
|
|
created_foods: list[str] = []
|
|
created_units: list[str] = []
|
|
|
|
new_ingredients: list[dict] = []
|
|
for orig_ing, prop in zip(recipe.get("recipeIngredient") or [], proposals):
|
|
# Each proposal can produce 1+ parsed children (fan-out for
|
|
# compound inputs like "Toppings (a, b, c)" or "salt and pepper").
|
|
# Keep the proposal_json key flexible: prefer parsed_items but
|
|
# fall back to a single 'parsed' for backward-compat.
|
|
items = prop.get("parsed_items")
|
|
if not isinstance(items, list) or not items:
|
|
legacy = prop.get("parsed")
|
|
items = [legacy] if isinstance(legacy, dict) else []
|
|
if not items:
|
|
# Nothing to write — pass the original through unchanged
|
|
new_ingredients.append(dict(orig_ing))
|
|
continue
|
|
|
|
# Defensive: if Sonnet returned a single all-null item but the
|
|
# original ingredient had a real Mealie food link, this is
|
|
# almost certainly a hallucination/parse failure. Pass the
|
|
# original through unchanged rather than blank the row.
|
|
orig_food = (orig_ing.get("food") or {}) if isinstance(orig_ing.get("food"), dict) else {}
|
|
orig_food_id = orig_food.get("id")
|
|
orig_food_name = orig_food.get("name")
|
|
if (
|
|
len(items) == 1
|
|
and not (items[0].get("food") or "").strip()
|
|
and not (items[0].get("note") or "").strip()
|
|
and (items[0].get("quantity") in (None, ""))
|
|
and orig_food_id
|
|
):
|
|
# Sonnet returned all-null for a row that already had data.
|
|
# Preserve the original verbatim — never lose a recipe link.
|
|
new_ingredients.append(dict(orig_ing))
|
|
continue
|
|
|
|
for child_idx, parsed in enumerate(items):
|
|
if child_idx == 0:
|
|
# First child inherits id/refId/originalText from the
|
|
# original Mealie row, so existing references stay live
|
|
new_ing = dict(orig_ing)
|
|
else:
|
|
# Additional children are fresh rows. Mealie generates
|
|
# ids on save when none provided.
|
|
new_ing = {
|
|
# Inherit refId so all fan-out children belong to
|
|
# the same logical group as the original. Some
|
|
# Mealie versions tolerate dup refIds; others
|
|
# generate one if missing.
|
|
"referenceId": orig_ing.get("referenceId"),
|
|
"title": None,
|
|
"originalText": orig_ing.get("originalText") or orig_ing.get("display"),
|
|
"disableAmount": False,
|
|
}
|
|
|
|
new_ing["quantity"] = parsed.get("quantity")
|
|
|
|
food_name = (parsed.get("food") or "").strip()
|
|
if food_name:
|
|
food_id = self._resolve_food(
|
|
food_name, food_index,
|
|
create_missing=create_missing,
|
|
created_log=created_foods,
|
|
)
|
|
if food_id:
|
|
new_ing["food"] = {"id": food_id, "name": food_name}
|
|
new_ing["isFood"] = True
|
|
elif child_idx == 0 and orig_food_id:
|
|
# Sonnet wanted to set a different food but we
|
|
# couldn't resolve. Preserve the original link
|
|
# rather than blank — never lose recipe data.
|
|
new_ing["food"] = {"id": orig_food_id, "name": orig_food_name or ""}
|
|
new_ing["isFood"] = True
|
|
else:
|
|
new_ing["food"] = None
|
|
new_ing["isFood"] = False
|
|
else:
|
|
# Sonnet returned no food. Distinguish two cases:
|
|
if child_idx == 0 and orig_food_id:
|
|
# Original had a Mealie food link → defensive: KEEP
|
|
# it. Sonnet probably misread the input as a section
|
|
# header but the recipe author already linked it.
|
|
new_ing["food"] = {"id": orig_food_id, "name": orig_food_name or ""}
|
|
new_ing["isFood"] = True
|
|
else:
|
|
# True section header (original was already foodless)
|
|
# OR fan-out child (these are net-new rows; if Sonnet
|
|
# didn't pick a food for them, drop quietly)
|
|
new_ing["food"] = None
|
|
new_ing["isFood"] = False
|
|
|
|
unit_name = (parsed.get("unit") or "").strip()
|
|
if unit_name:
|
|
unit_id = self._resolve_unit(
|
|
unit_name, unit_index,
|
|
create_missing=create_missing,
|
|
created_log=created_units,
|
|
)
|
|
if unit_id:
|
|
new_ing["unit"] = {"id": unit_id, "name": unit_name}
|
|
else:
|
|
new_ing["unit"] = None
|
|
else:
|
|
new_ing["unit"] = None
|
|
|
|
new_ing["note"] = parsed.get("note") or ""
|
|
new_ingredients.append(new_ing)
|
|
|
|
recipe["recipeIngredient"] = new_ingredients
|
|
self.mealie.update_recipe(slug, recipe)
|
|
|
|
return {
|
|
"slug": slug,
|
|
"updated": len(new_ingredients),
|
|
"created_foods": created_foods,
|
|
"created_units": created_units,
|
|
}
|
|
|
|
# --- food/unit resolution helpers --------------------------------------
|
|
|
|
def _resolve_food(
|
|
self,
|
|
name: str,
|
|
index: dict[str, str],
|
|
*,
|
|
create_missing: bool,
|
|
created_log: list[str],
|
|
) -> str | None:
|
|
"""Find or create a Mealie food row, robust to normalization gaps."""
|
|
key = name.lower()
|
|
|
|
# Step 1: local cache hit (covers name + pluralName from list_foods)
|
|
if key in index:
|
|
return index[key]
|
|
|
|
# Step 2: server-side search — Mealie does proper normalization here
|
|
existing_id = self._search_for_match(name, "foods")
|
|
if existing_id:
|
|
index[key] = existing_id
|
|
return existing_id
|
|
|
|
# Step 3: create. If Mealie races us with a UNIQUE-constraint 400,
|
|
# search again and use whatever it has under the normalized form.
|
|
if not create_missing:
|
|
return None
|
|
try:
|
|
created = self.mealie.create_food(name)
|
|
food_id = created.get("id")
|
|
except MealieError as e:
|
|
msg = str(e)
|
|
if "UNIQUE constraint" in msg or "400" in msg:
|
|
food_id = self._search_for_match(name, "foods")
|
|
if not food_id:
|
|
raise # truly couldn't reconcile — let caller record error
|
|
else:
|
|
raise
|
|
if food_id:
|
|
index[key] = food_id
|
|
created_log.append(name)
|
|
return food_id
|
|
|
|
def _resolve_unit(
|
|
self,
|
|
name: str,
|
|
index: dict[str, str],
|
|
*,
|
|
create_missing: bool,
|
|
created_log: list[str],
|
|
) -> str | None:
|
|
key = name.lower()
|
|
if key in index:
|
|
return index[key]
|
|
existing_id = self._search_for_match(name, "units")
|
|
if existing_id:
|
|
index[key] = existing_id
|
|
return existing_id
|
|
if not create_missing:
|
|
return None
|
|
try:
|
|
created = self.mealie.create_unit(name)
|
|
unit_id = created.get("id")
|
|
except MealieError as e:
|
|
msg = str(e)
|
|
if "UNIQUE constraint" in msg or "400" in msg:
|
|
unit_id = self._search_for_match(name, "units")
|
|
if not unit_id:
|
|
raise
|
|
else:
|
|
raise
|
|
if unit_id:
|
|
index[key] = unit_id
|
|
created_log.append(name)
|
|
return unit_id
|
|
|
|
def _search_for_match(self, name: str, kind: str) -> str | None:
|
|
"""Use Mealie's search endpoint to find a foods/units row matching
|
|
`name`. Returns the id of the first item whose name or pluralName
|
|
matches (case-insensitive) the query, else None."""
|
|
target = name.strip().lower()
|
|
if not target:
|
|
return None
|
|
listing = (self.mealie.list_foods(search=name)
|
|
if kind == "foods"
|
|
else self.mealie.list_units(search=name))
|
|
items = listing.get("items") or listing.get("data") or []
|
|
# Mealie's search returns ranked results; take the first exact-ish match
|
|
for item in items:
|
|
for field in ("name", "pluralName"):
|
|
v = (item.get(field) or "").strip().lower()
|
|
if v and v == target:
|
|
return item.get("id")
|
|
# Fallback: if there's exactly one search hit, trust Mealie's ranker
|
|
if len(items) == 1 and items[0].get("id"):
|
|
return items[0]["id"]
|
|
return None
|
|
|
|
# --- private ------------------------------------------------------------
|
|
|
|
# --- canonical food catalog (Mealie is source of truth) ----------------
|
|
|
|
def _load_catalog(self) -> list[dict]:
|
|
"""Pull every food row from Mealie in one big request. The user's
|
|
session token scopes to their group, so this spans every household
|
|
the user can see — fine, we want Sonnet to know all canonical
|
|
names. Cached on the instance after first call.
|
|
|
|
We use the underlying _get directly (not list_foods) so we can
|
|
also pass a page param if a per_page=5000 doesn't return everything
|
|
in one shot."""
|
|
if self._catalog_cache is not None:
|
|
return self._catalog_cache
|
|
out: list[dict] = []
|
|
page = 1
|
|
while page <= 20: # defensive ceiling
|
|
resp = self.mealie._get(
|
|
"/api/foods", search="", perPage=2000, page=page
|
|
)
|
|
items = resp.get("items") or resp.get("data") or []
|
|
for item in items:
|
|
out.append(item)
|
|
tp = resp.get("total_pages") or resp.get("totalPages") or 1
|
|
if not items or page >= tp:
|
|
break
|
|
page += 1
|
|
self._catalog_cache = out
|
|
return out
|
|
|
|
def _catalog_for_prompt(self) -> str:
|
|
"""Render the catalog as a bullet list for the system prompt.
|
|
Cached on the instance so we don't rebuild this per batch."""
|
|
if self._catalog_prompt is not None:
|
|
return self._catalog_prompt
|
|
items = self._load_catalog()
|
|
lines: list[str] = []
|
|
for it in items:
|
|
name = (it.get("name") or "").strip()
|
|
if not name:
|
|
continue
|
|
plural = (it.get("pluralName") or "").strip()
|
|
aliases = it.get("aliases") or []
|
|
# Aliases on Mealie can be a list of strings or a list of
|
|
# {name, foodId} dicts depending on version. Normalize.
|
|
alias_names: list[str] = []
|
|
for a in aliases:
|
|
if isinstance(a, str) and a.strip():
|
|
alias_names.append(a.strip())
|
|
elif isinstance(a, dict):
|
|
n = (a.get("name") or "").strip()
|
|
if n:
|
|
alias_names.append(n)
|
|
line = f" • {name}"
|
|
if plural and plural.lower() != name.lower():
|
|
line += f" (plural: {plural})"
|
|
if alias_names:
|
|
line += f" [aliases: {', '.join(alias_names)}]"
|
|
lines.append(line)
|
|
self._catalog_prompt = "\n".join(lines)
|
|
return self._catalog_prompt
|
|
|
|
def _system_prompt(self) -> str:
|
|
"""Build the full STERILIZE_SYSTEM prompt with the catalog spliced in."""
|
|
return STERILIZE_SYSTEM_TEMPLATE.format(foods=self._catalog_for_prompt())
|
|
|
|
# --- per-batch Sonnet call ---------------------------------------------
|
|
|
|
def _parse_batch(
|
|
self,
|
|
strings: list[str],
|
|
*,
|
|
recipe_context: dict | None = None,
|
|
) -> list[list[IngredientParse]]:
|
|
"""Returns list-of-lists matching the input length. Each inner list
|
|
is the parses derived from one input string (1 in normal case, N
|
|
for fan-out, never 0).
|
|
|
|
recipe_context is optional — when provided, includes recipe_name,
|
|
recipe_description, and recipe_steps so Sonnet can disambiguate
|
|
unclear ingredients via the cooking steps."""
|
|
body: dict = {"ingredients": strings}
|
|
if recipe_context:
|
|
for k in ("recipe_name", "recipe_description", "recipe_steps"):
|
|
v = recipe_context.get(k)
|
|
if v:
|
|
body[k] = v
|
|
prompt = json.dumps(body, ensure_ascii=False)
|
|
try:
|
|
resp = self.forge.run(
|
|
prompt=prompt,
|
|
model=self.model,
|
|
system=self._system_prompt(),
|
|
timeout_secs=180,
|
|
)
|
|
except ForgeError as e:
|
|
raise RuntimeError(f"clawdforge failed: {e}") from e
|
|
|
|
result = resp.get("result")
|
|
if not isinstance(result, dict) or "parses" not in result:
|
|
raise RuntimeError(f"unexpected response shape: {str(result)[:200]}")
|
|
|
|
parses_raw = result["parses"]
|
|
if not isinstance(parses_raw, list) or len(parses_raw) != len(strings):
|
|
raise RuntimeError(
|
|
f"parse count mismatch: got {len(parses_raw)}, expected {len(strings)}"
|
|
)
|
|
|
|
out: list[list[IngredientParse]] = []
|
|
for p in parses_raw:
|
|
# Defensive: accept either a single dict (legacy 1→1 shape) or
|
|
# a list of dicts (fan-out shape). Normalize to list-of-dicts.
|
|
items_raw = p if isinstance(p, list) else [p]
|
|
if not items_raw:
|
|
# Empty list — substitute a fallback all-null item so we
|
|
# never lose track of an input slot
|
|
items_raw = [{"quantity": None, "unit": None, "food": None,
|
|
"note": None, "approx": False}]
|
|
inner: list[IngredientParse] = []
|
|
for it in items_raw:
|
|
if not isinstance(it, dict):
|
|
continue
|
|
inner.append(
|
|
IngredientParse(
|
|
quantity=_coerce_float(it.get("quantity")),
|
|
unit=_clean_str(it.get("unit")),
|
|
food=_clean_str(it.get("food")),
|
|
note=_clean_str(it.get("note")),
|
|
approx=bool(it.get("approx")),
|
|
is_new_food=bool(it.get("is_new_food")),
|
|
)
|
|
)
|
|
if not inner:
|
|
inner.append(IngredientParse(
|
|
quantity=None, unit=None, food=None, note=None, approx=False
|
|
))
|
|
out.append(inner)
|
|
return out
|
|
|
|
@staticmethod
|
|
def _build_name_index(listing: dict) -> dict[str, str]:
|
|
index: dict[str, str] = {}
|
|
items = listing.get("items") or listing.get("data") or []
|
|
for item in items:
|
|
if name := item.get("name"):
|
|
index[name.lower()] = item["id"]
|
|
if plural := item.get("pluralName"):
|
|
index[plural.lower()] = item["id"]
|
|
return index
|
|
|
|
|
|
def _build_recipe_context(recipe: dict) -> dict:
|
|
"""Pull name + description + cooking steps off a Mealie recipe in a
|
|
Sonnet-friendly shape. Capped to ~3000 chars total of step text so
|
|
the user prompt doesn't blow past Sonnet's reasonable input size."""
|
|
out: dict = {}
|
|
name = (recipe.get("name") or "").strip()
|
|
if name:
|
|
out["recipe_name"] = name[:200]
|
|
desc = (recipe.get("description") or "").strip()
|
|
if desc:
|
|
out["recipe_description"] = desc[:600]
|
|
|
|
instructions = recipe.get("recipeInstructions") or []
|
|
steps: list[str] = []
|
|
char_budget = 3000
|
|
for step in instructions:
|
|
if not isinstance(step, dict):
|
|
continue
|
|
text = (step.get("text") or "").strip()
|
|
if not text:
|
|
continue
|
|
if char_budget <= 0:
|
|
break
|
|
# Truncate individual step if needed
|
|
if len(text) > char_budget:
|
|
text = text[:char_budget] + "…"
|
|
steps.append(text)
|
|
char_budget -= len(text)
|
|
if steps:
|
|
out["recipe_steps"] = steps
|
|
return out
|
|
|
|
|
|
def _render_ingredient_for_parse(ing: dict) -> str:
|
|
"""Best string representation of a Mealie ingredient for sending to Claude."""
|
|
if ing.get("originalText"):
|
|
return ing["originalText"]
|
|
if ing.get("display"):
|
|
return ing["display"]
|
|
parts: list[str] = []
|
|
if (q := ing.get("quantity")) is not None:
|
|
parts.append(str(q))
|
|
if u := ing.get("unit"):
|
|
parts.append(u.get("name") or "")
|
|
if f := ing.get("food"):
|
|
parts.append(f.get("name") or "")
|
|
if note := ing.get("note"):
|
|
parts.append(note)
|
|
return " ".join(p for p in parts if p).strip() or "(empty)"
|
|
|
|
|
|
def _coerce_float(v) -> float | None:
|
|
if v is None:
|
|
return None
|
|
try:
|
|
return float(v)
|
|
except (TypeError, ValueError):
|
|
return None
|
|
|
|
|
|
def _clean_str(v) -> str | None:
|
|
if v is None:
|
|
return None
|
|
s = str(v).strip()
|
|
return s or None
|
|
|
|
|
|
def _proposal_to_dict(p: IngredientProposal) -> dict:
|
|
d = asdict(p)
|
|
return d
|