recipe enrichment: per-recipe Sonnet meta for smarter planning

The 'fancy data fun' Cobb wanted: pre-compute structured metadata for
every recipe so the plan generator can match preferences to actual
recipe characteristics, not just match keywords on names.

Sonnet returns per recipe:
  - tags[]: curated descriptors (high-protein, weeknight, one-pan,
    leftovers-good, kid-friendly, etc — picks 3-8 that genuinely apply)
  - cuisine, complexity (easy/medium/involved), estimated_minutes
  - meal_type (breakfast/lunch/dinner/snack/dessert/side/sauce/drink)
  - primary_protein (chicken/beef/pork/fish/seafood/tofu/...)
  - primary_carb (rice/pasta/bread/potato/tortilla/quinoa/...)
  - veg_forward (veg-forward/mixed/meat-forward)
  - comfort_tier (weeknight-easy/hearty-comfort/fancy-occasion/...)
  - season_fit[] + summary one-liner + best_for short phrase

Schema:
- Migration 024: cauldron_recipe_meta keyed by (household_id, recipe_slug),
  meta_json + enrich_version (bumping the version invalidates the cache
  and forces re-walk). One row per Mealie recipe Cobb owns.
- Migration 025: cauldron_enrich_jobs — job runner state. No
  proposals/review needed since metadata is purely additive.

Forge:
- enrich_recipe(recipe) builds a compact prompt with name + description
  + ingredients + steps (capped at 2000 chars total) + yields, asks
  Sonnet for the structured blob. _extract_recipe_meta validates and
  coerces types.

Module enrich_recipes.py:
- Daemon thread runner, walks all household recipes, skips already-
  enriched at current ENRICH_VERSION (idempotent), respects external
  cancel + stuck-job recovery. Skips cross-household recipes (Lake
  Elsinore stuff visible but not enrichable).

Plan generator hookup:
- /api/plan/generate + regenerate now pulls cauldron_recipe_meta and
  splices it into the recipe pool prompt. Each pool line goes from:
      - chicken-stir-fry: Chicken Stir Fry  [asian]
  to:
      - chicken-stir-fry: Chicken Stir Fry  [asian · easy · 30min ·
        protein:chicken · carb:rice · high-protein/weeknight/one-pan]
        quick weeknight stir-fry with leftover-friendly portions
  Sonnet now has rich attributes to actually match a 'high protein
  week' or 'comfort food' or 'quick' preference against, instead of
  guessing from titles.

Endpoints:
- /enrich-recipes UI page (progress bar + start + force re-enrich +
  cancel; no review/approve since meta is additive)
- /api/recipes/enrich-{start,status,cancel} session-authed
- /api/admin/recipes/enrich-start bearer-authed for kayos kick-off

Cost (one-time): ~5s/recipe × 226 = ~20 min walk. Subsequent runs
only process new/changed recipes.
This commit is contained in:
Kayos 2026-04-30 20:08:20 -07:00
parent 820d65171b
commit 10849e0e95
6 changed files with 828 additions and 7 deletions

View file

@ -408,6 +408,52 @@ MIGRATIONS = [
ALTER TABLE cauldron_meal_plans ALTER TABLE cauldron_meal_plans
ADD COLUMN IF NOT EXISTS preference_prompt VARCHAR(1000) ADD COLUMN IF NOT EXISTS preference_prompt VARCHAR(1000)
""", """,
# 024 — Per-recipe AI-generated metadata. Sonnet looks at the full
# recipe (name, description, ingredients, steps, yields) and returns
# a structured blob: tags, cuisine, complexity, estimated_minutes,
# meal_type, primary_protein, primary_carb, veg_forward, comfort_tier,
# season_fit, summary, best_for. Plan generator uses this so "high
# protein week" becomes a real query, not just a vibe-prompt.
# enrich_version lets us bump the prompt and re-enrich without
# losing the prior data.
"""
CREATE TABLE IF NOT EXISTS cauldron_recipe_meta (
household_id BIGINT NOT NULL,
recipe_slug VARCHAR(255) NOT NULL,
meta_json JSON,
enrich_version INT NOT NULL DEFAULT 1,
last_enriched_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (household_id, recipe_slug),
INDEX idx_household (household_id),
FOREIGN KEY (household_id) REFERENCES cauldron_households(id) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
""",
# 025 — Recipe-enrichment bulk job state. Runs through every household
# recipe, calls Sonnet, persists meta. No apply/review step — meta is
# purely additive so we just write it. Same daemon-thread runner +
# cancel + stuck-recovery pattern.
"""
CREATE TABLE IF NOT EXISTS cauldron_enrich_jobs (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
household_id BIGINT NOT NULL,
started_by_sub VARCHAR(190) NOT NULL,
started_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
last_progress_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
finished_at DATETIME,
total_recipes INT NOT NULL DEFAULT 0,
enriched_count INT NOT NULL DEFAULT 0,
skipped_count INT NOT NULL DEFAULT 0,
error_count INT NOT NULL DEFAULT 0,
current_slug VARCHAR(255),
last_error VARCHAR(500),
state ENUM('running','done','failed','cancelled')
NOT NULL DEFAULT 'running',
INDEX idx_household_state (household_id, state),
FOREIGN KEY (household_id) REFERENCES cauldron_households(id) ON DELETE CASCADE,
FOREIGN KEY (started_by_sub) REFERENCES cauldron_users(authentik_sub) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
""",
] ]
@ -1499,6 +1545,142 @@ class DB:
(proposal_id,), (proposal_id,),
) )
# --- recipe enrichment ------------------------------------------------
ENRICH_VERSION = 1
def get_recipe_meta(self, household_id: int, recipe_slug: str) -> dict | None:
with self.conn() as c, c.cursor() as cur:
cur.execute(
"""SELECT recipe_slug, meta_json, enrich_version, last_enriched_at
FROM cauldron_recipe_meta
WHERE household_id=%s AND recipe_slug=%s""",
(household_id, recipe_slug),
)
row = cur.fetchone()
return dict(row) if row else None
def list_recipe_meta_for_household(self, household_id: int) -> list[dict]:
"""Used by the plan generator to splice meta into the recipe pool prompt."""
with self.conn() as c, c.cursor() as cur:
cur.execute(
"""SELECT recipe_slug, meta_json, enrich_version
FROM cauldron_recipe_meta
WHERE household_id=%s""",
(household_id,),
)
return [dict(r) for r in cur.fetchall()]
def upsert_recipe_meta(
self,
*,
household_id: int,
recipe_slug: str,
meta_json: str,
version: int,
) -> None:
with self.conn() as c, c.cursor() as cur:
cur.execute(
"""INSERT INTO cauldron_recipe_meta
(household_id, recipe_slug, meta_json, enrich_version)
VALUES (%s, %s, %s, %s)
ON DUPLICATE KEY UPDATE
meta_json = VALUES(meta_json),
enrich_version = VALUES(enrich_version)""",
(household_id, recipe_slug, meta_json, version),
)
def create_enrich_job(self, *, household_id: int, started_by_sub: str) -> int:
with self.conn() as c, c.cursor() as cur:
cur.execute(
"""INSERT INTO cauldron_enrich_jobs
(household_id, started_by_sub, state)
VALUES (%s, %s, 'running')""",
(household_id, started_by_sub),
)
return cur.lastrowid
def get_enrich_job(self, job_id: int) -> dict | None:
with self.conn() as c, c.cursor() as cur:
cur.execute("SELECT * FROM cauldron_enrich_jobs WHERE id=%s", (job_id,))
return cur.fetchone()
def get_enrich_job_state(self, job_id: int) -> str | None:
with self.conn() as c, c.cursor() as cur:
cur.execute("SELECT state FROM cauldron_enrich_jobs WHERE id=%s", (job_id,))
row = cur.fetchone()
return row["state"] if row else None
def latest_enrich_job_for_household(self, household_id: int) -> dict | None:
with self.conn() as c, c.cursor() as cur:
cur.execute(
"""SELECT * FROM cauldron_enrich_jobs
WHERE household_id=%s ORDER BY started_at DESC LIMIT 1""",
(household_id,),
)
return cur.fetchone()
def running_enrich_job_for_household(self, household_id: int) -> dict | None:
with self.conn() as c, c.cursor() as cur:
cur.execute(
"""SELECT * FROM cauldron_enrich_jobs
WHERE household_id=%s AND state='running'
ORDER BY started_at DESC LIMIT 1""",
(household_id,),
)
return cur.fetchone()
def update_enrich_job_progress(
self,
job_id: int,
*,
enriched_delta: int = 0,
skipped_delta: int = 0,
error_delta: int = 0,
current_slug: str | None = None,
last_error: str | None = None,
) -> None:
with self.conn() as c, c.cursor() as cur:
cur.execute(
"""UPDATE cauldron_enrich_jobs
SET enriched_count = enriched_count + %s,
skipped_count = skipped_count + %s,
error_count = error_count + %s,
current_slug = COALESCE(%s, current_slug),
last_error = COALESCE(%s, last_error),
last_progress_at = NOW()
WHERE id=%s""",
(enriched_delta, skipped_delta, error_delta,
current_slug, last_error, job_id),
)
def finalize_enrich_job(self, job_id: int, *, state: str) -> None:
with self.conn() as c, c.cursor() as cur:
cur.execute(
"""UPDATE cauldron_enrich_jobs
SET state=%s,
finished_at = CASE WHEN %s IN ('done','failed','cancelled')
THEN NOW() ELSE finished_at END,
last_progress_at = NOW(),
current_slug = NULL
WHERE id=%s
AND state NOT IN ('done','failed','cancelled')""",
(state, state, job_id),
)
def fail_stuck_enrich_jobs(self, *, stale_minutes: int = 15) -> int:
with self.conn() as c, c.cursor() as cur:
cur.execute(
"""UPDATE cauldron_enrich_jobs
SET state='failed',
finished_at=NOW(),
last_error=COALESCE(last_error, 'recovery: worker exited mid-run')
WHERE state='running'
AND last_progress_at < NOW() - INTERVAL %s MINUTE""",
(stale_minutes,),
)
return cur.rowcount
def fail_stuck_recipe_dedupe_jobs(self, *, stale_minutes: int = 15) -> int: def fail_stuck_recipe_dedupe_jobs(self, *, stale_minutes: int = 15) -> int:
with self.conn() as c, c.cursor() as cur: with self.conn() as c, c.cursor() as cur:
cur.execute( cur.execute(

179
cauldron/enrich_recipes.py Normal file
View file

@ -0,0 +1,179 @@
"""Recipe metadata enrichment — once per recipe, persist forever.
Walks the user's household recipes, calls forge.enrich_recipe(recipe)
on each one, persists the structured metadata to cauldron_recipe_meta
keyed by (household_id, recipe_slug).
No review/apply step the metadata is purely additive. The plan
generator reads it next time it runs.
Idempotent: skips recipes already enriched at the current
db.DB.ENRICH_VERSION. Bumping the version (when the prompt or schema
changes) forces a re-walk.
Same daemon-thread + cancel + stuck-recovery pattern as the rest.
"""
from __future__ import annotations
import json
import logging
import threading
from .db import DB
from .forge import Forge, ForgeError
from .mealie import Mealie, MealieError
log = logging.getLogger(__name__)
def _household_id_for(mealie: Mealie) -> str | None:
me = mealie.who_am_i()
hid = me.get("householdId") or me.get("household_id")
if not hid:
h = me.get("household")
if isinstance(h, dict):
hid = h.get("id")
return hid
def _recipe_household_id(recipe: dict) -> str | None:
hid = recipe.get("householdId") or recipe.get("household_id")
if hid:
return hid
h = recipe.get("household")
if isinstance(h, dict):
return h.get("id")
return None
def run_enrich(
*,
db: DB,
job_id: int,
household_id: int,
mealie: Mealie,
forge: Forge,
force: bool = False,
) -> None:
"""Walk all recipes in the user's household, enrich each via Sonnet,
persist. Runs in a daemon thread; respects external cancel."""
log.info("[enrich:%s] start (force=%s)", job_id, force)
def _cancelled() -> bool:
s = db.get_enrich_job_state(job_id)
return s in ("cancelled", "failed", "done")
try:
user_household = _household_id_for(mealie)
# Pull every recipe slug from Mealie (paginated)
slugs: list[tuple[str, str]] = []
page = 1
while page <= 50:
resp = mealie.list_recipes(page=page, per_page=100)
items = resp.get("items") or []
for r in items:
slug = r.get("slug")
name = r.get("name") or slug or ""
if slug:
slugs.append((slug, name))
tp = resp.get("total_pages") or resp.get("totalPages") or 1
if not items or page >= tp:
break
page += 1
with db.conn() as c, c.cursor() as cur:
cur.execute(
"UPDATE cauldron_enrich_jobs SET total_recipes=%s WHERE id=%s",
(len(slugs), job_id),
)
for slug, name in slugs:
if _cancelled():
log.info("[enrich:%s] aborted (state changed)", job_id)
return
# Skip cross-household — only enrich what the user owns
try:
recipe = mealie.get_recipe(slug)
except MealieError as e:
msg = str(e)[:500]
log.warning("[enrich:%s] get_recipe(%s): %s", job_id, slug, msg)
db.update_enrich_job_progress(
job_id, error_delta=1, current_slug=slug, last_error=msg
)
continue
if user_household:
rec_hh = _recipe_household_id(recipe)
if rec_hh and rec_hh != user_household:
db.update_enrich_job_progress(
job_id, skipped_delta=1, current_slug=slug
)
continue
# Skip if already enriched at the current version (unless forced)
if not force:
existing = db.get_recipe_meta(household_id, slug)
if existing and existing.get("enrich_version") == db.ENRICH_VERSION:
db.update_enrich_job_progress(
job_id, skipped_delta=1, current_slug=slug
)
continue
db.update_enrich_job_progress(job_id, current_slug=slug)
try:
meta = forge.enrich_recipe(recipe)
except (ForgeError, RuntimeError) as e:
msg = str(e)[:500]
log.warning("[enrich:%s] enrich_recipe(%s): %s", job_id, slug, msg)
db.update_enrich_job_progress(
job_id, error_delta=1, current_slug=slug, last_error=msg
)
continue
try:
db.upsert_recipe_meta(
household_id=household_id,
recipe_slug=slug,
meta_json=json.dumps(meta, ensure_ascii=False),
version=db.ENRICH_VERSION,
)
db.update_enrich_job_progress(job_id, enriched_delta=1)
except Exception as e:
msg = str(e)[:500]
log.warning("[enrich:%s] persist(%s): %s", job_id, slug, msg)
db.update_enrich_job_progress(
job_id, error_delta=1, current_slug=slug, last_error=msg
)
db.finalize_enrich_job(job_id, state="done")
log.info("[enrich:%s] done", job_id)
except Exception:
log.exception("[enrich:%s] crashed", job_id)
try:
db.finalize_enrich_job(job_id, state="failed")
except Exception:
pass
def spawn_thread(
*,
db: DB,
job_id: int,
household_id: int,
mealie: Mealie,
forge: Forge,
force: bool = False,
) -> threading.Thread:
t = threading.Thread(
target=run_enrich,
kwargs={
"db": db, "job_id": job_id, "household_id": household_id,
"mealie": mealie, "forge": forge, "force": force,
},
name=f"enrich-recipes-{job_id}",
daemon=True,
)
t.start()
return t

View file

@ -169,9 +169,11 @@ class Forge:
slug = r.get("slug") or "" slug = r.get("slug") or ""
name = r.get("name") or slug name = r.get("name") or slug
tags = r.get("tags") or [] tags = r.get("tags") or []
tag_str = "" meta = r.get("meta") or {}
extras: list[str] = []
# First 3 Mealie tags
if tags: if tags:
# First 3 tags only — keeps prompt token count under control
cleaned = [] cleaned = []
for t in tags[:3]: for t in tags[:3]:
if isinstance(t, dict): if isinstance(t, dict):
@ -180,8 +182,32 @@ class Forge:
cleaned.append(t) cleaned.append(t)
cleaned = [c for c in cleaned if c] cleaned = [c for c in cleaned if c]
if cleaned: if cleaned:
tag_str = f" [{', '.join(cleaned)}]" extras.append(", ".join(cleaned))
pool_lines.append(f"- {slug}: {name}{tag_str}") # Sonnet-generated meta — the actual high-signal stuff
if meta:
if meta.get("cuisine") and meta["cuisine"] not in ("unknown", "other"):
extras.append(meta["cuisine"])
if meta.get("complexity"):
extras.append(meta["complexity"])
em = meta.get("estimated_minutes")
if isinstance(em, int) and em > 0:
extras.append(f"{em}min")
if meta.get("primary_protein") and meta["primary_protein"] != "none":
extras.append(f"protein:{meta['primary_protein']}")
if meta.get("primary_carb") and meta["primary_carb"] != "none":
extras.append(f"carb:{meta['primary_carb']}")
if meta.get("veg_forward") and meta["veg_forward"] != "mixed":
extras.append(meta["veg_forward"])
meta_tags = meta.get("tags") or []
if meta_tags:
extras.append("/".join(meta_tags[:5]))
if meta.get("summary"):
# Inline 1-line summary helps Sonnet match preferences
summary = str(meta["summary"])[:140]
pool_lines.append(f"- {slug}: {name} [{' · '.join(extras)}]\n {summary}")
continue
extra_str = f" [{' · '.join(extras)}]" if extras else ""
pool_lines.append(f"- {slug}: {name}{extra_str}")
pick_lines = [] pick_lines = []
for p in picks: for p in picks:
@ -354,6 +380,113 @@ class Forge:
result = self.run(prompt, model=model or "sonnet", timeout_secs=60) result = self.run(prompt, model=model or "sonnet", timeout_secs=60)
return _extract_cluster_decision(result) return _extract_cluster_decision(result)
def enrich_recipe(self, recipe: dict, *, model: str | None = None) -> dict:
"""Generate structured metadata for a recipe so the plan generator
can match preferences to actual recipe characteristics, not just
names.
Input: a Mealie recipe dict (uses name + description + ingredients
+ instructions + yields + recipeYield).
Output (validated):
{
"tags": [<curated descriptor strings>],
# e.g. "high-protein", "weeknight", "one-pan",
# "kid-friendly", "leftovers-good", "freezer-friendly"
"cuisine": "<american|italian|asian|mexican|...|other|unknown>",
"complexity": "easy|medium|involved",
"estimated_minutes": <int>,
"meal_type": "breakfast|lunch|dinner|snack|dessert|side",
"primary_protein": "<chicken|beef|pork|fish|tofu|beans|eggs|none|mixed>",
"primary_carb": "<rice|pasta|bread|potato|tortilla|quinoa|none|mixed>",
"veg_forward": "veg-forward|mixed|meat-forward",
"comfort_tier": "<weeknight-easy|comfort|fancy|kid-friendly|...>",
"season_fit": [<season strings>],
"summary": "<one-line vibe>",
"best_for": "<short phrase about when this is the right pick>"
}
Cheap call, idempotent run once per recipe and cache forever
(or until enrich_version bumps)."""
# Build a compact recipe summary for the prompt
ings = recipe.get("recipeIngredient") or []
ing_lines: list[str] = []
for i in ings[:30]:
food = (i.get("food") or {}).get("name") if isinstance(i.get("food"), dict) else None
qty = i.get("quantity")
unit = (i.get("unit") or {}).get("name") if isinstance(i.get("unit"), dict) else None
note = i.get("note") or ""
line = ""
if qty not in (None, ""):
line += f"{qty} "
if unit:
line += f"{unit} "
if food:
line += food
elif note:
line += note
if line.strip():
ing_lines.append(line.strip())
instructions = recipe.get("recipeInstructions") or []
steps: list[str] = []
char_budget = 2000
for step in instructions:
if not isinstance(step, dict):
continue
text = (step.get("text") or "").strip()
if not text or char_budget <= 0:
continue
if len(text) > char_budget:
text = text[:char_budget] + ""
steps.append(text)
char_budget -= len(text)
prompt = (
"Given the following recipe, return structured metadata to help "
"an AI meal planner pick recipes that match user preferences "
"('high protein week', 'carb load', 'light recovery', etc).\n\n"
f"NAME: {recipe.get('name') or '(unnamed)'}\n"
f"DESCRIPTION: {(recipe.get('description') or '').strip()[:400]}\n"
f"YIELDS: {(recipe.get('recipeYield') or '').strip()[:80]}\n"
f"INGREDIENTS:\n - " + "\n - ".join(ing_lines or ['(none listed)']) + "\n"
f"STEPS:\n - " + "\n - ".join(steps or ['(none listed)']) + "\n\n"
"Output JSON ONLY, no prose:\n"
"{\n"
' "tags": [<curated descriptor strings — pick 3-8 from these or invent close variants: '
'"high-protein","low-carb","high-carb","low-fat","high-fiber",'
'"vegetarian","vegan","gluten-free","dairy-free","keto","paleo",'
'"weeknight","weekend","one-pan","one-pot","sheet-pan","slow-cooker","instant-pot",'
'"freezer-friendly","leftovers-good","kid-friendly","spicy","mild",'
'"hearty","light","fresh","comfort","fancy","quick","make-ahead">],\n'
' "cuisine": "<american|italian|asian|mexican|mediterranean|indian|french|middle-eastern|other|unknown>",\n'
' "complexity": "<easy|medium|involved>",\n'
' "estimated_minutes": <int total time including prep>,\n'
' "meal_type": "<breakfast|lunch|dinner|snack|dessert|side|sauce|drink>",\n'
' "primary_protein": "<chicken|beef|pork|fish|seafood|tofu|tempeh|beans|eggs|cheese|nuts|none|mixed>",\n'
' "primary_carb": "<rice|pasta|bread|potato|tortilla|quinoa|noodles|grain|none|mixed>",\n'
' "veg_forward": "<veg-forward|mixed|meat-forward>",\n'
' "comfort_tier": "<weeknight-easy|hearty-comfort|fancy-occasion|kid-friendly|date-night|crowd-pleaser>",\n'
' "season_fit": [<one or more of "spring","summer","fall","winter","year-round">],\n'
' "summary": "<one-line vibe — what KIND of meal is this>",\n'
' "best_for": "<short phrase: when is this the right pick>"\n'
"}\n\n"
"Rules:\n"
"- Return ONLY the JSON object, no markdown fences, no prose.\n"
"- Be concrete: 'high-protein' goes in tags ONLY if the recipe genuinely "
"qualifies (significant meat/eggs/dairy/protein source per serving).\n"
"- estimated_minutes: best guess from prep + cook implied by steps. Dishes "
"needing rise/marinade time count that time.\n"
"- complexity: 'easy' = ≤30 min + ≤7 ingredients + simple technique; "
"'medium' = 30-90 min OR moderate technique; 'involved' = >90 min OR "
"advanced technique (lamination, fermentation, multi-component).\n"
"- summary should describe the vibe / use-case, not just restate the name. "
"e.g. 'quick weeknight stir-fry with leftover-friendly portions' beats "
"'chicken stir fry with rice'.\n"
"- When uncertain on a categorical, use 'unknown' or 'other' rather than guessing."
)
result = self.run(prompt, model=model or "sonnet", timeout_secs=90)
return _extract_recipe_meta(result)
def fetch_food_info(self, name: str, *, model: str | None = None) -> dict: def fetch_food_info(self, name: str, *, model: str | None = None) -> dict:
"""Ask Sonnet for density + unit class + common size of a single """Ask Sonnet for density + unit class + common size of a single
food. Returns a dict shaped like: food. Returns a dict shaped like:
@ -390,6 +523,54 @@ class Forge:
return _extract_food_info(result) return _extract_food_info(result)
def _extract_recipe_meta(forge_result: dict) -> dict:
"""Validate the recipe metadata blob from Sonnet. Coerces types,
normalizes enums to lowercase, drops fields not in the schema."""
if not isinstance(forge_result, dict):
raise ForgeError("forge result not a dict")
inner = forge_result.get("result", forge_result)
if isinstance(inner, str):
inner = _parse_json_blob(inner)
if not isinstance(inner, dict):
raise ForgeError(f"recipe meta not a dict: {str(inner)[:200]}")
def _str(v, default=""):
return str(v).strip().lower()[:64] if isinstance(v, str) and v.strip() else default
def _str_long(v, default=""):
return str(v).strip()[:300] if isinstance(v, str) and v.strip() else default
def _str_list(v) -> list[str]:
if not isinstance(v, list):
return []
out = []
for item in v:
if isinstance(item, str) and item.strip():
out.append(item.strip().lower()[:48])
return out[:12]
def _int(v, default=0):
try:
return max(0, int(v))
except (TypeError, ValueError):
return default
return {
"tags": _str_list(inner.get("tags")),
"cuisine": _str(inner.get("cuisine"), "unknown"),
"complexity": _str(inner.get("complexity"), "medium"),
"estimated_minutes": _int(inner.get("estimated_minutes")),
"meal_type": _str(inner.get("meal_type"), "dinner"),
"primary_protein": _str(inner.get("primary_protein"), "none"),
"primary_carb": _str(inner.get("primary_carb"), "none"),
"veg_forward": _str(inner.get("veg_forward"), "mixed"),
"comfort_tier": _str(inner.get("comfort_tier"), "weeknight-easy"),
"season_fit": _str_list(inner.get("season_fit")) or ["year-round"],
"summary": _str_long(inner.get("summary")),
"best_for": _str_long(inner.get("best_for")),
}
def _extract_recipe_dedupe_decision(forge_result: dict) -> dict: def _extract_recipe_dedupe_decision(forge_result: dict) -> dict:
if not isinstance(forge_result, dict): if not isinstance(forge_result, dict):
raise ForgeError("forge result not a dict") raise ForgeError("forge result not a dict")

View file

@ -33,7 +33,7 @@ from .config import load
from .crypto import TokenCrypto from .crypto import TokenCrypto
from .db import DB from .db import DB
from .forge import Forge, ForgeError from .forge import Forge, ForgeError
from . import aggregator, bulk_sterilize, consolidate_foods, dedupe_recipes, foods from . import aggregator, bulk_sterilize, consolidate_foods, dedupe_recipes, enrich_recipes, foods
from .mealie import Mealie, MealieError from .mealie import Mealie, MealieError
from .oidc import init_oauth from .oidc import init_oauth
from .recipe_index import flatten_recipe, refresh_household_index, search_index from .recipe_index import flatten_recipe, refresh_household_index, search_index
@ -125,6 +125,13 @@ def create_app() -> Flask:
except Exception as e: except Exception as e:
app.logger.warning("recipe-dedupe stuck-job recovery failed: %s", e) app.logger.warning("recipe-dedupe stuck-job recovery failed: %s", e)
try:
n_failed = db.fail_stuck_enrich_jobs(stale_minutes=15)
if n_failed:
app.logger.info("failed %d stuck enrich jobs at boot", n_failed)
except Exception as e:
app.logger.warning("enrich stuck-job recovery failed: %s", e)
oauth = init_oauth( oauth = init_oauth(
app, app,
issuer=cfg.oidc_issuer, issuer=cfg.oidc_issuer,
@ -662,9 +669,23 @@ def create_app() -> Flask:
db.set_plan_preference(plan["id"], preference) db.set_plan_preference(plan["id"], preference)
plan["preference_prompt"] = preference[:1000] plan["preference_prompt"] = preference[:1000]
# Pull picks (with picker_subs) + recipe pool (slug+name+tags only) # Pull picks + recipe pool. The pool now splices in cauldron_recipe_meta
# (Sonnet-generated per-recipe attributes — cuisine, complexity, macros,
# meal_type, primary_protein/carb, comfort_tier, summary) so the planner
# can match preferences to actual recipe characteristics, not just names.
picks = db.list_household_picks_with_pickers(hid) picks = db.list_household_picks_with_pickers(hid)
rows = db.list_indexed_recipes(hid, limit=2000, offset=0) rows = db.list_indexed_recipes(hid, limit=2000, offset=0)
meta_rows = db.list_recipe_meta_for_household(hid)
meta_by_slug: dict[str, dict] = {}
for mr in meta_rows:
blob = mr.get("meta_json")
if isinstance(blob, str):
try:
meta_by_slug[mr["recipe_slug"]] = _json_loads(blob)
except Exception:
pass
elif isinstance(blob, dict):
meta_by_slug[mr["recipe_slug"]] = blob
recipes = [] recipes = []
for r in rows: for r in rows:
tags = [] tags = []
@ -676,7 +697,11 @@ def create_app() -> Flask:
raw = None raw = None
if isinstance(raw, dict): if isinstance(raw, dict):
tags = raw.get("tags") or [] tags = raw.get("tags") or []
recipes.append({"slug": r["slug"], "name": r["name"], "tags": tags}) entry = {"slug": r["slug"], "name": r["name"], "tags": tags}
m = meta_by_slug.get(r["slug"])
if m:
entry["meta"] = m
recipes.append(entry)
if not recipes: if not recipes:
return jsonify({"error": "no_recipes_indexed"}), 409 return jsonify({"error": "no_recipes_indexed"}), 409
@ -1049,6 +1074,99 @@ def create_app() -> Flask:
db.finalize_sterilize_job(job_id, state="cancelled") db.finalize_sterilize_job(job_id, state="cancelled")
return jsonify({"ok": True}) return jsonify({"ok": True})
# ---------- recipe metadata enrichment -----------------------------
@app.get("/enrich-recipes")
@require_session
def enrich_recipes_page():
hid = current_household_id()
if not hid:
return redirect(url_for("connect_mealie_get"))
latest = db.latest_enrich_job_for_household(hid)
existing_count = len(db.list_recipe_meta_for_household(hid))
return render_template(
"enrich_recipes.html",
active="enrich",
latest_job=latest,
existing_count=existing_count,
)
@app.post("/api/recipes/enrich-start")
@require_session
def enrich_recipes_start():
u = session["user"]
hid = current_household_id()
if not hid:
return jsonify({"error": "no household"}), 409
active = db.running_enrich_job_for_household(hid)
if active:
return jsonify({"error": "already_running", "job_id": active["id"]}), 409
client = current_user_mealie()
if client is None:
return redirect(url_for("connect_mealie_get"))
body = request.get_json(silent=True) or {}
force = bool(body.get("force"))
job_id = db.create_enrich_job(household_id=hid, started_by_sub=u["sub"])
enrich_recipes.spawn_thread(
db=db, job_id=job_id, household_id=hid,
mealie=client, forge=forge, force=force,
)
return jsonify({"ok": True, "job_id": job_id})
@app.get("/api/recipes/enrich-status")
@require_session
def enrich_recipes_status():
hid = current_household_id()
if not hid:
return jsonify({"error": "no household"}), 409
job = db.latest_enrich_job_for_household(hid)
if not job:
return jsonify({"job": None})
return jsonify({"job": _consolidate_job_payload(job)})
@app.post("/api/recipes/enrich-cancel/<int:job_id>")
@require_session
def enrich_recipes_cancel(job_id: int):
hid = current_household_id()
if not hid:
return jsonify({"error": "no household"}), 409
job = db.get_enrich_job(job_id)
if not job or job["household_id"] != hid:
return jsonify({"error": "not_found"}), 404
if job["state"] != "running":
return jsonify({"error": f"bad_state:{job['state']}"}), 409
db.finalize_enrich_job(job_id, state="cancelled")
return jsonify({"ok": True})
@app.post("/api/admin/recipes/enrich-start")
@require_bearer
def admin_enrich_recipes_start():
body = request.get_json(silent=True) or {}
sub = (body.get("started_by_sub") or "").strip()
if not sub:
return jsonify({"error": "started_by_sub required"}), 400
hid = db.get_user_household_id(sub)
if not hid:
return jsonify({"error": "user has no household"}), 404
active = db.running_enrich_job_for_household(hid)
if active:
return jsonify({"error": "already_running", "job_id": active["id"]}), 409
blob = db.get_user_mealie_token_blob(sub)
if not blob:
return jsonify({"error": "user_not_connected_to_mealie"}), 409
try:
tok = crypto.decrypt(blob)
except Exception:
return jsonify({"error": "user_token_undecryptable"}), 500
mealie = Mealie(base_url=cfg.mealie_api_url, api_token=tok)
force = bool(body.get("force"))
job_id = db.create_enrich_job(household_id=hid, started_by_sub=sub)
enrich_recipes.spawn_thread(
db=db, job_id=job_id, household_id=hid,
mealie=mealie, forge=forge, force=force,
)
return jsonify({"ok": True, "job_id": job_id})
# ---------- recipe dedupe ------------------------------------------ # ---------- recipe dedupe ------------------------------------------
@app.get("/dedupe-recipes") @app.get("/dedupe-recipes")

View file

@ -0,0 +1,158 @@
{% extends "_base.html" %}
{% block title %}Enrich Recipes · Cauldron{% endblock %}
{% block content %}
<style>
.progress-rail { width:100%; height:14px; background:var(--bg-2);
border:1px solid var(--line); border-radius:8px; overflow:hidden;
margin:12px 0 6px 0; }
.progress-fill { height:100%;
background:linear-gradient(90deg, var(--purple-deep), var(--purple-bright));
transition:width .3s ease; box-shadow:0 0 12px -2px var(--purple-glow); }
.progress-meta { color:var(--bone-dim); font-family:var(--mono); font-size:12px;
letter-spacing:.1em; display:flex; gap:18px; flex-wrap:wrap; }
.progress-meta strong { color:var(--bone); }
.stat-card { padding:14px; background:var(--bg-2); border:1px solid var(--line);
border-radius:8px; margin-bottom:14px; }
.stat-card .big { font-family:var(--serif); font-size:1.5em; color:var(--bone); }
.stat-card .lbl { color:var(--muted); font-family:var(--mono); font-size:11px;
letter-spacing:.15em; text-transform:uppercase; }
</style>
<div class="page-head">
<div class="crumb">// enrich · per-recipe metadata for smarter planning</div>
<h1>recipe <span class="accent">enrich</span></h1>
<div class="lede">
walk every household recipe and have sonnet generate structured metadata —
cuisine, complexity, macros, meal type, primary protein + carb,
comfort tier, one-line summary. the plan generator uses this so
"high protein week" actually filters the pool, not just biases the vibe.
</div>
</div>
<section class="panel">
<div class="panel-head">
<h2>state</h2>
<span class="pill" id="state-pill">loading…</span>
<span class="ctx" id="state-ctx"></span>
</div>
<div class="stat-card">
<div class="big" id="existing-count">{{ existing_count }}</div>
<div class="lbl">recipes already enriched in this household</div>
</div>
<div id="empty-pane" style="display:none;">
<p>kick off an enrichment run? walks every recipe and skips any already
enriched at the current schema version.</p>
<button class="btn btn-purple" id="start-btn" type="button" onclick="startRun(false)">🪄 enrich recipes</button>
<button class="btn" type="button" onclick="startRun(true)" title="ignore enrich_version cache and re-run all">↻ force re-enrich all</button>
<p class="muted" style="margin-top:8px;">
cost: ~5s/recipe via clawdforge. ~226 recipes ≈ 20 min the first time.
after that, only newly-imported / unenriched recipes process.
</p>
</div>
<div id="progress-pane" style="display:none;">
<div class="progress-rail"><div class="progress-fill" id="bar" style="width:0%;"></div></div>
<div class="progress-meta">
<span><strong id="enriched">0</strong> enriched</span>
<span><strong id="skipped">0</strong> already-enriched</span>
<span><strong id="errors">0</strong> errors</span>
<span>of <strong id="total">?</strong></span>
<span class="muted" id="current-slug"></span>
</div>
<div class="btn-row" style="margin-top:12px;">
<button class="btn" type="button" onclick="cancelJob()">cancel</button>
</div>
</div>
<div id="done-pane" style="display:none;">
<p id="done-line"></p>
<button class="btn btn-purple" type="button" onclick="startRun(false)">↻ enrich newly-imported</button>
<button class="btn" type="button" onclick="startRun(true)">↻ force re-enrich all</button>
</div>
<div id="failed-pane" style="display:none;">
<p style="color:var(--crit);" id="failed-line"></p>
<button class="btn btn-purple" type="button" onclick="startRun(false)">↻ retry</button>
</div>
</section>
<script>
let job = {{ (latest_job | tojson) if latest_job else 'null' }};
let pollTimer = null;
function $(id){return document.getElementById(id);}
function showPane(name){
for(const p of ['empty','progress','done','failed']){
$(`${p}-pane`).style.display = (p === name) ? '' : 'none';
}
}
function setStatePill(text, klass){
const el = $('state-pill'); el.textContent = text;
el.className = 'pill ' + (klass || 'pill-mute');
}
function paint(){
if(!job) return;
const total = job.total_recipes || 0;
const done = (job.enriched_count || 0) + (job.skipped_count || 0) + (job.error_count || 0);
const pct = total>0 ? Math.round((done/total)*100) : 0;
$('bar').style.width = pct+'%';
$('enriched').textContent = job.enriched_count || 0;
$('skipped').textContent = job.skipped_count || 0;
$('errors').textContent = job.error_count || 0;
$('total').textContent = total || '?';
$('current-slug').textContent = job.current_slug ? `· ${job.current_slug}` : '';
}
async function fetchJob(){
try {
const r = await fetch('/api/recipes/enrich-status');
const d = await r.json();
job = d.job || null; route();
} catch(e){ console.error('status poll failed', e); }
}
function route(){
if(!job){ stopPoll(); setStatePill('idle','pill-mute'); $('state-ctx').textContent=''; showPane('empty'); return; }
$('state-ctx').textContent = `started ${new Date(job.started_at).toLocaleString()}`;
const s = job.state;
if(s === 'running'){ setStatePill('walking','pill-ok'); paint(); showPane('progress'); startPoll(); }
else if(s === 'done'){
setStatePill('done','pill-mute');
const e = job.enriched_count || 0;
const sk = job.skipped_count || 0;
$('done-line').textContent = `enriched ${e} recipe${e===1?'':'s'} · ${sk} already-current.`;
showPane('done'); stopPoll();
}
else if(s === 'failed'){ setStatePill('failed','pill-mute'); $('failed-line').textContent = job.last_error || 'job failed'; showPane('failed'); stopPoll(); }
else if(s === 'cancelled'){ setStatePill('cancelled','pill-mute'); $('done-line').textContent='job cancelled.'; showPane('done'); stopPoll(); }
}
function startPoll(){ if(pollTimer) return; pollTimer = setInterval(fetchJob, 2000); }
function stopPoll(){ if(pollTimer){ clearInterval(pollTimer); pollTimer=null; } }
async function startRun(force){
const btn = $('start-btn');
if(btn){ btn.disabled = true; btn.textContent = 'kicking off…'; }
try {
const r = await fetch('/api/recipes/enrich-start',{
method:'POST', headers:{'Content-Type':'application/json'},
body: JSON.stringify({force: !!force}),
});
if(!r.ok){ const j = await r.json().catch(()=>({})); throw new Error(j.error || r.status); }
await fetchJob();
} catch(e){
alert('start failed: ' + e.message);
if(btn){ btn.disabled = false; btn.textContent = '🪄 enrich recipes'; }
}
}
async function cancelJob(){
if(!job) return;
if(!confirm('cancel?')) return;
try { await fetch('/api/recipes/enrich-cancel/'+job.id,{method:'POST'}); await fetchJob(); }
catch(e){ alert('cancel failed: '+e.message); }
}
route();
if(job && job.state === 'running') startPoll();
</script>
{% endblock %}

View file

@ -62,6 +62,9 @@
<p class="muted" style="margin-top:14px;">find duplicate recipes by name + ingredient similarity. sonnet picks the canonical to keep; you confirm per cluster before mealie deletes the others. permanent — review carefully.</p> <p class="muted" style="margin-top:14px;">find duplicate recipes by name + ingredient similarity. sonnet picks the canonical to keep; you confirm per cluster before mealie deletes the others. permanent — review carefully.</p>
<p><a class="btn" href="/dedupe-recipes">🌀 dedupe recipes →</a></p> <p><a class="btn" href="/dedupe-recipes">🌀 dedupe recipes →</a></p>
<p class="muted" style="margin-top:14px;">have sonnet generate per-recipe metadata — cuisine, complexity, macros, primary protein/carb, comfort tier, summary. the plan generator reads this so "high protein week" is a real query, not just a vibe.</p>
<p><a class="btn" href="/enrich-recipes">✨ enrich recipes →</a></p>
</section> </section>
{% endif %} {% endif %}