audit: fix all critical + high findings before dogfood

Six audit-driven fixes from 2026-05-02 punch list at memory/cauldron-codebase-audit.md.

CRITICAL
- F-1 routes: SSRF guard on /api/discover/scrape-start. Every URL is
  validated via discover_recipes.is_public_url() — parses host, rejects
  IP literals in private/loopback/link-local/multicast/reserved ranges,
  resolves hostnames via getaddrinfo and rejects if any A/AAAA is private.
  Defense-in-depth: _scrape_one re-validates before fetch in case any
  future caller bypasses the route. Rejected URLs are returned in the
  response payload so the user knows which were skipped.
- F-6 domain: prompt-injection mitigation on enrich_recipe + verify_allergens.
  New apply_allergen_safety_override() in forge.py runs regex pattern-
  matching against the raw ingredient text for the SIX anaphylaxis-class
  allergens (peanuts, nuts, shellfish, fish, eggs, sesame, dairy). On
  match, force contains.<allergen>=TRUE regardless of Sonnet output. False
  positives are recoverable; undetected anaphylaxis is not. Pork/soy/
  gluten not auto-overridden (religious/dietary or too-common).

HIGH
- F-2 routes: /api/discover/reject swapped from global status flip to
  per-household scope. New migration 039 cauldron_discover_skips
  (discover_id, household_id, skipped_by_sub, skipped_at) join table.
  list_discovered_recipes default view filters out caller-household
  skips; ?status=skipped surfaces them for unskip. Different households
  have different tastes.
- F-3a routes: /login?next= same-origin validation. Reject anything that
  doesn't start with `/`, AND reject `//evil.example` protocol-relative
  redirects. One-line fix.
- F-10 domain: Sterilizer.apply_recipe ingredient-count guard. Refuse to
  apply if Mealie's current recipeIngredient length differs from the
  preview's proposals length. Python's zip would silently truncate;
  user edits made during the 60-300s Sonnet window now raise
  RuntimeError instead of getting clobbered. Bulk runner already catches
  RuntimeError per-recipe, marks proposal stale.
- F-15 domain: aggregator qty=None safety net. Ingredients with no
  quantity now go to a separate no_qty_items list instead of being
  silently coerced to 0.0 (which then failed the `any(qty for ...)`
  truthiness check and dropped the food off the shopping list). If no
  other line was emitted, write a "qty unspecified" placeholder so the
  food APPEARS on the list. If a sized line WAS emitted, append a
  "+ N ingredient(s) with no quantity" note.

ALSO (one-liners called out in the punch list)
- Migration 029 DROP INDEX gets IF EXISTS — prevents boot-brick on
  partial-failure retry.
- Flavor B prefix prompt rule — Sonnet now told to keep `lib:`/`disc:`
  prefix verbatim; prevents intermittent 502s on the panel just shipped.
- list_discover_eligible_for_group switched from LEFT JOIN to NOT EXISTS
  subqueries — fixes F-5 data (LIMIT-shrink from cross-group import
  multiplication) and adds the per-household skip filter cleanly.

All edits AST-verified. Allergen regex tested with peanut/fish/clean
inputs — flips correctly, preserves Sonnet TRUEs, no over-broad coverage.

Mediums + lows from the audit are tracked in
memory/cauldron-codebase-audit.md and deferred until Cobb hits them
during dogfood.
This commit is contained in:
Kayos 2026-05-02 12:43:04 -07:00
parent 8752fcd340
commit 1c943ec2d8
6 changed files with 422 additions and 46 deletions

View file

@ -187,11 +187,20 @@ def _aggregate_one_food(
meta: dict,
) -> list[ShoppingLine]:
"""All ingredients for ONE food → 1+ ShoppingLines."""
# Bucket by unit class
# Bucket by unit class. Ingredients with qty=None go to a separate
# `no_qty` bucket so they DON'T silently disappear from the shopping
# list when Mealie's parser couldn't extract a number (audit F-15
# domain, 2026-05-02). The killer feature should surface "buy onion"
# even if the source recipe just said "1 onion, chopped" without a
# parseable quantity.
buckets: dict[str, list[tuple[Ingredient, float]]] = {
"mass": [], "volume": [], "count": [], "vague": [], "unknown": [],
}
no_qty_items: list[Ingredient] = []
for ing in items:
if ing.qty is None and (ing.unit or "").strip() == "":
no_qty_items.append(ing)
continue
cls = classify_unit(ing.unit)
buckets[cls].append((ing, ing.qty if ing.qty is not None else 0.0))
@ -202,6 +211,12 @@ def _aggregate_one_food(
for i in items
if (i.original_text or i.qty is not None or i.note)
]
# Pull no-qty original-text contributors into the contribs list so
# they appear under whatever line we emit (or the standalone fallback)
for ing in no_qty_items:
text = ing.original_text or _render(ing)
if text and text not in contribs:
contribs.append(text)
density = float(meta.get("density_g_per_ml") or 0) or None
@ -272,6 +287,25 @@ def _aggregate_one_food(
is_split=True,
))
# qty=None safety net: if every contributor was a no-qty ingredient
# (Mealie's parser couldn't extract a number), nothing else above
# produced a line. Emit a placeholder so the food APPEARS on the
# shopping list — Abby still needs to know to buy onions even if the
# recipe just said "1 onion, chopped". UI surfaces this as a
# "qty unspecified" hint, nudging Cobb to run sterilize.
if no_qty_items and not lines:
lines.append(ShoppingLine(
food=food, qty=None, unit="ea",
contributors=contribs,
notes=notes_acc + ["quantity unspecified — re-sterilize for an exact total"],
))
elif no_qty_items and lines:
# We DID emit a sized line — still flag that some contributors
# had unknown qty so the user knows the total may be incomplete.
lines[0].notes.append(
f"+ {len(no_qty_items)} ingredient(s) with no quantity"
)
return lines

View file

@ -481,10 +481,12 @@ MIGRATIONS = [
""",
# 029 — Old unique key was (plan_id, day) — now needs to include
# meal_type so a Monday can have breakfast AND lunch AND dinner.
# Drop-and-add, idempotent: catch the "doesn't exist" if already done.
# Drop-and-add, idempotent: IF EXISTS so a partial-failure retry
# doesn't brick boot if the index was already dropped before
# schema_migrations recorded the version.
"""
ALTER TABLE cauldron_meal_plan_slots
DROP INDEX uk_plan_day
DROP INDEX IF EXISTS uk_plan_day
""",
"""
ALTER TABLE cauldron_meal_plan_slots
@ -604,6 +606,25 @@ MIGRATIONS = [
FOREIGN KEY (household_id) REFERENCES cauldron_households(id) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
""",
# 039 — Per-household discover skips. Replaces the global
# `cauldron_discovered_recipes.status='rejected'` write — that flipped
# the row for EVERY household in EVERY group (audit finding F-2 routes,
# 2026-05-02). Different households have different tastes; skip is
# per-household. The global status column stays for spam URLs an admin
# wants to nuke for everyone (set via a future bearer-only endpoint);
# routine "not interested" goes here.
"""
CREATE TABLE IF NOT EXISTS cauldron_discover_skips (
discover_id BIGINT NOT NULL,
household_id BIGINT NOT NULL,
skipped_by_sub VARCHAR(190),
skipped_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (discover_id, household_id),
INDEX idx_household (household_id),
FOREIGN KEY (discover_id) REFERENCES cauldron_discovered_recipes(id) ON DELETE CASCADE,
FOREIGN KEY (household_id) REFERENCES cauldron_households(id) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
""",
]
@ -2338,44 +2359,95 @@ class DB:
row["imported_at"] = row["imported_at"].isoformat()
return row
def record_discover_skip(
self,
*,
discover_id: int,
household_id: int,
skipped_by_sub: str | None,
) -> None:
"""Per-household 'skip from discover' — INSERT IGNORE so re-clicking
is a no-op. PK is (discover_id, household_id). Only the caller's
household is affected; other households still see the row."""
with self.conn() as c, c.cursor() as cur:
cur.execute(
"""INSERT IGNORE INTO cauldron_discover_skips
(discover_id, household_id, skipped_by_sub)
VALUES (%s, %s, %s)""",
(discover_id, household_id, skipped_by_sub),
)
def unskip_discover(
self, *, discover_id: int, household_id: int
) -> int:
"""Undo a per-household skip. Returns rowcount."""
with self.conn() as c, c.cursor() as cur:
cur.execute(
"""DELETE FROM cauldron_discover_skips
WHERE discover_id=%s AND household_id=%s""",
(discover_id, household_id),
)
return cur.rowcount
def get_skipped_discover_ids_for_household(
self, *, household_id: int
) -> set[int]:
"""Used by /api/discover/search to filter out the caller's
household's skipped rows from the default view."""
with self.conn() as c, c.cursor() as cur:
cur.execute(
"""SELECT discover_id FROM cauldron_discover_skips
WHERE household_id=%s""",
(household_id,),
)
return {r["discover_id"] for r in (cur.fetchall() or [])}
def list_discover_eligible_for_group(
self, *, mealie_group_id: str | None, limit: int = 100
self,
*,
mealie_group_id: str | None,
skipping_household_id: int | None = None,
limit: int = 100,
) -> list[dict]:
"""Return enriched discover rows that no household in the given Mealie
group has imported yet. Used by Flavor B's pool builder — these are
the 'new gems' Hecate can suggest alongside library recipes.
group has imported yet and that the caller's household has not
skipped. Used by Flavor B's pool builder.
Switched 2026-05-02 from LEFT JOIN to NOT EXISTS subqueries to fix
F-5 data (LIMIT-shrink from cross-group import multiplication) and
to add the per-household skip filter cleanly.
If `mealie_group_id` is None we still return the unimported global
list (no group context yet first-boot edge case)."""
clauses = ["d.status = 'enriched'"]
args: list = []
if mealie_group_id:
clauses.append(
"NOT EXISTS (SELECT 1 FROM cauldron_discover_imports i "
"JOIN cauldron_households h ON h.id = i.household_id "
"WHERE i.discover_id = d.id AND h.mealie_group_id = %s)"
)
args.append(mealie_group_id)
else:
clauses.append(
"NOT EXISTS (SELECT 1 FROM cauldron_discover_imports i "
"WHERE i.discover_id = d.id)"
)
if skipping_household_id is not None:
clauses.append(
"NOT EXISTS (SELECT 1 FROM cauldron_discover_skips s "
"WHERE s.discover_id = d.id AND s.household_id = %s)"
)
args.append(int(skipping_household_id))
sql = (
"SELECT d.id, d.slug, d.source_url, d.name, d.description, "
"d.image_url, d.meta_json FROM cauldron_discovered_recipes d "
f"WHERE {' AND '.join(clauses)} "
"ORDER BY d.scraped_at DESC LIMIT %s"
)
args.append(int(limit))
with self.conn() as c, c.cursor() as cur:
if mealie_group_id:
cur.execute(
"""SELECT d.id, d.slug, d.source_url, d.name, d.description,
d.image_url, d.meta_json
FROM cauldron_discovered_recipes d
LEFT JOIN cauldron_discover_imports i
ON i.discover_id = d.id
LEFT JOIN cauldron_households h
ON h.id = i.household_id
AND h.mealie_group_id = %s
WHERE d.status = 'enriched'
AND h.id IS NULL
ORDER BY d.scraped_at DESC
LIMIT %s""",
(mealie_group_id, int(limit)),
)
else:
cur.execute(
"""SELECT d.id, d.slug, d.source_url, d.name, d.description,
d.image_url, d.meta_json
FROM cauldron_discovered_recipes d
LEFT JOIN cauldron_discover_imports i ON i.discover_id = d.id
WHERE d.status = 'enriched'
AND i.discover_id IS NULL
ORDER BY d.scraped_at DESC
LIMIT %s""",
(int(limit),),
)
cur.execute(sql, args)
return list(cur.fetchall() or [])
def get_discovered_recipe(self, discover_id: int) -> dict | None:

View file

@ -22,6 +22,9 @@ import logging
import threading
from urllib.parse import urlparse
import ipaddress as _ipaddr
import socket as _socket
from .db import DB
from .forge import Forge, ForgeError
@ -80,6 +83,67 @@ def list_seeds() -> list[dict]:
return [{"name": k, "count": len(v)} for k, v in SEED_URLS.items()]
def is_public_url(url: str) -> tuple[bool, str]:
"""SSRF guard: validate that `url` resolves to a non-private,
non-loopback, non-link-local IP. Returns (ok, reason).
Used by both `/api/discover/scrape-start` (pre-queue rejection) and
`_scrape_one` (defense-in-depth before fetch). Audit finding F-1
(CRIT, 2026-05-02): without this, any session user could queue URLs
pointing at Lucy's LAN, the docker bridge, or cloud metadata
endpoints (169.254.169.254, etc).
Strategy:
1. Parse the URL. Reject non-http(s) schemes.
2. Extract hostname. Reject if IP-literal pointing at private space.
3. Resolve via getaddrinfo (covers IPv4 + IPv6 + IDNs).
4. For each resolved address, reject if .is_private, .is_loopback,
.is_link_local, .is_multicast, .is_reserved, .is_unspecified.
Note: this is best-effort. A malicious resolver could DNS-rebind
between this check and the actual GET. recipe-scrapers also makes
its own HTTP calls in scrape_me those bypass this guard. Acceptable
for v0.1 (LAN-only deployment, family OIDC). The right answer
long-term is a custom requests transport that re-validates per-
connection like Mealie's pkgs/safehttp."""
from urllib.parse import urlparse
try:
parsed = urlparse(url)
except Exception as e:
return (False, f"unparseable url: {e}")
if parsed.scheme not in ("http", "https"):
return (False, f"scheme not allowed: {parsed.scheme!r}")
host = parsed.hostname
if not host:
return (False, "no hostname in url")
# Reject IP-literals pointing into private / loopback / link-local /
# multicast / reserved space directly (saves a DNS roundtrip too).
try:
ip = _ipaddr.ip_address(host)
if (ip.is_private or ip.is_loopback or ip.is_link_local
or ip.is_multicast or ip.is_reserved or ip.is_unspecified):
return (False, f"ip in restricted range: {ip}")
return (True, "")
except ValueError:
pass # not an IP literal — resolve via DNS
# Resolve and check every returned address.
try:
infos = _socket.getaddrinfo(host, None, type=_socket.SOCK_STREAM)
except _socket.gaierror as e:
return (False, f"dns resolution failed: {e}")
for info in infos:
addr = info[4][0]
try:
ip = _ipaddr.ip_address(addr)
except ValueError:
continue
if (ip.is_private or ip.is_loopback or ip.is_link_local
or ip.is_multicast or ip.is_reserved or ip.is_unspecified):
return (False, f"resolves to restricted ip: {ip} (host={host})")
return (True, "")
def _slug_from_url(url: str) -> str | None:
"""Cheap slug fallback when the scraper doesn't expose one."""
try:
@ -139,6 +203,14 @@ def _to_mealie_shape(scraper, source_url: str) -> dict:
def _scrape_one(url: str) -> tuple[dict, str | None] | None:
"""Scrape a single URL. Returns (mealie_shape_dict, image_url) on
success. Returns None on any unrecoverable scraper error."""
# SSRF defense-in-depth: even though /api/discover/scrape-start
# validates URLs at queue time, re-check here so any future caller
# (cron, admin script, future bulk runner) can't bypass it.
ok, reason = is_public_url(url)
if not ok:
log.warning("[discover] refusing private/restricted url %s (%s)", url, reason)
return None
try:
from recipe_scrapers import scrape_me # type: ignore
except ImportError:

View file

@ -839,7 +839,14 @@ class Forge:
"- When uncertain on a categorical, use 'unknown' or 'other' rather than guessing."
)
result = self.run(prompt, model=model or "sonnet", timeout_secs=180)
return _extract_recipe_meta(result)
meta = _extract_recipe_meta(result)
# Anaphylaxis safety belt: regex-override `contains.*` toward TRUE
# for any anaphylaxis-class allergen detected in the raw ingredient
# text. Defends against prompt-injection via discover-scraped fields.
meta["contains"] = apply_allergen_safety_override(
meta.get("contains"), "\n".join(ing_lines)
)
return meta
def verify_allergens(
self,
@ -904,9 +911,17 @@ class Forge:
try:
result = self.run(prompt, model=model or "sonnet", timeout_secs=60)
except ForgeError:
# Verification failed — fall back to prior contains; better than nothing
return prior_contains or {}
return _extract_allergen_verification(result, prior_contains or {})
# Verification failed — fall back to prior contains; better than nothing.
# Still apply the regex safety belt so we don't trust unverified prior
# data on anaphylaxis-class allergens.
return apply_allergen_safety_override(
prior_contains or {}, "\n".join(ing_lines)
)
verified = _extract_allergen_verification(result, prior_contains or {})
# Anaphylaxis safety belt — regex-override toward TRUE for the
# six anaphylaxis-class allergens. Sonnet's TRUEs are preserved;
# only FALSEs that contradict regex evidence get flipped.
return apply_allergen_safety_override(verified, "\n".join(ing_lines))
def suggest_recipes(
self,
@ -1064,6 +1079,8 @@ class Forge:
'{"suggestions": [{"recipe_slug": "...", "fit_score": 1-5, "reason": "..."}]}\n\n'
"Rules:\n"
f"- Exactly {target} suggestion(s); each recipe_slug MUST be in the pool above\n"
"- recipe_slug values MUST be returned VERBATIM as shown — INCLUDING the "
"`lib:` or `disc:` prefix when present. Do not strip, simplify, or 'clean' these slugs.\n"
"- VARIETY: don't pick 3 of the same cuisine or 3 of the same primary_protein\n"
"- BIAS toward recipes whose meta best fits the family's picker profiles "
"and any week preference/targets stated above\n"
@ -1140,6 +1157,111 @@ class Forge:
return _extract_food_info(result)
# Anaphylaxis-class allergen safety override.
#
# The enrich + verify_allergens prompts interpolate user-controlled fields
# (recipe NAME, DESCRIPTION, INGREDIENTS, STEPS) verbatim into Sonnet's prompt.
# An adversarial discover-scraped recipe can theoretically inject text that
# nudges Sonnet to flip `contains.<allergen>` to FALSE. The audit doc calls
# this F-6 (CRITICAL).
#
# This override is a belt-and-suspenders safety check: after Sonnet produces
# `contains.*` booleans, we run regex pattern-matching on the raw ingredient
# text. For the SIX anaphylaxis-class allergens (peanuts, tree nuts, shellfish,
# fish, eggs, sesame, dairy), if a regex match is found, we force the bool to
# TRUE regardless of what Sonnet said. False positives are recoverable; an
# undetected anaphylaxis-class allergen is not.
#
# Pork/soy/gluten are NOT auto-overridden because they're either
# religious/dietary (pork) or so common that false positives would block
# 90 % of recipes (gluten via flour mentions in step text).
import re as _re
_ALLERGEN_SAFETY_PATTERNS: dict[str, _re.Pattern] = {
"peanuts": _re.compile(
r"\b(peanut|peanuts|peanut\s*butter|peanut\s*oil|groundnut)\b",
_re.IGNORECASE,
),
"nuts": _re.compile(
r"\b(almond|almonds|cashew|cashews|pecan|pecans|walnut|walnuts|"
r"pistachio|pistachios|hazelnut|hazelnuts|brazil\s*nut|"
r"macadamia|pine\s*nut|pine\s*nuts|chestnut|chestnuts|"
r"praline|nutella|frangipane|marzipan|amaretto|almond\s*flour|"
r"almond\s*meal|almond\s*milk|almond\s*butter|cashew\s*butter|"
r"hazelnut\s*spread|tree\s*nut|tree\s*nuts)\b",
_re.IGNORECASE,
),
"shellfish": _re.compile(
r"\b(shrimp|prawn|prawns|lobster|crab|crabmeat|scallop|scallops|"
r"clam|clams|mussel|mussels|oyster|oysters|crayfish|langoustine|"
r"crawfish|krill|squid|calamari|octopus|cuttlefish)\b",
_re.IGNORECASE,
),
"fish": _re.compile(
r"\b(salmon|tuna|cod|tilapia|halibut|anchovy|anchovies|"
r"sardine|sardines|trout|mackerel|snapper|sea\s*bass|"
r"swordfish|monkfish|perch|herring|mahi|mahi-mahi|"
r"flounder|sole|catfish|grouper|haddock|pollock|"
r"sea\s*bream|branzino|barramundi|bonito|kingfish|"
r"fish\s*sauce|nuoc\s*mam|fish\s*stock|fish\s*paste|"
r"worcestershire)\b",
_re.IGNORECASE,
),
"eggs": _re.compile(
r"\b(egg|eggs|egg\s*yolk|egg\s*yolks|egg\s*white|egg\s*whites|"
r"mayonnaise|mayo|aioli|hollandaise|carbonara|meringue|"
r"egg\s*wash|albumen|ovalbumin)\b",
_re.IGNORECASE,
),
"sesame": _re.compile(
r"\b(sesame|tahini|gomashio|gomasio|halva|halvah|benne|"
r"sesame\s*oil|sesame\s*seed|sesame\s*seeds|sesame\s*paste)\b",
_re.IGNORECASE,
),
"dairy": _re.compile(
r"\b(milk|cream|butter|cheese|yogurt|yoghurt|ghee|whey|casein|"
r"mascarpone|ricotta|feta|parmesan|parmigiano|mozzarella|"
r"cheddar|gouda|brie|camembert|gruyere|provolone|asiago|"
r"fontina|manchego|pecorino|goat\s*cheese|cream\s*cheese|"
r"sour\s*cream|crème\s*fraîche|creme\s*fraiche|buttermilk|"
r"half-and-half|half\s*and\s*half|condensed\s*milk|"
r"evaporated\s*milk|powdered\s*milk|dry\s*milk|"
r"clotted\s*cream|crema|leche|burrata|halloumi|paneer|"
r"queso|kefir|skyr|labneh|quark|cottage\s*cheese|"
r"ice\s*cream|gelato|custard)\b",
_re.IGNORECASE,
),
}
def apply_allergen_safety_override(
contains: dict | None, raw_ingredient_text: str
) -> dict:
"""Belt-and-suspenders: force `contains.<allergen>` TRUE for any
anaphylaxis-class allergen where a regex pattern matches the raw
ingredient text. Mitigation for prompt-injection attacks where Sonnet
is talked into setting a real allergen to FALSE.
Inputs:
- `contains`: the dict from Sonnet (may be None or partial)
- `raw_ingredient_text`: joined string of all ingredient names/notes
as scraped from the source. Step text is intentionally excluded
because it's noisier (step text mentions ingredients used vs. not).
Returns: a new dict with anaphylaxis-class allergens forced TRUE on
regex match. Sonnet's other allergen calls (pork/soy/gluten) are
preserved as-is. Original `contains` is not mutated.
"""
out = dict(contains or {})
if not raw_ingredient_text:
return out
for allergen, pat in _ALLERGEN_SAFETY_PATTERNS.items():
if pat.search(raw_ingredient_text):
# Override toward TRUE only — never FALSE-stomp Sonnet's TRUE
out[allergen] = True
return out
def _extract_allergen_verification(forge_result: dict, prior: dict) -> dict:
"""Pull the corrected contains dict out of the verify_allergens reply.
Falls back to prior on any shape problem verification is best-effort."""

View file

@ -284,8 +284,13 @@ def create_app() -> Flask:
@app.get("/login")
def login():
# Stash where to go after login
# Stash where to go after login. Validate same-origin path to
# close the open-redirect surface — `next=https://evil.example/...`
# would otherwise route an authenticated user to an attacker page
# right after OIDC handshake. Audit F-3a routes 2026-05-02.
nxt = request.args.get("next") or "/me"
if not nxt.startswith("/") or nxt.startswith("//") or nxt.startswith("/\\"):
nxt = "/me"
session["post_login_next"] = nxt
return oauth.cauldron.authorize_redirect(cfg.oidc_redirect_uri)
@ -1172,7 +1177,9 @@ def create_app() -> Flask:
my_household = db.get_household(hid)
my_group_id = (my_household or {}).get("mealie_group_id")
discover_rows = db.list_discover_eligible_for_group(
mealie_group_id=my_group_id, limit=80
mealie_group_id=my_group_id,
skipping_household_id=hid,
limit=80,
)
discover_by_id: dict[int, dict] = {}
for d in discover_rows:
@ -2080,7 +2087,9 @@ def create_app() -> Flask:
status_arg = (args.get("status") or "active").strip()
if status_arg == "active":
status: list[str] | str | None = ["enriched", "raw"]
elif status_arg == "all":
elif status_arg in ("all", "skipped"):
# `skipped` is a per-household virtual status, not a DB value
# — pull all rows and let the post-filter handle it.
status = None
else:
status = status_arg
@ -2123,9 +2132,24 @@ def create_app() -> Flask:
db.get_discover_imports_for_group(mealie_group_id=my_group_id)
if my_group_id else {}
)
# Per-household skips (audit F-2 routes — was a global flip).
# Default view filters out rows the caller's household has skipped;
# `?status=skipped` surfaces them so the user can unskip if needed.
my_skipped = (
db.get_skipped_discover_ids_for_household(household_id=my_hid)
if my_hid else set()
)
show_skipped = (status_arg == "skipped")
out = []
for r in rows:
is_skipped = r["id"] in my_skipped
if show_skipped and not is_skipped:
# `?status=skipped` view — only skipped rows for this household.
continue
if not show_skipped and is_skipped and status_arg in ("active", "enriched"):
# Default browse — hide rows this household has skipped.
continue
meta = r.get("meta_json")
if isinstance(meta, str):
try:
@ -2139,6 +2163,7 @@ def create_app() -> Flask:
# scraped_json can be heavy — drop it from list responses
r.pop("scraped_json", None)
r["meta_json"] = meta
r["skipped_by_my_household"] = is_skipped
imp = group_imports.get(r["id"])
if imp:
@ -2195,11 +2220,25 @@ def create_app() -> Flask:
@app.post("/api/discover/reject/<int:discover_id>")
@require_session
def discover_reject(discover_id: int):
"""Per-household 'skip from discover'. Audit F-2 routes 2026-05-02:
previously this flipped the GLOBAL `cauldron_discovered_recipes.status
= 'rejected'` field, hiding the recipe from every household in every
group. Now writes to `cauldron_discover_skips(discover_id, household_id)`
only the caller's household stops seeing it; other households are
unaffected. Different households have different tastes."""
u = session["user"]
row = db.get_discovered_recipe(discover_id)
if not row:
return jsonify({"error": "not_found"}), 404
db.set_discovered_status(discover_id, "rejected")
return jsonify({"ok": True})
hid = current_household_id()
if not hid:
return jsonify({"error": "no_household"}), 409
db.record_discover_skip(
discover_id=discover_id,
household_id=hid,
skipped_by_sub=u["sub"],
)
return jsonify({"ok": True, "scope": "household"})
@app.post("/api/discover/scrape-start")
@require_session
@ -2228,13 +2267,36 @@ def create_app() -> Flask:
urls = [x for x in urls if x.startswith(("http://", "https://"))][:50]
if not urls:
return jsonify({"error": "no valid http(s) urls"}), 400
# SSRF guard (audit F-1 routes, CRIT, 2026-05-02): every URL must
# resolve to a public IP. Reject any that hit private / loopback /
# link-local / multicast / reserved space — those are LAN, docker
# bridge, or cloud metadata endpoints. Apply BEFORE queueing so
# the caller gets a clean error per bad URL.
accepted: list[str] = []
rejected: list[dict] = []
for u_url in urls:
ok, reason = discover_recipes.is_public_url(u_url)
if ok:
accepted.append(u_url)
else:
rejected.append({"url": u_url, "reason": reason})
if not accepted:
return jsonify({
"error": "no_safe_urls",
"rejected": rejected,
}), 400
job_id = db.create_discover_job(
started_by_sub=u["sub"], source_seed=seed_name,
)
discover_recipes.spawn_thread(
db=db, job_id=job_id, forge=forge, urls=urls,
db=db, job_id=job_id, forge=forge, urls=accepted,
)
return jsonify({"ok": True, "job_id": job_id, "urls_queued": len(urls)})
return jsonify({
"ok": True,
"job_id": job_id,
"urls_queued": len(accepted),
"rejected": rejected,
})
@app.get("/api/discover/scrape-status")
@require_session

View file

@ -220,13 +220,27 @@ class Sterilizer:
return {"slug": slug, "updated": 0, "skipped": 0, "created_foods": [], "created_units": []}
recipe = self.mealie.get_recipe(slug)
# Mid-flight edit guard: preview was built from one snapshot of the
# recipe; we just re-fetched. If the user added or removed an
# ingredient in between (Sonnet preview can take 60-300s), Python's
# zip would silently truncate to the shorter list, dropping the
# user's edit OR mis-aligning proposals with new ingredient slots.
# Refuse to apply rather than scribble. Bulk runner catches the
# RuntimeError and marks the proposal stale; user retries cleanly.
mealie_ings = recipe.get("recipeIngredient") or []
if len(mealie_ings) != len(proposals):
raise RuntimeError(
f"recipe edited mid-flight: preview had {len(proposals)} ingredients "
f"but Mealie now has {len(mealie_ings)} — re-run sterilize on this slug"
)
food_index = self._build_name_index(self.mealie.list_foods())
unit_index = self._build_name_index(self.mealie.list_units())
created_foods: list[str] = []
created_units: list[str] = []
new_ingredients: list[dict] = []
for orig_ing, prop in zip(recipe.get("recipeIngredient") or [], proposals):
for orig_ing, prop in zip(mealie_ings, proposals):
# Each proposal can produce 1+ parsed children (fan-out for
# compound inputs like "Toppings (a, b, c)" or "salt and pepper").
# Keep the proposal_json key flexible: prefer parsed_items but