Commit graph

12 commits

Author SHA1 Message Date
7c7151186e channel: extract avatar from pageHeaderRenderer + metadata fallback
Channels on the newer pageHeaderRenderer layout (most channels with a
2024+ refreshed header — WTYP, etc.) were getting empty avatars and
banners since the parse_channel_browse only extracted those from the
older c4TabbedHeaderRenderer branch.

Two fixes layered:

1. parse_page_header_avatar() — walks the deep ViewModel nest:
     header.content.pageHeaderViewModel.image
       .decoratedAvatarViewModel.avatar.avatarViewModel.image.sources[]
   Falls back to a couple of shallower nestings YT has used on this
   path historically. Returns ImageSet sorted by height ascending so
   .last() still picks the largest source.

2. metadata.channelMetadataRenderer.avatar.thumbnails[] backfill.
   Set whether the header is c4Tabbed or pageHeader, and the most
   reliable single avatar source. Used only when both header branches
   came back empty so we don't override a higher-quality header avatar.

Description-from-metadata extraction folded into the same metadata
walk to avoid the JSON tree twice.
2026-05-25 19:47:46 +00:00
e6fbbb79b4 channel: second-browse to Videos tab + parse lockupViewModel
Found via emulator smoke that channelInfo was returning empty
recent_videos list, breaking the subscriptions feed.

Two root causes:
1. First browse of a channel by browseId lands on the HOME tab in
   2026 YT, not Videos. Home uses sectionListRenderer, not the
   richGridRenderer my parser expected. The Videos tab in the
   response carries an empty content block (you need a SECOND
   browse with the params token to populate it).
2. Channel video items on the Videos tab migrated from
   videoRenderer to lockupViewModel (YT made the switch ~2024).
   My old parser only handled videoRenderer.

Fix:
* fetch_channel_browse now does TWO browses — first for Home
  (header + metadata), second with params='EgZ2aWRlb3PyBgQKAjoA'
  for the Videos tab. Same magic constant NPE uses (audit Track
  A §2.4).
* parse_videos_tab handles BOTH videoRenderer (legacy/fallback)
  AND lockupViewModel (current). lockupViewModel parse extracts:
    - contentId → video ID
    - metadata.lockupMetadataViewModel.title.content → title
    - metadataRows[].metadataParts[].text.content → view-count
      ('1.1m views') + relative-age ('2 years ago') + uploader
    - contentImage.thumbnailViewModel.overlays[]
      .thumbnailBottomOverlayViewModel.badges[]
      .thumbnailBadgeViewModel.text → duration ('3:14:08')
    - contentImage.thumbnailViewModel.image.sources[] → thumbnails
* parse_videos_continuation pulls the continuation token from the
  Videos tab grid for pagination.

Second browse is best-effort: if it fails, recent_videos stays
empty and the channel header still populates from the first.

Verified the YT response shape by probing live channel
UCwwtUfy0-CqN50HfaFDzL0w (NCS Spektrem) — got 30+ lockup-style
video items with the expected fields.
2026-05-24 20:06:43 -07:00
aa07984631 Drop cdylib + staticlib from strawcore-core crate-type
Caught during the cargo-ndk cross-compile — strawcore-core was
emitting its own libstrawcore_core.so (~306 KB per ABI) into Straw's
jniLibs. That .so is never loaded by Android; the wrapper crate's
libstrawcore.so is the only entry point.

rlib only is what consumer crates need.
2026-05-24 17:40:37 -07:00
56089ffa3e Rename package to strawcore-core
Straw's wrapper crate already owns the name 'strawcore' (and that name
is baked into the Android .so file + Kotlin's System.loadLibrary call).
Renaming this extractor crate to 'strawcore-core' resolves the cargo
package-name collision so both can live in the same workspace dep tree.

Repo name on Gitea stays Sulkta-Coop/strawcore.
2026-05-24 17:28:38 -07:00
f79d8fb109 Phase 6 — Search + Channel + Playlist + LinkHandler
Pulls in the read-side extractor surfaces Straw needs at app open
(search bar) + on detail screens (channel + playlist).

src/youtube/linkhandler/
  * mod.rs       — ACCEPTED_HOSTS allowlist (youtube.com /
                   youtube-nocookie.com / youtu.be / m.youtube.com /
                   music.youtube.com); 27 Invidious mirror hosts
                   intentionally dropped (SPEC §6.6).
  * stream.rs    — extract_video_id() handles /watch?v= / youtu.be/ /
                   /embed/ / /shorts/ / /v/ / /live/ / attribution_link;
                   strict 11-char [A-Za-z0-9_-] validation.
  * channel.rs   — ChannelIdentifier enum (DirectId / Handle / Custom /
                   LegacyUser). Resolution to UC… id lands in
                   youtube/channel.rs.
  * playlist.rs  — extracts ?list=<PLid> from /playlist and /watch URLs.
  * search.rs    — SearchFilter enum + params() opaque base64 strings +
                   uses_music_endpoint() routing flag.

src/youtube/search_extractor.rs
  * search(query, filter) → SearchInfo { query, corrected_query,
                                          videos, continuation_token }
  * Walks twoColumnSearchResultsRenderer → sectionListRenderer →
    itemSectionRenderer → videoRenderer (+ shelfRenderer recursion).
  * Parses YT duration strings, view-count abbreviations ('1.5M views'),
    publishedTimeText, ownerBadges verified flag, badge LIVE flag.
  * Music-search filters route to WEB_REMIX — flagged as not-yet-impl.

src/youtube/suggestion_extractor.rs
  * suggestions(query) → Vec<String> via the suggestqueries-clients6
    endpoint; handles both XSSI-prefixed and bare JSON responses.

src/youtube/channel.rs
  * resolve_handle_to_channel_id() via /youtubei/v1/navigation/resolve_url
  * channel_info(ChannelIdentifier) → ChannelInfo {
      name, description, avatars, banners, subscriber_count, verified,
      recent_videos, videos_continuation
    }
  * Parses both c4TabbedHeaderRenderer (most common) and the newer
    pageHeaderRenderer flavor.
  * subscriber_count parser handles K/M/B suffixes.

src/youtube/playlist_extractor.rs
  * playlist_info(playlist_id) → PlaylistInfo with first-page video
    list + continuation_token. Browses with browseId='VL<id>'.
  * Walks playlistMetadataRenderer + playlistSidebarRenderer + the
    playlistVideoListRenderer.contents[] for video items.

Tests: 121 lib unit pass (+44 since Phase 5). All previous phase smoke
tests still green.

What's left:
* Phase 6 kiosks (Trending etc) — minor, deferred
* Phase 7 — UniFFI surface swap into Straw (Straw repo work)
* Phase 8 — delete rustypipe (Straw repo work)
2026-05-24 17:16:14 -07:00
b4286b8236 Phase 5 — PoTokenProvider trait + stream_extractor wiring
Mirrors NPE PoTokenProvider.java + PoTokenResult.java; defines the
host-injection surface for BotGuard attestation. The Rust crate stays
out of the BotGuard business — embedders (Straw on Android, future
Sulkta CLI via Browserless, etc.) supply their own impl.

src/youtube/potoken/mod.rs
  * PoTokenResult { player_request_po_token, streaming_data_po_token,
                    visitor_data }  + ::new + ::single constructors
  * PoTokenError (Unavailable, MintFailed) — FIX vs NPE: split 'declined'
    (Ok(None)) from 'errored' (Err) so callers can react differently
  * trait PoTokenProvider with 4 client-scoped methods; default impl
    returns Ok(None) so embedders can override just what they support
  * set_po_token_provider / clear_po_token_provider / po_token_provider
    static registration via RwLock<Option<Arc<dyn PoTokenProvider>>>

src/youtube/potoken/noop.rs
  * NoopPoTokenProvider — safe default

src/youtube/stream_extractor.rs
  * resolve_po_token via options-first-then-provider helper
    (options_or_provider)
  * Android branch: pulls player_request_po_token + visitor_data into
    /player body, streams streaming_data_po_token through to URL &pot=
  * iOS branch: same shape, gated on fetch_ios_client AND non-empty
    provider result

Kotlin side (PoTokenWebView lift into Straw via UniFFI's foreign-trait
bridge) is separate work — strawcore just owns the contract.

Tests: 77 lib unit pass (+4 since Phase 4) + 7 Phase 2 offline + 7
Phase 4 offline = 91 green.
2026-05-24 17:10:13 -07:00
a47e142ab7 Phase 4 (complete) — stream_extractor orchestrator
Wire the Android-primary fetch path + JSON-walking + URL post-processing
into a single stream_info(video_id) entry point. Mirrors NPE
YoutubeStreamExtractor.onFetchPage() per audit Track C §1.2.

src/youtube/stream_extractor.rs
  * stream_info(video_id) + stream_info_with(video_id, options)
  * fetch_android — reel endpoint (anonymous) OR /player (with po_token)
  * check_playability_status — maps to ContentUnavailable variants
    (AgeRestricted, GeoRestricted, Paid, Private, YoutubeMusicPremium,
    AccountTerminated, Other)
  * is_player_response_not_valid — decoy-video detection
  * populate_video_details + populate_microformat + populate_streams +
    populate_manifests + populate_captions
  * process_url — sig deobf path (signatureCipher → JS function call)
    + unconditional nsig deobf + cpn append + pot append
  * build_video_progressive / build_video_only / build_audio +
    push_*_dedup helpers (FIX: NPE bug — dedup by itag id, not by
    mediaFormat.id which collides 140/141)

Consolidated stream_helper's local ExtractionError into the crate-wide
exceptions::ExtractionError with a new DownloaderMissing variant.

Tests: 73 lib unit pass (+9 since Phase 3) + 7 new Phase 4 offline
integration tests = 80 lib green. Live YT end-to-end smoke deferred
to Straw integration; the code path is in place.
2026-05-24 17:08:04 -07:00
cd98673684 Phase 4 (partial) — stream value types + InnerTube /player helpers
Lands the data shapes + the HTTP layer for stream extraction. The
extractor orchestrator + DASH manifest creator are deferred to the
next session — the parsing logic is dense enough to want a focused
pass.

src/stream/
  * mod.rs       — StreamInfo + StreamInfoItem (full + 'card' shapes)
                   mirroring NPE StreamInfo.java + StreamInfoItem.java
  * delivery.rs  — DeliveryMethod (Progressive/Dash/Hls/Torrent)
  * audio.rs     — AudioStream (itag, format, url, bitrate, codec,
                   audio_track_id, content_length, etc.)
  * video.rs     — VideoStream (itag, format, url, resolution, fps,
                   bandwidth, codec, video_only flag)
  * subtitles.rs — SubtitlesStream (url, lang, auto_generated, mime)

src/youtube/stream_helper.rs
  * generate_content_playback_nonce() — 16-char LCG-shuffled cpn
  * get_web_metadata_player_response       (microformat + thumbnails only)
  * get_web_embedded_player_response       (embed-url + signatureTimestamp)
  * get_android_player_response            (full Android /player + poToken)
  * get_android_reel_player_response       (no-poToken fallback)
  * get_ios_player_response                (iOS — flagged with 917 KiB cap
                                            warning in the doc comment)

Per-helper headers + URL shapes match audit Track C §2.7 verbatim:
Android/iOS hit gapis endpoint with mobile UA; WEB family hits
www.youtube.com with the WEB headers.

Tests: 64 lib unit pass (up from 62 in Phase 3).

Next session: full stream_extractor.rs orchestrator + dash_manifest/
creator + Phase 4 done-when smoke (extract NCS Spektrem).
2026-05-24 17:01:03 -07:00
3014410cba Phase 3 — InnerTube + itag
Port the YT client matrix + request envelope + itag lookup table.

src/youtube/
  * constants.rs       — ClientsConstants.java verbatim. All six live
                         clients (WEB, WEB_EMBEDDED_PLAYER,
                         WEB_MUSIC_ANALYTICS, ANDROID, IOS, plus the
                         WEB_REMIX values for completeness). Base URLs
                         + prettyPrint=false suffix.
  * client_request.rs  — ClientInfo / DeviceInfo / InnertubeClientRequestInfo
                         + the 5 factory constructors NPE exposes
                         (ofWebClient, ofWebEmbeddedPlayer, ofCharts,
                         ofAndroid, ofIos). build_envelope() emits the
                         InnerTube JSON in NPE's exact insertion order;
                         build_desktop_envelope() is the WEB-fast-path
                         used by search/browse/next/resolve_url/comments.
  * itag.rs            — 57-entry itag table (14 progressive + 10 audio +
                         33 video-only). MediaFormat enum + ItagType
                         enum + ItagItem struct + lookup().
  * parsing.rs         — consent toggle + cookie generator (SOCS=CAE= /
                         SOCS=CAISAiAD), WEB client-version cache + sw.js
                         scrape, WEB/mobile header builders (mobile
                         deliberately strips X-YouTube-Client-Name +
                         Origin/Referer + Cookie per audit Track A §6.2),
                         android/ios UA templates, visitor_data bootstrap
                         POST to /youtubei/v1/visitor_id.

PARITY notes flagged in code:
  * androidSdkVersion=36 + osVersion=16 but Android-15 in UA — NPE-intentional
  * mobile clients send NO X-YouTube-Client-* headers
  * audit doc says "53 entries" but tallies + NPE source = 57 ItagItems

Tests: 62 lib unit pass (up from 43 in Phase 2). All Phase 1 + Phase 2
smoke still green. Live InnerTube POSTs (visitor_data bootstrap +
/player) deferred to Phase 4 integration.
2026-05-24 16:57:47 -07:00
91639f26d1 Phase 2 — JS deobfuscator (rquickjs + ress)
Port NewPipeExtractor's JS pipeline: player.js fetch + cache, sig and
nsig function extraction, deobfuscation, sticky-error caching.

src/youtube/js/
  * runtime.rs        — rquickjs wrapper (mirrors utils/JavaScript.java)
                        compile_or_throw + run(snippet, name, parameter)
  * lexer.rs          — match_to_closing_brace via the `ress` JS scanner
                        (NPE's lexer is derived from the same crate
                        upstream)
  * extractor.rs      — iframe_api → embed page fallback for player.js
                        URL, regex-driven hash extraction, clean-and-fetch
  * signature.rs      — 6 sig fn name regexes (front-most-recent),
                        deobf-function-body via lexer w/ regex fallback,
                        helper-object + global-string-array extraction,
                        signatureTimestamp, snippet assembler
  * nsig.rs           — 8 nsig fn name regexes (incl. array-indirection),
                        body via lexer w/ regex fallback, fixupFunction
                        early-return strip
  * player_manager.rs — orchestrator + sticky-error cache mirroring
                        YoutubeJavaScriptPlayerManager

PORT DEVIATIONS from NPE (each flagged in code):
  * dropped the 6th sig fn name regex (used Java backref \2; Rust's
    `regex` crate is backtracking-free, so we substitute a loose form
    that NPE itself half-broke per audit Track B §2.1)
  * dropped the Java atomic group `(?>...)` from helper-object regex —
    Rust's NFA is already linear-time
  * nsig fixup substitutes `(?:"undefined"|'undefined')` for the
    \1 backref; harmless loosening
  * sig and nsig assembled snippets prepend `var` — QuickJS rejects
    bare-assignment to undeclared identifiers; NPE relied on Rhino's
    non-strict mode

Tests:
  * 43 lib unit tests (up from 7 in Phase 1)
  * 7 Phase 2 offline integration tests against a hand-crafted
    minified synthetic player.js — exercises the full sig pipeline
    (build_deobfuscator → runtime::run) and nsig fixup_function
  * 7 Phase 1 live smoke tests still green

57/57 total green.
2026-05-24 16:53:19 -07:00
46201c731f Phase 1 — Foundation
Mirror NPE's dependency-free spine in Rust:

* exceptions   — NetworkError + ParsingError + ContentUnavailable
                 + ExtractionError tree, with reqwest/serde_json conversions
* localization — Localization + ContentCountry, default (en, GB)
* downloader/  — Downloader trait, Request builder, Response,
                 reqwest blocking default impl
* page         — continuation-token carrier
* image        — Image + ImageSet + ResolutionLevel
                 (HEIGHT_UNKNOWN/WIDTH_UNKNOWN = -1)
* metainfo     — title/content/url/url_text grab-bag
* service      — StreamingService trait + LinkType + ServiceInfo
* newpipe      — process-global Downloader / Localization /
                 ContentCountry singleton

Foundational invariants nailed down (per SPEC §3):
* HTTP non-2xx returns Ok(Response); only 429 throws NetworkError::Recaptcha
* Response header keys lowercase-normalized
* Request.add_header PARITY with NPE bug (silent overwrite);
  append_header is our clean addition
* default Localization is en-GB
* No cookie jar in the default downloader

Tests: 7 unit + 7 live smoke against httpbin.org (gated on
'online-tests' feature). All green.
2026-05-24 16:32:36 -07:00
f44b46fab5 Initial commit 2026-05-24 16:26:57 -07:00