Commit graph

10 commits

Author SHA1 Message Date
aa07984631 Drop cdylib + staticlib from strawcore-core crate-type
Caught during the cargo-ndk cross-compile — strawcore-core was
emitting its own libstrawcore_core.so (~306 KB per ABI) into Straw's
jniLibs. That .so is never loaded by Android; the wrapper crate's
libstrawcore.so is the only entry point.

rlib only is what consumer crates need.
2026-05-24 17:40:37 -07:00
56089ffa3e Rename package to strawcore-core
Straw's wrapper crate already owns the name 'strawcore' (and that name
is baked into the Android .so file + Kotlin's System.loadLibrary call).
Renaming this extractor crate to 'strawcore-core' resolves the cargo
package-name collision so both can live in the same workspace dep tree.

Repo name on Gitea stays Sulkta-Coop/strawcore.
2026-05-24 17:28:38 -07:00
f79d8fb109 Phase 6 — Search + Channel + Playlist + LinkHandler
Pulls in the read-side extractor surfaces Straw needs at app open
(search bar) + on detail screens (channel + playlist).

src/youtube/linkhandler/
  * mod.rs       — ACCEPTED_HOSTS allowlist (youtube.com /
                   youtube-nocookie.com / youtu.be / m.youtube.com /
                   music.youtube.com); 27 Invidious mirror hosts
                   intentionally dropped (SPEC §6.6).
  * stream.rs    — extract_video_id() handles /watch?v= / youtu.be/ /
                   /embed/ / /shorts/ / /v/ / /live/ / attribution_link;
                   strict 11-char [A-Za-z0-9_-] validation.
  * channel.rs   — ChannelIdentifier enum (DirectId / Handle / Custom /
                   LegacyUser). Resolution to UC… id lands in
                   youtube/channel.rs.
  * playlist.rs  — extracts ?list=<PLid> from /playlist and /watch URLs.
  * search.rs    — SearchFilter enum + params() opaque base64 strings +
                   uses_music_endpoint() routing flag.

src/youtube/search_extractor.rs
  * search(query, filter) → SearchInfo { query, corrected_query,
                                          videos, continuation_token }
  * Walks twoColumnSearchResultsRenderer → sectionListRenderer →
    itemSectionRenderer → videoRenderer (+ shelfRenderer recursion).
  * Parses YT duration strings, view-count abbreviations ('1.5M views'),
    publishedTimeText, ownerBadges verified flag, badge LIVE flag.
  * Music-search filters route to WEB_REMIX — flagged as not-yet-impl.

src/youtube/suggestion_extractor.rs
  * suggestions(query) → Vec<String> via the suggestqueries-clients6
    endpoint; handles both XSSI-prefixed and bare JSON responses.

src/youtube/channel.rs
  * resolve_handle_to_channel_id() via /youtubei/v1/navigation/resolve_url
  * channel_info(ChannelIdentifier) → ChannelInfo {
      name, description, avatars, banners, subscriber_count, verified,
      recent_videos, videos_continuation
    }
  * Parses both c4TabbedHeaderRenderer (most common) and the newer
    pageHeaderRenderer flavor.
  * subscriber_count parser handles K/M/B suffixes.

src/youtube/playlist_extractor.rs
  * playlist_info(playlist_id) → PlaylistInfo with first-page video
    list + continuation_token. Browses with browseId='VL<id>'.
  * Walks playlistMetadataRenderer + playlistSidebarRenderer + the
    playlistVideoListRenderer.contents[] for video items.

Tests: 121 lib unit pass (+44 since Phase 5). All previous phase smoke
tests still green.

What's left:
* Phase 6 kiosks (Trending etc) — minor, deferred
* Phase 7 — UniFFI surface swap into Straw (Straw repo work)
* Phase 8 — delete rustypipe (Straw repo work)
2026-05-24 17:16:14 -07:00
b4286b8236 Phase 5 — PoTokenProvider trait + stream_extractor wiring
Mirrors NPE PoTokenProvider.java + PoTokenResult.java; defines the
host-injection surface for BotGuard attestation. The Rust crate stays
out of the BotGuard business — embedders (Straw on Android, future
Sulkta CLI via Browserless, etc.) supply their own impl.

src/youtube/potoken/mod.rs
  * PoTokenResult { player_request_po_token, streaming_data_po_token,
                    visitor_data }  + ::new + ::single constructors
  * PoTokenError (Unavailable, MintFailed) — FIX vs NPE: split 'declined'
    (Ok(None)) from 'errored' (Err) so callers can react differently
  * trait PoTokenProvider with 4 client-scoped methods; default impl
    returns Ok(None) so embedders can override just what they support
  * set_po_token_provider / clear_po_token_provider / po_token_provider
    static registration via RwLock<Option<Arc<dyn PoTokenProvider>>>

src/youtube/potoken/noop.rs
  * NoopPoTokenProvider — safe default

src/youtube/stream_extractor.rs
  * resolve_po_token via options-first-then-provider helper
    (options_or_provider)
  * Android branch: pulls player_request_po_token + visitor_data into
    /player body, streams streaming_data_po_token through to URL &pot=
  * iOS branch: same shape, gated on fetch_ios_client AND non-empty
    provider result

Kotlin side (PoTokenWebView lift into Straw via UniFFI's foreign-trait
bridge) is separate work — strawcore just owns the contract.

Tests: 77 lib unit pass (+4 since Phase 4) + 7 Phase 2 offline + 7
Phase 4 offline = 91 green.
2026-05-24 17:10:13 -07:00
a47e142ab7 Phase 4 (complete) — stream_extractor orchestrator
Wire the Android-primary fetch path + JSON-walking + URL post-processing
into a single stream_info(video_id) entry point. Mirrors NPE
YoutubeStreamExtractor.onFetchPage() per audit Track C §1.2.

src/youtube/stream_extractor.rs
  * stream_info(video_id) + stream_info_with(video_id, options)
  * fetch_android — reel endpoint (anonymous) OR /player (with po_token)
  * check_playability_status — maps to ContentUnavailable variants
    (AgeRestricted, GeoRestricted, Paid, Private, YoutubeMusicPremium,
    AccountTerminated, Other)
  * is_player_response_not_valid — decoy-video detection
  * populate_video_details + populate_microformat + populate_streams +
    populate_manifests + populate_captions
  * process_url — sig deobf path (signatureCipher → JS function call)
    + unconditional nsig deobf + cpn append + pot append
  * build_video_progressive / build_video_only / build_audio +
    push_*_dedup helpers (FIX: NPE bug — dedup by itag id, not by
    mediaFormat.id which collides 140/141)

Consolidated stream_helper's local ExtractionError into the crate-wide
exceptions::ExtractionError with a new DownloaderMissing variant.

Tests: 73 lib unit pass (+9 since Phase 3) + 7 new Phase 4 offline
integration tests = 80 lib green. Live YT end-to-end smoke deferred
to Straw integration; the code path is in place.
2026-05-24 17:08:04 -07:00
cd98673684 Phase 4 (partial) — stream value types + InnerTube /player helpers
Lands the data shapes + the HTTP layer for stream extraction. The
extractor orchestrator + DASH manifest creator are deferred to the
next session — the parsing logic is dense enough to want a focused
pass.

src/stream/
  * mod.rs       — StreamInfo + StreamInfoItem (full + 'card' shapes)
                   mirroring NPE StreamInfo.java + StreamInfoItem.java
  * delivery.rs  — DeliveryMethod (Progressive/Dash/Hls/Torrent)
  * audio.rs     — AudioStream (itag, format, url, bitrate, codec,
                   audio_track_id, content_length, etc.)
  * video.rs     — VideoStream (itag, format, url, resolution, fps,
                   bandwidth, codec, video_only flag)
  * subtitles.rs — SubtitlesStream (url, lang, auto_generated, mime)

src/youtube/stream_helper.rs
  * generate_content_playback_nonce() — 16-char LCG-shuffled cpn
  * get_web_metadata_player_response       (microformat + thumbnails only)
  * get_web_embedded_player_response       (embed-url + signatureTimestamp)
  * get_android_player_response            (full Android /player + poToken)
  * get_android_reel_player_response       (no-poToken fallback)
  * get_ios_player_response                (iOS — flagged with 917 KiB cap
                                            warning in the doc comment)

Per-helper headers + URL shapes match audit Track C §2.7 verbatim:
Android/iOS hit gapis endpoint with mobile UA; WEB family hits
www.youtube.com with the WEB headers.

Tests: 64 lib unit pass (up from 62 in Phase 3).

Next session: full stream_extractor.rs orchestrator + dash_manifest/
creator + Phase 4 done-when smoke (extract NCS Spektrem).
2026-05-24 17:01:03 -07:00
3014410cba Phase 3 — InnerTube + itag
Port the YT client matrix + request envelope + itag lookup table.

src/youtube/
  * constants.rs       — ClientsConstants.java verbatim. All six live
                         clients (WEB, WEB_EMBEDDED_PLAYER,
                         WEB_MUSIC_ANALYTICS, ANDROID, IOS, plus the
                         WEB_REMIX values for completeness). Base URLs
                         + prettyPrint=false suffix.
  * client_request.rs  — ClientInfo / DeviceInfo / InnertubeClientRequestInfo
                         + the 5 factory constructors NPE exposes
                         (ofWebClient, ofWebEmbeddedPlayer, ofCharts,
                         ofAndroid, ofIos). build_envelope() emits the
                         InnerTube JSON in NPE's exact insertion order;
                         build_desktop_envelope() is the WEB-fast-path
                         used by search/browse/next/resolve_url/comments.
  * itag.rs            — 57-entry itag table (14 progressive + 10 audio +
                         33 video-only). MediaFormat enum + ItagType
                         enum + ItagItem struct + lookup().
  * parsing.rs         — consent toggle + cookie generator (SOCS=CAE= /
                         SOCS=CAISAiAD), WEB client-version cache + sw.js
                         scrape, WEB/mobile header builders (mobile
                         deliberately strips X-YouTube-Client-Name +
                         Origin/Referer + Cookie per audit Track A §6.2),
                         android/ios UA templates, visitor_data bootstrap
                         POST to /youtubei/v1/visitor_id.

PARITY notes flagged in code:
  * androidSdkVersion=36 + osVersion=16 but Android-15 in UA — NPE-intentional
  * mobile clients send NO X-YouTube-Client-* headers
  * audit doc says "53 entries" but tallies + NPE source = 57 ItagItems

Tests: 62 lib unit pass (up from 43 in Phase 2). All Phase 1 + Phase 2
smoke still green. Live InnerTube POSTs (visitor_data bootstrap +
/player) deferred to Phase 4 integration.
2026-05-24 16:57:47 -07:00
91639f26d1 Phase 2 — JS deobfuscator (rquickjs + ress)
Port NewPipeExtractor's JS pipeline: player.js fetch + cache, sig and
nsig function extraction, deobfuscation, sticky-error caching.

src/youtube/js/
  * runtime.rs        — rquickjs wrapper (mirrors utils/JavaScript.java)
                        compile_or_throw + run(snippet, name, parameter)
  * lexer.rs          — match_to_closing_brace via the `ress` JS scanner
                        (NPE's lexer is derived from the same crate
                        upstream)
  * extractor.rs      — iframe_api → embed page fallback for player.js
                        URL, regex-driven hash extraction, clean-and-fetch
  * signature.rs      — 6 sig fn name regexes (front-most-recent),
                        deobf-function-body via lexer w/ regex fallback,
                        helper-object + global-string-array extraction,
                        signatureTimestamp, snippet assembler
  * nsig.rs           — 8 nsig fn name regexes (incl. array-indirection),
                        body via lexer w/ regex fallback, fixupFunction
                        early-return strip
  * player_manager.rs — orchestrator + sticky-error cache mirroring
                        YoutubeJavaScriptPlayerManager

PORT DEVIATIONS from NPE (each flagged in code):
  * dropped the 6th sig fn name regex (used Java backref \2; Rust's
    `regex` crate is backtracking-free, so we substitute a loose form
    that NPE itself half-broke per audit Track B §2.1)
  * dropped the Java atomic group `(?>...)` from helper-object regex —
    Rust's NFA is already linear-time
  * nsig fixup substitutes `(?:"undefined"|'undefined')` for the
    \1 backref; harmless loosening
  * sig and nsig assembled snippets prepend `var` — QuickJS rejects
    bare-assignment to undeclared identifiers; NPE relied on Rhino's
    non-strict mode

Tests:
  * 43 lib unit tests (up from 7 in Phase 1)
  * 7 Phase 2 offline integration tests against a hand-crafted
    minified synthetic player.js — exercises the full sig pipeline
    (build_deobfuscator → runtime::run) and nsig fixup_function
  * 7 Phase 1 live smoke tests still green

57/57 total green.
2026-05-24 16:53:19 -07:00
46201c731f Phase 1 — Foundation
Mirror NPE's dependency-free spine in Rust:

* exceptions   — NetworkError + ParsingError + ContentUnavailable
                 + ExtractionError tree, with reqwest/serde_json conversions
* localization — Localization + ContentCountry, default (en, GB)
* downloader/  — Downloader trait, Request builder, Response,
                 reqwest blocking default impl
* page         — continuation-token carrier
* image        — Image + ImageSet + ResolutionLevel
                 (HEIGHT_UNKNOWN/WIDTH_UNKNOWN = -1)
* metainfo     — title/content/url/url_text grab-bag
* service      — StreamingService trait + LinkType + ServiceInfo
* newpipe      — process-global Downloader / Localization /
                 ContentCountry singleton

Foundational invariants nailed down (per SPEC §3):
* HTTP non-2xx returns Ok(Response); only 429 throws NetworkError::Recaptcha
* Response header keys lowercase-normalized
* Request.add_header PARITY with NPE bug (silent overwrite);
  append_header is our clean addition
* default Localization is en-GB
* No cookie jar in the default downloader

Tests: 7 unit + 7 live smoke against httpbin.org (gated on
'online-tests' feature). All green.
2026-05-24 16:32:36 -07:00
f44b46fab5 Initial commit 2026-05-24 16:26:57 -07:00