Commit graph

3 commits

Author SHA1 Message Date
3014410cba Phase 3 — InnerTube + itag
Port the YT client matrix + request envelope + itag lookup table.

src/youtube/
  * constants.rs       — ClientsConstants.java verbatim. All six live
                         clients (WEB, WEB_EMBEDDED_PLAYER,
                         WEB_MUSIC_ANALYTICS, ANDROID, IOS, plus the
                         WEB_REMIX values for completeness). Base URLs
                         + prettyPrint=false suffix.
  * client_request.rs  — ClientInfo / DeviceInfo / InnertubeClientRequestInfo
                         + the 5 factory constructors NPE exposes
                         (ofWebClient, ofWebEmbeddedPlayer, ofCharts,
                         ofAndroid, ofIos). build_envelope() emits the
                         InnerTube JSON in NPE's exact insertion order;
                         build_desktop_envelope() is the WEB-fast-path
                         used by search/browse/next/resolve_url/comments.
  * itag.rs            — 57-entry itag table (14 progressive + 10 audio +
                         33 video-only). MediaFormat enum + ItagType
                         enum + ItagItem struct + lookup().
  * parsing.rs         — consent toggle + cookie generator (SOCS=CAE= /
                         SOCS=CAISAiAD), WEB client-version cache + sw.js
                         scrape, WEB/mobile header builders (mobile
                         deliberately strips X-YouTube-Client-Name +
                         Origin/Referer + Cookie per audit Track A §6.2),
                         android/ios UA templates, visitor_data bootstrap
                         POST to /youtubei/v1/visitor_id.

PARITY notes flagged in code:
  * androidSdkVersion=36 + osVersion=16 but Android-15 in UA — NPE-intentional
  * mobile clients send NO X-YouTube-Client-* headers
  * audit doc says "53 entries" but tallies + NPE source = 57 ItagItems

Tests: 62 lib unit pass (up from 43 in Phase 2). All Phase 1 + Phase 2
smoke still green. Live InnerTube POSTs (visitor_data bootstrap +
/player) deferred to Phase 4 integration.
2026-05-24 16:57:47 -07:00
91639f26d1 Phase 2 — JS deobfuscator (rquickjs + ress)
Port NewPipeExtractor's JS pipeline: player.js fetch + cache, sig and
nsig function extraction, deobfuscation, sticky-error caching.

src/youtube/js/
  * runtime.rs        — rquickjs wrapper (mirrors utils/JavaScript.java)
                        compile_or_throw + run(snippet, name, parameter)
  * lexer.rs          — match_to_closing_brace via the `ress` JS scanner
                        (NPE's lexer is derived from the same crate
                        upstream)
  * extractor.rs      — iframe_api → embed page fallback for player.js
                        URL, regex-driven hash extraction, clean-and-fetch
  * signature.rs      — 6 sig fn name regexes (front-most-recent),
                        deobf-function-body via lexer w/ regex fallback,
                        helper-object + global-string-array extraction,
                        signatureTimestamp, snippet assembler
  * nsig.rs           — 8 nsig fn name regexes (incl. array-indirection),
                        body via lexer w/ regex fallback, fixupFunction
                        early-return strip
  * player_manager.rs — orchestrator + sticky-error cache mirroring
                        YoutubeJavaScriptPlayerManager

PORT DEVIATIONS from NPE (each flagged in code):
  * dropped the 6th sig fn name regex (used Java backref \2; Rust's
    `regex` crate is backtracking-free, so we substitute a loose form
    that NPE itself half-broke per audit Track B §2.1)
  * dropped the Java atomic group `(?>...)` from helper-object regex —
    Rust's NFA is already linear-time
  * nsig fixup substitutes `(?:"undefined"|'undefined')` for the
    \1 backref; harmless loosening
  * sig and nsig assembled snippets prepend `var` — QuickJS rejects
    bare-assignment to undeclared identifiers; NPE relied on Rhino's
    non-strict mode

Tests:
  * 43 lib unit tests (up from 7 in Phase 1)
  * 7 Phase 2 offline integration tests against a hand-crafted
    minified synthetic player.js — exercises the full sig pipeline
    (build_deobfuscator → runtime::run) and nsig fixup_function
  * 7 Phase 1 live smoke tests still green

57/57 total green.
2026-05-24 16:53:19 -07:00
46201c731f Phase 1 — Foundation
Mirror NPE's dependency-free spine in Rust:

* exceptions   — NetworkError + ParsingError + ContentUnavailable
                 + ExtractionError tree, with reqwest/serde_json conversions
* localization — Localization + ContentCountry, default (en, GB)
* downloader/  — Downloader trait, Request builder, Response,
                 reqwest blocking default impl
* page         — continuation-token carrier
* image        — Image + ImageSet + ResolutionLevel
                 (HEIGHT_UNKNOWN/WIDTH_UNKNOWN = -1)
* metainfo     — title/content/url/url_text grab-bag
* service      — StreamingService trait + LinkType + ServiceInfo
* newpipe      — process-global Downloader / Localization /
                 ContentCountry singleton

Foundational invariants nailed down (per SPEC §3):
* HTTP non-2xx returns Ok(Response); only 429 throws NetworkError::Recaptcha
* Response header keys lowercase-normalized
* Request.add_header PARITY with NPE bug (silent overwrite);
  append_header is our clean addition
* default Localization is en-GB
* No cookie jar in the default downloader

Tests: 7 unit + 7 live smoke against httpbin.org (gated on
'online-tests' feature). All green.
2026-05-24 16:32:36 -07:00