docs: porting plan — NPE sig/nsig pipeline + globalVar indirection (M1)

This commit is contained in:
Kayos 2026-05-24 11:43:20 -07:00
parent 6035e6db4e
commit a25907b494

View file

@ -0,0 +1,123 @@
# Porting NPE's player-JS pipeline into rustypipe
**Branch:** `kayos/m1-sig-port`
**Goal:** Replace `src/deobfuscate.rs`'s narrow regex approach with
NewPipeExtractor's full pipeline so the fork keeps working as YouTube
rotates its `player_ias.vflset/.../base.js`.
## The diagnosis
Upstream rustypipe 0.11.4 (June 2025) extracts the signature
deobfuscation function with six regex patterns aimed at the call site
(`var&&(var=SIGFN(decodeURIComponent(var)))`). On current YouTube player
`c2f7551f` (May 2026) all six miss. NewPipeExtractor master's six
patterns also miss on the same file — and NPE-master's nsig (throttling)
pipeline is openly broken (`TeamNewPipe/NewPipeExtractor#1339`, open
since 2026-02-03; the dev branch has had no sig/nsig commits in 60
days). The reason NPE *appears* to work in apps is that the
Innertube paths for Android / iOS / TV clients return stream URLs that
don't carry an obfuscated `s=` signature for most videos — sig deobf
is a fallback the typical playback path never reaches.
Two structural changes have happened since rustypipe was last cut:
1. **The sig fn call site now sometimes takes a numeric prefix arg.**
New shape: `var&&(var=SIGFN(123,decodeURIComponent(var)))`. NPE's
regex set has one pattern for this; rustypipe doesn't.
2. **YT routes literal token references through a global string array.**
Near the top of every recent `player.js`:
```js
var e="startsWith{redirector.googlevideo.com{split{...{decodeURIComponent{...".split("{")
```
Calls then reference `e[N]` instead of the literal symbol. So an
anchor like `decodeURIComponent` is no longer present at the sig-fn
call site as text — it's `e[37]` (or whatever the index is).
NPE's pipeline handles (1) but not (2). To make the fork robust we
do both.
## What we're porting
| NPE file | Rust target | Notes |
|---|---|---|
| `YoutubeSignatureUtils.java` | `src/deobfuscate.rs` (rewritten) | Sig fn name + body + helper-obj + global-var assembly |
| `YoutubeThrottlingParameterUtils.java` | new `src/deobfuscate/throttling.rs` module | nsig fn name + body + early-return fixup |
| `utils/jsextractor/JavaScriptExtractor.matchToClosingBrace` | new `src/deobfuscate/jslexer.rs` | Find a `name=function` site, walk braces until balanced |
| `YoutubeJavaScriptPlayerManager.java` | already covered by rustypipe's `cache.rs` | We keep rustypipe's cache shape but extend the cached payload to include nsig fn + global var |
## Pipeline (the desired flow)
```
player.js (string)
├── extract_sig_fn_name // 6+ regex patterns, w/ globalVar[N] retry
│ │
│ └── fall back to: // globalVar[N] indirection
│ 1. extract_global_string_array_indices()
│ 2. find N where arr[N] == "decodeURIComponent"
│ 3. re-run patterns with `(?:decodeURIComponent|globalVar\[N\])`
├── extract_sig_fn_body // lexer brace-walk, regex fallback
├── extract_global_var // var X="...".split("{") (verbatim)
├── extract_helper_obj_name // from inside fn body: [;,]NAME[..
├── extract_helper_obj_body // var NAME={...};
└── assemble:
globalVar + ";" + helperObj + ";" + deobfFn + ";" + callerFn
── eval in rquickjs ──→ deobf_sig(input) ⇒ deobf(input)
player.js (string)
├── extract_nsig_fn_name // 7 NPE patterns including arr-index variants
│ │
│ └── if array variant: resolve var NAME=[fn1,fn2,fnN]
├── extract_nsig_fn_body // lexer brace-walk
├── fixup_early_return // strip `if(typeof X==="undefined")return arg;`
└── eval in rquickjs ──→ deobf_nsig(input) ⇒ deobf(input)
```
## Milestones
| ID | Subject | Effort | Gate |
|---|---|---|---|
| M1.1 | Port `matchToClosingBrace` (clean brace walker) to `src/deobfuscate/jslexer.rs` | S | Standalone unit test against a tiny `var Wka=function(d){return /,/}/` fixture |
| M1.2 | Replace `get_sig_fn_name` with NPE's 6 patterns (including `(\d+,)decodeURIComponent`) | S | T-1 fixture is the prior-working `9216d1f7` player + new fixture `c2f7551f.js` |
| M1.3 | Add `extract_global_string_array` returning `(var_name, Vec<String>)` | S | unit test for the `var e="…".split("{")` shape |
| M1.4 | Add `extract_helper_obj_name` from fn body + `extract_helper_obj_body` | S | unit test against the `qB={w8:..,EC:..,Np:..}` style fixture |
| M1.5 | Assemble globalVar + helperObj + sigFn + caller; round-trip via rquickjs | M | the existing `t_deobfuscate_sig` test fixture passes via new code path |
| M1.6 | Add globalVar[N] indirection retry to sig fn name extraction | M | new test: a fixture where the call site uses `e[N]` instead of `decodeURIComponent` |
| M1.7 | Port nsig pipeline (`YoutubeThrottlingParameterUtils`) — 7 patterns + array-resolution + early-return fixup | M | port + run NPE's `nsig_tests` table in `tests/sig_tests.rs` |
| M1.8 | Add live integration test downloading current `player.js` and asserting round-trip end-to-end | S | `cargo test --features live -- t_update` |
| M1.9 | Bump `Cargo.toml` to `0.12.0-sulkta.1`, tag, push to `Sulkta-Coop/rustypipe` `kayos/m1-sig-port` | S | clean release |
## Not in M1 (parking lot)
- Deno / external-JS-runtime swap (yt-dlp's path; we revisit if M1
doesn't hold).
- Caching the assembled deobf code across processes (cookie-jar style
on Android).
- N-tier fallback against multiple geo `player.js` variants if YT ever
splits them.
## Why this is safe-ish to ship
NPE's pipeline is what straw v0.1.0-X currently relies on for the rare
videos that hit the sig path. Porting it 1:1 to Rust gives us a
behavioural baseline equivalent to what NPE provides — no regression
from the Java side. The globalVar[N] indirection added in M1.6 is the
forward-looking piece that handles current `c2f7551f`-style
obfuscation NPE doesn't yet handle. If M1.6 turns out unnecessary
(e.g. NPE-dev lands its own fix first), we can pull the patterns into
parity but keep our generalised resolution layer.
## Tracking
Workspace task IDs:
- `#226` parent — fork + ship the patched fork
- `#230` audit + port the sig pipeline (this milestone)
- `#231` build pipeline + crafting-table integration
When M1 lands, U-2..U-5 revival becomes a `Cargo.toml` dep flip in
`rust/strawcore/` + cherry-pick of the parked commits
(`7ff5ac79e..a13896f5e` on `Sulkta-Coop/straw`).