Phase 1 — Foundation
Mirror NPE's dependency-free spine in Rust:
* exceptions — NetworkError + ParsingError + ContentUnavailable
+ ExtractionError tree, with reqwest/serde_json conversions
* localization — Localization + ContentCountry, default (en, GB)
* downloader/ — Downloader trait, Request builder, Response,
reqwest blocking default impl
* page — continuation-token carrier
* image — Image + ImageSet + ResolutionLevel
(HEIGHT_UNKNOWN/WIDTH_UNKNOWN = -1)
* metainfo — title/content/url/url_text grab-bag
* service — StreamingService trait + LinkType + ServiceInfo
* newpipe — process-global Downloader / Localization /
ContentCountry singleton
Foundational invariants nailed down (per SPEC §3):
* HTTP non-2xx returns Ok(Response); only 429 throws NetworkError::Recaptcha
* Response header keys lowercase-normalized
* Request.add_header PARITY with NPE bug (silent overwrite);
append_header is our clean addition
* default Localization is en-GB
* No cookie jar in the default downloader
Tests: 7 unit + 7 live smoke against httpbin.org (gated on
'online-tests' feature). All green.
This commit is contained in:
parent
f44b46fab5
commit
46201c731f
16 changed files with 2689 additions and 1 deletions
1528
Cargo.lock
generated
Normal file
1528
Cargo.lock
generated
Normal file
File diff suppressed because it is too large
Load diff
40
Cargo.toml
Normal file
40
Cargo.toml
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
[package]
|
||||
name = "strawcore"
|
||||
version = "0.1.0"
|
||||
edition = "2021"
|
||||
license = "GPL-3.0-or-later"
|
||||
authors = ["Sulkta-Coop"]
|
||||
repository = "http://192.168.0.5:3001/Sulkta-Coop/strawcore"
|
||||
description = "Rust port of NewPipeExtractor (YT-only). Plugs into Straw via UniFFI."
|
||||
|
||||
[lib]
|
||||
crate-type = ["rlib", "cdylib", "staticlib"]
|
||||
|
||||
[dependencies]
|
||||
reqwest = { version = "0.12", default-features = false, features = ["rustls-tls-webpki-roots", "blocking", "gzip"] }
|
||||
serde = { version = "1", features = ["derive"] }
|
||||
serde_json = "1"
|
||||
thiserror = "1"
|
||||
parking_lot = "0.12"
|
||||
url = "2"
|
||||
once_cell = "1"
|
||||
|
||||
[dev-dependencies]
|
||||
serde_json = "1"
|
||||
|
||||
[features]
|
||||
default = []
|
||||
# `online-tests` gates network-dependent integration tests. Enable with
|
||||
# `cargo test --features online-tests` once an internet route is available.
|
||||
online-tests = []
|
||||
|
||||
[profile.release]
|
||||
strip = true
|
||||
lto = "thin"
|
||||
codegen-units = 1
|
||||
panic = "abort"
|
||||
opt-level = "z"
|
||||
|
||||
[profile.dev]
|
||||
opt-level = 0
|
||||
debug = 1
|
||||
35
README.md
35
README.md
|
|
@ -1,3 +1,36 @@
|
|||
# strawcore
|
||||
|
||||
Rust port of NewPipeExtractor (YT-only). Plugs into Straw via UniFFI.
|
||||
Rust port of [NewPipeExtractor](https://github.com/TeamNewPipe/NewPipeExtractor) (v0.26.2), YouTube-only. Plugs into [Straw](http://192.168.0.5:3001/Sulkta-Coop/straw) via UniFFI.
|
||||
|
||||
## Why this exists
|
||||
|
||||
`rustypipe` regex-parses YouTube's `player.js` and reimplements the signature deobfuscator in Rust. Every YT player rotation breaks it. NPE embeds Mozilla Rhino and executes the JS function live — resilient by design, and that's the architecture we're mirroring.
|
||||
|
||||
The rustypipe-backed Straw build (vc=15..17) also routed playback through iOS-progressive URLs, which hit a server-side ~917 KiB end-byte cap. NPE uses the Android client + po_token → DASH manifest path, which doesn't see the cap. Same fix, different layer.
|
||||
|
||||
See `memory/npe-audit-2026-05-24/SPEC.md` in the workspace repo for the full plan.
|
||||
|
||||
## Status
|
||||
|
||||
| Phase | Subsystem | Status |
|
||||
|---|---|---|
|
||||
| 1 | Foundation (downloader + service spine) | **in progress** |
|
||||
| 2 | JS engine (rquickjs + ress) | pending |
|
||||
| 3 | InnerTube + itag table | pending |
|
||||
| 4 | Stream extractor + DASH | pending |
|
||||
| 5 | PoTokenProvider trait + Android JNI bridge | pending |
|
||||
| 6 | Search + Channel + Playlist + Kiosks | pending |
|
||||
| 7 | UniFFI surface swap | pending |
|
||||
| 8 | Delete rustypipe everywhere | pending |
|
||||
|
||||
## Build + test
|
||||
|
||||
```bash
|
||||
cargo build
|
||||
cargo test --lib # offline unit tests
|
||||
cargo test --features online-tests # full smoke incl. live httpbin.org
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
GPL-3.0-or-later. NPE is GPL-3.0; this port inherits.
|
||||
|
|
|
|||
97
src/downloader/default_impl.rs
Normal file
97
src/downloader/default_impl.rs
Normal file
|
|
@ -0,0 +1,97 @@
|
|||
// reqwest-backed default Downloader.
|
||||
//
|
||||
// Mirrors NewPipe-app's OkHttpDownloaderImpl behavior:
|
||||
// * blocking (mirrors NPE's sync surface; async is deferred to a later
|
||||
// phase that threads tokio through the whole tree)
|
||||
// * no cookie jar — apps hand-build Cookie headers
|
||||
// * up to 10 redirects, ~30s timeout
|
||||
// * HTTP 429 → NetworkError::Recaptcha; all other status codes surface
|
||||
// as Ok(Response)
|
||||
|
||||
use std::time::Duration;
|
||||
|
||||
use reqwest::blocking::Client;
|
||||
use reqwest::redirect::Policy;
|
||||
|
||||
use crate::downloader::request::{Method, Request};
|
||||
use crate::downloader::response::{Headers as RespHeaders, Response};
|
||||
use crate::downloader::Downloader;
|
||||
use crate::exceptions::NetworkError;
|
||||
|
||||
const DEFAULT_TIMEOUT: Duration = Duration::from_secs(30);
|
||||
const MAX_REDIRECTS: usize = 10;
|
||||
const USER_AGENT: &str =
|
||||
"Mozilla/5.0 (Linux; Android 14) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Mobile Safari/537.36";
|
||||
|
||||
pub struct ReqwestDownloader {
|
||||
client: Client,
|
||||
}
|
||||
|
||||
impl ReqwestDownloader {
|
||||
pub fn new() -> Result<Self, NetworkError> {
|
||||
let client = Client::builder()
|
||||
.user_agent(USER_AGENT)
|
||||
.timeout(DEFAULT_TIMEOUT)
|
||||
.redirect(Policy::limited(MAX_REDIRECTS))
|
||||
.gzip(true)
|
||||
.build()?;
|
||||
Ok(Self { client })
|
||||
}
|
||||
|
||||
pub fn with_client(client: Client) -> Self {
|
||||
Self { client }
|
||||
}
|
||||
}
|
||||
|
||||
impl Downloader for ReqwestDownloader {
|
||||
fn execute(&self, request: Request) -> Result<Response, NetworkError> {
|
||||
let method = match request.method() {
|
||||
Method::Get => reqwest::Method::GET,
|
||||
Method::Head => reqwest::Method::HEAD,
|
||||
Method::Post => reqwest::Method::POST,
|
||||
Method::Put => reqwest::Method::PUT,
|
||||
Method::Delete => reqwest::Method::DELETE,
|
||||
};
|
||||
|
||||
let mut builder = self.client.request(method, request.url());
|
||||
|
||||
for (name, values) in request.headers() {
|
||||
for value in values {
|
||||
builder = builder.header(name, value);
|
||||
}
|
||||
}
|
||||
|
||||
if let Some(loc) = request.localization() {
|
||||
if request.automatic_localization_header() {
|
||||
builder = builder.header("Accept-Language", loc.localization_code());
|
||||
}
|
||||
}
|
||||
|
||||
if let Some(body) = request.body() {
|
||||
builder = builder.body(body.to_vec());
|
||||
}
|
||||
|
||||
let resp = builder.send()?;
|
||||
|
||||
let status = resp.status();
|
||||
let url_after_redirects = resp.url().to_string();
|
||||
|
||||
if status.as_u16() == 429 {
|
||||
return Err(NetworkError::Recaptcha { url: url_after_redirects });
|
||||
}
|
||||
|
||||
let code = status.as_u16();
|
||||
let message = status.canonical_reason().unwrap_or("").to_string();
|
||||
|
||||
let mut headers: RespHeaders = RespHeaders::new();
|
||||
for (name, value) in resp.headers().iter() {
|
||||
let key = name.as_str().to_ascii_lowercase();
|
||||
let val = value.to_str().unwrap_or("").to_string();
|
||||
headers.entry(key).or_default().push(val);
|
||||
}
|
||||
|
||||
let body = resp.text()?;
|
||||
|
||||
Ok(Response::new(code, message, headers, body, url_after_redirects))
|
||||
}
|
||||
}
|
||||
48
src/downloader/mod.rs
Normal file
48
src/downloader/mod.rs
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
// Downloader contract — mirrors NPE's Downloader abstract class.
|
||||
//
|
||||
// Foundational invariants (SPEC §3, audited from NPE Downloader.java +
|
||||
// OkHttpDownloaderImpl in the NewPipe-app):
|
||||
//
|
||||
// * No automatic cookie jar. `Cookie:` header is hand-built per request.
|
||||
// * HTTP non-2xx is NOT an error. Only HTTP 429 throws
|
||||
// (NetworkError::Recaptcha). Every other 4xx/5xx surfaces as Ok(Response)
|
||||
// with the status set. Callers inspect themselves.
|
||||
// * Response::headers normalizes keys to lowercase (OkHttp does this for
|
||||
// NPE; we make it contractual).
|
||||
// * Request::add_header mirrors NPE's set-on-add bug — last write wins.
|
||||
// append_header is our clean addition.
|
||||
|
||||
pub mod default_impl;
|
||||
pub mod request;
|
||||
pub mod response;
|
||||
|
||||
use crate::exceptions::NetworkError;
|
||||
use crate::localization::Localization;
|
||||
|
||||
pub use default_impl::ReqwestDownloader;
|
||||
pub use request::{Request, RequestBuilder};
|
||||
pub use response::Response;
|
||||
|
||||
pub trait Downloader: Send + Sync {
|
||||
fn execute(&self, request: Request) -> Result<Response, NetworkError>;
|
||||
|
||||
fn get(&self, url: &str) -> Result<Response, NetworkError> {
|
||||
self.execute(Request::get(url).build())
|
||||
}
|
||||
|
||||
fn get_localized(
|
||||
&self,
|
||||
url: &str,
|
||||
localization: Localization,
|
||||
) -> Result<Response, NetworkError> {
|
||||
self.execute(Request::get(url).localization(Some(localization)).build())
|
||||
}
|
||||
|
||||
fn head(&self, url: &str) -> Result<Response, NetworkError> {
|
||||
self.execute(Request::head(url).build())
|
||||
}
|
||||
|
||||
fn post(&self, url: &str, body: Vec<u8>) -> Result<Response, NetworkError> {
|
||||
self.execute(Request::post(url, body).build())
|
||||
}
|
||||
}
|
||||
192
src/downloader/request.rs
Normal file
192
src/downloader/request.rs
Normal file
|
|
@ -0,0 +1,192 @@
|
|||
// Request + RequestBuilder — mirrors NPE Request.java.
|
||||
//
|
||||
// PARITY: add_header silently overwrites instead of appending, per NPE
|
||||
// Request.java:215-221. Callers depend on this. append_header is our
|
||||
// own clean addition for callers we control.
|
||||
|
||||
use std::collections::BTreeMap;
|
||||
|
||||
use crate::localization::Localization;
|
||||
|
||||
pub type Headers = BTreeMap<String, Vec<String>>;
|
||||
|
||||
#[derive(Clone, Debug, Eq, PartialEq)]
|
||||
pub enum Method {
|
||||
Get,
|
||||
Head,
|
||||
Post,
|
||||
Put,
|
||||
Delete,
|
||||
}
|
||||
|
||||
impl Method {
|
||||
pub fn as_str(&self) -> &'static str {
|
||||
match self {
|
||||
Method::Get => "GET",
|
||||
Method::Head => "HEAD",
|
||||
Method::Post => "POST",
|
||||
Method::Put => "PUT",
|
||||
Method::Delete => "DELETE",
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct Request {
|
||||
method: Method,
|
||||
url: String,
|
||||
headers: Headers,
|
||||
body: Option<Vec<u8>>,
|
||||
localization: Option<Localization>,
|
||||
automatic_localization_header: bool,
|
||||
}
|
||||
|
||||
impl Request {
|
||||
pub fn get(url: impl Into<String>) -> RequestBuilder {
|
||||
RequestBuilder::new(Method::Get, url)
|
||||
}
|
||||
|
||||
pub fn head(url: impl Into<String>) -> RequestBuilder {
|
||||
RequestBuilder::new(Method::Head, url)
|
||||
}
|
||||
|
||||
pub fn post(url: impl Into<String>, body: Vec<u8>) -> RequestBuilder {
|
||||
RequestBuilder::new(Method::Post, url).body(Some(body))
|
||||
}
|
||||
|
||||
pub fn method(&self) -> &Method {
|
||||
&self.method
|
||||
}
|
||||
|
||||
pub fn url(&self) -> &str {
|
||||
&self.url
|
||||
}
|
||||
|
||||
pub fn headers(&self) -> &Headers {
|
||||
&self.headers
|
||||
}
|
||||
|
||||
pub fn body(&self) -> Option<&[u8]> {
|
||||
self.body.as_deref()
|
||||
}
|
||||
|
||||
pub fn localization(&self) -> Option<&Localization> {
|
||||
self.localization.as_ref()
|
||||
}
|
||||
|
||||
pub fn automatic_localization_header(&self) -> bool {
|
||||
self.automatic_localization_header
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct RequestBuilder {
|
||||
method: Method,
|
||||
url: String,
|
||||
headers: Headers,
|
||||
body: Option<Vec<u8>>,
|
||||
localization: Option<Localization>,
|
||||
automatic_localization_header: bool,
|
||||
}
|
||||
|
||||
impl RequestBuilder {
|
||||
pub fn new(method: Method, url: impl Into<String>) -> Self {
|
||||
Self {
|
||||
method,
|
||||
url: url.into(),
|
||||
headers: BTreeMap::new(),
|
||||
body: None,
|
||||
localization: None,
|
||||
automatic_localization_header: true,
|
||||
}
|
||||
}
|
||||
|
||||
/// PARITY with NPE Request.Builder.addHeader: silently overwrites any
|
||||
/// existing values for `name`. Callers downstream of NPE-derived code
|
||||
/// depend on this. For new code prefer [`Self::append_header`].
|
||||
pub fn add_header(mut self, name: impl Into<String>, value: impl Into<String>) -> Self {
|
||||
let key = lowercase(name.into());
|
||||
self.headers.insert(key, vec![value.into()]);
|
||||
self
|
||||
}
|
||||
|
||||
/// Appends a value to `name`, creating the entry if absent. This is the
|
||||
/// behavior NPE's addHeader was intended to have. Use freely in our own
|
||||
/// code; avoid when porting NPE call sites that rely on overwrite.
|
||||
pub fn append_header(mut self, name: impl Into<String>, value: impl Into<String>) -> Self {
|
||||
let key = lowercase(name.into());
|
||||
self.headers.entry(key).or_default().push(value.into());
|
||||
self
|
||||
}
|
||||
|
||||
pub fn headers(mut self, headers: Headers) -> Self {
|
||||
self.headers = headers
|
||||
.into_iter()
|
||||
.map(|(k, v)| (lowercase(k), v))
|
||||
.collect();
|
||||
self
|
||||
}
|
||||
|
||||
pub fn body(mut self, body: Option<Vec<u8>>) -> Self {
|
||||
self.body = body;
|
||||
self
|
||||
}
|
||||
|
||||
pub fn localization(mut self, localization: Option<Localization>) -> Self {
|
||||
self.localization = localization;
|
||||
self
|
||||
}
|
||||
|
||||
pub fn automatic_localization_header(mut self, on: bool) -> Self {
|
||||
self.automatic_localization_header = on;
|
||||
self
|
||||
}
|
||||
|
||||
pub fn build(self) -> Request {
|
||||
Request {
|
||||
method: self.method,
|
||||
url: self.url,
|
||||
headers: self.headers,
|
||||
body: self.body,
|
||||
localization: self.localization,
|
||||
automatic_localization_header: self.automatic_localization_header,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn lowercase(s: String) -> String {
|
||||
s.to_ascii_lowercase()
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn add_header_overwrites_parity() {
|
||||
let r = Request::get("https://x")
|
||||
.add_header("X-Foo", "first")
|
||||
.add_header("X-Foo", "second")
|
||||
.build();
|
||||
assert_eq!(r.headers().get("x-foo"), Some(&vec!["second".into()]));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn append_header_accumulates() {
|
||||
let r = Request::get("https://x")
|
||||
.append_header("X-Foo", "first")
|
||||
.append_header("X-Foo", "second")
|
||||
.build();
|
||||
assert_eq!(
|
||||
r.headers().get("x-foo"),
|
||||
Some(&vec!["first".into(), "second".into()])
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn headers_keys_lowercased() {
|
||||
let r = Request::get("https://x").add_header("Content-Type", "text/plain").build();
|
||||
assert!(r.headers().contains_key("content-type"));
|
||||
assert!(!r.headers().contains_key("Content-Type"));
|
||||
}
|
||||
}
|
||||
96
src/downloader/response.rs
Normal file
96
src/downloader/response.rs
Normal file
|
|
@ -0,0 +1,96 @@
|
|||
// Response — mirrors NPE Response.java.
|
||||
//
|
||||
// Header keys are lowercased (SPEC §3 invariant #3). latest_url tracks the
|
||||
// final URL after redirect chasing — used by every linkHandler and the
|
||||
// channel resolver loop.
|
||||
|
||||
use std::collections::BTreeMap;
|
||||
|
||||
pub type Headers = BTreeMap<String, Vec<String>>;
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct Response {
|
||||
response_code: u16,
|
||||
response_message: String,
|
||||
response_headers: Headers,
|
||||
response_body: String,
|
||||
latest_url: String,
|
||||
}
|
||||
|
||||
impl Response {
|
||||
pub fn new(
|
||||
response_code: u16,
|
||||
response_message: impl Into<String>,
|
||||
response_headers: Headers,
|
||||
response_body: impl Into<String>,
|
||||
latest_url: impl Into<String>,
|
||||
) -> Self {
|
||||
let response_headers = response_headers
|
||||
.into_iter()
|
||||
.map(|(k, v)| (k.to_ascii_lowercase(), v))
|
||||
.collect();
|
||||
Self {
|
||||
response_code,
|
||||
response_message: response_message.into(),
|
||||
response_headers,
|
||||
response_body: response_body.into(),
|
||||
latest_url: latest_url.into(),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn response_code(&self) -> u16 {
|
||||
self.response_code
|
||||
}
|
||||
|
||||
pub fn response_message(&self) -> &str {
|
||||
&self.response_message
|
||||
}
|
||||
|
||||
pub fn response_headers(&self) -> &Headers {
|
||||
&self.response_headers
|
||||
}
|
||||
|
||||
pub fn response_body(&self) -> &str {
|
||||
&self.response_body
|
||||
}
|
||||
|
||||
pub fn latest_url(&self) -> &str {
|
||||
&self.latest_url
|
||||
}
|
||||
|
||||
pub fn header(&self, name: &str) -> Option<&str> {
|
||||
let key = name.to_ascii_lowercase();
|
||||
self.response_headers.get(&key).and_then(|v| v.first()).map(String::as_str)
|
||||
}
|
||||
|
||||
pub fn headers(&self, name: &str) -> Vec<&str> {
|
||||
let key = name.to_ascii_lowercase();
|
||||
match self.response_headers.get(&key) {
|
||||
Some(v) => v.iter().map(String::as_str).collect(),
|
||||
None => Vec::new(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn header_lookup_case_insensitive() {
|
||||
let mut h = Headers::new();
|
||||
h.insert("content-type".into(), vec!["application/json".into()]);
|
||||
let r = Response::new(200, "OK", h, "{}", "https://x");
|
||||
assert_eq!(r.header("Content-Type"), Some("application/json"));
|
||||
assert_eq!(r.header("CONTENT-TYPE"), Some("application/json"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn headers_normalized_to_lowercase() {
|
||||
let mut h = Headers::new();
|
||||
h.insert("X-Foo".into(), vec!["bar".into()]);
|
||||
let r = Response::new(200, "OK", h, "", "https://x");
|
||||
assert!(r.response_headers().contains_key("x-foo"));
|
||||
assert!(!r.response_headers().contains_key("X-Foo"));
|
||||
}
|
||||
}
|
||||
91
src/exceptions.rs
Normal file
91
src/exceptions.rs
Normal file
|
|
@ -0,0 +1,91 @@
|
|||
// Mirrors NewPipeExtractor's exception tree (extractor/src/main/java/org/schabi/newpipe/extractor/exceptions/*).
|
||||
//
|
||||
// NPE's hierarchy:
|
||||
// ExtractionException (root)
|
||||
// ├── ParsingException
|
||||
// ├── ContentNotAvailableException
|
||||
// │ ├── AgeRestrictedContentException
|
||||
// │ ├── GeographicRestrictionException
|
||||
// │ ├── PaidContentException
|
||||
// │ ├── PrivateContentException
|
||||
// │ ├── YoutubeMusicPremiumContentException
|
||||
// │ └── SoundCloudGoPlusContentException
|
||||
// ├── ReCaptchaException
|
||||
// └── AccountTerminatedException
|
||||
//
|
||||
// NetworkError is the IOException-equivalent — only transport-level failures
|
||||
// throw it. HTTP non-2xx returns Ok(Response). HTTP 429 is the one
|
||||
// "downloader-aborts" condition, surfaced as NetworkError::Recaptcha.
|
||||
|
||||
use thiserror::Error;
|
||||
|
||||
#[derive(Debug, Error)]
|
||||
pub enum NetworkError {
|
||||
#[error("network transport: {0}")]
|
||||
Transport(String),
|
||||
|
||||
#[error("HTTP 429 reCAPTCHA at {url}")]
|
||||
Recaptcha { url: String },
|
||||
}
|
||||
|
||||
#[derive(Debug, Error)]
|
||||
pub enum ParsingError {
|
||||
#[error("regex didn't match: {0}")]
|
||||
RegexMiss(String),
|
||||
|
||||
#[error("missing field: {0}")]
|
||||
MissingField(String),
|
||||
|
||||
#[error("unexpected JSON shape: {0}")]
|
||||
JsonShape(String),
|
||||
|
||||
#[error("invalid input: {0}")]
|
||||
Invalid(String),
|
||||
}
|
||||
|
||||
#[derive(Debug, Error)]
|
||||
pub enum ContentUnavailable {
|
||||
#[error("age restricted")]
|
||||
AgeRestricted,
|
||||
#[error("geo restricted")]
|
||||
GeoRestricted,
|
||||
#[error("paid content")]
|
||||
Paid,
|
||||
#[error("private content")]
|
||||
Private,
|
||||
#[error("YouTube Music premium content")]
|
||||
YoutubeMusicPremium,
|
||||
#[error("SoundCloud Go+ content")]
|
||||
SoundCloudGoPlus,
|
||||
#[error("account terminated")]
|
||||
AccountTerminated,
|
||||
#[error("unavailable: {0}")]
|
||||
Other(String),
|
||||
}
|
||||
|
||||
#[derive(Debug, Error)]
|
||||
pub enum ExtractionError {
|
||||
#[error("network: {0}")]
|
||||
Network(#[from] NetworkError),
|
||||
|
||||
#[error("parsing: {0}")]
|
||||
Parsing(#[from] ParsingError),
|
||||
|
||||
#[error("content unavailable: {0}")]
|
||||
ContentUnavailable(#[from] ContentUnavailable),
|
||||
|
||||
#[error("{0}")]
|
||||
Other(String),
|
||||
}
|
||||
|
||||
impl From<reqwest::Error> for NetworkError {
|
||||
fn from(e: reqwest::Error) -> Self {
|
||||
NetworkError::Transport(e.to_string())
|
||||
}
|
||||
}
|
||||
|
||||
impl From<serde_json::Error> for ParsingError {
|
||||
fn from(e: serde_json::Error) -> Self {
|
||||
ParsingError::JsonShape(e.to_string())
|
||||
}
|
||||
}
|
||||
72
src/image.rs
Normal file
72
src/image.rs
Normal file
|
|
@ -0,0 +1,72 @@
|
|||
// Image + ImageSet + ResolutionLevel. Mirrors NPE Image.java.
|
||||
//
|
||||
// HEIGHT_UNKNOWN / WIDTH_UNKNOWN are -1 sentinels per SPEC §3 invariant #10
|
||||
// — kept as i32, not Option<u32>, because several JSON output sites encode
|
||||
// this directly.
|
||||
|
||||
pub const HEIGHT_UNKNOWN: i32 = -1;
|
||||
pub const WIDTH_UNKNOWN: i32 = -1;
|
||||
|
||||
#[derive(Clone, Copy, Debug, Eq, PartialEq, Hash)]
|
||||
pub enum ResolutionLevel {
|
||||
Low,
|
||||
Medium,
|
||||
High,
|
||||
Unknown,
|
||||
}
|
||||
|
||||
impl ResolutionLevel {
|
||||
pub fn from_height(height: i32) -> Self {
|
||||
if height == HEIGHT_UNKNOWN {
|
||||
ResolutionLevel::Unknown
|
||||
} else if height <= 175 {
|
||||
ResolutionLevel::Low
|
||||
} else if height <= 720 {
|
||||
ResolutionLevel::Medium
|
||||
} else {
|
||||
ResolutionLevel::High
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct Image {
|
||||
url: String,
|
||||
height: i32,
|
||||
width: i32,
|
||||
estimated_resolution_level: ResolutionLevel,
|
||||
}
|
||||
|
||||
impl Image {
|
||||
pub fn new(
|
||||
url: impl Into<String>,
|
||||
height: i32,
|
||||
width: i32,
|
||||
estimated_resolution_level: ResolutionLevel,
|
||||
) -> Self {
|
||||
Self {
|
||||
url: url.into(),
|
||||
height,
|
||||
width,
|
||||
estimated_resolution_level,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn url(&self) -> &str {
|
||||
&self.url
|
||||
}
|
||||
|
||||
pub fn height(&self) -> i32 {
|
||||
self.height
|
||||
}
|
||||
|
||||
pub fn width(&self) -> i32 {
|
||||
self.width
|
||||
}
|
||||
|
||||
pub fn estimated_resolution_level(&self) -> ResolutionLevel {
|
||||
self.estimated_resolution_level
|
||||
}
|
||||
}
|
||||
|
||||
pub type ImageSet = Vec<Image>;
|
||||
24
src/lib.rs
Normal file
24
src/lib.rs
Normal file
|
|
@ -0,0 +1,24 @@
|
|||
// Rust port of NewPipeExtractor (YT-only).
|
||||
//
|
||||
// Phase 1 lays the dependency-free spine that everything else builds on:
|
||||
// errors, localization, the Downloader contract, value types, the
|
||||
// StreamingService trait, and the NewPipe singleton. None of this module
|
||||
// tree knows anything about YouTube yet — that lands in Phase 3+.
|
||||
|
||||
pub mod downloader;
|
||||
pub mod exceptions;
|
||||
pub mod image;
|
||||
pub mod localization;
|
||||
pub mod metainfo;
|
||||
pub mod newpipe;
|
||||
pub mod page;
|
||||
pub mod service;
|
||||
|
||||
pub use downloader::{Downloader, Request, Response};
|
||||
pub use exceptions::{ExtractionError, NetworkError, ParsingError};
|
||||
pub use image::{Image, ImageSet, ResolutionLevel};
|
||||
pub use localization::{ContentCountry, Localization};
|
||||
pub use metainfo::MetaInfo;
|
||||
pub use newpipe::NewPipe;
|
||||
pub use page::Page;
|
||||
pub use service::{LinkType, ServiceInfo, StreamingService};
|
||||
109
src/localization.rs
Normal file
109
src/localization.rs
Normal file
|
|
@ -0,0 +1,109 @@
|
|||
// Localization + ContentCountry. Per SPEC §3 invariant #9, the DEFAULT
|
||||
// Localization is ("en", "GB") — not en-US, not the system locale.
|
||||
// NPE's Localization.java exposes ~100 country codes; we ship a small
|
||||
// in-source set today and grow as needed.
|
||||
|
||||
use std::fmt;
|
||||
|
||||
#[derive(Clone, Debug, Eq, PartialEq, Hash)]
|
||||
pub struct Localization {
|
||||
language_code: String,
|
||||
country_code: Option<String>,
|
||||
}
|
||||
|
||||
impl Localization {
|
||||
pub fn new(language_code: impl Into<String>, country_code: Option<String>) -> Self {
|
||||
Self { language_code: language_code.into(), country_code }
|
||||
}
|
||||
|
||||
pub fn from_localization_code(code: &str) -> Option<Self> {
|
||||
let (lang, country) = code.split_once('-').unwrap_or((code, ""));
|
||||
if lang.is_empty() {
|
||||
return None;
|
||||
}
|
||||
Some(Self {
|
||||
language_code: lang.to_string(),
|
||||
country_code: if country.is_empty() { None } else { Some(country.to_string()) },
|
||||
})
|
||||
}
|
||||
|
||||
pub fn language_code(&self) -> &str {
|
||||
&self.language_code
|
||||
}
|
||||
|
||||
pub fn country_code(&self) -> Option<&str> {
|
||||
self.country_code.as_deref()
|
||||
}
|
||||
|
||||
pub fn localization_code(&self) -> String {
|
||||
match &self.country_code {
|
||||
Some(c) => format!("{}-{}", self.language_code, c),
|
||||
None => self.language_code.clone(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for Localization {
|
||||
fn default() -> Self {
|
||||
Self::new("en", Some("GB".into()))
|
||||
}
|
||||
}
|
||||
|
||||
impl fmt::Display for Localization {
|
||||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
f.write_str(&self.localization_code())
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug, Eq, PartialEq, Hash)]
|
||||
pub struct ContentCountry {
|
||||
country_code: String,
|
||||
}
|
||||
|
||||
impl ContentCountry {
|
||||
pub fn new(country_code: impl Into<String>) -> Self {
|
||||
Self { country_code: country_code.into() }
|
||||
}
|
||||
|
||||
pub fn country_code(&self) -> &str {
|
||||
&self.country_code
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for ContentCountry {
|
||||
fn default() -> Self {
|
||||
Self::new("GB")
|
||||
}
|
||||
}
|
||||
|
||||
impl fmt::Display for ContentCountry {
|
||||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
f.write_str(&self.country_code)
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn default_is_en_gb() {
|
||||
let l = Localization::default();
|
||||
assert_eq!(l.language_code(), "en");
|
||||
assert_eq!(l.country_code(), Some("GB"));
|
||||
assert_eq!(l.localization_code(), "en-GB");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parse_localization_code() {
|
||||
let l = Localization::from_localization_code("en-US").unwrap();
|
||||
assert_eq!(l.language_code(), "en");
|
||||
assert_eq!(l.country_code(), Some("US"));
|
||||
|
||||
let l = Localization::from_localization_code("de").unwrap();
|
||||
assert_eq!(l.language_code(), "de");
|
||||
assert_eq!(l.country_code(), None);
|
||||
|
||||
assert!(Localization::from_localization_code("").is_none());
|
||||
}
|
||||
}
|
||||
53
src/metainfo.rs
Normal file
53
src/metainfo.rs
Normal file
|
|
@ -0,0 +1,53 @@
|
|||
// MetaInfo — mirrors NPE MetaInfo.java.
|
||||
//
|
||||
// Carries "info card" style data (knowledge-panel boxes, COVID/election
|
||||
// warning banners, etc.) attached to a stream or search result. Paired URLs
|
||||
// + URL texts — same indices.
|
||||
|
||||
use url::Url;
|
||||
|
||||
#[derive(Clone, Debug, Default)]
|
||||
pub struct MetaInfo {
|
||||
title: String,
|
||||
content: String,
|
||||
urls: Vec<Url>,
|
||||
url_texts: Vec<String>,
|
||||
}
|
||||
|
||||
impl MetaInfo {
|
||||
pub fn new() -> Self {
|
||||
Self::default()
|
||||
}
|
||||
|
||||
pub fn title(&self) -> &str {
|
||||
&self.title
|
||||
}
|
||||
|
||||
pub fn set_title(&mut self, title: impl Into<String>) -> &mut Self {
|
||||
self.title = title.into();
|
||||
self
|
||||
}
|
||||
|
||||
pub fn content(&self) -> &str {
|
||||
&self.content
|
||||
}
|
||||
|
||||
pub fn set_content(&mut self, content: impl Into<String>) -> &mut Self {
|
||||
self.content = content.into();
|
||||
self
|
||||
}
|
||||
|
||||
pub fn urls(&self) -> &[Url] {
|
||||
&self.urls
|
||||
}
|
||||
|
||||
pub fn url_texts(&self) -> &[String] {
|
||||
&self.url_texts
|
||||
}
|
||||
|
||||
pub fn add_url(&mut self, url: Url, text: impl Into<String>) -> &mut Self {
|
||||
self.urls.push(url);
|
||||
self.url_texts.push(text.into());
|
||||
self
|
||||
}
|
||||
}
|
||||
68
src/newpipe.rs
Normal file
68
src/newpipe.rs
Normal file
|
|
@ -0,0 +1,68 @@
|
|||
// NewPipe singleton — mirrors NPE NewPipe.java.
|
||||
//
|
||||
// Holds the process-global Downloader + preferred Localization +
|
||||
// preferred ContentCountry. init() once at startup, then call sites read
|
||||
// the globals through these getters.
|
||||
//
|
||||
// Concrete service registration lands in Phase 3+ once YoutubeService
|
||||
// exists. Phase 1 only wires the globals.
|
||||
|
||||
use std::sync::Arc;
|
||||
|
||||
use parking_lot::RwLock;
|
||||
|
||||
use crate::downloader::Downloader;
|
||||
use crate::localization::{ContentCountry, Localization};
|
||||
|
||||
pub struct NewPipe {
|
||||
downloader: RwLock<Option<Arc<dyn Downloader>>>,
|
||||
preferred_localization: RwLock<Localization>,
|
||||
preferred_content_country: RwLock<ContentCountry>,
|
||||
}
|
||||
|
||||
impl NewPipe {
|
||||
pub fn instance() -> &'static NewPipe {
|
||||
use once_cell::sync::Lazy;
|
||||
static INSTANCE: Lazy<NewPipe> = Lazy::new(|| NewPipe {
|
||||
downloader: RwLock::new(None),
|
||||
preferred_localization: RwLock::new(Localization::default()),
|
||||
preferred_content_country: RwLock::new(ContentCountry::default()),
|
||||
});
|
||||
&INSTANCE
|
||||
}
|
||||
|
||||
pub fn init(downloader: Arc<dyn Downloader>) {
|
||||
*Self::instance().downloader.write() = Some(downloader);
|
||||
}
|
||||
|
||||
pub fn init_full(
|
||||
downloader: Arc<dyn Downloader>,
|
||||
localization: Localization,
|
||||
content_country: ContentCountry,
|
||||
) {
|
||||
let np = Self::instance();
|
||||
*np.downloader.write() = Some(downloader);
|
||||
*np.preferred_localization.write() = localization;
|
||||
*np.preferred_content_country.write() = content_country;
|
||||
}
|
||||
|
||||
pub fn downloader() -> Option<Arc<dyn Downloader>> {
|
||||
Self::instance().downloader.read().clone()
|
||||
}
|
||||
|
||||
pub fn preferred_localization() -> Localization {
|
||||
Self::instance().preferred_localization.read().clone()
|
||||
}
|
||||
|
||||
pub fn preferred_content_country() -> ContentCountry {
|
||||
Self::instance().preferred_content_country.read().clone()
|
||||
}
|
||||
|
||||
pub fn set_preferred_localization(localization: Localization) {
|
||||
*Self::instance().preferred_localization.write() = localization;
|
||||
}
|
||||
|
||||
pub fn set_preferred_content_country(content_country: ContentCountry) {
|
||||
*Self::instance().preferred_content_country.write() = content_country;
|
||||
}
|
||||
}
|
||||
79
src/page.rs
Normal file
79
src/page.rs
Normal file
|
|
@ -0,0 +1,79 @@
|
|||
// Page — continuation token carrier. Mirrors NPE Page.java.
|
||||
//
|
||||
// Used everywhere "the next page" is paginated through an opaque token
|
||||
// (search results, channel videos, playlist videos, comments). The fields
|
||||
// are deliberately a grab-bag — NPE callers stuff whatever they need to
|
||||
// resume.
|
||||
|
||||
use std::collections::BTreeMap;
|
||||
|
||||
#[derive(Clone, Debug, Default)]
|
||||
pub struct Page {
|
||||
url: Option<String>,
|
||||
id: Option<String>,
|
||||
ids: Vec<String>,
|
||||
body: Option<Vec<u8>>,
|
||||
cookies: BTreeMap<String, String>,
|
||||
}
|
||||
|
||||
impl Page {
|
||||
pub fn new() -> Self {
|
||||
Self::default()
|
||||
}
|
||||
|
||||
pub fn with_url(url: impl Into<String>) -> Self {
|
||||
Self { url: Some(url.into()), ..Self::default() }
|
||||
}
|
||||
|
||||
pub fn url(&self) -> Option<&str> {
|
||||
self.url.as_deref()
|
||||
}
|
||||
|
||||
pub fn set_url(&mut self, url: Option<String>) -> &mut Self {
|
||||
self.url = url;
|
||||
self
|
||||
}
|
||||
|
||||
pub fn id(&self) -> Option<&str> {
|
||||
self.id.as_deref()
|
||||
}
|
||||
|
||||
pub fn set_id(&mut self, id: Option<String>) -> &mut Self {
|
||||
self.id = id;
|
||||
self
|
||||
}
|
||||
|
||||
pub fn ids(&self) -> &[String] {
|
||||
&self.ids
|
||||
}
|
||||
|
||||
pub fn set_ids(&mut self, ids: Vec<String>) -> &mut Self {
|
||||
self.ids = ids;
|
||||
self
|
||||
}
|
||||
|
||||
pub fn body(&self) -> Option<&[u8]> {
|
||||
self.body.as_deref()
|
||||
}
|
||||
|
||||
pub fn set_body(&mut self, body: Option<Vec<u8>>) -> &mut Self {
|
||||
self.body = body;
|
||||
self
|
||||
}
|
||||
|
||||
pub fn cookies(&self) -> &BTreeMap<String, String> {
|
||||
&self.cookies
|
||||
}
|
||||
|
||||
pub fn set_cookies(&mut self, cookies: BTreeMap<String, String>) -> &mut Self {
|
||||
self.cookies = cookies;
|
||||
self
|
||||
}
|
||||
|
||||
pub fn is_valid(&self) -> bool {
|
||||
self.url.as_deref().map(|s| !s.is_empty()).unwrap_or(false)
|
||||
|| self.id.as_deref().map(|s| !s.is_empty()).unwrap_or(false)
|
||||
|| !self.ids.is_empty()
|
||||
|| self.body.as_ref().map(|b| !b.is_empty()).unwrap_or(false)
|
||||
}
|
||||
}
|
||||
63
src/service.rs
Normal file
63
src/service.rs
Normal file
|
|
@ -0,0 +1,63 @@
|
|||
// StreamingService trait + ServiceInfo + LinkType. Mirrors NPE
|
||||
// StreamingService.java.
|
||||
//
|
||||
// Phase 1 keeps this dependency-free — no extractor traits, no per-service
|
||||
// linkHandler factories. YouTube's concrete impl lands in Phase 3 once the
|
||||
// JS engine is in place (Phase 2).
|
||||
//
|
||||
// Service IDs are stable persistence keys per SPEC §3 invariant #6:
|
||||
// YouTube = 0, even if we never implement another service.
|
||||
|
||||
use std::fmt;
|
||||
|
||||
#[derive(Clone, Copy, Debug, Eq, PartialEq, Hash)]
|
||||
pub enum LinkType {
|
||||
None,
|
||||
Stream,
|
||||
Channel,
|
||||
Playlist,
|
||||
}
|
||||
|
||||
#[derive(Clone, Copy, Debug, Eq, PartialEq, Hash)]
|
||||
pub enum MediaCapability {
|
||||
Audio,
|
||||
Video,
|
||||
LiveStream,
|
||||
Comments,
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct ServiceInfo {
|
||||
name: String,
|
||||
media_capabilities: Vec<MediaCapability>,
|
||||
}
|
||||
|
||||
impl ServiceInfo {
|
||||
pub fn new(name: impl Into<String>, media_capabilities: Vec<MediaCapability>) -> Self {
|
||||
Self { name: name.into(), media_capabilities }
|
||||
}
|
||||
|
||||
pub fn name(&self) -> &str {
|
||||
&self.name
|
||||
}
|
||||
|
||||
pub fn media_capabilities(&self) -> &[MediaCapability] {
|
||||
&self.media_capabilities
|
||||
}
|
||||
}
|
||||
|
||||
pub trait StreamingService: Send + Sync {
|
||||
fn service_id(&self) -> u32;
|
||||
fn service_info(&self) -> &ServiceInfo;
|
||||
|
||||
fn base_url(&self) -> &str;
|
||||
}
|
||||
|
||||
impl fmt::Debug for dyn StreamingService {
|
||||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
f.debug_struct("StreamingService")
|
||||
.field("service_id", &self.service_id())
|
||||
.field("name", &self.service_info().name())
|
||||
.finish()
|
||||
}
|
||||
}
|
||||
95
tests/foundation_smoke.rs
Normal file
95
tests/foundation_smoke.rs
Normal file
|
|
@ -0,0 +1,95 @@
|
|||
// Phase 1 smoke — exercises the foundation against live httpbin.org.
|
||||
//
|
||||
// Per SPEC §4 Phase 1 "Done when": build a Request, send through default
|
||||
// Downloader, parse Response, confirm latest_url follows redirects.
|
||||
//
|
||||
// These tests hit the network — gated on the `online` cfg so CI offline
|
||||
// runs aren't broken.
|
||||
|
||||
#![cfg(feature = "online-tests")]
|
||||
|
||||
use std::sync::Arc;
|
||||
|
||||
use strawcore::downloader::request::Request;
|
||||
use strawcore::downloader::ReqwestDownloader;
|
||||
use strawcore::exceptions::NetworkError;
|
||||
use strawcore::localization::{ContentCountry, Localization};
|
||||
use strawcore::{Downloader, NewPipe};
|
||||
|
||||
#[test]
|
||||
fn get_through_default_downloader() {
|
||||
let dl = ReqwestDownloader::new().expect("build downloader");
|
||||
let resp = dl.get("https://httpbin.org/get").expect("transport");
|
||||
assert_eq!(resp.response_code(), 200);
|
||||
assert!(resp.response_body().contains("\"url\""));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn latest_url_follows_redirects() {
|
||||
let dl = ReqwestDownloader::new().expect("build downloader");
|
||||
let resp = dl
|
||||
.get("https://httpbin.org/redirect/3")
|
||||
.expect("transport");
|
||||
assert_eq!(resp.response_code(), 200);
|
||||
assert!(
|
||||
resp.latest_url().ends_with("/get"),
|
||||
"latest_url should land at /get after 3 redirects, got {}",
|
||||
resp.latest_url()
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn non_2xx_returns_ok_not_err() {
|
||||
let dl = ReqwestDownloader::new().expect("build downloader");
|
||||
let resp = dl.get("https://httpbin.org/status/404").expect("transport");
|
||||
assert_eq!(resp.response_code(), 404);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn http_429_surfaces_as_recaptcha_err() {
|
||||
let dl = ReqwestDownloader::new().expect("build downloader");
|
||||
let err = dl.get("https://httpbin.org/status/429").expect_err("429 must be NetworkError");
|
||||
match err {
|
||||
NetworkError::Recaptcha { url } => assert!(url.contains("/status/429")),
|
||||
other => panic!("expected Recaptcha, got {other:?}"),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn localization_header_attached_when_enabled() {
|
||||
let dl = ReqwestDownloader::new().expect("build downloader");
|
||||
let req = Request::get("https://httpbin.org/headers")
|
||||
.localization(Some(Localization::new("en", Some("GB".into()))))
|
||||
.build();
|
||||
let resp = dl.execute(req).expect("transport");
|
||||
assert_eq!(resp.response_code(), 200);
|
||||
assert!(
|
||||
resp.response_body().to_ascii_lowercase().contains("accept-language"),
|
||||
"Accept-Language should be echoed by httpbin"
|
||||
);
|
||||
assert!(resp.response_body().contains("en-GB"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn header_keys_lowercased_in_response() {
|
||||
let dl = ReqwestDownloader::new().expect("build downloader");
|
||||
let resp = dl.get("https://httpbin.org/get").expect("transport");
|
||||
for (k, _) in resp.response_headers() {
|
||||
assert_eq!(k, &k.to_ascii_lowercase(), "header key {k} not lowercased");
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn newpipe_singleton_wires_downloader() {
|
||||
let dl = Arc::new(ReqwestDownloader::new().expect("build downloader"));
|
||||
NewPipe::init_full(
|
||||
dl.clone(),
|
||||
Localization::default(),
|
||||
ContentCountry::default(),
|
||||
);
|
||||
|
||||
let from_global = NewPipe::downloader().expect("downloader registered");
|
||||
let resp = from_global.get("https://httpbin.org/get").expect("transport");
|
||||
assert_eq!(resp.response_code(), 200);
|
||||
assert_eq!(NewPipe::preferred_localization().localization_code(), "en-GB");
|
||||
}
|
||||
Loading…
Add table
Add a link
Reference in a new issue