camsoda update

This commit is contained in:
Simon
2026-06-22 13:22:37 +00:00
parent 570a77c90b
commit 342c7dc098
3 changed files with 245 additions and 576 deletions

View File

@@ -73,7 +73,7 @@ This is the current implementation inventory as of this snapshot of the repo. Us
| `eporner` | `mainstream-tube` | no | no | HTML scraper for eporner.com (5M+ videos); card selector `div.mb[data-id]` with inline duration/rating/views/uploader; thumbnails at `static-eu-cdn.eporner.com` (no proxy needed); pagination uses `/{N}/` suffix (page 1 = no suffix, page 2 = `/2/`); search queries map to `/tag/{slug}/` (eporner redirects all keyword searches to tag pages — 404 tag pages still return related content); supports sort: new/popular/rated/best; 65 hardcoded categories via `cat:`, `tag:`, `pornstar:`, `uploader:` query shortcuts; background-loads pornstar name→URL map from `/pornstar-list/`; yt-dlp resolves `video.url` natively (Eporner extractor); no proxy needed. | | `eporner` | `mainstream-tube` | no | no | HTML scraper for eporner.com (5M+ videos); card selector `div.mb[data-id]` with inline duration/rating/views/uploader; thumbnails at `static-eu-cdn.eporner.com` (no proxy needed); pagination uses `/{N}/` suffix (page 1 = no suffix, page 2 = `/2/`); search queries map to `/tag/{slug}/` (eporner redirects all keyword searches to tag pages — 404 tag pages still return related content); supports sort: new/popular/rated/best; 65 hardcoded categories via `cat:`, `tag:`, `pornstar:`, `uploader:` query shortcuts; background-loads pornstar name→URL map from `/pornstar-list/`; yt-dlp resolves `video.url` natively (Eporner extractor); no proxy needed. |
| `xnxx` | `mainstream-tube` | no | no | HTML scraper for xnxx.com (10M+ videos); unified card parser handles two formats: `div.thumb-block[data-eid]` (search) and `div.thumb-block.video[data-video='{"id":...}']` (hits); eid extracted from `/video-{eid}/{slug}` URL path; thumbnails at `thumb-cdn77.xnxx-cdn.com` and `thumbs-gcore.xnxx-cdn.com` (no proxy, no Referer needed); 0-indexed pagination (page 1 = `/hits`, page N = `/hits/{N-1}`); default feed is `/hits` (most-viewed — xnxx has no chronological listing); search via `/search/{slug}` (works for keywords and tags); supports `tag:`, `cat:`, `category:` query shortcuts; yt-dlp resolves `video.url` natively (XNXX extractor, returns 4-7 HLS formats); no proxy needed. | | `xnxx` | `mainstream-tube` | no | no | HTML scraper for xnxx.com (10M+ videos); unified card parser handles two formats: `div.thumb-block[data-eid]` (search) and `div.thumb-block.video[data-video='{"id":...}']` (hits); eid extracted from `/video-{eid}/{slug}` URL path; thumbnails at `thumb-cdn77.xnxx-cdn.com` and `thumbs-gcore.xnxx-cdn.com` (no proxy, no Referer needed); 0-indexed pagination (page 1 = `/hits`, page N = `/hits/{N-1}`); default feed is `/hits` (most-viewed — xnxx has no chronological listing); search via `/search/{slug}` (works for keywords and tags); supports `tag:`, `cat:`, `category:` query shortcuts; yt-dlp resolves `video.url` natively (XNXX extractor, returns 4-7 HLS formats); no proxy needed. |
| `xhamster` | `mainstream-tube` | no | no | HTML scraper for xhamster.com; card selector `div[data-video-type="video"]` with `data-video-id`; thumbnails via `img[data-role="thumb-preview-img"]` at `ic-vt-nss.xhcdn.com` (no proxy, no Referer needed); pagination via `?page=N` query param (browse feeds use infinite-scroll so only search reliably returns different content per page); feeds: `/newest` (default), `/most-viewed`, `/best`; categories via `/categories/{slug}`; channels via `/channels/{slug}`; 43 hardcoded categories as `categories` option; uploader type inferred from URL path (`/channels/` → channel, `/creators/` → creator, `/pornstars/` → pornstar); supports `cat:`/`category:` and `channel:` query shortcuts, plus static category name matching; preview mp4 clips from `data-previewvideo` attribute; yt-dlp resolves `video.url` natively (xHamster extractor, 28 formats); no proxy needed. | | `xhamster` | `mainstream-tube` | no | no | HTML scraper for xhamster.com; card selector `div[data-video-type="video"]` with `data-video-id`; thumbnails via `img[data-role="thumb-preview-img"]` at `ic-vt-nss.xhcdn.com` (no proxy, no Referer needed); pagination via `?page=N` query param (browse feeds use infinite-scroll so only search reliably returns different content per page); feeds: `/newest` (default), `/most-viewed`, `/best`; categories via `/categories/{slug}`; channels via `/channels/{slug}`; 43 hardcoded categories as `categories` option; uploader type inferred from URL path (`/channels/` → channel, `/creators/` → creator, `/pornstars/` → pornstar); supports `cat:`/`category:` and `channel:` query shortcuts, plus static category name matching; preview mp4 clips from `data-previewvideo` attribute; yt-dlp resolves `video.url` natively (xHamster extractor, 28 formats); no proxy needed. |
| `camsoda` | `live-cams` | no | no | JSON-API provider for camsoda.com recorded model clips. camsoda.com is hard Cloudflare-protected: direct requests and yt-dlp both get HTTP 403, and FlareSolverr was unreliable during development, so the only reliable path through CF is the shared requester's Jina mirror fallback (`r.jina.ai/http://...`, `X-Return-Format: html`) — note Jina rate-limits per IP, so multi-provider builds that burst many concurrent fetches see 429s; validate with a single-provider build (`HOT_TUB_PROVIDER=camsoda`) which makes one fetch at a time. The `/media` page is a CF-protected SPA whose SSR ignores `?page`/`?p`/`sort` (it always renders the same default 60 items); pagination/sort/tag are client-only XHR to a JSON API discovered in the (non-CF-protected) static `pages-media-MediaMainPage` bundle: `GET https://www.camsoda.com/api/v1/media/list/video?page=N&sort_by=<sort>&tag=<slug>` returning `{"result":true,"data":[...]}` — Jina returns that JSON wrapped in a `<pre>`, so the provider extracts the `{...}` slice and parses it (this gives real pagination across ~166 pages, plus sort and tag filtering — fixing the old HTML-scrape that couldn't paginate or search). Rich item fields come straight from the JSON: `name`→title, `username`→uploader slug, `user_display_name`→uploader, `duration` (seconds), `created_at``uploadedAt`, `thumbnail_url` (direct `media-secure.camsoda.com`, no proxy/referer needed). `sort_by` values: `date_added` (default/new), `popular`, `popular_all_time` (top). 49-tag catalog (extracted from the bundle) is exposed via the `categories` option (sanitized out of `/api/status` like other big catalogs, but honored in `/api/videos`) and routed by `tag:`/`cat:`/`category:` prefixes or a bare keyword that exactly matches a tag slug; there is no keyword media-search endpoint, so other bare queries fall back to the default listing for the server's client-side substring filter. `model:`/`uploader:`/`user:`/`performer:` prefixes browse a performer's SSR `/{username}/media` page, parsed via anchor selectors (`[class*="media-item-module__title"]` / `[class*="media-item-module__subtitle"]`). `video.url` is the page URL (`/{username}/media/{slug}/{id}`); recorded clips are token-gated (`token_price`>0, `is_free_no_auth` effectively always false) and CF-blocks both browser and yt-dlp, so no `formats` are populated and playback is not resolvable in this environment (`check.py` reports these as expected CF warnings `www.camsoda.com` is in its CF allowlist not errors). No proxy needed. | | `camsoda` | `live-cams` | no | no | Live-cam provider for camsoda.com (chaturbate-style — `live` performers streaming now, `video.url` = the room page, `is_live=true`, no `formats`). camsoda.com is hard Cloudflare-protected: direct requests and yt-dlp both get HTTP 403, and FlareSolverr was unreliable during development, so the live-browse API is reached through the shared requester's Jina mirror fallback (`r.jina.ai/http://...`, `X-Return-Format: html`); Jina rate-limits per IP, so the provider caches each fetched feed URL for 60s (and serves stale items on a 429 rather than emptying the feed), and a single-provider build (`HOT_TUB_PROVIDER=camsoda`) validates most cleanly (one fetch at a time). Endpoint (found in the non-CF static `main.js` bundle): `GET https://www.camsoda.com/api/v1/browse/react{route}?p=N` returning a body with a top-level `userList` array (Jina wraps it in `<pre>`, so the provider slices out the `{...}` and parses it with `serde_json::Value`, like the chaturbate provider). Per-cam fields: `username`→id + room URL (`/{username}`), `subjectText`→title (html-decoded, falls back to `displayName`), `displayName`→uploader, `connectionCount`→views (string or number tolerated), `thumbUrl`→thumb (direct `media.livemediahost.com` CDN, no proxy/referer), `status` (skip `offline`), `vr`/`private` surfaced as tags. Category option `category` uses verified `browse/react` routes — `all`(featured)/`girls`/`trans`/`couples`/`voyeur-cams`/`new` (`/male` is NOT a path route, camsoda gates male via `gender-hide`); `cat:`/`category:` prefixes and a bare keyword matching a category id route there too. Search: `GET browse/react/search/{dashed-query}?sortByConnection=1` (single connection-sorted result set, no real paging). Playback: `video.url` is the live room page; the room and the token-gated edge HLS (`*.livemediahost.com`) are both Cloudflare-protected, so HLS can't be resolved server-side and no `formats` are populated — yt-dlp has a `Camsoda` live extractor that resolves the room on a non-CF-blocked client, and `check.py` reports the sandbox's CF 403s as expected warnings (`www.camsoda.com` is in its CF allowlist), not errors. The earlier recorded-`/media` JSON scrape was replaced because clips were token-gated/non-playable; live cams are the site's actual product. No proxy needed. |
| `xvideos` | `mainstream-tube` | no | no | HTML scraper for xvideos.com; handles two card formats: homepage (`div.thumb-block[data-id][data-eid]`) uses `p.title a[title]` + `data-pvv` on img, best-of-month page uses `div.thumb-block.video[data-video=JSON]` with `div.title a` text + `previewVideo` JSON key; thumbnails at `thumb-cdn77.xvideos-cdn.com` / `thumbs-gcore.xvideos-cdn.com` (no proxy needed); latest: `/` (page 1) / `/new/{N-1}` (page N≥2); best-of-month: `/best/{YYYY-MM}` (previous calendar month), page N: `/best/{YYYY-MM}/{N-1}`; search: `/?k={query}` / `/?k={query}&p={N-1}` (0-indexed); tag shortcuts: `/tags/{slug}/{N-1}`; category shortcuts: `/c/{Name}-{ID}/{N-1}` (38 hardcoded categories); `cat:`, `tag:`, `uploader:` query prefix routing; yt-dlp resolves `video.url` natively (XVideos extractor → HLS formats); CDN preview mp4 in `preview` field; no proxy needed. | | `xvideos` | `mainstream-tube` | no | no | HTML scraper for xvideos.com; handles two card formats: homepage (`div.thumb-block[data-id][data-eid]`) uses `p.title a[title]` + `data-pvv` on img, best-of-month page uses `div.thumb-block.video[data-video=JSON]` with `div.title a` text + `previewVideo` JSON key; thumbnails at `thumb-cdn77.xvideos-cdn.com` / `thumbs-gcore.xvideos-cdn.com` (no proxy needed); latest: `/` (page 1) / `/new/{N-1}` (page N≥2); best-of-month: `/best/{YYYY-MM}` (previous calendar month), page N: `/best/{YYYY-MM}/{N-1}`; search: `/?k={query}` / `/?k={query}&p={N-1}` (0-indexed); tag shortcuts: `/tags/{slug}/{N-1}`; category shortcuts: `/c/{Name}-{ID}/{N-1}` (38 hardcoded categories); `cat:`, `tag:`, `uploader:` query prefix routing; yt-dlp resolves `video.url` natively (XVideos extractor → HLS formats); CDN preview mp4 in `preview` field; no proxy needed. |
| `wowxxx` | `studio-network` | no | no | HTML scraper for wow.xxx premium aggregator; default feed `/latest-updates/`, page 2 `/{N}/` suffix (for example `/latest-updates/2/`), search `/search/{query}/relevance/` with the same page suffix; supports `site:`/`studio:`/`network:`/`model:`/`pornstar:`/`tag:`/`cat:` query shortcuts to direct archive routes; list cards expose preview clips (`cast.wow.xxx/preview/*.mp4`), thumbnails (`img.wow.xxx/.../medium@2x/1.jpg`), duration, rating, views, site (as uploader), and model tags; `video.url` is the detail page URL and yt-dlp resolves HTML5 MP4 formats dynamically; no proxy needed. | | `wowxxx` | `studio-network` | no | no | HTML scraper for wow.xxx premium aggregator; default feed `/latest-updates/`, page 2 `/{N}/` suffix (for example `/latest-updates/2/`), search `/search/{query}/relevance/` with the same page suffix; supports `site:`/`studio:`/`network:`/`model:`/`pornstar:`/`tag:`/`cat:` query shortcuts to direct archive routes; list cards expose preview clips (`cast.wow.xxx/preview/*.mp4`), thumbnails (`img.wow.xxx/.../medium@2x/1.jpg`), duration, rating, views, site (as uploader), and model tags; `video.url` is the detail page URL and yt-dlp resolves HTML5 MP4 formats dynamically; no proxy needed. |

View File

@@ -3,84 +3,34 @@ use crate::api::ClientVersion;
use crate::providers::{Provider, report_provider_error, requester_or_default}; use crate::providers::{Provider, report_provider_error, requester_or_default};
use crate::status::*; use crate::status::*;
use crate::util::cache::VideoCache; use crate::util::cache::VideoCache;
use crate::util::time::parse_time_to_seconds;
use crate::videos::{ServerOptions, VideoItem}; use crate::videos::{ServerOptions, VideoItem};
use async_trait::async_trait; use async_trait::async_trait;
use chrono::NaiveDateTime;
use error_chain::error_chain; use error_chain::error_chain;
use htmlentity::entity::{ICodedDataTrait, decode}; use htmlentity::entity::{ICodedDataTrait, decode};
use scraper::{Html, Selector};
use serde::Deserialize;
use std::collections::HashSet;
pub const CHANNEL_METADATA: crate::providers::ProviderChannelMetadata = pub const CHANNEL_METADATA: crate::providers::ProviderChannelMetadata =
crate::providers::ProviderChannelMetadata { crate::providers::ProviderChannelMetadata {
group_id: "live-cams", group_id: "live-cams",
tags: &["cams", "amateur", "recordings", "clips"], tags: &["live", "cams", "amateur", "webcam"],
}; };
const BASE_URL: &str = "https://www.camsoda.com"; const BASE_URL: &str = "https://www.camsoda.com";
const CHANNEL_ID: &str = "camsoda"; const CHANNEL_ID: &str = "camsoda";
/// Recorded-media listing API. The site's `/media` page is a CF-protected SPA /// Live-cam browse API. The `/` SPA loads this over XHR; it is Cloudflare-
/// that loads this JSON endpoint over XHR for every page/sort/tag change: /// protected on direct access (HTTP 403), so the shared requester's Jina mirror
/// GET /api/v1/media/list/video?page=N&sort_by=<sort>&tag=<slug> /// fallback is what gets through. Response body has `userList` at the top level.
/// Direct access is Cloudflare-challenged (HTTP 403), so the shared requester const API_BROWSE: &str = "https://www.camsoda.com/api/v1/browse/react";
/// falls back to the Jina mirror, which returns the JSON wrapped in a `<pre>`.
const API_LIST: &str = "https://www.camsoda.com/api/v1/media/list/video";
/// Tag catalog exposed by the media filter dropdown (extracted from the /// Category routes appended to `browse/react` (verified against the live API).
/// MediaMainPage bundle). Used for the `categories` option and to route bare / /// `id` is what the client sends back in the `category` option; `route` is the
/// `tag:`/`cat:` queries straight to a tag archive instead of dropping them. /// path segment (empty = the default featured feed).
const MEDIA_TAGS: &[&str] = &[ const CATEGORIES: &[(&str, &str, &str)] = &[
"amateur", ("all", "Featured", ""),
"anal", ("girls", "Girls", "/girls"),
"asian", ("trans", "Trans", "/trans"),
"ass", ("couples", "Couples", "/couples"),
"bbw", ("voyeur-cams", "Voyeur Cams", "/voyeur-cams"),
"big-ass", ("new", "New", "/girls/new"),
"big-tits",
"black",
"blonde-hair",
"blowjob",
"bondage",
"brown-hair",
"college",
"cosplay",
"creampie",
"cum",
"curvy",
"dildo",
"dp",
"ebony",
"facial",
"feet",
"fetish",
"hairy-pussy",
"hd",
"japanese",
"latina",
"lesbian",
"lovense",
"lush",
"massage",
"masturbation",
"milf",
"muscle",
"ohmibod",
"outdoor",
"petite",
"pov",
"public",
"red-hair",
"shaved-pussy",
"small-tits",
"squirting",
"swallow",
"teen-18",
"threesome",
"toys",
"tranny",
"voyeur",
]; ];
error_chain! { error_chain! {
@@ -96,237 +46,118 @@ error_chain! {
} }
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct CamsodaProvider { pub struct CamsodaProvider;
url: String,
}
#[derive(Debug, Clone, PartialEq, Eq)] #[derive(Debug, Clone, PartialEq, Eq)]
enum Target { enum Target {
/// Default/tag listing via the JSON API. `tag` is `None` for "all". /// Live browse feed at `browse/react{route}?p=N`.
Listing { tag: Option<String> }, Browse { route: String },
/// A specific model's recorded-media page (`/{username}/media`). /// Keyword search at `browse/react/search/{query}?sortByConnection=1`.
Model { username: String }, Search { query: String },
}
/// Subset of the `media/list` JSON item fields the provider consumes. All
/// nullable fields are modelled as `Option` so a stray `null` never aborts the
/// whole page parse.
#[derive(Debug, Deserialize)]
struct ApiMediaItem {
id: i64,
#[serde(default)]
name: Option<String>,
#[serde(default)]
slug: Option<String>,
#[serde(default)]
duration: Option<i64>,
#[serde(default)]
created_at: Option<String>,
#[serde(default)]
thumbnail_url: Option<String>,
#[serde(default)]
user_display_name: Option<String>,
#[serde(default)]
username: Option<String>,
#[serde(default)]
is_video: Option<bool>,
}
#[derive(Debug, Deserialize)]
struct ApiResponse {
#[serde(default)]
data: Vec<ApiMediaItem>,
} }
impl CamsodaProvider { impl CamsodaProvider {
pub fn new() -> Self { pub fn new() -> Self {
Self { CamsodaProvider
url: BASE_URL.to_string(),
}
} }
fn build_channel(&self, _clientversion: ClientVersion) -> Channel { fn build_channel(&self, _clientversion: ClientVersion) -> Channel {
let cat_options = std::iter::once(FilterOption { let cat_options = CATEGORIES
id: "all".to_string(), .iter()
title: "All".to_string(), .map(|(id, title, _)| FilterOption {
id: id.to_string(),
title: title.to_string(),
}) })
.chain(MEDIA_TAGS.iter().map(|slug| FilterOption {
id: slug.to_string(),
title: Self::pretty_tag(slug),
}))
.collect::<Vec<_>>(); .collect::<Vec<_>>();
Channel { Channel {
id: CHANNEL_ID.to_string(), id: CHANNEL_ID.to_string(),
name: "CamSoda".to_string(), name: "CamSoda".to_string(),
description: description: "CamSoda live webcams — free adult cam shows streaming right now."
"CamSoda model video clips — recorded amateur cam shows uploaded by performers."
.to_string(), .to_string(),
premium: false, premium: false,
favicon: "https://www.google.com/s2/favicons?sz=64&domain=camsoda.com".to_string(), favicon: "https://www.google.com/s2/favicons?sz=64&domain=camsoda.com".to_string(),
status: "active".to_string(), status: "active".to_string(),
categories: vec![], categories: vec![],
options: vec![ options: vec![ChannelOption {
ChannelOption { id: "category".to_string(),
id: "sort".to_string(), title: "Category".to_string(),
title: "Sort".to_string(), description: "Browse a CamSoda live-cam category.".to_string(),
description: "Order the CamSoda media feed.".to_string(),
systemImage: "list.number".to_string(),
colorName: "blue".to_string(),
options: vec![
FilterOption {
id: "new".to_string(),
title: "Newest".to_string(),
},
FilterOption {
id: "popular".to_string(),
title: "Popular".to_string(),
},
FilterOption {
id: "top".to_string(),
title: "Popular (All Time)".to_string(),
},
],
multiSelect: false,
},
ChannelOption {
id: "categories".to_string(),
title: "Categories".to_string(),
description: "Filter CamSoda media by tag.".to_string(),
systemImage: "square.grid.2x2".to_string(), systemImage: "square.grid.2x2".to_string(),
colorName: "orange".to_string(), colorName: "orange".to_string(),
options: cat_options, options: cat_options,
multiSelect: false, multiSelect: false,
}, }],
],
nsfw: true, nsfw: true,
cacheDuration: Some(1800), cacheDuration: Some(60),
} }
} }
/// Map a Hot Tub sort id to the API's `sort_by` value. /// Resolve the category id (from the option or a query prefix) to a route.
fn map_sort(sort: &str) -> &'static str { fn route_for_category(value: &str) -> Option<String> {
match sort.trim().to_ascii_lowercase().as_str() { let key = value.trim().trim_start_matches('#').to_ascii_lowercase();
"popular" | "trending" | "hot" | "featured" => "popular",
"top" | "rated" | "best" | "mostviewed" | "most_viewed" | "popular_all_time" => {
"popular_all_time"
}
// "new", "newest", "latest", "recent", empty, anything else
_ => "date_added",
}
}
/// Lowercase/space-normalize a value for tag lookups.
fn normalize_key(s: &str) -> String {
s.trim()
.trim_start_matches('#')
.replace(['_', ' '], "-")
.to_ascii_lowercase()
}
/// Resolve a user-supplied value to a known tag slug, if it matches one.
fn resolve_tag(value: &str) -> Option<String> {
let key = Self::normalize_key(value);
if key.is_empty() { if key.is_empty() {
return None; return None;
} }
MEDIA_TAGS CATEGORIES
.iter() .iter()
.find(|slug| **slug == key) .find(|(id, _, _)| *id == key)
.map(|slug| slug.to_string()) .map(|(_, _, route)| route.to_string())
} }
/// Pretty display title for a tag slug (e.g. `big-tits` -> `Big Tits`). /// Decide what to fetch from the query and the selected category option.
fn pretty_tag(slug: &str) -> String {
slug.split('-')
.map(|word| match word {
"dp" => "DP".to_string(),
"pov" => "POV".to_string(),
"bbw" => "BBW".to_string(),
"hd" => "HD".to_string(),
"18" => "18".to_string(),
other => {
let mut chars = other.chars();
match chars.next() {
Some(first) => {
first.to_uppercase().collect::<String>() + chars.as_str()
}
None => String::new(),
}
}
})
.collect::<Vec<_>>()
.join(" ")
}
/// Resolve the fetch target from the query and the selected category option.
fn pick_target(query: Option<&str>, category: Option<&str>) -> Target { fn pick_target(query: Option<&str>, category: Option<&str>) -> Target {
// An explicitly selected category option wins.
if let Some(cat) = category { if let Some(cat) = category {
let cat = cat.trim(); if let Some(route) = Self::route_for_category(cat) {
if !cat.is_empty() && cat != "all" { return Target::Browse { route };
if let Some(tag) = Self::resolve_tag(cat) {
return Target::Listing { tag: Some(tag) };
}
// Unknown but non-empty: still pass a slug through to the API.
return Target::Listing {
tag: Some(Self::normalize_key(cat)),
};
} }
} }
let Some(query) = query.map(str::trim).filter(|v| !v.is_empty()) else { let Some(query) = query.map(str::trim).filter(|v| !v.is_empty()) else {
return Target::Listing { tag: None }; return Target::Browse {
route: String::new(),
};
}; };
// Model shortcuts browse a performer's media page. // `cat:`/`category:` prefixes route to a known category.
for prefix in &["uploader:", "model:", "user:", "performer:"] { for prefix in &["category:", "cat:"] {
if let Some(username) = query.strip_prefix(prefix) {
let username = username.trim().to_lowercase();
if !username.is_empty() {
return Target::Model { username };
}
}
}
// Tag/category shortcuts route straight to a tag archive.
for prefix in &["tag:", "cat:", "category:"] {
if let Some(rest) = query.strip_prefix(prefix) { if let Some(rest) = query.strip_prefix(prefix) {
let slug = Self::resolve_tag(rest).unwrap_or_else(|| Self::normalize_key(rest)); if let Some(route) = Self::route_for_category(rest) {
if !slug.is_empty() { return Target::Browse { route };
return Target::Listing { tag: Some(slug) };
} }
} }
} }
// A bare keyword that exactly matches a known tag is much better served // A bare keyword that exactly matches a category id goes to that feed.
// by that tag archive than by the (non-existent) media search endpoint. if let Some(route) = Self::route_for_category(query) {
if let Some(tag) = Self::resolve_tag(query) { return Target::Browse { route };
return Target::Listing { tag: Some(tag) };
} }
// No media keyword search exists; fall back to the default listing and Target::Search {
// let the server apply its client-side substring filter for quoted query: query.to_string(),
// queries. }
Target::Listing { tag: None }
} }
fn build_api_url(tag: Option<&str>, sort: &str, page: u16) -> String { fn build_url(&self, target: &Target, page: u16) -> String {
let page = page.max(1); let page = page.max(1);
match tag { match target {
Some(tag) if !tag.is_empty() && tag != "all" => { Target::Browse { route } => format!("{}{}?p={}", API_BROWSE, route, page),
format!("{API_LIST}?page={page}&sort_by={sort}&tag={tag}") Target::Search { query } => {
let q = Self::slug_query(query);
// Search is a single connection-sorted result set (no paging).
format!("{}/search/{}?sortByConnection=1", API_BROWSE, q)
} }
_ => format!("{API_LIST}?page={page}&sort_by={sort}"),
} }
} }
fn build_model_url(&self, username: &str, page: u16) -> String { /// Encode a free-text query into the `/search/<q>` path segment.
if page <= 1 { fn slug_query(query: &str) -> String {
format!("{}/{}/media", self.url, username) query
} else { .trim()
format!("{}/{}/media?page={}", self.url, username, page) .to_ascii_lowercase()
} .split_whitespace()
.collect::<Vec<_>>()
.join("-")
} }
fn clean_text(text: &str) -> String { fn clean_text(text: &str) -> String {
@@ -336,91 +167,103 @@ impl CamsodaProvider {
decoded.split_whitespace().collect::<Vec<_>>().join(" ") decoded.split_whitespace().collect::<Vec<_>>().join(" ")
} }
fn parse_created_at(value: &str) -> Option<u64> {
NaiveDateTime::parse_from_str(value.trim(), "%Y-%m-%dT%H:%M:%S")
.ok()
.map(|dt| dt.and_utc().timestamp())
.and_then(|ts| u64::try_from(ts).ok())
}
/// Extract the JSON object from a body that may be wrapped in HTML by the /// Extract the JSON object from a body that may be wrapped in HTML by the
/// Jina mirror (`<pre>{...}</pre>`) or returned raw. /// Jina mirror (`<pre>{...}</pre>`) or returned raw.
fn extract_json(body: &str) -> Option<&str> { fn extract_json(body: &str) -> Option<&str> {
let start = body.find('{')?; let start = body.find('{')?;
let end = body.rfind('}')?; let end = body.rfind('}')?;
if end > start { (end > start).then(|| &body[start..=end])
Some(&body[start..=end])
} else {
None
}
} }
/// Parse the `media/list` JSON response into rich `VideoItem`s. /// Parse the `browse/react` JSON (top-level `userList`) into live
fn parse_api_items(body: &str, tag: Option<&str>) -> Result<Vec<VideoItem>> { /// `VideoItem`s. Done with `serde_json::Value` so a stray field shape never
/// aborts the whole page (mirrors the chaturbate provider).
fn parse_items(body: &str) -> Result<Vec<VideoItem>> {
let json = Self::extract_json(body) let json = Self::extract_json(body)
.ok_or_else(|| Error::from("no JSON object found in response".to_string()))?; .ok_or_else(|| Error::from("no JSON object found in response".to_string()))?;
let parsed: ApiResponse = serde_json::from_str(json) let root: serde_json::Value = serde_json::from_str(json)
.map_err(|e| Error::from(format!("media/list JSON decode failed: {e}")))?; .map_err(|e| Error::from(format!("browse JSON decode failed: {e}")))?;
let mut items = Vec::with_capacity(parsed.data.len()); let users = root
let mut seen: HashSet<i64> = HashSet::new(); .get("userList")
.and_then(|v| v.as_array())
.ok_or_else(|| Error::from("missing userList array".to_string()))?;
for media in parsed.data { let mut items = Vec::with_capacity(users.len());
if matches!(media.is_video, Some(false)) { for user in users {
let Some(username) = user.get("username").and_then(|v| v.as_str()) else {
continue;
};
if username.is_empty() {
continue; continue;
} }
if !seen.insert(media.id) {
continue;
}
let username = media.username.unwrap_or_default();
let slug = media.slug.unwrap_or_default();
if username.is_empty() || slug.is_empty() {
continue;
}
let video_id = media.id.to_string();
let title = media // Skip offline performers; everything else in the browse feed is live.
.name let status = user.get("status").and_then(|v| v.as_str()).unwrap_or("");
.as_deref() if status.eq_ignore_ascii_case("offline") {
continue;
}
let display = user
.get("displayName")
.and_then(|v| v.as_str())
.filter(|s| !s.is_empty())
.unwrap_or(username);
let subject = user
.get("subjectText")
.and_then(|v| v.as_str())
.map(Self::clean_text) .map(Self::clean_text)
.filter(|t| !t.is_empty()) .filter(|s| !s.is_empty());
.unwrap_or_else(|| format!("CamSoda video {video_id}")); let title = subject.unwrap_or_else(|| display.to_string());
let duration = media let thumb = user
.duration .get("thumbUrl")
.and_then(|d| u32::try_from(d).ok()) .and_then(|v| v.as_str())
.filter(|s| !s.is_empty())
.or_else(|| user.get("offlinePictureUrl").and_then(|v| v.as_str()))
.unwrap_or("")
.to_string();
// connectionCount is usually a string ("34") but tolerate a number.
let views = user
.get("connectionCount")
.map(|v| match v {
serde_json::Value::String(s) => s.parse::<u32>().unwrap_or(0),
serde_json::Value::Number(n) => n.as_u64().unwrap_or(0) as u32,
_ => 0,
})
.unwrap_or(0); .unwrap_or(0);
let thumb = media.thumbnail_url.unwrap_or_default(); let room_url = format!("{BASE_URL}/{username}");
let page_url = format!("{BASE_URL}/{username}/media/{slug}/{video_id}");
let mut tags = Vec::new();
if status.eq_ignore_ascii_case("private") {
tags.push("Private Show".to_string());
}
if user
.get("vr")
.and_then(|v| v.as_bool())
.unwrap_or(false)
{
tags.push("VR".to_string());
}
let mut item = VideoItem::new( let mut item = VideoItem::new(
video_id, username.to_string(),
title, title,
page_url, room_url.clone(),
CHANNEL_ID.to_string(), CHANNEL_ID.to_string(),
thumb, thumb,
duration, 0,
); )
.is_live(true)
let uploader = media .views(views)
.user_display_name .uploader(display.to_string())
.as_deref() .uploader_url(room_url);
.map(Self::clean_text)
.filter(|u| !u.is_empty())
.unwrap_or_else(|| username.clone());
item.uploader = Some(uploader);
item.uploaderUrl = Some(format!("{BASE_URL}/{username}/media"));
item.uploaderId = Some(format!("{CHANNEL_ID}:{username}")); item.uploaderId = Some(format!("{CHANNEL_ID}:{username}"));
if !tags.is_empty() {
if let Some(ts) = media.created_at.as_deref().and_then(Self::parse_created_at) { item.tags = Some(tags);
item.uploadedAt = Some(ts);
}
if let Some(tag) = tag {
if !tag.is_empty() && tag != "all" {
item.tags = Some(vec![Self::pretty_tag(tag)]);
}
} }
items.push(item); items.push(item);
@@ -429,178 +272,57 @@ impl CamsodaProvider {
Ok(items) Ok(items)
} }
/// Parse video cards from the HTML of a CamSoda model media page. async fn fetch(
///
/// Each card is an anchor linking to `/{username}/media/{slug}/{id}` with a
/// `media-item-module__title` span and a `media-item-module__subtitle` span
/// holding `by UPLOADER (MM:SS)`.
fn parse_html_items(html: &str) -> Vec<VideoItem> {
let document = Html::parse_document(html);
let anchor_sel = match Selector::parse(r#"a[href]"#) {
Ok(s) => s,
Err(_) => return vec![],
};
let title_sel = match Selector::parse(r#"[class*="media-item-module__title"]"#) {
Ok(s) => s,
Err(_) => return vec![],
};
let subtitle_sel = match Selector::parse(r#"[class*="media-item-module__subtitle"]"#) {
Ok(s) => s,
Err(_) => return vec![],
};
let img_sel = match Selector::parse(r#"img[src]"#) {
Ok(s) => s,
Err(_) => return vec![],
};
let sub_re =
match regex::Regex::new(r"(?i)^by\s+(.+?)\s+\((\d{1,2}:\d{2}(?::\d{2})?)\)\s*$") {
Ok(r) => r,
Err(_) => return vec![],
};
let href_re = match regex::Regex::new(r"^/([^/]+)/media/([^/]+)/(\d+)$") {
Ok(r) => r,
Err(_) => return vec![],
};
let mut items: Vec<VideoItem> = Vec::new();
let mut seen_ids: HashSet<String> = HashSet::new();
for anchor in document.select(&anchor_sel) {
let href = match anchor.value().attr("href") {
Some(h) => h,
None => continue,
};
let caps = match href_re.captures(href) {
Some(c) => c,
None => continue,
};
let username = caps.get(1).map(|m| m.as_str()).unwrap_or("").to_string();
let slug = caps.get(2).map(|m| m.as_str()).unwrap_or("").to_string();
let video_id = caps.get(3).map(|m| m.as_str()).unwrap_or("").to_string();
if video_id.is_empty() || username.is_empty() {
continue;
}
if !seen_ids.insert(video_id.clone()) {
continue;
}
let title = anchor
.select(&title_sel)
.next()
.map(|el| el.text().collect::<String>().trim().to_string())
.unwrap_or_default();
let title = if title.is_empty() {
format!("CamSoda video {video_id}")
} else {
title
};
let subtitle = anchor
.select(&subtitle_sel)
.next()
.map(|el| el.text().collect::<String>().trim().to_string())
.unwrap_or_default();
let (uploader, duration) = if let Some(sc) = sub_re.captures(&subtitle) {
let u = sc
.get(1)
.map(|m| m.as_str().trim().to_string())
.unwrap_or_default();
let d = sc
.get(2)
.and_then(|m| parse_time_to_seconds(m.as_str()))
.and_then(|s| u32::try_from(s).ok())
.unwrap_or(0);
(if u.is_empty() { None } else { Some(u) }, d)
} else {
(None, 0)
};
let thumb = anchor
.select(&img_sel)
.filter_map(|img| img.value().attr("src"))
.find(|src| src.contains("media-secure.camsoda.com"))
.or_else(|| {
anchor
.select(&img_sel)
.filter_map(|img| img.value().attr("src"))
.find(|src| src.contains("livemediahost.com"))
})
.unwrap_or("")
.to_string();
let page_url = format!("{BASE_URL}/{username}/media/{slug}/{video_id}");
let mut item = VideoItem::new(
video_id,
title,
page_url,
CHANNEL_ID.to_string(),
thumb,
duration,
);
item.uploader = uploader;
item.uploaderUrl = Some(format!("{BASE_URL}/{username}/media"));
item.uploaderId = Some(format!("{CHANNEL_ID}:{username}"));
items.push(item);
}
items
}
async fn fetch_listing(
&self, &self,
tag: Option<&str>, target: &Target,
sort: &str,
page: u16, page: u16,
cache: &VideoCache,
options: &ServerOptions, options: &ServerOptions,
) -> Result<Vec<VideoItem>> { ) -> Result<Vec<VideoItem>> {
let url = Self::build_api_url(tag, sort, page); let url = self.build_url(target, page);
let mut requester = requester_or_default(options, CHANNEL_ID, "fetch_listing");
let text = requester // Short cache (the feed changes constantly and Jina rate-limits hard).
.get(&url, None) let stale = match cache.get(&url) {
.await Some((time, items)) => {
.map_err(|e| Error::from(format!("fetch failed for {url}: {e}")))?; if time.elapsed().unwrap_or_default().as_secs() < 60 {
return Ok(items.clone());
}
items.clone()
}
None => vec![],
};
let mut requester = requester_or_default(options, CHANNEL_ID, "fetch");
let text = match requester.get(&url, None).await {
Ok(t) => t,
Err(e) => {
report_provider_error(CHANNEL_ID, "fetch.request", &format!("url={url}; error={e}"))
.await;
return Ok(stale);
}
};
if text.contains("cf-browser-verification") if text.contains("cf-browser-verification")
|| text.contains("cf-chl") || text.contains("cf-chl")
|| text.contains("Just a moment") || text.contains("Just a moment")
{ {
return Err(Error::from( report_provider_error(CHANNEL_ID, "fetch.cloudflare", &format!("url={url}")).await;
"cloudflare challenge page returned".to_string(), return Ok(stale);
));
} }
Self::parse_api_items(&text, tag) match Self::parse_items(&text) {
Ok(items) if !items.is_empty() => {
cache.remove(&url);
cache.insert(url, items.clone());
Ok(items)
}
Ok(_) => Ok(stale),
Err(e) => {
report_provider_error(CHANNEL_ID, "fetch.parse", &format!("url={url}; error={e}"))
.await;
Ok(stale)
} }
async fn fetch_model(
&self,
username: &str,
page: u16,
options: &ServerOptions,
) -> Result<Vec<VideoItem>> {
let url = self.build_model_url(username, page);
let mut requester = requester_or_default(options, CHANNEL_ID, "fetch_model");
let text = requester
.get(&url, None)
.await
.map_err(|e| Error::from(format!("fetch failed for {url}: {e}")))?;
if text.contains("cf-browser-verification")
|| text.contains("cf-chl")
|| text.contains("Just a moment")
{
return Err(Error::from(
"cloudflare challenge page returned".to_string(),
));
} }
Ok(Self::parse_html_items(&text))
} }
} }
@@ -610,13 +332,12 @@ impl Provider for CamsodaProvider {
&self, &self,
cache: VideoCache, cache: VideoCache,
pool: DbPool, pool: DbPool,
sort: String, _sort: String,
query: Option<String>, query: Option<String>,
page: String, page: String,
per_page: String, per_page: String,
options: ServerOptions, options: ServerOptions,
) -> Vec<VideoItem> { ) -> Vec<VideoItem> {
let _ = cache;
let _ = pool; let _ = pool;
let _ = per_page; let _ = per_page;
@@ -633,17 +354,8 @@ impl Provider for CamsodaProvider {
.or(options.category.as_deref()); .or(options.category.as_deref());
let target = Self::pick_target(normalized_query.as_deref(), category); let target = Self::pick_target(normalized_query.as_deref(), category);
let sort_value = Self::map_sort(&sort);
let result = match &target { match self.fetch(&target, page, &cache, &options).await {
Target::Listing { tag } => {
self.fetch_listing(tag.as_deref(), sort_value, page, &options)
.await
}
Target::Model { username } => self.fetch_model(username, page, &options).await,
};
match result {
Ok(items) => items, Ok(items) => items,
Err(error) => { Err(error) => {
report_provider_error(CHANNEL_ID, "get_videos", &error.to_string()).await; report_provider_error(CHANNEL_ID, "get_videos", &error.to_string()).await;
@@ -661,128 +373,81 @@ impl Provider for CamsodaProvider {
mod tests { mod tests {
use super::*; use super::*;
fn sample_api_body() -> String { fn sample_browse() -> String {
// Mimics the Jina mirror response: JSON wrapped in a <pre> block. // Mimics the Jina mirror response: JSON wrapped in a <pre> block.
r#"<html><head></head><body><pre>{"result":true,"data":[ r#"<html><body><pre>{"perPageCount":60,"totalCount":98,"userList":[
{"id":15032118,"user_id":18777219,"type_id":2,"name":"Extreme Fuck &amp; Squirt","slug":"fuck-machine-squirt-surprise","token_price":555,"created_at":"2025-06-07T10:13:35","duration":2711,"is_video":true,"thumbnail_url":"https://media-secure.camsoda.com/user/videos/15032118/15032118_1749294653.thumb.jpg","type_name":"Video","user_display_name":"Lola Bunniii","username":"lolabunniii"}, {"id":1,"username":"theowonder","displayName":"Theo Wonder","connectionCount":"69","status":"online","subjectText":"hey guys &amp; girls","thumbUrl":"https://media.livemediahost.com/thumbs/199/theowonder.jpg?cb=1","streamName":"cam_obs/theowonder-flu","vr":false},
{"id":17009049,"user_id":1,"type_id":2,"name":"FIRST IR BG SHOW","slug":"first-ir-bg-show","token_price":0,"created_at":"2026-02-12T08:00:00","duration":2654,"is_video":true,"thumbnail_url":"https://media-secure.camsoda.com/user/videos/17009049/17009049.thumb.webp","type_name":"Video","user_display_name":"Coco Dethick","username":"coco-dethick"}, {"id":2,"username":"miavr","displayName":"Mia","connectionCount":139,"status":"private","subjectText":"","thumbUrl":"https://media.livemediahost.com/thumbs/197/miavr.jpg","vr":true},
{"id":99,"user_id":2,"type_id":1,"name":"a picture","slug":"pic","is_video":false,"username":"someone"} {"id":3,"username":"gone","displayName":"Gone","connectionCount":"0","status":"offline","subjectText":"bye","thumbUrl":""}
]}</pre></body></html>"#.to_string() ]}</pre></body></html>"#
.to_string()
} }
#[test] #[test]
fn parses_api_items() { fn parses_live_items_and_skips_offline() {
let items = CamsodaProvider::parse_api_items(&sample_api_body(), None).unwrap(); let items = CamsodaProvider::parse_items(&sample_browse()).unwrap();
assert_eq!(items.len(), 2, "non-video item should be skipped"); assert_eq!(items.len(), 2, "offline performer should be skipped");
let a = &items[0]; let a = &items[0];
assert_eq!(a.id, "15032118"); assert_eq!(a.id, "theowonder");
assert_eq!(a.title, "Extreme Fuck & Squirt", "html entity should decode"); assert_eq!(a.title, "hey guys & girls", "subject html-decoded");
assert_eq!(a.duration, 2711); assert_eq!(a.url, "https://www.camsoda.com/theowonder");
assert_eq!( assert!(a.isLive);
a.url, assert_eq!(a.views, Some(69));
"https://www.camsoda.com/lolabunniii/media/fuck-machine-squirt-surprise/15032118" assert_eq!(a.uploader.as_deref(), Some("Theo Wonder"));
); assert_eq!(a.uploaderId.as_deref(), Some("camsoda:theowonder"));
assert_eq!(a.uploader.as_deref(), Some("Lola Bunniii")); assert!(a.thumb.contains("media.livemediahost.com"));
assert_eq!(a.uploaderId.as_deref(), Some("camsoda:lolabunniii"));
assert!(a.thumb.contains("media-secure.camsoda.com"));
assert!(a.uploadedAt.is_some(), "created_at should parse");
let b = &items[1]; let b = &items[1];
assert_eq!(b.id, "17009049"); assert_eq!(b.id, "miavr");
assert_eq!(b.uploader.as_deref(), Some("Coco Dethick")); // numeric connectionCount tolerated
assert_eq!(b.views, Some(139));
// empty subject falls back to display name
assert_eq!(b.title, "Mia");
// private + vr surface as tags
let tags = b.tags.clone().unwrap_or_default();
assert!(tags.contains(&"Private Show".to_string()));
assert!(tags.contains(&"VR".to_string()));
} }
#[test] #[test]
fn tags_attached_when_filtering() { fn picks_target() {
let items =
CamsodaProvider::parse_api_items(&sample_api_body(), Some("big-tits")).unwrap();
assert_eq!(items[0].tags.as_deref(), Some(["Big Tits".to_string()].as_slice()));
}
#[test]
fn maps_sort_values() {
assert_eq!(CamsodaProvider::map_sort("new"), "date_added");
assert_eq!(CamsodaProvider::map_sort(""), "date_added");
assert_eq!(CamsodaProvider::map_sort("popular"), "popular");
assert_eq!(CamsodaProvider::map_sort("top"), "popular_all_time");
assert_eq!(CamsodaProvider::map_sort("rated"), "popular_all_time");
}
#[test]
fn picks_target_correctly() {
assert_eq!( assert_eq!(
CamsodaProvider::pick_target(None, None), CamsodaProvider::pick_target(None, None),
Target::Listing { tag: None } Target::Browse { route: String::new() }
); );
// bare non-tag keyword -> default listing (server substring-filters)
assert_eq!( assert_eq!(
CamsodaProvider::pick_target(Some("some random phrase"), None), CamsodaProvider::pick_target(None, Some("girls")),
Target::Listing { tag: None } Target::Browse { route: "/girls".to_string() }
); );
// bare keyword matching a known tag -> tag archive
assert_eq!( assert_eq!(
CamsodaProvider::pick_target(Some("blowjob"), None), CamsodaProvider::pick_target(Some("cat:trans"), None),
Target::Listing { Target::Browse { route: "/trans".to_string() }
tag: Some("blowjob".to_string())
}
); );
// tag: prefix
assert_eq!( assert_eq!(
CamsodaProvider::pick_target(Some("tag:big tits"), None), CamsodaProvider::pick_target(Some("voyeur-cams"), None),
Target::Listing { Target::Browse { route: "/voyeur-cams".to_string() }
tag: Some("big-tits".to_string())
}
); );
// category option selected
assert_eq!( assert_eq!(
CamsodaProvider::pick_target(None, Some("latina")), CamsodaProvider::pick_target(Some("blonde teen"), None),
Target::Listing { Target::Search { query: "blonde teen".to_string() }
tag: Some("latina".to_string())
}
);
// model shortcut
assert_eq!(
CamsodaProvider::pick_target(Some("model:katt-leya"), None),
Target::Model {
username: "katt-leya".to_string()
}
); );
} }
#[test] #[test]
fn builds_api_urls() { fn builds_urls() {
let p = CamsodaProvider::new();
assert_eq!( assert_eq!(
CamsodaProvider::build_api_url(None, "date_added", 1), p.build_url(&Target::Browse { route: String::new() }, 1),
"https://www.camsoda.com/api/v1/media/list/video?page=1&sort_by=date_added" "https://www.camsoda.com/api/v1/browse/react?p=1"
); );
assert_eq!( assert_eq!(
CamsodaProvider::build_api_url(None, "date_added", 3), p.build_url(&Target::Browse { route: "/girls".to_string() }, 3),
"https://www.camsoda.com/api/v1/media/list/video?page=3&sort_by=date_added" "https://www.camsoda.com/api/v1/browse/react/girls?p=3"
); );
assert_eq!( assert_eq!(
CamsodaProvider::build_api_url(Some("big-tits"), "popular", 2), p.build_url(&Target::Search { query: "big boobs".to_string() }, 1),
"https://www.camsoda.com/api/v1/media/list/video?page=2&sort_by=popular&tag=big-tits" "https://www.camsoda.com/api/v1/browse/react/search/big-boobs?sortByConnection=1"
); );
} }
#[test]
fn pretty_tag_titles() {
assert_eq!(CamsodaProvider::pretty_tag("big-tits"), "Big Tits");
assert_eq!(CamsodaProvider::pretty_tag("pov"), "POV");
assert_eq!(CamsodaProvider::pretty_tag("teen-18"), "Teen 18");
}
#[test]
fn parses_model_html() {
let html = r#"<a href="/lil-asian-jaz/media/torso-ride/16984249">
<span><span class="media-item-module__title--x">Torso ride</span><span class="media-item-module__subtitle--y">by jazzyj (24:35)</span></span>
<img src="https://media-secure.camsoda.com/user/videos/16984249/16984249.thumb.webp">
</a>"#;
let items = CamsodaProvider::parse_html_items(html);
assert_eq!(items.len(), 1);
assert_eq!(items[0].id, "16984249");
assert_eq!(items[0].uploader.as_deref(), Some("jazzyj"));
assert_eq!(items[0].duration, 24 * 60 + 35);
}
} }

View File

@@ -194,7 +194,11 @@ impl VideoItem {
self self
} }
#[cfg(any(not(hottub_single_provider), hottub_provider = "chaturbate"))] #[cfg(any(
not(hottub_single_provider),
hottub_provider = "chaturbate",
hottub_provider = "camsoda"
))]
pub fn is_live(mut self, is_live: bool) -> Self { pub fn is_live(mut self, is_live: bool) -> Self {
self.isLive = is_live; self.isLive = is_live;
self self