camsoda update

This commit is contained in:
Simon
2026-06-22 13:22:37 +00:00
parent 570a77c90b
commit 342c7dc098
3 changed files with 245 additions and 576 deletions

View File

@@ -73,7 +73,7 @@ This is the current implementation inventory as of this snapshot of the repo. Us
| `eporner` | `mainstream-tube` | no | no | HTML scraper for eporner.com (5M+ videos); card selector `div.mb[data-id]` with inline duration/rating/views/uploader; thumbnails at `static-eu-cdn.eporner.com` (no proxy needed); pagination uses `/{N}/` suffix (page 1 = no suffix, page 2 = `/2/`); search queries map to `/tag/{slug}/` (eporner redirects all keyword searches to tag pages — 404 tag pages still return related content); supports sort: new/popular/rated/best; 65 hardcoded categories via `cat:`, `tag:`, `pornstar:`, `uploader:` query shortcuts; background-loads pornstar name→URL map from `/pornstar-list/`; yt-dlp resolves `video.url` natively (Eporner extractor); no proxy needed. |
| `xnxx` | `mainstream-tube` | no | no | HTML scraper for xnxx.com (10M+ videos); unified card parser handles two formats: `div.thumb-block[data-eid]` (search) and `div.thumb-block.video[data-video='{"id":...}']` (hits); eid extracted from `/video-{eid}/{slug}` URL path; thumbnails at `thumb-cdn77.xnxx-cdn.com` and `thumbs-gcore.xnxx-cdn.com` (no proxy, no Referer needed); 0-indexed pagination (page 1 = `/hits`, page N = `/hits/{N-1}`); default feed is `/hits` (most-viewed — xnxx has no chronological listing); search via `/search/{slug}` (works for keywords and tags); supports `tag:`, `cat:`, `category:` query shortcuts; yt-dlp resolves `video.url` natively (XNXX extractor, returns 4-7 HLS formats); no proxy needed. |
| `xhamster` | `mainstream-tube` | no | no | HTML scraper for xhamster.com; card selector `div[data-video-type="video"]` with `data-video-id`; thumbnails via `img[data-role="thumb-preview-img"]` at `ic-vt-nss.xhcdn.com` (no proxy, no Referer needed); pagination via `?page=N` query param (browse feeds use infinite-scroll so only search reliably returns different content per page); feeds: `/newest` (default), `/most-viewed`, `/best`; categories via `/categories/{slug}`; channels via `/channels/{slug}`; 43 hardcoded categories as `categories` option; uploader type inferred from URL path (`/channels/` → channel, `/creators/` → creator, `/pornstars/` → pornstar); supports `cat:`/`category:` and `channel:` query shortcuts, plus static category name matching; preview mp4 clips from `data-previewvideo` attribute; yt-dlp resolves `video.url` natively (xHamster extractor, 28 formats); no proxy needed. |
| `camsoda` | `live-cams` | no | no | JSON-API provider for camsoda.com recorded model clips. camsoda.com is hard Cloudflare-protected: direct requests and yt-dlp both get HTTP 403, and FlareSolverr was unreliable during development, so the only reliable path through CF is the shared requester's Jina mirror fallback (`r.jina.ai/http://...`, `X-Return-Format: html`) — note Jina rate-limits per IP, so multi-provider builds that burst many concurrent fetches see 429s; validate with a single-provider build (`HOT_TUB_PROVIDER=camsoda`) which makes one fetch at a time. The `/media` page is a CF-protected SPA whose SSR ignores `?page`/`?p`/`sort` (it always renders the same default 60 items); pagination/sort/tag are client-only XHR to a JSON API discovered in the (non-CF-protected) static `pages-media-MediaMainPage` bundle: `GET https://www.camsoda.com/api/v1/media/list/video?page=N&sort_by=<sort>&tag=<slug>` returning `{"result":true,"data":[...]}` — Jina returns that JSON wrapped in a `<pre>`, so the provider extracts the `{...}` slice and parses it (this gives real pagination across ~166 pages, plus sort and tag filtering — fixing the old HTML-scrape that couldn't paginate or search). Rich item fields come straight from the JSON: `name`→title, `username`→uploader slug, `user_display_name`→uploader, `duration` (seconds), `created_at``uploadedAt`, `thumbnail_url` (direct `media-secure.camsoda.com`, no proxy/referer needed). `sort_by` values: `date_added` (default/new), `popular`, `popular_all_time` (top). 49-tag catalog (extracted from the bundle) is exposed via the `categories` option (sanitized out of `/api/status` like other big catalogs, but honored in `/api/videos`) and routed by `tag:`/`cat:`/`category:` prefixes or a bare keyword that exactly matches a tag slug; there is no keyword media-search endpoint, so other bare queries fall back to the default listing for the server's client-side substring filter. `model:`/`uploader:`/`user:`/`performer:` prefixes browse a performer's SSR `/{username}/media` page, parsed via anchor selectors (`[class*="media-item-module__title"]` / `[class*="media-item-module__subtitle"]`). `video.url` is the page URL (`/{username}/media/{slug}/{id}`); recorded clips are token-gated (`token_price`>0, `is_free_no_auth` effectively always false) and CF-blocks both browser and yt-dlp, so no `formats` are populated and playback is not resolvable in this environment (`check.py` reports these as expected CF warnings `www.camsoda.com` is in its CF allowlist not errors). No proxy needed. |
| `camsoda` | `live-cams` | no | no | Live-cam provider for camsoda.com (chaturbate-style — `live` performers streaming now, `video.url` = the room page, `is_live=true`, no `formats`). camsoda.com is hard Cloudflare-protected: direct requests and yt-dlp both get HTTP 403, and FlareSolverr was unreliable during development, so the live-browse API is reached through the shared requester's Jina mirror fallback (`r.jina.ai/http://...`, `X-Return-Format: html`); Jina rate-limits per IP, so the provider caches each fetched feed URL for 60s (and serves stale items on a 429 rather than emptying the feed), and a single-provider build (`HOT_TUB_PROVIDER=camsoda`) validates most cleanly (one fetch at a time). Endpoint (found in the non-CF static `main.js` bundle): `GET https://www.camsoda.com/api/v1/browse/react{route}?p=N` returning a body with a top-level `userList` array (Jina wraps it in `<pre>`, so the provider slices out the `{...}` and parses it with `serde_json::Value`, like the chaturbate provider). Per-cam fields: `username`→id + room URL (`/{username}`), `subjectText`→title (html-decoded, falls back to `displayName`), `displayName`→uploader, `connectionCount`→views (string or number tolerated), `thumbUrl`→thumb (direct `media.livemediahost.com` CDN, no proxy/referer), `status` (skip `offline`), `vr`/`private` surfaced as tags. Category option `category` uses verified `browse/react` routes — `all`(featured)/`girls`/`trans`/`couples`/`voyeur-cams`/`new` (`/male` is NOT a path route, camsoda gates male via `gender-hide`); `cat:`/`category:` prefixes and a bare keyword matching a category id route there too. Search: `GET browse/react/search/{dashed-query}?sortByConnection=1` (single connection-sorted result set, no real paging). Playback: `video.url` is the live room page; the room and the token-gated edge HLS (`*.livemediahost.com`) are both Cloudflare-protected, so HLS can't be resolved server-side and no `formats` are populated — yt-dlp has a `Camsoda` live extractor that resolves the room on a non-CF-blocked client, and `check.py` reports the sandbox's CF 403s as expected warnings (`www.camsoda.com` is in its CF allowlist), not errors. The earlier recorded-`/media` JSON scrape was replaced because clips were token-gated/non-playable; live cams are the site's actual product. No proxy needed. |
| `xvideos` | `mainstream-tube` | no | no | HTML scraper for xvideos.com; handles two card formats: homepage (`div.thumb-block[data-id][data-eid]`) uses `p.title a[title]` + `data-pvv` on img, best-of-month page uses `div.thumb-block.video[data-video=JSON]` with `div.title a` text + `previewVideo` JSON key; thumbnails at `thumb-cdn77.xvideos-cdn.com` / `thumbs-gcore.xvideos-cdn.com` (no proxy needed); latest: `/` (page 1) / `/new/{N-1}` (page N≥2); best-of-month: `/best/{YYYY-MM}` (previous calendar month), page N: `/best/{YYYY-MM}/{N-1}`; search: `/?k={query}` / `/?k={query}&p={N-1}` (0-indexed); tag shortcuts: `/tags/{slug}/{N-1}`; category shortcuts: `/c/{Name}-{ID}/{N-1}` (38 hardcoded categories); `cat:`, `tag:`, `uploader:` query prefix routing; yt-dlp resolves `video.url` natively (XVideos extractor → HLS formats); CDN preview mp4 in `preview` field; no proxy needed. |
| `wowxxx` | `studio-network` | no | no | HTML scraper for wow.xxx premium aggregator; default feed `/latest-updates/`, page 2 `/{N}/` suffix (for example `/latest-updates/2/`), search `/search/{query}/relevance/` with the same page suffix; supports `site:`/`studio:`/`network:`/`model:`/`pornstar:`/`tag:`/`cat:` query shortcuts to direct archive routes; list cards expose preview clips (`cast.wow.xxx/preview/*.mp4`), thumbnails (`img.wow.xxx/.../medium@2x/1.jpg`), duration, rating, views, site (as uploader), and model tags; `video.url` is the detail page URL and yt-dlp resolves HTML5 MP4 formats dynamically; no proxy needed. |