Compare commits
34 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
9499356d8b | ||
|
|
45af69c5c1 | ||
|
|
7478b27078 | ||
|
|
ad51ad7637 | ||
|
|
86c96315e1 | ||
|
|
a11f53571b | ||
|
|
590bfed134 | ||
|
|
fcac444ca5 | ||
|
|
af6109ed45 | ||
|
|
ebab95aeb6 | ||
|
|
fba82f10d6 | ||
|
|
e20022463a | ||
|
|
5b53115e28 | ||
|
|
9e98255608 | ||
|
|
3ecdbbc787 | ||
|
|
a719e01e86 | ||
|
|
31e507288a | ||
|
|
d5b615253a | ||
|
|
4277724af7 | ||
|
|
c8412d651a | ||
|
|
e81f43adca | ||
|
|
dd79b7633c | ||
|
|
47ae8dee22 | ||
|
|
f8ffcaba3e | ||
|
|
fc39a6055f | ||
|
|
5bf2b74b69 | ||
|
|
20ffd65657 | ||
|
|
2a89a60723 | ||
|
|
d414031d9c | ||
|
|
cc1c95feb8 | ||
|
|
d659d33b60 | ||
|
|
83be858d61 | ||
|
|
cbe576f7a8 | ||
|
|
a4497de282 |
36
CHANGELOG.md
36
CHANGELOG.md
@@ -1,5 +1,41 @@
|
||||
# Changelog
|
||||
|
||||
## 1.31.6 - 2026-02-08
|
||||
### Extractors
|
||||
#### Additions
|
||||
- [fikfap] add `hashtag` extractor ([#9018](https://github.com/mikf/gallery-dl/issues/9018))
|
||||
#### Fixes
|
||||
- [8chan] fail downloads of `POW` images ([#8975](https://github.com/mikf/gallery-dl/issues/8975))
|
||||
- [artstation] fix embedded videos ([#8972](https://github.com/mikf/gallery-dl/issues/8972) [#9003](https://github.com/mikf/gallery-dl/issues/9003))
|
||||
- [artstation] fix `challenge` extractor
|
||||
- [imagefap:user] support multiple pages ([#9016](https://github.com/mikf/gallery-dl/issues/9016))
|
||||
- [imhentai] use alternate strategy for galleries without image data ([#8951](https://github.com/mikf/gallery-dl/issues/8951))
|
||||
- [instagram] use `/topsearch/` to fetch user information ([#8978](https://github.com/mikf/gallery-dl/issues/8978))
|
||||
- [pixiv] fix errors when using metadata options for avatar/background
|
||||
- [simpcity] extract `tiktok` & `reddit` media embeds ([#8994](https://github.com/mikf/gallery-dl/issues/8994) [#8996](https://github.com/mikf/gallery-dl/issues/8996))
|
||||
- [tiktok] always try to resolve JS challenges ([#8993](https://github.com/mikf/gallery-dl/issues/8993))
|
||||
- [tiktok] use time cursor for story requests ([#8991](https://github.com/mikf/gallery-dl/issues/8991))
|
||||
- [tiktok] identify when user accounts do not exist ([#8977](https://github.com/mikf/gallery-dl/issues/8977))
|
||||
- [tiktok] do not exit early when rolling back cursor ([#8968](https://github.com/mikf/gallery-dl/issues/8968))
|
||||
- [xhamster] fix user profile extraction ([#8974](https://github.com/mikf/gallery-dl/issues/8974))
|
||||
#### Improvements
|
||||
- [8chan] skip `TOS` cookie name lookup if already present
|
||||
- [artstation] download `/8k/` images ([#9003](https://github.com/mikf/gallery-dl/issues/9003))
|
||||
- [discord:server-search] use `max_id` for pagination
|
||||
- [fikfap] allow for dashes in usernames ([#9019](https://github.com/mikf/gallery-dl/issues/9019))
|
||||
- [instagram] cache user profile results on disk ([#8978](https://github.com/mikf/gallery-dl/issues/8978))
|
||||
- [reddit:user] implement `only` option ([#8228](https://github.com/mikf/gallery-dl/issues/8228))
|
||||
- [reddit:user] provide `user` metadata ([#8228](https://github.com/mikf/gallery-dl/issues/8228))
|
||||
- [tiktok] fix outdated error message ([#8979](https://github.com/mikf/gallery-dl/issues/8979))
|
||||
- [twitter] support `article` media ([#8995](https://github.com/mikf/gallery-dl/issues/8995))
|
||||
- [xenforo] implement `"order-posts": "reaction"` ([#8997](https://github.com/mikf/gallery-dl/issues/8997))
|
||||
### Cookies
|
||||
- add support for `Floorp` ([#9005](https://github.com/mikf/gallery-dl/issues/9005))
|
||||
- support `Firefox` 147+ profile paths ([#8803](https://github.com/mikf/gallery-dl/issues/8803))
|
||||
### Miscellaneous
|
||||
- [job] fix overwriting `_extractor` fields ([#8958](https://github.com/mikf/gallery-dl/issues/8958))
|
||||
- use tempfile when updating input files ([#8981](https://github.com/mikf/gallery-dl/issues/8981))
|
||||
|
||||
## 1.31.5 - 2026-01-31
|
||||
### Extractors
|
||||
#### Additions
|
||||
|
||||
@@ -79,9 +79,9 @@ Standalone Executable
|
||||
Prebuilt executable files with a Python interpreter and
|
||||
required Python packages included are available for
|
||||
|
||||
- `Windows <https://github.com/mikf/gallery-dl/releases/download/v1.31.5/gallery-dl.exe>`__
|
||||
- `Windows <https://github.com/mikf/gallery-dl/releases/download/v1.31.6/gallery-dl.exe>`__
|
||||
(Requires `Microsoft Visual C++ Redistributable Package (x86) <https://aka.ms/vs/17/release/vc_redist.x86.exe>`__)
|
||||
- `Linux <https://github.com/mikf/gallery-dl/releases/download/v1.31.5/gallery-dl.bin>`__
|
||||
- `Linux <https://github.com/mikf/gallery-dl/releases/download/v1.31.6/gallery-dl.bin>`__
|
||||
|
||||
|
||||
Nightly Builds
|
||||
|
||||
@@ -5350,6 +5350,16 @@ Note
|
||||
but it will not always get the best video quality available.
|
||||
|
||||
|
||||
extractor.reddit.user.only
|
||||
--------------------------
|
||||
Type
|
||||
``bool``
|
||||
Default
|
||||
``trur``
|
||||
Description
|
||||
Only process and return posts from the user specified in the input URL.
|
||||
|
||||
|
||||
extractor.redgifs.format
|
||||
------------------------
|
||||
Type
|
||||
@@ -6284,6 +6294,16 @@ Description
|
||||
Fetch media from promoted Tweets.
|
||||
|
||||
|
||||
extractor.twitter.articles
|
||||
--------------------------
|
||||
Type
|
||||
``bool``
|
||||
Default
|
||||
``true``
|
||||
Description
|
||||
Download media embedded in articles.
|
||||
|
||||
|
||||
extractor.twitter.cards
|
||||
-----------------------
|
||||
Type
|
||||
@@ -7196,15 +7216,20 @@ extractor.[xenforo].order-posts
|
||||
Type
|
||||
``string``
|
||||
Default
|
||||
``"desc"``
|
||||
``thread``
|
||||
``"desc"``
|
||||
otherwise
|
||||
``"asc"``
|
||||
Description
|
||||
Controls the order in which
|
||||
posts of a ``thread`` are processed.
|
||||
posts of a ``thread`` or `media` files are processed.
|
||||
|
||||
``"asc"``
|
||||
Ascending order (oldest first)
|
||||
``"desc"`` | ``"reverse"``
|
||||
Descending order (newest first)
|
||||
``"reaction"`` | ``"score"``
|
||||
Reaction Score order (``threads`` only)
|
||||
|
||||
|
||||
extractor.ytdl.cmdline-args
|
||||
|
||||
@@ -704,7 +704,11 @@
|
||||
"previews" : true,
|
||||
"recursion" : 0,
|
||||
"selftext" : null,
|
||||
"videos" : "dash"
|
||||
"videos" : "dash",
|
||||
|
||||
"user": {
|
||||
"only": true
|
||||
}
|
||||
},
|
||||
"redgifs":
|
||||
{
|
||||
@@ -876,6 +880,7 @@
|
||||
"cookies" : null,
|
||||
|
||||
"ads" : false,
|
||||
"articles" : true,
|
||||
"cards" : false,
|
||||
"cards-blacklist": [],
|
||||
"csrf" : "cookies",
|
||||
|
||||
@@ -346,7 +346,7 @@ Consider all listed sites to potentially be NSFW.
|
||||
<tr id="fikfap" title="fikfap">
|
||||
<td>FikFap</td>
|
||||
<td>https://fikfap.com/</td>
|
||||
<td>Posts, User Profiles</td>
|
||||
<td>Hashtags, Posts, User Profiles</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr id="fitnakedgirls" title="fitnakedgirls">
|
||||
|
||||
@@ -1,11 +1,12 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2014-2025 Mike Fährmann
|
||||
# Copyright 2014-2026 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
# published by the Free Software Foundation.
|
||||
|
||||
import os
|
||||
import sys
|
||||
import logging
|
||||
from . import version, config, option, output, extractor, job, util, exception
|
||||
@@ -542,11 +543,14 @@ class InputManager():
|
||||
|
||||
def _rewrite(self):
|
||||
url, path, action, indicies = self._item
|
||||
path_tmp = path + ".tmp"
|
||||
lines = self.files[path]
|
||||
action(lines, indicies)
|
||||
|
||||
try:
|
||||
with open(path, "w", encoding="utf-8") as fp:
|
||||
with open(path_tmp, "w", encoding="utf-8") as fp:
|
||||
fp.writelines(lines)
|
||||
os.replace(path_tmp, path)
|
||||
except Exception as exc:
|
||||
self.log.warning(
|
||||
"Unable to update '%s' (%s: %s)",
|
||||
|
||||
@@ -26,7 +26,7 @@ from . import aes, text, util
|
||||
|
||||
SUPPORTED_BROWSERS_CHROMIUM = {
|
||||
"brave", "chrome", "chromium", "edge", "opera", "thorium", "vivaldi"}
|
||||
SUPPORTED_BROWSERS_FIREFOX = {"firefox", "librewolf", "zen"}
|
||||
SUPPORTED_BROWSERS_FIREFOX = {"firefox", "librewolf", "zen", "floorp"}
|
||||
SUPPORTED_BROWSERS_WEBKIT = {"safari", "orion"}
|
||||
SUPPORTED_BROWSERS = \
|
||||
SUPPORTED_BROWSERS_CHROMIUM \
|
||||
@@ -51,8 +51,8 @@ def load_cookies(browser_specification):
|
||||
|
||||
def load_cookies_firefox(browser_name, profile=None,
|
||||
container=None, domain=None):
|
||||
path, container_id = _firefox_cookies_database(browser_name,
|
||||
profile, container)
|
||||
path, container_id = _firefox_cookies_database(
|
||||
browser_name, profile, container)
|
||||
|
||||
sql = ("SELECT name, value, host, path, isSecure, expiry "
|
||||
"FROM moz_cookies")
|
||||
@@ -219,13 +219,16 @@ def _firefox_cookies_database(browser_name, profile=None, container=None):
|
||||
elif _is_path(profile):
|
||||
search_root = profile
|
||||
else:
|
||||
search_root = os.path.join(
|
||||
_firefox_browser_directory(browser_name), profile)
|
||||
search_root = _firefox_browser_directory(browser_name)
|
||||
if isinstance(search_root, str):
|
||||
search_root = os.path.join(search_root, profile)
|
||||
else:
|
||||
search_root = [os.path.join(dir, profile) for dir in search_root]
|
||||
|
||||
path = _find_most_recently_used_file(search_root, "cookies.sqlite")
|
||||
if path is None:
|
||||
raise FileNotFoundError(f"Unable to find {browser_name.capitalize()} "
|
||||
f"cookies database in {search_root}")
|
||||
f"cookies database")
|
||||
|
||||
_log_debug("Extracting cookies from %s", path)
|
||||
|
||||
@@ -272,6 +275,7 @@ def _firefox_browser_directory(browser_name):
|
||||
"firefox" : join(appdata, R"Mozilla\Firefox\Profiles"),
|
||||
"librewolf": join(appdata, R"librewolf\Profiles"),
|
||||
"zen" : join(appdata, R"zen\Profiles"),
|
||||
"floorp" : join(appdata, R"Floorp\Profiles")
|
||||
}[browser_name]
|
||||
elif sys.platform == "darwin":
|
||||
appdata = os.path.expanduser("~/Library/Application Support")
|
||||
@@ -279,14 +283,25 @@ def _firefox_browser_directory(browser_name):
|
||||
"firefox" : join(appdata, R"Firefox/Profiles"),
|
||||
"librewolf": join(appdata, R"librewolf/Profiles"),
|
||||
"zen" : join(appdata, R"zen/Profiles"),
|
||||
"floorp" : join(appdata, R"Floorp/Profiles")
|
||||
}[browser_name]
|
||||
else:
|
||||
home = os.path.expanduser("~")
|
||||
return {
|
||||
"firefox" : join(home, R".mozilla/firefox"),
|
||||
"librewolf": join(home, R".librewolf"),
|
||||
"zen" : join(home, R".zen"),
|
||||
}[browser_name]
|
||||
if browser_name == "firefox":
|
||||
config = (os.environ.get("XDG_CONFIG_HOME") or
|
||||
os.path.expanduser("~/.config"))
|
||||
return (
|
||||
# versions >= 147
|
||||
join(config, "mozilla/firefox"),
|
||||
# versions <= 146
|
||||
home + "/.mozilla/firefox",
|
||||
# Flatpak
|
||||
home + "/.var/app/org.mozilla.firefox/config/mozilla/firefox",
|
||||
home + "/.var/app/org.mozilla.firefox/.mozilla/firefox",
|
||||
# Snap
|
||||
home + "/snap/firefox/common/.mozilla/firefox",
|
||||
)
|
||||
return f"{home}/.{browser_name}"
|
||||
|
||||
|
||||
# --------------------------------------------------------------------
|
||||
@@ -1095,19 +1110,24 @@ def _decrypt_windows_dpapi(ciphertext):
|
||||
return result
|
||||
|
||||
|
||||
def _find_most_recently_used_file(root, filename):
|
||||
def _find_most_recently_used_file(roots, filename):
|
||||
if isinstance(roots, str):
|
||||
roots = (roots,)
|
||||
|
||||
# if the provided root points to an exact profile path
|
||||
# check if it contains the wanted filename
|
||||
first_choice = os.path.join(root, filename)
|
||||
if os.path.exists(first_choice):
|
||||
return first_choice
|
||||
for root in roots:
|
||||
first_choice = os.path.join(root, filename)
|
||||
if os.path.exists(first_choice):
|
||||
return first_choice
|
||||
|
||||
# if there are multiple browser profiles, take the most recently used one
|
||||
paths = []
|
||||
for curr_root, dirs, files in os.walk(root):
|
||||
for file in files:
|
||||
if file == filename:
|
||||
paths.append(os.path.join(curr_root, file))
|
||||
for root in roots:
|
||||
for curr_root, dirs, files in os.walk(root):
|
||||
for file in files:
|
||||
if file == filename:
|
||||
paths.append(os.path.join(curr_root, file))
|
||||
if not paths:
|
||||
return None
|
||||
return max(paths, key=lambda path: os.lstat(path).st_mtime)
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2022-2025 Mike Fährmann
|
||||
# Copyright 2022-2026 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@@ -27,6 +27,13 @@ class _8chanExtractor(Extractor):
|
||||
|
||||
@memcache()
|
||||
def cookies_tos_name(self):
|
||||
domain = "8chan." + self.groups[0]
|
||||
for cookie in self.cookies:
|
||||
if cookie.domain == domain and \
|
||||
cookie.name.lower().startswith("tos"):
|
||||
self.log.debug("TOS cookie name: %s", cookie.name)
|
||||
return cookie.name
|
||||
|
||||
url = self.root + "/.static/pages/confirmed.html"
|
||||
headers = {"Referer": self.root + "/.static/pages/disclaimer.html"}
|
||||
response = self.request(url, headers=headers, allow_redirects=False)
|
||||
@@ -37,7 +44,7 @@ class _8chanExtractor(Extractor):
|
||||
return cookie.name
|
||||
|
||||
self.log.error("Unable to determin TOS cookie name")
|
||||
return "TOS20241009"
|
||||
return "TOS20250418"
|
||||
|
||||
@memcache()
|
||||
def cookies_prepare(self):
|
||||
@@ -100,6 +107,7 @@ class _8chanThreadExtractor(_8chanExtractor):
|
||||
for num, file in enumerate(files):
|
||||
file.update(thread)
|
||||
file["num"] = num
|
||||
file["_http_validate"] = _validate
|
||||
text.nameext_from_url(file["originalName"], file)
|
||||
yield Message.Url, self.root + file["path"], file
|
||||
|
||||
@@ -130,3 +138,12 @@ class _8chanBoardExtractor(_8chanExtractor):
|
||||
return
|
||||
url = f"{self.root}/{board}/{pnum}.json"
|
||||
threads = self.request_json(url)["threads"]
|
||||
|
||||
|
||||
def _validate(response):
|
||||
hget = response.headers.get
|
||||
return not (
|
||||
hget("expires") == "0" and
|
||||
hget("content-length") == "166" and
|
||||
hget("content-type") == "image/png"
|
||||
)
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2018-2025 Mike Fährmann
|
||||
# Copyright 2018-2026 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@@ -64,7 +64,7 @@ class ArtstationExtractor(Extractor):
|
||||
if "/images/images/" in url:
|
||||
lhs, _, rhs = url.partition("/large/")
|
||||
if rhs:
|
||||
url = f"{lhs}/4k/{rhs}"
|
||||
url = f"{lhs}/8k/{rhs}"
|
||||
asset["_fallback"] = self._image_fallback(lhs, rhs)
|
||||
|
||||
yield Message.Url, url, asset
|
||||
@@ -88,7 +88,8 @@ class ArtstationExtractor(Extractor):
|
||||
if not self.videos:
|
||||
return
|
||||
page = self.request(url).text
|
||||
return text.extr(page, ' src="', '"')
|
||||
return text.extract(
|
||||
page, ' src="', '"', page.find('id="video"')+1)[0]
|
||||
|
||||
if url:
|
||||
# external URL
|
||||
@@ -102,6 +103,7 @@ class ArtstationExtractor(Extractor):
|
||||
adict.get("id"))
|
||||
|
||||
def _image_fallback(self, lhs, rhs):
|
||||
yield f"{lhs}/4k/{rhs}"
|
||||
yield f"{lhs}/large/{rhs}"
|
||||
yield f"{lhs}/medium/{rhs}"
|
||||
yield f"{lhs}/small/{rhs}"
|
||||
@@ -317,9 +319,9 @@ class ArtstationChallengeExtractor(ArtstationExtractor):
|
||||
"{challenge[id]} - {challenge[title]}")
|
||||
archive_fmt = "c_{challenge[id]}_{asset_id}"
|
||||
pattern = (r"(?:https?://)?(?:www\.)?artstation\.com"
|
||||
r"/contests/[^/?#]+/challenges/(\d+)"
|
||||
r"/c(?:hallenges|ontests)/[^/?#]+/c(?:ategori|halleng)es/(\d+)"
|
||||
r"/?(?:\?sorting=([a-z]+))?")
|
||||
example = "https://www.artstation.com/contests/NAME/challenges/12345"
|
||||
example = "https://www.artstation.com/challenges/NAME/categories/12345"
|
||||
|
||||
def __init__(self, match):
|
||||
ArtstationExtractor.__init__(self, match)
|
||||
@@ -327,24 +329,28 @@ class ArtstationChallengeExtractor(ArtstationExtractor):
|
||||
self.sorting = match[2] or "popular"
|
||||
|
||||
def items(self):
|
||||
base = f"{self.root}/contests/_/challenges/{self.challenge_id}"
|
||||
challenge_url = base + ".json"
|
||||
submission_url = base + "/submissions.json"
|
||||
update_url = self.root + "/contests/submission_updates.json"
|
||||
base = self.root + "/api/v2/competition/"
|
||||
challenge_url = f"{base}challenges/{self.challenge_id}.json"
|
||||
submission_url = base + "submissions.json"
|
||||
|
||||
challenge = self.request_json(challenge_url)
|
||||
yield Message.Directory, "", {"challenge": challenge}
|
||||
|
||||
params = {"sorting": self.sorting}
|
||||
params = {
|
||||
"page" : 1,
|
||||
"per_page" : 50,
|
||||
"challenge_id": self.challenge_id,
|
||||
"sort_by" : self.sorting,
|
||||
}
|
||||
|
||||
for submission in self._pagination(submission_url, params):
|
||||
|
||||
params = {"submission_id": submission["id"]}
|
||||
update_url = (f"{base}submissions/{submission['id']}"
|
||||
f"/submission_updates.json")
|
||||
params = {"page": 1, "per_page": 50}
|
||||
for update in self._pagination(update_url, params=params):
|
||||
|
||||
del update["replies"]
|
||||
update["challenge"] = challenge
|
||||
for url in text.extract_iter(
|
||||
update["body_presentation_html"], ' href="', '"'):
|
||||
for url in util.unique_sequence(text.extract_iter(
|
||||
update["body"], ' href="', '"')):
|
||||
update["asset_id"] = self._id_from_url(url)
|
||||
text.nameext_from_url(url, update)
|
||||
yield Message.Url, self._no_cache(url), update
|
||||
|
||||
@@ -447,8 +447,17 @@ class DiscordAPI():
|
||||
MESSAGES_BATCH = 25
|
||||
|
||||
def _method(offset):
|
||||
params["offset"] = offset
|
||||
return self._call(url, params)["messages"]
|
||||
messages = self._call(url, params)["messages"]
|
||||
|
||||
max_id = 0
|
||||
for msgs in messages:
|
||||
for msg in msgs:
|
||||
mid = int(msg["id"])
|
||||
if max_id > mid or not max_id:
|
||||
max_id = mid
|
||||
params["max_id"] = max_id
|
||||
|
||||
return messages
|
||||
|
||||
url = f"/guilds/{server_id}/messages/search"
|
||||
return self._pagination(_method, MESSAGES_BATCH)
|
||||
|
||||
@@ -86,7 +86,7 @@ class FikfapPostExtractor(FikfapExtractor):
|
||||
|
||||
class FikfapUserExtractor(FikfapExtractor):
|
||||
subcategory = "user"
|
||||
pattern = BASE_PATTERN + r"/user/(\w+)"
|
||||
pattern = BASE_PATTERN + r"/user/([\w-]+)"
|
||||
example = "https://fikfap.com/user/USER"
|
||||
|
||||
def posts(self):
|
||||
@@ -103,3 +103,25 @@ class FikfapUserExtractor(FikfapExtractor):
|
||||
if len(data) < 21:
|
||||
return
|
||||
params["afterId"] = data[-1]["postId"]
|
||||
|
||||
|
||||
class FikfapHashtagExtractor(FikfapExtractor):
|
||||
subcategory = "hashtag"
|
||||
directory_fmt = ("{category}", "{hashtag}")
|
||||
pattern = BASE_PATTERN + r"/hash/([\w-]+)"
|
||||
example = "https://fikfap.com/hash/HASH"
|
||||
|
||||
def posts(self):
|
||||
self.kwdict["hashtag"] = hashtag = self.groups[0]
|
||||
|
||||
url = f"{self.root_api}/hashtags/label/{hashtag}/posts"
|
||||
params = {"amount": "21"}
|
||||
|
||||
while True:
|
||||
data = self.request_api(url, params)
|
||||
|
||||
yield from data
|
||||
|
||||
if len(data) < 21:
|
||||
return
|
||||
params["afterId"] = data[-1]["postId"]
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2016-2025 Mike Fährmann
|
||||
# Copyright 2016-2026 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@@ -28,10 +28,10 @@ class ImagefapExtractor(Extractor):
|
||||
response = Extractor.request(self, url, **kwargs)
|
||||
|
||||
if response.history and response.url.endswith("/human-verification"):
|
||||
self.log.warning("HTTP redirect to '%s'", response.url)
|
||||
if msg := text.extr(response.text, '<div class="mt-4', '<'):
|
||||
msg = " ".join(msg.partition(">")[2].split())
|
||||
raise exception.AbortExtraction(f"'{msg}'")
|
||||
self.log.warning("HTTP redirect to %s", response.url)
|
||||
|
||||
return response
|
||||
|
||||
@@ -42,12 +42,8 @@ class ImagefapGalleryExtractor(ImagefapExtractor):
|
||||
pattern = BASE_PATTERN + r"/(?:gallery\.php\?gid=|gallery/|pictures/)(\d+)"
|
||||
example = "https://www.imagefap.com/gallery/12345"
|
||||
|
||||
def __init__(self, match):
|
||||
ImagefapExtractor.__init__(self, match)
|
||||
self.gid = match[1]
|
||||
self.image_id = ""
|
||||
|
||||
def items(self):
|
||||
self.gid = self.groups[0]
|
||||
url = f"{self.root}/gallery/{self.gid}"
|
||||
page = self.request(url).text
|
||||
data = self.get_job_metadata(page)
|
||||
@@ -113,17 +109,13 @@ class ImagefapImageExtractor(ImagefapExtractor):
|
||||
pattern = BASE_PATTERN + r"/photo/(\d+)"
|
||||
example = "https://www.imagefap.com/photo/12345"
|
||||
|
||||
def __init__(self, match):
|
||||
ImagefapExtractor.__init__(self, match)
|
||||
self.image_id = match[1]
|
||||
|
||||
def items(self):
|
||||
url, data = self.get_image()
|
||||
yield Message.Directory, "", data
|
||||
yield Message.Url, url, data
|
||||
|
||||
def get_image(self):
|
||||
url = f"{self.root}/photo/{self.image_id}/"
|
||||
url = f"{self.root}/photo/{self.groups[0]}/"
|
||||
page = self.request(url).text
|
||||
|
||||
url, pos = text.extract(
|
||||
@@ -153,13 +145,8 @@ class ImagefapFolderExtractor(ImagefapExtractor):
|
||||
r"|profile/([^/?#]+)/galleries\?)folderid=)(\d+|-1)")
|
||||
example = "https://www.imagefap.com/organizer/12345"
|
||||
|
||||
def __init__(self, match):
|
||||
ImagefapExtractor.__init__(self, match)
|
||||
self._id, user, profile, self.folder_id = match.groups()
|
||||
self.user = user or profile
|
||||
|
||||
def items(self):
|
||||
for gallery_id, name, folder in self.galleries(self.folder_id):
|
||||
for gallery_id, name, folder in self.galleries():
|
||||
url = f"{self.root}/gallery/{gallery_id}"
|
||||
data = {
|
||||
"gallery_id": gallery_id,
|
||||
@@ -169,22 +156,25 @@ class ImagefapFolderExtractor(ImagefapExtractor):
|
||||
}
|
||||
yield Message.Queue, url, data
|
||||
|
||||
def galleries(self, folder_id):
|
||||
def galleries(self):
|
||||
"""Yield gallery IDs and titles of a folder"""
|
||||
_id, user, profile, folder_id = self.groups
|
||||
|
||||
if folder_id == "-1":
|
||||
folder_name = "Uncategorized"
|
||||
if self._id:
|
||||
if _id:
|
||||
url = (f"{self.root}/usergallery.php"
|
||||
f"?userid={self.user}&folderid=-1")
|
||||
f"?userid={user}&folderid=-1")
|
||||
else:
|
||||
url = f"{self.root}/profile/{self.user}/galleries?folderid=-1"
|
||||
url = (f"{self.root}/profile/"
|
||||
f"{user or profile}/galleries?folderid=-1")
|
||||
else:
|
||||
folder_name = None
|
||||
url = f"{self.root}/organizer/{folder_id}/"
|
||||
|
||||
params = {"page": 0}
|
||||
extr = text.extract_from(self.request(url, params=params).text)
|
||||
if not folder_name:
|
||||
if folder_name is None:
|
||||
folder_name = extr("class'blk_galleries'><b>", "</b>")
|
||||
|
||||
while True:
|
||||
@@ -211,28 +201,44 @@ class ImagefapUserExtractor(ImagefapExtractor):
|
||||
r"|usergallery\.php\?userid=(\d+))(?:$|#)")
|
||||
example = "https://www.imagefap.com/profile/USER"
|
||||
|
||||
def __init__(self, match):
|
||||
ImagefapExtractor.__init__(self, match)
|
||||
self.user, self.user_id = match.groups()
|
||||
|
||||
def items(self):
|
||||
data = {"_extractor": ImagefapFolderExtractor}
|
||||
|
||||
for folder_id in self.folders():
|
||||
if folder_id == "-1":
|
||||
url = f"{self.root}/profile/{self.user}/galleries?folderid=-1"
|
||||
url = (f"{self.root}/profile/{self.user}/galleries"
|
||||
f"?folderid=-1")
|
||||
else:
|
||||
url = f"{self.root}/organizer/{folder_id}/"
|
||||
yield Message.Queue, url, data
|
||||
|
||||
def folders(self):
|
||||
"""Return a list of folder IDs of a user"""
|
||||
if self.user:
|
||||
url = f"{self.root}/profile/{self.user}/galleries"
|
||||
user, user_id = self.groups
|
||||
if user:
|
||||
url = f"{self.root}/profile/{user}/galleries"
|
||||
else:
|
||||
url = f"{self.root}/usergallery.php?userid={self.user_id}"
|
||||
url = f"{self.root}/usergallery.php?userid={user_id}"
|
||||
params = {"page": 0}
|
||||
pnum = 0
|
||||
|
||||
response = self.request(url)
|
||||
self.user = response.url.split("/")[-2]
|
||||
folders = text.extr(response.text, ' id="tgl_all" value="', '"')
|
||||
return folders.rstrip("|").split("|")
|
||||
self.user = None
|
||||
while True:
|
||||
response = self.request(url, params=params)
|
||||
|
||||
if self.user is None:
|
||||
url = response.url.partition("?")[0]
|
||||
self.user = url.rsplit("/", 2)[1]
|
||||
|
||||
page = response.text
|
||||
folders = text.extr(
|
||||
page, ' id="tgl_all" value="', '"').rstrip("|").split("|")
|
||||
if folders and folders[-1] == "-1":
|
||||
last = folders.pop()
|
||||
if not pnum:
|
||||
folders.insert(0, last)
|
||||
yield from folders
|
||||
|
||||
params["page"] = pnum = pnum + 1
|
||||
if f'href="?page={pnum}">{pnum+1}</a>' not in page:
|
||||
return
|
||||
|
||||
@@ -123,18 +123,29 @@ class ImhentaiGalleryExtractor(ImhentaiExtractor, GalleryExtractor):
|
||||
return results
|
||||
|
||||
def images(self, page):
|
||||
base = text.extr(page, 'data-src="', '"').rpartition("/")[0] + "/"
|
||||
exts = {"j": "jpg", "p": "png", "g": "gif", "w": "webp", "a": "avif"}
|
||||
|
||||
try:
|
||||
data = util.json_loads(text.extr(page, "$.parseJSON('", "'"))
|
||||
except Exception:
|
||||
data = None
|
||||
|
||||
if data is None:
|
||||
self.log.warning("%s: Missing image data", self.gallery_id)
|
||||
return ()
|
||||
base = text.extr(page, 'data-src="', '"').rpartition("/")[0] + "/"
|
||||
exts = {"j": "jpg", "p": "png", "g": "gif", "w": "webp", "a": "avif"}
|
||||
|
||||
def _fallback_exts(i):
|
||||
for ext in util.advance(exts.values(), 1):
|
||||
yield f"{base}{i}.{ext}"
|
||||
cnt = text.parse_int(text.extr(
|
||||
page, 'id="load_pages" value="', '"'))
|
||||
return [(f"{base}{i}.jpg", {"_fallback": _fallback_exts(i)})
|
||||
for i in range(1, cnt+1)]
|
||||
|
||||
results = []
|
||||
for i in map(str, range(1, len(data)+1)):
|
||||
ext, width, height = data[i].split(",")
|
||||
url = base + i + "." + exts[ext]
|
||||
url = f"{base}{i}.{exts[ext]}"
|
||||
results.append((url, {
|
||||
"width" : text.parse_int(width),
|
||||
"height": text.parse_int(height),
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2018-2020 Leonardo Taccari
|
||||
# Copyright 2018-2025 Mike Fährmann
|
||||
# Copyright 2018-2026 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@@ -11,7 +11,7 @@
|
||||
|
||||
from .common import Extractor, Message, Dispatch
|
||||
from .. import text, util, exception
|
||||
from ..cache import cache, memcache
|
||||
from ..cache import cache
|
||||
import itertools
|
||||
import binascii
|
||||
|
||||
@@ -570,7 +570,7 @@ class InstagramTaggedExtractor(InstagramExtractor):
|
||||
user = self.api.user_by_id(self.user_id)
|
||||
else:
|
||||
self.user_id = self.api.user_id(self.item)
|
||||
user = self.api.user_by_name(self.item)
|
||||
user = self.api.user_by_screen_name(self.item)
|
||||
|
||||
return {
|
||||
"tagged_owner_id" : user["id"],
|
||||
@@ -757,7 +757,7 @@ class InstagramInfoExtractor(InstagramExtractor):
|
||||
if screen_name.startswith("id:"):
|
||||
user = self.api.user_by_id(screen_name[3:])
|
||||
else:
|
||||
user = self.api.user_by_name(screen_name)
|
||||
user = self.api.user_by_screen_name(screen_name)
|
||||
|
||||
return iter(((Message.Directory, "", user),))
|
||||
|
||||
@@ -779,7 +779,7 @@ class InstagramAvatarExtractor(InstagramExtractor):
|
||||
if user.startswith("id:"):
|
||||
user = self.api.user_by_id(user[3:])
|
||||
else:
|
||||
user = self.api.user_by_name(user)
|
||||
user = self.api.user_by_screen_name(user)
|
||||
user["pk"] = user["id"]
|
||||
url = user.get("profile_pic_url_hd") or user["profile_pic_url"]
|
||||
avatar = {"url": url, "width": 0, "height": 0}
|
||||
@@ -869,36 +869,37 @@ class InstagramRestAPI():
|
||||
}
|
||||
return self._pagination_sections(endpoint, data)
|
||||
|
||||
@memcache(keyarg=1)
|
||||
def user_by_name(self, screen_name):
|
||||
endpoint = "/v1/users/web_profile_info/"
|
||||
params = {"username": screen_name}
|
||||
try:
|
||||
return self._call(
|
||||
endpoint, params=params, notfound="user")["data"]["user"]
|
||||
except KeyError:
|
||||
raise exception.NotFoundError("user")
|
||||
|
||||
@memcache(keyarg=1)
|
||||
@cache(maxage=36500*86400, keyarg=1)
|
||||
def user_by_id(self, user_id):
|
||||
endpoint = f"/v1/users/{user_id}/info/"
|
||||
return self._call(endpoint)["user"]
|
||||
|
||||
def user_by_screen_name(self, screen_name):
|
||||
user = user_by_search(self, screen_name)
|
||||
if user is None:
|
||||
user_by_search.invalidate(screen_name)
|
||||
self.extractor.log.warning(
|
||||
"Failed to find profile '%s' via search. "
|
||||
"Trying 'web_profile_info' fallback", screen_name)
|
||||
user = user_by_name(self, screen_name)
|
||||
if user is None:
|
||||
user_by_name.invalidate(screen_name)
|
||||
raise exception.NotFoundError("user")
|
||||
return user
|
||||
|
||||
def user_id(self, screen_name, check_private=True):
|
||||
if screen_name.startswith("id:"):
|
||||
if self.extractor.config("metadata"):
|
||||
self.extractor._user = self.user_by_id(screen_name[3:])
|
||||
return screen_name[3:]
|
||||
|
||||
user = self.user_by_name(screen_name)
|
||||
if user is None:
|
||||
raise exception.AuthorizationError(
|
||||
"Login required to access this profile")
|
||||
if check_private and user["is_private"] and \
|
||||
not user["followed_by_viewer"]:
|
||||
user = self.user_by_screen_name(screen_name)
|
||||
if check_private and user.get("is_private") and \
|
||||
not user.get("followed_by_viewer", True):
|
||||
name = user["username"]
|
||||
s = "" if name.endswith("s") else "s"
|
||||
self.extractor.log.warning("%s'%s posts are private", name, s)
|
||||
|
||||
self.extractor._assign_user(user)
|
||||
return user["id"]
|
||||
|
||||
@@ -945,7 +946,10 @@ class InstagramRestAPI():
|
||||
def _call(self, endpoint, **kwargs):
|
||||
extr = self.extractor
|
||||
|
||||
url = "https://www.instagram.com/api" + endpoint
|
||||
if endpoint[0] == "/":
|
||||
url = "https://www.instagram.com/api" + endpoint
|
||||
else:
|
||||
url = endpoint
|
||||
kwargs["headers"] = {
|
||||
"Accept" : "*/*",
|
||||
"X-CSRFToken" : extr.csrf_token,
|
||||
@@ -1049,7 +1053,7 @@ class InstagramGraphqlAPI():
|
||||
self._json_dumps = util.json_dumps
|
||||
|
||||
api = InstagramRestAPI(extractor)
|
||||
self.user_by_name = api.user_by_name
|
||||
self.user_by_screen_name = api.user_by_screen_name
|
||||
self.user_by_id = api.user_by_id
|
||||
self.user_id = api.user_id
|
||||
|
||||
@@ -1155,6 +1159,29 @@ def _login_impl(extr, username, password):
|
||||
return {}
|
||||
|
||||
|
||||
@cache(maxage=36500*86400, keyarg=1)
|
||||
def user_by_name(self, screen_name):
|
||||
endpoint = "/v1/users/web_profile_info/"
|
||||
params = {"username": screen_name}
|
||||
try:
|
||||
return self._call(
|
||||
endpoint, params=params, notfound="user")["data"]["user"]
|
||||
except KeyError:
|
||||
raise exception.NotFoundError("user")
|
||||
|
||||
|
||||
@cache(maxage=36500*86400, keyarg=1)
|
||||
def user_by_search(self, screen_name):
|
||||
url = "https://www.instagram.com/web/search/topsearch/"
|
||||
params = {"query": screen_name}
|
||||
|
||||
name = screen_name.lower()
|
||||
for result in self._call(url, params=params)["users"]:
|
||||
user = result["user"]
|
||||
if user["username"].lower() == name:
|
||||
return user
|
||||
|
||||
|
||||
def id_from_shortcode(shortcode):
|
||||
return util.bdecode(shortcode, _ALPHABET)
|
||||
|
||||
|
||||
@@ -376,6 +376,8 @@ class PixivExtractor(Extractor):
|
||||
"meta_single_page": {"original_image_url": url},
|
||||
"page_count" : 1,
|
||||
"sanity_level" : 0,
|
||||
"total_comments" : 0,
|
||||
"is_bookmarked" : False,
|
||||
"tags" : (),
|
||||
"title" : kind,
|
||||
"type" : kind,
|
||||
@@ -520,7 +522,10 @@ class PixivAvatarExtractor(PixivExtractor):
|
||||
|
||||
def _init(self):
|
||||
PixivExtractor._init(self)
|
||||
self.sanity_workaround = self.meta_comments = False
|
||||
self.sanity_workaround = \
|
||||
self.meta_bookmark = \
|
||||
self.meta_comments = \
|
||||
self.meta_captions = False
|
||||
|
||||
def works(self):
|
||||
user = self.api.user_detail(self.groups[0])["user"]
|
||||
@@ -536,9 +541,7 @@ class PixivBackgroundExtractor(PixivExtractor):
|
||||
pattern = USER_PATTERN + "/background"
|
||||
example = "https://www.pixiv.net/en/users/12345/background"
|
||||
|
||||
def _init(self):
|
||||
PixivExtractor._init(self)
|
||||
self.sanity_workaround = self.meta_comments = False
|
||||
_init = PixivAvatarExtractor._init
|
||||
|
||||
def works(self):
|
||||
detail = self.api.user_detail(self.groups[0])
|
||||
|
||||
@@ -55,7 +55,7 @@ class RedditExtractor(Extractor):
|
||||
for submission, comments in submissions:
|
||||
urls = []
|
||||
|
||||
if submission:
|
||||
if submission and submission.get("_media", True):
|
||||
submission["comment"] = None
|
||||
submission["date"] = self.parse_timestamp(
|
||||
submission["created_utc"])
|
||||
@@ -126,7 +126,7 @@ class RedditExtractor(Extractor):
|
||||
|
||||
data = submission.copy()
|
||||
data["comment"] = comment
|
||||
comment["date"] = self.parse_timestamp(
|
||||
comment["date"] = data["date"] = self.parse_timestamp(
|
||||
comment["created_utc"])
|
||||
|
||||
if media:
|
||||
@@ -300,19 +300,39 @@ class RedditHomeExtractor(RedditSubredditExtractor):
|
||||
class RedditUserExtractor(RedditExtractor):
|
||||
"""Extractor for URLs from posts by a reddit user"""
|
||||
subcategory = "user"
|
||||
directory_fmt = ("{category}", "Users", "{user[name]}")
|
||||
pattern = (r"(?:https?://)?(?:\w+\.)?reddit\.com/u(?:ser)?/"
|
||||
r"([^/?#]+(?:/([a-z]+))?)/?(?:\?([^#]*))?$")
|
||||
example = "https://www.reddit.com/user/USER/"
|
||||
|
||||
def __init__(self, match):
|
||||
self.user, sub, params = match.groups()
|
||||
self.params = text.parse_query(params)
|
||||
if sub:
|
||||
if sub := match[2]:
|
||||
self.subcategory += "-" + sub
|
||||
RedditExtractor.__init__(self, match)
|
||||
|
||||
def submissions(self):
|
||||
return self.api.submissions_user(self.user, self.params)
|
||||
username, sub, qs = self.groups
|
||||
username = text.unquote(username)
|
||||
self.kwdict["user"] = user = self.api.user_about(username)
|
||||
|
||||
submissions = self.api.submissions_user(
|
||||
user["name"], text.parse_query(qs))
|
||||
if self.config("only", True):
|
||||
submissions = self._only(submissions, user)
|
||||
return submissions
|
||||
|
||||
def _only(self, submissions, user):
|
||||
uid = "t2_" + user["id"]
|
||||
for submission, comments in submissions:
|
||||
if submission and submission.get("author_fullname") != uid:
|
||||
submission["_media"] = False
|
||||
comments = [
|
||||
comment
|
||||
for comment in (comments or ())
|
||||
if comment.get("author_fullname") == uid
|
||||
]
|
||||
if submission or comments:
|
||||
yield submission, comments
|
||||
|
||||
|
||||
class RedditSubmissionExtractor(RedditExtractor):
|
||||
@@ -323,12 +343,8 @@ class RedditSubmissionExtractor(RedditExtractor):
|
||||
r"comments|gallery)|redd\.it)/([a-z0-9]+)")
|
||||
example = "https://www.reddit.com/r/SUBREDDIT/comments/id/"
|
||||
|
||||
def __init__(self, match):
|
||||
RedditExtractor.__init__(self, match)
|
||||
self.submission_id = match[1]
|
||||
|
||||
def submissions(self):
|
||||
return (self.api.submission(self.submission_id),)
|
||||
return (self.api.submission(self.groups[0]),)
|
||||
|
||||
|
||||
class RedditImageExtractor(Extractor):
|
||||
@@ -449,9 +465,9 @@ class RedditAPI():
|
||||
endpoint = subreddit + "/.json"
|
||||
return self._pagination(endpoint, params)
|
||||
|
||||
def submissions_user(self, user, params):
|
||||
def submissions_user(self, username, params):
|
||||
"""Collect all (submission, comments)-tuples posted by a user"""
|
||||
endpoint = "/user/" + user + "/.json"
|
||||
endpoint = f"/user/{username}/.json"
|
||||
return self._pagination(endpoint, params)
|
||||
|
||||
def morechildren(self, link_id, children):
|
||||
@@ -472,6 +488,10 @@ class RedditAPI():
|
||||
else:
|
||||
yield thing["data"]
|
||||
|
||||
def user_about(self, username):
|
||||
endpoint = f"/user/{username}/about.json"
|
||||
return self._call(endpoint, {})["data"]
|
||||
|
||||
def authenticate(self):
|
||||
"""Authenticate the application by requesting an access token"""
|
||||
self.headers["Authorization"] = \
|
||||
|
||||
@@ -181,21 +181,16 @@ class TiktokExtractor(Extractor):
|
||||
raise KeyError(assert_key)
|
||||
return data
|
||||
except (ValueError, KeyError):
|
||||
# We failed to retrieve rehydration data. This happens
|
||||
# relatively frequently when making many requests, so retry.
|
||||
if tries >= self._retries:
|
||||
raise
|
||||
tries += 1
|
||||
self.log.warning("%s: Failed to retrieve rehydration data "
|
||||
"(%s/%s)", url.rpartition("/")[2], tries,
|
||||
self._retries)
|
||||
if challenge_attempt:
|
||||
self.sleep(self._timeout, "retry")
|
||||
challenge_attempt = False
|
||||
else:
|
||||
# Even if the retries option has been set to 0, we should
|
||||
# always at least try to solve the JS challenge and go again
|
||||
# immediately.
|
||||
if not challenge_attempt:
|
||||
challenge_attempt = True
|
||||
self.log.info("Solving JavaScript challenge")
|
||||
try:
|
||||
self._solve_challenge(html)
|
||||
html = None
|
||||
continue
|
||||
except Exception as exc:
|
||||
self.log.traceback(exc)
|
||||
self.log.warning(
|
||||
@@ -204,9 +199,19 @@ class TiktokExtractor(Extractor):
|
||||
"with the --write-pages option and include the "
|
||||
"resulting page in your bug report",
|
||||
url.rpartition("/")[2])
|
||||
self.sleep(self._timeout, "retry")
|
||||
html = None
|
||||
challenge_attempt = True
|
||||
|
||||
# We've already tried resolving the challenge, and either
|
||||
# resolving it failed, or resolving it didn't get us the
|
||||
# rehydration data, so fail this attempt.
|
||||
self.log.warning("%s: Failed to retrieve rehydration data "
|
||||
"(%s/%s)", url.rpartition("/")[2], tries + 1,
|
||||
self._retries)
|
||||
if tries >= self._retries:
|
||||
raise
|
||||
tries += 1
|
||||
self.sleep(self._timeout, "retry")
|
||||
challenge_attempt = False
|
||||
html = None
|
||||
|
||||
def _extract_rehydration_data_user(self, profile_url, additional_keys=()):
|
||||
if profile_url in self.rehydration_data_cache:
|
||||
@@ -223,7 +228,7 @@ class TiktokExtractor(Extractor):
|
||||
data = data["webapp.user-detail"]
|
||||
if not self._check_status_code(data, profile_url, "profile"):
|
||||
raise exception.ExtractionError(
|
||||
"%s: could not extract rehydration data", profile_url)
|
||||
f"{profile_url}: could not extract rehydration data")
|
||||
try:
|
||||
for key in additional_keys:
|
||||
data = data[key]
|
||||
@@ -475,6 +480,8 @@ class TiktokExtractor(Extractor):
|
||||
self.log.error("%s: Login required to access this %s, or this "
|
||||
"profile has no videos posted", url,
|
||||
type_of_url)
|
||||
elif status == 10221:
|
||||
self.log.error("%s: User account could not be found", url)
|
||||
elif status == 10204:
|
||||
self.log.error("%s: Requested %s not available", url, type_of_url)
|
||||
elif status == 10231:
|
||||
@@ -605,8 +612,7 @@ class TiktokPostsExtractor(TiktokExtractor):
|
||||
f"{user_name}"
|
||||
if not ytdl:
|
||||
message += ", try extracting post information using " \
|
||||
"yt-dlp with the -o " \
|
||||
"tiktok-user-extractor=ytdl argument"
|
||||
"yt-dlp with the -o ytdl=true argument"
|
||||
self.log.warning(message)
|
||||
return ()
|
||||
|
||||
@@ -913,19 +919,22 @@ class TiktokPaginationCursor:
|
||||
|
||||
|
||||
class TiktokTimeCursor(TiktokPaginationCursor):
|
||||
def __init__(self, *, reverse=True):
|
||||
def __init__(self, *, reverse=True, has_more_attribute="hasMore",
|
||||
cursor_attribute="cursor"):
|
||||
super().__init__()
|
||||
self.cursor = 0
|
||||
# If we expect the cursor to go up or down as we go to the next page.
|
||||
# True for down, False for up.
|
||||
self.reverse = reverse
|
||||
self.has_more_key = has_more_attribute
|
||||
self.cursor_key = cursor_attribute
|
||||
|
||||
def current_page(self):
|
||||
return self.cursor
|
||||
|
||||
def next_page(self, data, query_parameters):
|
||||
skip_fallback_logic = self.cursor == 0
|
||||
new_cursor = int(data.get("cursor", 0))
|
||||
new_cursor = int(data.get(self.cursor_key, 0))
|
||||
no_cursor = not new_cursor
|
||||
if not skip_fallback_logic:
|
||||
# If the new cursor doesn't go in the direction we expect, use the
|
||||
@@ -937,7 +946,7 @@ class TiktokTimeCursor(TiktokPaginationCursor):
|
||||
elif no_cursor:
|
||||
raise exception.ExtractionError("Could not extract next cursor")
|
||||
self.cursor = new_cursor
|
||||
return not data.get("hasMore", False)
|
||||
return not data.get(self.has_more_key, False)
|
||||
|
||||
def fallback_cursor(self, data):
|
||||
try:
|
||||
@@ -967,6 +976,12 @@ class TiktokPopularTimeCursor(TiktokTimeCursor):
|
||||
return -50_000
|
||||
|
||||
|
||||
class TiktokStoryTimeCursor(TiktokTimeCursor):
|
||||
def __init__(self):
|
||||
super().__init__(reverse=False, has_more_attribute="HasMoreAfter",
|
||||
cursor_attribute="MaxCursor")
|
||||
|
||||
|
||||
class TiktokLegacyTimeCursor(TiktokPaginationCursor):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
@@ -1049,8 +1064,10 @@ class TiktokPaginationRequest:
|
||||
cursor_type = self.cursor_type(query_parameters)
|
||||
cursor = cursor_type() if cursor_type else None
|
||||
for page in itertools.count(start=1):
|
||||
extractor.log.info("%s: retrieving %s page %d", url, self.endpoint,
|
||||
page)
|
||||
item_count = len(self.items)
|
||||
extractor.log.info("%s: retrieving %s page %d (%d item%s)", url,
|
||||
self.endpoint, page, item_count,
|
||||
"" if item_count == 1 else "s")
|
||||
tries = 0
|
||||
while True:
|
||||
try:
|
||||
@@ -1269,7 +1286,8 @@ class TiktokItemListRequest(TiktokPaginationRequest):
|
||||
|
||||
def extract_items(self, data):
|
||||
if "itemList" not in data:
|
||||
self.exit_early_due_to_no_items = True
|
||||
if not data.get("hasMorePrevious", data.get("hasMore", False)):
|
||||
self.exit_early_due_to_no_items = True
|
||||
return {}
|
||||
return {item["id"]: item for item in data["itemList"]}
|
||||
|
||||
@@ -1468,7 +1486,7 @@ class TiktokStoryItemListRequest(TiktokItemListRequest):
|
||||
assert query_parameters["loadBackward"] in ["true", "false"]
|
||||
|
||||
def cursor_type(self, query_parameters):
|
||||
return TiktokItemCursor
|
||||
return TiktokStoryTimeCursor
|
||||
|
||||
|
||||
class TiktokStoryBatchItemListRequest(TiktokItemListRequest):
|
||||
|
||||
@@ -37,6 +37,7 @@ class TwitterExtractor(Extractor):
|
||||
def _init(self):
|
||||
self.unavailable = self.config("unavailable", False)
|
||||
self.textonly = self.config("text-tweets", False)
|
||||
self.articles = self.config("articles", True)
|
||||
self.retweets = self.config("retweets", False)
|
||||
self.replies = self.config("replies", True)
|
||||
self.twitpic = self.config("twitpic", False)
|
||||
@@ -159,6 +160,15 @@ class TwitterExtractor(Extractor):
|
||||
"%s: Error while extracting Card files (%s: %s)",
|
||||
data["id_str"], exc.__class__.__name__, exc)
|
||||
|
||||
if self.articles and "article" in tweet:
|
||||
try:
|
||||
self._extract_article(tweet, files)
|
||||
except Exception as exc:
|
||||
self.log.traceback(exc)
|
||||
self.log.warning(
|
||||
"%s: Error while extracting article files (%s: %s)",
|
||||
data["id_str"], exc.__class__.__name__, exc)
|
||||
|
||||
if self.twitpic:
|
||||
try:
|
||||
self._extract_twitpic(data, files)
|
||||
@@ -319,6 +329,31 @@ class TwitterExtractor(Extractor):
|
||||
url = f"ytdl:{self.root}/i/web/status/{tweet_id}"
|
||||
files.append({"url": url})
|
||||
|
||||
def _extract_article(self, tweet, files):
|
||||
article = tweet["article"]["article_results"]["result"]
|
||||
|
||||
if media := article.get("cover_media"):
|
||||
info = media["media_info"]
|
||||
files.append({
|
||||
"media_id" : media["media_id"],
|
||||
"media_key": media["media_key"],
|
||||
"url" : info["original_img_url"],
|
||||
"width" : info["original_img_width"],
|
||||
"height" : info["original_img_height"],
|
||||
"type" : "article:cover",
|
||||
})
|
||||
|
||||
for media in article["media_entities"]:
|
||||
info = media["media_info"]
|
||||
files.append({
|
||||
"media_id" : media["media_id"],
|
||||
"media_key": media["media_key"],
|
||||
"url" : info["original_img_url"],
|
||||
"width" : info["original_img_width"],
|
||||
"height" : info["original_img_height"],
|
||||
"type" : "article:cover",
|
||||
})
|
||||
|
||||
def _extract_twitpic(self, tweet, files):
|
||||
urls = {}
|
||||
|
||||
|
||||
@@ -9,7 +9,7 @@
|
||||
"""Extractors for XenForo forums"""
|
||||
|
||||
from .common import BaseExtractor, Message
|
||||
from .. import text, exception
|
||||
from .. import text, util, exception
|
||||
from ..cache import cache
|
||||
import binascii
|
||||
|
||||
@@ -46,10 +46,10 @@ class XenforoExtractor(BaseExtractor):
|
||||
base = root if (pos := root.find("/", 8)) < 0 else root[:pos]
|
||||
for post in self.posts():
|
||||
urls = extract_urls(post["content"])
|
||||
if "data-s9e-mediaembed-iframe=" in post["content"]:
|
||||
self._extract_embeds(urls, post)
|
||||
if post["attachments"]:
|
||||
for att in text.extract_iter(
|
||||
post["attachments"], "<li", "</li>"):
|
||||
urls.append((None, att[att.find('href="')+6:], None, None))
|
||||
self._extract_attachments(urls, post)
|
||||
|
||||
data = {"post": post}
|
||||
post["count"] = data["count"] = len(urls)
|
||||
@@ -195,14 +195,14 @@ class XenforoExtractor(BaseExtractor):
|
||||
if cookie.domain.endswith(self.cookies_domain)
|
||||
}
|
||||
|
||||
def _pagination(self, base, pnum=None, callback=None):
|
||||
def _pagination(self, base, pnum=None, callback=None, params=""):
|
||||
base = self.root + base
|
||||
|
||||
if pnum is None:
|
||||
url = base + "/"
|
||||
url = f"{base}/{params}"
|
||||
pnum = 1
|
||||
else:
|
||||
url = f"{base}/page-{pnum}"
|
||||
url = f"{base}/page-{pnum}{params}"
|
||||
pnum = None
|
||||
|
||||
page = self.request_page(url).text
|
||||
@@ -214,7 +214,7 @@ class XenforoExtractor(BaseExtractor):
|
||||
if pnum is None or "pageNav-jump--next" not in page:
|
||||
return
|
||||
pnum += 1
|
||||
page = self.request_page(f"{base}/page-{pnum}").text
|
||||
page = self.request_page(f"{base}/page-{pnum}{params}").text
|
||||
|
||||
def _pagination_reverse(self, base, pnum=None, callback=None):
|
||||
base = self.root + base
|
||||
@@ -340,6 +340,39 @@ class XenforoExtractor(BaseExtractor):
|
||||
data["author_id"] = data["author"][15:]
|
||||
return data
|
||||
|
||||
def _extract_attachments(self, urls, post):
|
||||
for att in text.extract_iter(post["attachments"], "<li", "</li>"):
|
||||
urls.append((None, att[att.find('href="')+6:], None, None))
|
||||
|
||||
def _extract_embeds(self, urls, post):
|
||||
for embed in text.extract_iter(
|
||||
post["content"], "data-s9e-mediaembed-iframe='", "'"):
|
||||
data = {}
|
||||
key = None
|
||||
for value in util.json_loads(embed):
|
||||
if key is None:
|
||||
key = value
|
||||
else:
|
||||
data[key] = value
|
||||
key = None
|
||||
|
||||
src = data.get("src")
|
||||
if not src:
|
||||
self.log.debug(data)
|
||||
continue
|
||||
|
||||
type = data.get("data-s9e-mediaembed")
|
||||
frag = src[src.find("#")+1:]
|
||||
if type == "tiktok":
|
||||
url = "https://www.tiktok.com/@/video/" + frag
|
||||
elif type == "reddit":
|
||||
url = "https://embed.reddit.com/r/" + frag
|
||||
else:
|
||||
self.log.warning("%s: Unsupported media embed type '%s'",
|
||||
post["id"], type)
|
||||
continue
|
||||
urls.append((None, None, None, url))
|
||||
|
||||
def _extract_media(self, url, file):
|
||||
media = {}
|
||||
name, _, media["id"] = file.rpartition(".")
|
||||
@@ -449,7 +482,12 @@ class XenforoThreadExtractor(XenforoExtractor):
|
||||
|
||||
if (order := self.config("order-posts")) and \
|
||||
order[0] not in ("d", "r"):
|
||||
pages = self._pagination(path, pnum)
|
||||
params = "?order=reaction_score" if order[0] == "s" else ""
|
||||
pages = self._pagination(path, pnum, params=params)
|
||||
reverse = False
|
||||
elif order == "reaction":
|
||||
pages = self._pagination(
|
||||
path, pnum, params="?order=reaction_score")
|
||||
reverse = False
|
||||
else:
|
||||
pages = self._pagination_reverse(path, pnum)
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2019-2025 Mike Fährmann
|
||||
# Copyright 2019-2026 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@@ -112,7 +112,7 @@ class XhamsterUserExtractor(XhamsterExtractor):
|
||||
while url:
|
||||
extr = text.extract_from(self.request(url).text)
|
||||
while True:
|
||||
url = extr('thumb-image-container role-pop" href="', '"')
|
||||
url = extr(' role-pop" href="', '"')
|
||||
if not url:
|
||||
break
|
||||
yield Message.Queue, url, data
|
||||
|
||||
@@ -459,6 +459,8 @@ class DownloadJob(Job):
|
||||
job.kwdict.update(self.kwdict)
|
||||
if kwdict:
|
||||
job.kwdict.update(kwdict)
|
||||
if "_extractor" in kwdict:
|
||||
del job.kwdict["_extractor"]
|
||||
|
||||
if pextr.config("parent-session", parent):
|
||||
extr.session = pextr.session
|
||||
|
||||
@@ -6,5 +6,5 @@
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
# published by the Free Software Foundation.
|
||||
|
||||
__version__ = "1.31.5"
|
||||
__version__ = "1.31.6"
|
||||
__variant__ = None
|
||||
|
||||
@@ -5,68 +5,62 @@
|
||||
# published by the Free Software Foundation.
|
||||
|
||||
from gallery_dl.extractor import artstation
|
||||
from gallery_dl import exception
|
||||
|
||||
|
||||
__tests__ = (
|
||||
{
|
||||
"#url" : "https://www.artstation.com/sungchoi/",
|
||||
"#category": ("", "artstation", "user"),
|
||||
"#class" : artstation.ArtstationUserExtractor,
|
||||
"#pattern" : r"https://\w+\.artstation\.com/p/assets/images/images/\d+/\d+/\d+/(4k|large|medium|small)/[^/]+",
|
||||
"#pattern" : r"https://\w+\.artstation\.com/p/assets/images/images/\d+/\d+/\d+/8k/[^/]+",
|
||||
"#range" : "1-10",
|
||||
"#count" : ">= 10",
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://www.artstation.com/sungchoi/albums/all/",
|
||||
"#category": ("", "artstation", "user"),
|
||||
"#class" : artstation.ArtstationUserExtractor,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://sungchoi.artstation.com/",
|
||||
"#category": ("", "artstation", "user"),
|
||||
"#class" : artstation.ArtstationUserExtractor,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://sungchoi.artstation.com/projects/",
|
||||
"#category": ("", "artstation", "user"),
|
||||
"#comment" : "alternate user URL format",
|
||||
"#class" : artstation.ArtstationUserExtractor,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://www.artstation.com/huimeiye/albums/770899",
|
||||
"#category": ("", "artstation", "album"),
|
||||
"#comment" : "'Hellboy' album",
|
||||
"#class" : artstation.ArtstationAlbumExtractor,
|
||||
"#count" : 2,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://www.artstation.com/huimeiye/albums/770898",
|
||||
"#category": ("", "artstation", "album"),
|
||||
"#comment" : "non-existent album",
|
||||
"#class" : artstation.ArtstationAlbumExtractor,
|
||||
"#exception": exception.NotFoundError,
|
||||
"#exception": "NotFoundError",
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://huimeiye.artstation.com/albums/770899",
|
||||
"#category": ("", "artstation", "album"),
|
||||
"#comment" : "alternate user URL format",
|
||||
"#class" : artstation.ArtstationAlbumExtractor,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://www.artstation.com/mikf/likes",
|
||||
"#category": ("", "artstation", "likes"),
|
||||
"#class" : artstation.ArtstationLikesExtractor,
|
||||
"#pattern" : r"https://\w+\.artstation\.com/p/assets/images/images/\d+/\d+/\d+/(4k|large|medium|small)/[^/]+",
|
||||
"#pattern" : r"https://\w+\.artstation\.com/p/assets/images/images/\d+/\d+/\d+/8k/[^/]+",
|
||||
"#count" : 6,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://www.artstation.com/mikf/collections/2647023",
|
||||
"#category": ("", "artstation", "collection"),
|
||||
"#class" : artstation.ArtstationCollectionExtractor,
|
||||
"#count" : 10,
|
||||
|
||||
@@ -85,7 +79,6 @@ __tests__ = (
|
||||
|
||||
{
|
||||
"#url" : "https://www.artstation.com/mikf/collections",
|
||||
"#category": ("", "artstation", "collections"),
|
||||
"#class" : artstation.ArtstationCollectionsExtractor,
|
||||
"#results" : (
|
||||
"https://www.artstation.com/mikf/collections/2647023",
|
||||
@@ -105,28 +98,42 @@ __tests__ = (
|
||||
{
|
||||
"#url" : "https://www.artstation.com/sungchoi/likes",
|
||||
"#comment" : "no likes",
|
||||
"#category": ("", "artstation", "likes"),
|
||||
"#class" : artstation.ArtstationLikesExtractor,
|
||||
"#count" : 0,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://www.artstation.com/contests/thu-2017/challenges/20",
|
||||
"#category": ("", "artstation", "challenge"),
|
||||
"#class" : artstation.ArtstationChallengeExtractor,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://www.artstation.com/contests/beyond-human/challenges/23?sorting=winners",
|
||||
"#category": ("", "artstation", "challenge"),
|
||||
"#url" : "https://www.artstation.com/challenges/beyond-human/categories/23/submissions",
|
||||
"#class" : artstation.ArtstationChallengeExtractor,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://www.artstation.com/contests/beyond-human/challenges/23?sorting=popular",
|
||||
"#class" : artstation.ArtstationChallengeExtractor,
|
||||
"#range" : "1-30",
|
||||
"#count" : 30,
|
||||
|
||||
"challenge": {
|
||||
"id" : 23,
|
||||
"headline" : "Imagining Where Future Humans Live",
|
||||
"created_at": "2017-06-26T14:45:43+00:00",
|
||||
"contest" : {
|
||||
"archived" : True,
|
||||
"published": True,
|
||||
"slug" : "beyond-human",
|
||||
"title" : "Beyond Human",
|
||||
"submissions_count": 4258,
|
||||
},
|
||||
},
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://www.artstation.com/search?query=ancient&sort_by=rank",
|
||||
"#category": ("", "artstation", "search"),
|
||||
"#class" : artstation.ArtstationSearchExtractor,
|
||||
"#range" : "1-20",
|
||||
"#count" : 20,
|
||||
@@ -134,7 +141,6 @@ __tests__ = (
|
||||
|
||||
{
|
||||
"#url" : "https://www.artstation.com/artwork?sorting=latest",
|
||||
"#category": ("", "artstation", "artwork"),
|
||||
"#class" : artstation.ArtstationArtworkExtractor,
|
||||
"#range" : "1-20",
|
||||
"#count" : 20,
|
||||
@@ -142,16 +148,14 @@ __tests__ = (
|
||||
|
||||
{
|
||||
"#url" : "https://www.artstation.com/artwork/LQVJr",
|
||||
"#category": ("", "artstation", "image"),
|
||||
"#class" : artstation.ArtstationImageExtractor,
|
||||
"#pattern" : r"https?://\w+\.artstation\.com/p/assets/images/images/008/760/279/4k/.+",
|
||||
"#pattern" : r"https?://\w+\.artstation\.com/p/assets/images/images/008/760/279/8k/.+",
|
||||
"#sha1_content": "3f211ce0d6ecdb502db2cdf7bbeceb11d8421170",
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://www.artstation.com/artwork/Db3dy",
|
||||
"#comment" : "multiple images per project",
|
||||
"#category": ("", "artstation", "image"),
|
||||
"#class" : artstation.ArtstationImageExtractor,
|
||||
"#count" : 4,
|
||||
},
|
||||
@@ -159,7 +163,6 @@ __tests__ = (
|
||||
{
|
||||
"#url" : "https://www.artstation.com/artwork/lR8b5k",
|
||||
"#comment" : "artstation video clips (#2566)",
|
||||
"#category": ("", "artstation", "image"),
|
||||
"#class" : artstation.ArtstationImageExtractor,
|
||||
"#options" : {"videos": True},
|
||||
"#range" : "2-3",
|
||||
@@ -185,7 +188,6 @@ __tests__ = (
|
||||
{
|
||||
"#url" : "https://www.artstation.com/artwork/g4WPK",
|
||||
"#comment" : "embedded youtube video",
|
||||
"#category": ("", "artstation", "image"),
|
||||
"#class" : artstation.ArtstationImageExtractor,
|
||||
"#options" : {"external": True},
|
||||
"#pattern" : r"ytdl:https://www\.youtube(-nocookie)?\.com/embed/JNFfJtwwrU0",
|
||||
@@ -195,27 +197,23 @@ __tests__ = (
|
||||
{
|
||||
"#url" : "https://www.artstation.com/artwork/3q3mXB",
|
||||
"#comment" : "404 (#3016)",
|
||||
"#category": ("", "artstation", "image"),
|
||||
"#class" : artstation.ArtstationImageExtractor,
|
||||
"#count" : 0,
|
||||
"#exception": "NotFoundError",
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://sungchoi.artstation.com/projects/LQVJr",
|
||||
"#comment" : "alternate URL patterns",
|
||||
"#category": ("", "artstation", "image"),
|
||||
"#class" : artstation.ArtstationImageExtractor,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://artstn.co/p/LQVJr",
|
||||
"#category": ("", "artstation", "image"),
|
||||
"#class" : artstation.ArtstationImageExtractor,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://www.artstation.com/sungchoi/following",
|
||||
"#category": ("", "artstation", "following"),
|
||||
"#class" : artstation.ArtstationFollowingExtractor,
|
||||
"#pattern" : artstation.ArtstationUserExtractor.pattern,
|
||||
"#count" : ">= 40",
|
||||
@@ -224,21 +222,18 @@ __tests__ = (
|
||||
{
|
||||
"#url" : "https://fede-x-rojas.artstation.com/projects/WBdaZy",
|
||||
"#comment" : "dash in username",
|
||||
"#category": ("", "artstation", "image"),
|
||||
"#class" : artstation.ArtstationImageExtractor,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://fede-x-rojas.artstation.com/albums/8533110",
|
||||
"#comment" : "dash in username",
|
||||
"#category": ("", "artstation", "album"),
|
||||
"#class" : artstation.ArtstationAlbumExtractor,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://fede-x-rojas.artstation.com/",
|
||||
"#comment" : "dash in username",
|
||||
"#category": ("", "artstation", "user"),
|
||||
"#class" : artstation.ArtstationUserExtractor,
|
||||
},
|
||||
|
||||
|
||||
@@ -18,7 +18,7 @@ __tests__ = (
|
||||
"commentsCount" : int,
|
||||
"createdAt" : "2025-10-21T00:49:00.306Z",
|
||||
"date" : "dt:2025-10-21 00:49:00",
|
||||
"date_updated" : "dt:2025-12-10 01:09:26",
|
||||
"date_updated" : "dt:2026-02-07 01:07:45",
|
||||
"deletedAt" : None,
|
||||
"duration" : None,
|
||||
"explicitnessRating": None,
|
||||
@@ -27,15 +27,15 @@ __tests__ = (
|
||||
"inCollectionsCount": range(20, 50),
|
||||
"isBunnyVideoReady": True,
|
||||
"label" : "⬇️check my FREE VIP OF ⬇️",
|
||||
"likesCount" : range(500, 2000),
|
||||
"likesCount" : range(300, 2000),
|
||||
"mediaId" : "b821619e-96a1-49a3-a3f8-a8a3e8432a51",
|
||||
"postId" : 1429486,
|
||||
"publishedAt" : "2025-10-21T00:50:37.143Z",
|
||||
"score" : range(500, 2000),
|
||||
"score" : range(300, 2000),
|
||||
"sexualOrientation": "STRAIGHT",
|
||||
"tags" : ["lesbian"],
|
||||
"thumbnailStreamUrl": str,
|
||||
"updatedAt" : "2025-12-10T01:09:26.902Z",
|
||||
"updatedAt" : "iso:dt",
|
||||
"uploadMethod" : "USER_FILE",
|
||||
"userId" : "32f4c8d6-2409-4db8-9e66-d3b5ff0c1a98",
|
||||
"videoStreamUrl" : str,
|
||||
@@ -49,7 +49,7 @@ __tests__ = (
|
||||
"labelLower" : "lesbian",
|
||||
"lastCountUpdatedAt": "iso:dt",
|
||||
"searchTags" : [],
|
||||
"thumbnailPostId": 301300,
|
||||
"thumbnailPostId": 311180,
|
||||
"updatedAt" : "iso:dt",
|
||||
"sexualOrientations": [
|
||||
"STRAIGHT",
|
||||
@@ -119,4 +119,20 @@ __tests__ = (
|
||||
"linkSidebar" : dict,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://fikfap.com/user/Hot-sauce-34",
|
||||
"#comment" : "'-' in username",
|
||||
"#class" : fikfap.FikfapUserExtractor,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://fikfap.com/hash/outercourse",
|
||||
"#class" : fikfap.FikfapHashtagExtractor,
|
||||
"#pattern" : r"ytdl:https://[^/]+\.b\-cdn\.net/bcdn_token=.+/playlist\.m3u8$",
|
||||
"#count" : range(50, 100),
|
||||
|
||||
"algorithm": "hashtag-posts",
|
||||
"hashtag" : "outercourse",
|
||||
},
|
||||
|
||||
)
|
||||
|
||||
@@ -208,4 +208,13 @@ __tests__ = (
|
||||
"#class" : imagefap.ImagefapUserExtractor,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://www.imagefap.com/profile/brookdale",
|
||||
"#comment" : "multiple pagea (#9016)",
|
||||
"#class" : imagefap.ImagefapUserExtractor,
|
||||
"#pattern" : imagefap.ImagefapFolderExtractor.pattern,
|
||||
"#range" : "1-100",
|
||||
"#count" : 100,
|
||||
},
|
||||
|
||||
)
|
||||
|
||||
@@ -211,12 +211,24 @@ __tests__ = (
|
||||
{
|
||||
"#url" : "https://www.pixiv.net/en/users/173530/avatar",
|
||||
"#class" : pixiv.PixivAvatarExtractor,
|
||||
"#options" : {
|
||||
"metadata" : True,
|
||||
"metadata-bookmark": True,
|
||||
"captions" : True,
|
||||
"comments" : True,
|
||||
},
|
||||
"#sha1_content": "4e57544480cc2036ea9608103e8f024fa737fe66",
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://www.pixiv.net/en/users/194921/background",
|
||||
"#class" : pixiv.PixivBackgroundExtractor,
|
||||
"#options" : {
|
||||
"metadata" : True,
|
||||
"metadata-bookmark": True,
|
||||
"captions" : True,
|
||||
"comments" : True,
|
||||
},
|
||||
"#pattern" : r"https://i\.pximg\.net/background/img/2021/01/30/16/12/02/194921_af1f71e557a42f499213d4b9eaccc0f8\.jpg",
|
||||
},
|
||||
|
||||
|
||||
@@ -240,6 +240,24 @@ __tests__ = (
|
||||
),
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://simpcity.cr/threads/arianaskyeshelby-itsarianaskyebaby-busty.1237895/post-40205575",
|
||||
"#comment" : "tiktok s9e media embed iframe (#8994)",
|
||||
"#category": ("xenforo", "simpcity", "post"),
|
||||
"#class" : xenforo.XenforoPostExtractor,
|
||||
"#auth" : True,
|
||||
"#results" : "https://www.tiktok.com/@/video/7556556034794425631",
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://simpcity.cr/threads/alrightsierra.70601/post-571509",
|
||||
"#comment" : "reddit s9e media embed iframe (#8996)",
|
||||
"#category": ("xenforo", "simpcity", "post"),
|
||||
"#class" : xenforo.XenforoPostExtractor,
|
||||
"#auth" : True,
|
||||
"#results" : "https://embed.reddit.com/r/TikTokFeet/comments/rtzwnz#theme=auto",
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://simpcity.cr/threads/alua-tatakai.89490/",
|
||||
"#category": ("xenforo", "simpcity", "thread"),
|
||||
@@ -284,6 +302,19 @@ __tests__ = (
|
||||
"#class" : xenforo.XenforoThreadExtractor,
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://simpcity.cr/threads/ririkana-rr_loveit.10731/",
|
||||
"#comment" : "post order by reaction score (#8997)",
|
||||
"#category": ("xenforo", "simpcity", "thread"),
|
||||
"#class" : xenforo.XenforoThreadExtractor,
|
||||
"#auth" : True,
|
||||
"#options" : {
|
||||
"post-range" : 1,
|
||||
"order-posts": "reaction",
|
||||
},
|
||||
"#results" : "https://bunkr.cr/v/BKLYkkr9KK6dg",
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://simpcity.cr/forums/asians.48/",
|
||||
"#category": ("xenforo", "simpcity", "forum"),
|
||||
|
||||
Reference in New Issue
Block a user