Compare commits

...

34 Commits

Author SHA1 Message Date
Mike Fährmann
9499356d8b release version 1.31.6 2026-02-08 19:12:27 +01:00
Mike Fährmann
45af69c5c1 [instagram] fix errors for missing user profiles 2026-02-08 18:48:22 +01:00
Mike Fährmann
7478b27078 [cookies] support Firefox 147+ profile paths (#8803)
https://github.com/mikf/gallery-dl/issues/8803#issuecomment-3866682047
2026-02-08 18:02:33 +01:00
Mike Fährmann
ad51ad7637 [instagram] fix 'avatar' & 'info' extractors (#8978)
export user lookup logic into 'user_by_screen_name' method
2026-02-08 17:58:05 +01:00
Mike Fährmann
86c96315e1 [imagefap:user] support multiple pages (#9016) 2026-02-08 17:58:05 +01:00
wise-immersion
a11f53571b [fikfap] add 'hashtag' extractor (#9018)
Added functionality to extract by hashtag and save to directory named after the hashtag.
2026-02-08 17:58:05 +01:00
wise-immersion
590bfed134 [fikfap] allow for dash in usernames (#9019) 2026-02-08 17:58:05 +01:00
Mike Fährmann
fcac444ca5 [reddit] try to improve comment metadata (#8228)
* provide toplevel 'date'
* preserve 'submission' data
2026-02-08 17:58:05 +01:00
Mike Fährmann
af6109ed45 [reddit:user] implement 'only' option (#8228) 2026-02-08 17:58:05 +01:00
Mike Fährmann
ebab95aeb6 [reddit:user] provide 'user' metadata field (#8228) 2026-02-08 17:58:05 +01:00
Mike Fährmann
fba82f10d6 [twitter] support 'article' media (#8995) 2026-02-08 17:58:05 +01:00
Mike Fährmann
e20022463a [instagram] use '/topsearch/' to fetch user information (#8978) 2026-02-08 17:58:05 +01:00
dimomarg
5b53115e28 [cookies] add support for Floorp (#9005) 2026-02-08 17:58:04 +01:00
Mike Fährmann
9e98255608 [imhentai] use alternate strategy for galleries without image data (#8951) 2026-02-08 17:58:04 +01:00
Mike Fährmann
3ecdbbc787 [artstation] fix & update 'challenge' extractor 2026-02-08 17:58:04 +01:00
Mike Fährmann
a719e01e86 [artstation] download '/8k/' images (#9003) 2026-02-08 17:58:04 +01:00
Mike Fährmann
31e507288a [pixiv] fix errors when using metadata options for avatar/background
(#9002)
2026-02-08 17:58:04 +01:00
Mike Fährmann
d5b615253a [instagram] cache '/users/web_profile_info' results on disk (#8978)
In the rare case this endpoint returns results and not a 429 error,
store them locally so they can be re-used the next time this user
is downloaded from.
2026-02-08 17:58:04 +01:00
Mike Fährmann
4277724af7 [imagefap] use self.groups, remove __init__ 2026-02-08 17:58:04 +01:00
Mike Fährmann
c8412d651a [xenforo] implement '"order-posts": "reaction"' (#8997) 2026-02-08 17:58:04 +01:00
Mike Fährmann
e81f43adca [simpcity] extract 'reddit' media embeds (#8994) 2026-02-08 17:58:04 +01:00
Mike Fährmann
dd79b7633c [simpcity] extract 'tiktok' media embeds (#8994) 2026-02-08 17:58:04 +01:00
Mike Fährmann
47ae8dee22 [job] fix overwriting '_extractor' (#8958) 2026-02-08 17:58:04 +01:00
CasualYouTuber31
f8ffcaba3e [tiktok] always try to resolve JS challenges even if retries is set to 0 (#8993)
* [tiktok] always try to resolve JS challenges even if retries is set to 0

* add 1 to tries counter when logging to retain existing logging behavior

* clear html data in the case where resolving the challenge worked but extracting the rehydration data afterward did not
2026-02-08 17:58:04 +01:00
CasualYouTuber31
fc39a6055f [tiktok] use time cursor for story requests (#8991) 2026-02-08 17:58:04 +01:00
CasualYouTuber31
5bf2b74b69 [tiktok] identify when user accounts do not exist (#8977) 2026-02-08 17:58:03 +01:00
CasualYouTuber31
20ffd65657 [tiktok] do not exit early when rolling back cursor (#8968)
* [tiktok] do not exit account extraction early when we need to manually roll back the cursor
* [tiktok] fix rehydration data error string formatting
2026-02-08 17:58:03 +01:00
CasualYouTuber31
2a89a60723 [tiktok] fix outdated error message (#8979) 2026-02-08 17:58:03 +01:00
Mike Fährmann
d414031d9c use tempfile when updating input files (#8981)
0d72789aa3
2026-02-08 17:58:03 +01:00
Mike Fährmann
cc1c95feb8 [8chan] fail downloads of 'POW' images (#8975) 2026-02-08 17:58:03 +01:00
Mike Fährmann
d659d33b60 [8chan] skip 'TOS' cookie name lookup if already present 2026-02-08 17:58:03 +01:00
Mike Fährmann
83be858d61 [xhamster] fix user profile extraction (#8974) 2026-02-08 17:58:03 +01:00
Mike Fährmann
cbe576f7a8 [discord:server-search] use 'max_id' for pagination
'offset' is limited to 10_000
'max_id' is hopefully not
2026-02-08 17:58:03 +01:00
Mike Fährmann
a4497de282 [artstation] fix embedded videos (#8972) 2026-02-08 17:58:03 +01:00
27 changed files with 570 additions and 203 deletions

View File

@@ -1,5 +1,41 @@
# Changelog
## 1.31.6 - 2026-02-08
### Extractors
#### Additions
- [fikfap] add `hashtag` extractor ([#9018](https://github.com/mikf/gallery-dl/issues/9018))
#### Fixes
- [8chan] fail downloads of `POW` images ([#8975](https://github.com/mikf/gallery-dl/issues/8975))
- [artstation] fix embedded videos ([#8972](https://github.com/mikf/gallery-dl/issues/8972) [#9003](https://github.com/mikf/gallery-dl/issues/9003))
- [artstation] fix `challenge` extractor
- [imagefap:user] support multiple pages ([#9016](https://github.com/mikf/gallery-dl/issues/9016))
- [imhentai] use alternate strategy for galleries without image data ([#8951](https://github.com/mikf/gallery-dl/issues/8951))
- [instagram] use `/topsearch/` to fetch user information ([#8978](https://github.com/mikf/gallery-dl/issues/8978))
- [pixiv] fix errors when using metadata options for avatar/background
- [simpcity] extract `tiktok` & `reddit` media embeds ([#8994](https://github.com/mikf/gallery-dl/issues/8994) [#8996](https://github.com/mikf/gallery-dl/issues/8996))
- [tiktok] always try to resolve JS challenges ([#8993](https://github.com/mikf/gallery-dl/issues/8993))
- [tiktok] use time cursor for story requests ([#8991](https://github.com/mikf/gallery-dl/issues/8991))
- [tiktok] identify when user accounts do not exist ([#8977](https://github.com/mikf/gallery-dl/issues/8977))
- [tiktok] do not exit early when rolling back cursor ([#8968](https://github.com/mikf/gallery-dl/issues/8968))
- [xhamster] fix user profile extraction ([#8974](https://github.com/mikf/gallery-dl/issues/8974))
#### Improvements
- [8chan] skip `TOS` cookie name lookup if already present
- [artstation] download `/8k/` images ([#9003](https://github.com/mikf/gallery-dl/issues/9003))
- [discord:server-search] use `max_id` for pagination
- [fikfap] allow for dashes in usernames ([#9019](https://github.com/mikf/gallery-dl/issues/9019))
- [instagram] cache user profile results on disk ([#8978](https://github.com/mikf/gallery-dl/issues/8978))
- [reddit:user] implement `only` option ([#8228](https://github.com/mikf/gallery-dl/issues/8228))
- [reddit:user] provide `user` metadata ([#8228](https://github.com/mikf/gallery-dl/issues/8228))
- [tiktok] fix outdated error message ([#8979](https://github.com/mikf/gallery-dl/issues/8979))
- [twitter] support `article` media ([#8995](https://github.com/mikf/gallery-dl/issues/8995))
- [xenforo] implement `"order-posts": "reaction"` ([#8997](https://github.com/mikf/gallery-dl/issues/8997))
### Cookies
- add support for `Floorp` ([#9005](https://github.com/mikf/gallery-dl/issues/9005))
- support `Firefox` 147+ profile paths ([#8803](https://github.com/mikf/gallery-dl/issues/8803))
### Miscellaneous
- [job] fix overwriting `_extractor` fields ([#8958](https://github.com/mikf/gallery-dl/issues/8958))
- use tempfile when updating input files ([#8981](https://github.com/mikf/gallery-dl/issues/8981))
## 1.31.5 - 2026-01-31
### Extractors
#### Additions

View File

@@ -79,9 +79,9 @@ Standalone Executable
Prebuilt executable files with a Python interpreter and
required Python packages included are available for
- `Windows <https://github.com/mikf/gallery-dl/releases/download/v1.31.5/gallery-dl.exe>`__
- `Windows <https://github.com/mikf/gallery-dl/releases/download/v1.31.6/gallery-dl.exe>`__
(Requires `Microsoft Visual C++ Redistributable Package (x86) <https://aka.ms/vs/17/release/vc_redist.x86.exe>`__)
- `Linux <https://github.com/mikf/gallery-dl/releases/download/v1.31.5/gallery-dl.bin>`__
- `Linux <https://github.com/mikf/gallery-dl/releases/download/v1.31.6/gallery-dl.bin>`__
Nightly Builds

View File

@@ -5350,6 +5350,16 @@ Note
but it will not always get the best video quality available.
extractor.reddit.user.only
--------------------------
Type
``bool``
Default
``trur``
Description
Only process and return posts from the user specified in the input URL.
extractor.redgifs.format
------------------------
Type
@@ -6284,6 +6294,16 @@ Description
Fetch media from promoted Tweets.
extractor.twitter.articles
--------------------------
Type
``bool``
Default
``true``
Description
Download media embedded in articles.
extractor.twitter.cards
-----------------------
Type
@@ -7196,15 +7216,20 @@ extractor.[xenforo].order-posts
Type
``string``
Default
``"desc"``
``thread``
``"desc"``
otherwise
``"asc"``
Description
Controls the order in which
posts of a ``thread`` are processed.
posts of a ``thread`` or `media` files are processed.
``"asc"``
Ascending order (oldest first)
``"desc"`` | ``"reverse"``
Descending order (newest first)
``"reaction"`` | ``"score"``
Reaction Score order (``threads`` only)
extractor.ytdl.cmdline-args

View File

@@ -704,7 +704,11 @@
"previews" : true,
"recursion" : 0,
"selftext" : null,
"videos" : "dash"
"videos" : "dash",
"user": {
"only": true
}
},
"redgifs":
{
@@ -876,6 +880,7 @@
"cookies" : null,
"ads" : false,
"articles" : true,
"cards" : false,
"cards-blacklist": [],
"csrf" : "cookies",

View File

@@ -346,7 +346,7 @@ Consider all listed sites to potentially be NSFW.
<tr id="fikfap" title="fikfap">
<td>FikFap</td>
<td>https://fikfap.com/</td>
<td>Posts, User Profiles</td>
<td>Hashtags, Posts, User Profiles</td>
<td></td>
</tr>
<tr id="fitnakedgirls" title="fitnakedgirls">

View File

@@ -1,11 +1,12 @@
# -*- coding: utf-8 -*-
# Copyright 2014-2025 Mike Fährmann
# Copyright 2014-2026 Mike Fährmann
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 2 as
# published by the Free Software Foundation.
import os
import sys
import logging
from . import version, config, option, output, extractor, job, util, exception
@@ -542,11 +543,14 @@ class InputManager():
def _rewrite(self):
url, path, action, indicies = self._item
path_tmp = path + ".tmp"
lines = self.files[path]
action(lines, indicies)
try:
with open(path, "w", encoding="utf-8") as fp:
with open(path_tmp, "w", encoding="utf-8") as fp:
fp.writelines(lines)
os.replace(path_tmp, path)
except Exception as exc:
self.log.warning(
"Unable to update '%s' (%s: %s)",

View File

@@ -26,7 +26,7 @@ from . import aes, text, util
SUPPORTED_BROWSERS_CHROMIUM = {
"brave", "chrome", "chromium", "edge", "opera", "thorium", "vivaldi"}
SUPPORTED_BROWSERS_FIREFOX = {"firefox", "librewolf", "zen"}
SUPPORTED_BROWSERS_FIREFOX = {"firefox", "librewolf", "zen", "floorp"}
SUPPORTED_BROWSERS_WEBKIT = {"safari", "orion"}
SUPPORTED_BROWSERS = \
SUPPORTED_BROWSERS_CHROMIUM \
@@ -51,8 +51,8 @@ def load_cookies(browser_specification):
def load_cookies_firefox(browser_name, profile=None,
container=None, domain=None):
path, container_id = _firefox_cookies_database(browser_name,
profile, container)
path, container_id = _firefox_cookies_database(
browser_name, profile, container)
sql = ("SELECT name, value, host, path, isSecure, expiry "
"FROM moz_cookies")
@@ -219,13 +219,16 @@ def _firefox_cookies_database(browser_name, profile=None, container=None):
elif _is_path(profile):
search_root = profile
else:
search_root = os.path.join(
_firefox_browser_directory(browser_name), profile)
search_root = _firefox_browser_directory(browser_name)
if isinstance(search_root, str):
search_root = os.path.join(search_root, profile)
else:
search_root = [os.path.join(dir, profile) for dir in search_root]
path = _find_most_recently_used_file(search_root, "cookies.sqlite")
if path is None:
raise FileNotFoundError(f"Unable to find {browser_name.capitalize()} "
f"cookies database in {search_root}")
f"cookies database")
_log_debug("Extracting cookies from %s", path)
@@ -272,6 +275,7 @@ def _firefox_browser_directory(browser_name):
"firefox" : join(appdata, R"Mozilla\Firefox\Profiles"),
"librewolf": join(appdata, R"librewolf\Profiles"),
"zen" : join(appdata, R"zen\Profiles"),
"floorp" : join(appdata, R"Floorp\Profiles")
}[browser_name]
elif sys.platform == "darwin":
appdata = os.path.expanduser("~/Library/Application Support")
@@ -279,14 +283,25 @@ def _firefox_browser_directory(browser_name):
"firefox" : join(appdata, R"Firefox/Profiles"),
"librewolf": join(appdata, R"librewolf/Profiles"),
"zen" : join(appdata, R"zen/Profiles"),
"floorp" : join(appdata, R"Floorp/Profiles")
}[browser_name]
else:
home = os.path.expanduser("~")
return {
"firefox" : join(home, R".mozilla/firefox"),
"librewolf": join(home, R".librewolf"),
"zen" : join(home, R".zen"),
}[browser_name]
if browser_name == "firefox":
config = (os.environ.get("XDG_CONFIG_HOME") or
os.path.expanduser("~/.config"))
return (
# versions >= 147
join(config, "mozilla/firefox"),
# versions <= 146
home + "/.mozilla/firefox",
# Flatpak
home + "/.var/app/org.mozilla.firefox/config/mozilla/firefox",
home + "/.var/app/org.mozilla.firefox/.mozilla/firefox",
# Snap
home + "/snap/firefox/common/.mozilla/firefox",
)
return f"{home}/.{browser_name}"
# --------------------------------------------------------------------
@@ -1095,19 +1110,24 @@ def _decrypt_windows_dpapi(ciphertext):
return result
def _find_most_recently_used_file(root, filename):
def _find_most_recently_used_file(roots, filename):
if isinstance(roots, str):
roots = (roots,)
# if the provided root points to an exact profile path
# check if it contains the wanted filename
first_choice = os.path.join(root, filename)
if os.path.exists(first_choice):
return first_choice
for root in roots:
first_choice = os.path.join(root, filename)
if os.path.exists(first_choice):
return first_choice
# if there are multiple browser profiles, take the most recently used one
paths = []
for curr_root, dirs, files in os.walk(root):
for file in files:
if file == filename:
paths.append(os.path.join(curr_root, file))
for root in roots:
for curr_root, dirs, files in os.walk(root):
for file in files:
if file == filename:
paths.append(os.path.join(curr_root, file))
if not paths:
return None
return max(paths, key=lambda path: os.lstat(path).st_mtime)

View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
# Copyright 2022-2025 Mike Fährmann
# Copyright 2022-2026 Mike Fährmann
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 2 as
@@ -27,6 +27,13 @@ class _8chanExtractor(Extractor):
@memcache()
def cookies_tos_name(self):
domain = "8chan." + self.groups[0]
for cookie in self.cookies:
if cookie.domain == domain and \
cookie.name.lower().startswith("tos"):
self.log.debug("TOS cookie name: %s", cookie.name)
return cookie.name
url = self.root + "/.static/pages/confirmed.html"
headers = {"Referer": self.root + "/.static/pages/disclaimer.html"}
response = self.request(url, headers=headers, allow_redirects=False)
@@ -37,7 +44,7 @@ class _8chanExtractor(Extractor):
return cookie.name
self.log.error("Unable to determin TOS cookie name")
return "TOS20241009"
return "TOS20250418"
@memcache()
def cookies_prepare(self):
@@ -100,6 +107,7 @@ class _8chanThreadExtractor(_8chanExtractor):
for num, file in enumerate(files):
file.update(thread)
file["num"] = num
file["_http_validate"] = _validate
text.nameext_from_url(file["originalName"], file)
yield Message.Url, self.root + file["path"], file
@@ -130,3 +138,12 @@ class _8chanBoardExtractor(_8chanExtractor):
return
url = f"{self.root}/{board}/{pnum}.json"
threads = self.request_json(url)["threads"]
def _validate(response):
hget = response.headers.get
return not (
hget("expires") == "0" and
hget("content-length") == "166" and
hget("content-type") == "image/png"
)

View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
# Copyright 2018-2025 Mike Fährmann
# Copyright 2018-2026 Mike Fährmann
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 2 as
@@ -64,7 +64,7 @@ class ArtstationExtractor(Extractor):
if "/images/images/" in url:
lhs, _, rhs = url.partition("/large/")
if rhs:
url = f"{lhs}/4k/{rhs}"
url = f"{lhs}/8k/{rhs}"
asset["_fallback"] = self._image_fallback(lhs, rhs)
yield Message.Url, url, asset
@@ -88,7 +88,8 @@ class ArtstationExtractor(Extractor):
if not self.videos:
return
page = self.request(url).text
return text.extr(page, ' src="', '"')
return text.extract(
page, ' src="', '"', page.find('id="video"')+1)[0]
if url:
# external URL
@@ -102,6 +103,7 @@ class ArtstationExtractor(Extractor):
adict.get("id"))
def _image_fallback(self, lhs, rhs):
yield f"{lhs}/4k/{rhs}"
yield f"{lhs}/large/{rhs}"
yield f"{lhs}/medium/{rhs}"
yield f"{lhs}/small/{rhs}"
@@ -317,9 +319,9 @@ class ArtstationChallengeExtractor(ArtstationExtractor):
"{challenge[id]} - {challenge[title]}")
archive_fmt = "c_{challenge[id]}_{asset_id}"
pattern = (r"(?:https?://)?(?:www\.)?artstation\.com"
r"/contests/[^/?#]+/challenges/(\d+)"
r"/c(?:hallenges|ontests)/[^/?#]+/c(?:ategori|halleng)es/(\d+)"
r"/?(?:\?sorting=([a-z]+))?")
example = "https://www.artstation.com/contests/NAME/challenges/12345"
example = "https://www.artstation.com/challenges/NAME/categories/12345"
def __init__(self, match):
ArtstationExtractor.__init__(self, match)
@@ -327,24 +329,28 @@ class ArtstationChallengeExtractor(ArtstationExtractor):
self.sorting = match[2] or "popular"
def items(self):
base = f"{self.root}/contests/_/challenges/{self.challenge_id}"
challenge_url = base + ".json"
submission_url = base + "/submissions.json"
update_url = self.root + "/contests/submission_updates.json"
base = self.root + "/api/v2/competition/"
challenge_url = f"{base}challenges/{self.challenge_id}.json"
submission_url = base + "submissions.json"
challenge = self.request_json(challenge_url)
yield Message.Directory, "", {"challenge": challenge}
params = {"sorting": self.sorting}
params = {
"page" : 1,
"per_page" : 50,
"challenge_id": self.challenge_id,
"sort_by" : self.sorting,
}
for submission in self._pagination(submission_url, params):
params = {"submission_id": submission["id"]}
update_url = (f"{base}submissions/{submission['id']}"
f"/submission_updates.json")
params = {"page": 1, "per_page": 50}
for update in self._pagination(update_url, params=params):
del update["replies"]
update["challenge"] = challenge
for url in text.extract_iter(
update["body_presentation_html"], ' href="', '"'):
for url in util.unique_sequence(text.extract_iter(
update["body"], ' href="', '"')):
update["asset_id"] = self._id_from_url(url)
text.nameext_from_url(url, update)
yield Message.Url, self._no_cache(url), update

View File

@@ -447,8 +447,17 @@ class DiscordAPI():
MESSAGES_BATCH = 25
def _method(offset):
params["offset"] = offset
return self._call(url, params)["messages"]
messages = self._call(url, params)["messages"]
max_id = 0
for msgs in messages:
for msg in msgs:
mid = int(msg["id"])
if max_id > mid or not max_id:
max_id = mid
params["max_id"] = max_id
return messages
url = f"/guilds/{server_id}/messages/search"
return self._pagination(_method, MESSAGES_BATCH)

View File

@@ -86,7 +86,7 @@ class FikfapPostExtractor(FikfapExtractor):
class FikfapUserExtractor(FikfapExtractor):
subcategory = "user"
pattern = BASE_PATTERN + r"/user/(\w+)"
pattern = BASE_PATTERN + r"/user/([\w-]+)"
example = "https://fikfap.com/user/USER"
def posts(self):
@@ -103,3 +103,25 @@ class FikfapUserExtractor(FikfapExtractor):
if len(data) < 21:
return
params["afterId"] = data[-1]["postId"]
class FikfapHashtagExtractor(FikfapExtractor):
subcategory = "hashtag"
directory_fmt = ("{category}", "{hashtag}")
pattern = BASE_PATTERN + r"/hash/([\w-]+)"
example = "https://fikfap.com/hash/HASH"
def posts(self):
self.kwdict["hashtag"] = hashtag = self.groups[0]
url = f"{self.root_api}/hashtags/label/{hashtag}/posts"
params = {"amount": "21"}
while True:
data = self.request_api(url, params)
yield from data
if len(data) < 21:
return
params["afterId"] = data[-1]["postId"]

View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
# Copyright 2016-2025 Mike Fährmann
# Copyright 2016-2026 Mike Fährmann
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 2 as
@@ -28,10 +28,10 @@ class ImagefapExtractor(Extractor):
response = Extractor.request(self, url, **kwargs)
if response.history and response.url.endswith("/human-verification"):
self.log.warning("HTTP redirect to '%s'", response.url)
if msg := text.extr(response.text, '<div class="mt-4', '<'):
msg = " ".join(msg.partition(">")[2].split())
raise exception.AbortExtraction(f"'{msg}'")
self.log.warning("HTTP redirect to %s", response.url)
return response
@@ -42,12 +42,8 @@ class ImagefapGalleryExtractor(ImagefapExtractor):
pattern = BASE_PATTERN + r"/(?:gallery\.php\?gid=|gallery/|pictures/)(\d+)"
example = "https://www.imagefap.com/gallery/12345"
def __init__(self, match):
ImagefapExtractor.__init__(self, match)
self.gid = match[1]
self.image_id = ""
def items(self):
self.gid = self.groups[0]
url = f"{self.root}/gallery/{self.gid}"
page = self.request(url).text
data = self.get_job_metadata(page)
@@ -113,17 +109,13 @@ class ImagefapImageExtractor(ImagefapExtractor):
pattern = BASE_PATTERN + r"/photo/(\d+)"
example = "https://www.imagefap.com/photo/12345"
def __init__(self, match):
ImagefapExtractor.__init__(self, match)
self.image_id = match[1]
def items(self):
url, data = self.get_image()
yield Message.Directory, "", data
yield Message.Url, url, data
def get_image(self):
url = f"{self.root}/photo/{self.image_id}/"
url = f"{self.root}/photo/{self.groups[0]}/"
page = self.request(url).text
url, pos = text.extract(
@@ -153,13 +145,8 @@ class ImagefapFolderExtractor(ImagefapExtractor):
r"|profile/([^/?#]+)/galleries\?)folderid=)(\d+|-1)")
example = "https://www.imagefap.com/organizer/12345"
def __init__(self, match):
ImagefapExtractor.__init__(self, match)
self._id, user, profile, self.folder_id = match.groups()
self.user = user or profile
def items(self):
for gallery_id, name, folder in self.galleries(self.folder_id):
for gallery_id, name, folder in self.galleries():
url = f"{self.root}/gallery/{gallery_id}"
data = {
"gallery_id": gallery_id,
@@ -169,22 +156,25 @@ class ImagefapFolderExtractor(ImagefapExtractor):
}
yield Message.Queue, url, data
def galleries(self, folder_id):
def galleries(self):
"""Yield gallery IDs and titles of a folder"""
_id, user, profile, folder_id = self.groups
if folder_id == "-1":
folder_name = "Uncategorized"
if self._id:
if _id:
url = (f"{self.root}/usergallery.php"
f"?userid={self.user}&folderid=-1")
f"?userid={user}&folderid=-1")
else:
url = f"{self.root}/profile/{self.user}/galleries?folderid=-1"
url = (f"{self.root}/profile/"
f"{user or profile}/galleries?folderid=-1")
else:
folder_name = None
url = f"{self.root}/organizer/{folder_id}/"
params = {"page": 0}
extr = text.extract_from(self.request(url, params=params).text)
if not folder_name:
if folder_name is None:
folder_name = extr("class'blk_galleries'><b>", "</b>")
while True:
@@ -211,28 +201,44 @@ class ImagefapUserExtractor(ImagefapExtractor):
r"|usergallery\.php\?userid=(\d+))(?:$|#)")
example = "https://www.imagefap.com/profile/USER"
def __init__(self, match):
ImagefapExtractor.__init__(self, match)
self.user, self.user_id = match.groups()
def items(self):
data = {"_extractor": ImagefapFolderExtractor}
for folder_id in self.folders():
if folder_id == "-1":
url = f"{self.root}/profile/{self.user}/galleries?folderid=-1"
url = (f"{self.root}/profile/{self.user}/galleries"
f"?folderid=-1")
else:
url = f"{self.root}/organizer/{folder_id}/"
yield Message.Queue, url, data
def folders(self):
"""Return a list of folder IDs of a user"""
if self.user:
url = f"{self.root}/profile/{self.user}/galleries"
user, user_id = self.groups
if user:
url = f"{self.root}/profile/{user}/galleries"
else:
url = f"{self.root}/usergallery.php?userid={self.user_id}"
url = f"{self.root}/usergallery.php?userid={user_id}"
params = {"page": 0}
pnum = 0
response = self.request(url)
self.user = response.url.split("/")[-2]
folders = text.extr(response.text, ' id="tgl_all" value="', '"')
return folders.rstrip("|").split("|")
self.user = None
while True:
response = self.request(url, params=params)
if self.user is None:
url = response.url.partition("?")[0]
self.user = url.rsplit("/", 2)[1]
page = response.text
folders = text.extr(
page, ' id="tgl_all" value="', '"').rstrip("|").split("|")
if folders and folders[-1] == "-1":
last = folders.pop()
if not pnum:
folders.insert(0, last)
yield from folders
params["page"] = pnum = pnum + 1
if f'href="?page={pnum}">{pnum+1}</a>' not in page:
return

View File

@@ -123,18 +123,29 @@ class ImhentaiGalleryExtractor(ImhentaiExtractor, GalleryExtractor):
return results
def images(self, page):
base = text.extr(page, 'data-src="', '"').rpartition("/")[0] + "/"
exts = {"j": "jpg", "p": "png", "g": "gif", "w": "webp", "a": "avif"}
try:
data = util.json_loads(text.extr(page, "$.parseJSON('", "'"))
except Exception:
data = None
if data is None:
self.log.warning("%s: Missing image data", self.gallery_id)
return ()
base = text.extr(page, 'data-src="', '"').rpartition("/")[0] + "/"
exts = {"j": "jpg", "p": "png", "g": "gif", "w": "webp", "a": "avif"}
def _fallback_exts(i):
for ext in util.advance(exts.values(), 1):
yield f"{base}{i}.{ext}"
cnt = text.parse_int(text.extr(
page, 'id="load_pages" value="', '"'))
return [(f"{base}{i}.jpg", {"_fallback": _fallback_exts(i)})
for i in range(1, cnt+1)]
results = []
for i in map(str, range(1, len(data)+1)):
ext, width, height = data[i].split(",")
url = base + i + "." + exts[ext]
url = f"{base}{i}.{exts[ext]}"
results.append((url, {
"width" : text.parse_int(width),
"height": text.parse_int(height),

View File

@@ -1,7 +1,7 @@
# -*- coding: utf-8 -*-
# Copyright 2018-2020 Leonardo Taccari
# Copyright 2018-2025 Mike Fährmann
# Copyright 2018-2026 Mike Fährmann
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 2 as
@@ -11,7 +11,7 @@
from .common import Extractor, Message, Dispatch
from .. import text, util, exception
from ..cache import cache, memcache
from ..cache import cache
import itertools
import binascii
@@ -570,7 +570,7 @@ class InstagramTaggedExtractor(InstagramExtractor):
user = self.api.user_by_id(self.user_id)
else:
self.user_id = self.api.user_id(self.item)
user = self.api.user_by_name(self.item)
user = self.api.user_by_screen_name(self.item)
return {
"tagged_owner_id" : user["id"],
@@ -757,7 +757,7 @@ class InstagramInfoExtractor(InstagramExtractor):
if screen_name.startswith("id:"):
user = self.api.user_by_id(screen_name[3:])
else:
user = self.api.user_by_name(screen_name)
user = self.api.user_by_screen_name(screen_name)
return iter(((Message.Directory, "", user),))
@@ -779,7 +779,7 @@ class InstagramAvatarExtractor(InstagramExtractor):
if user.startswith("id:"):
user = self.api.user_by_id(user[3:])
else:
user = self.api.user_by_name(user)
user = self.api.user_by_screen_name(user)
user["pk"] = user["id"]
url = user.get("profile_pic_url_hd") or user["profile_pic_url"]
avatar = {"url": url, "width": 0, "height": 0}
@@ -869,36 +869,37 @@ class InstagramRestAPI():
}
return self._pagination_sections(endpoint, data)
@memcache(keyarg=1)
def user_by_name(self, screen_name):
endpoint = "/v1/users/web_profile_info/"
params = {"username": screen_name}
try:
return self._call(
endpoint, params=params, notfound="user")["data"]["user"]
except KeyError:
raise exception.NotFoundError("user")
@memcache(keyarg=1)
@cache(maxage=36500*86400, keyarg=1)
def user_by_id(self, user_id):
endpoint = f"/v1/users/{user_id}/info/"
return self._call(endpoint)["user"]
def user_by_screen_name(self, screen_name):
user = user_by_search(self, screen_name)
if user is None:
user_by_search.invalidate(screen_name)
self.extractor.log.warning(
"Failed to find profile '%s' via search. "
"Trying 'web_profile_info' fallback", screen_name)
user = user_by_name(self, screen_name)
if user is None:
user_by_name.invalidate(screen_name)
raise exception.NotFoundError("user")
return user
def user_id(self, screen_name, check_private=True):
if screen_name.startswith("id:"):
if self.extractor.config("metadata"):
self.extractor._user = self.user_by_id(screen_name[3:])
return screen_name[3:]
user = self.user_by_name(screen_name)
if user is None:
raise exception.AuthorizationError(
"Login required to access this profile")
if check_private and user["is_private"] and \
not user["followed_by_viewer"]:
user = self.user_by_screen_name(screen_name)
if check_private and user.get("is_private") and \
not user.get("followed_by_viewer", True):
name = user["username"]
s = "" if name.endswith("s") else "s"
self.extractor.log.warning("%s'%s posts are private", name, s)
self.extractor._assign_user(user)
return user["id"]
@@ -945,7 +946,10 @@ class InstagramRestAPI():
def _call(self, endpoint, **kwargs):
extr = self.extractor
url = "https://www.instagram.com/api" + endpoint
if endpoint[0] == "/":
url = "https://www.instagram.com/api" + endpoint
else:
url = endpoint
kwargs["headers"] = {
"Accept" : "*/*",
"X-CSRFToken" : extr.csrf_token,
@@ -1049,7 +1053,7 @@ class InstagramGraphqlAPI():
self._json_dumps = util.json_dumps
api = InstagramRestAPI(extractor)
self.user_by_name = api.user_by_name
self.user_by_screen_name = api.user_by_screen_name
self.user_by_id = api.user_by_id
self.user_id = api.user_id
@@ -1155,6 +1159,29 @@ def _login_impl(extr, username, password):
return {}
@cache(maxage=36500*86400, keyarg=1)
def user_by_name(self, screen_name):
endpoint = "/v1/users/web_profile_info/"
params = {"username": screen_name}
try:
return self._call(
endpoint, params=params, notfound="user")["data"]["user"]
except KeyError:
raise exception.NotFoundError("user")
@cache(maxage=36500*86400, keyarg=1)
def user_by_search(self, screen_name):
url = "https://www.instagram.com/web/search/topsearch/"
params = {"query": screen_name}
name = screen_name.lower()
for result in self._call(url, params=params)["users"]:
user = result["user"]
if user["username"].lower() == name:
return user
def id_from_shortcode(shortcode):
return util.bdecode(shortcode, _ALPHABET)

View File

@@ -376,6 +376,8 @@ class PixivExtractor(Extractor):
"meta_single_page": {"original_image_url": url},
"page_count" : 1,
"sanity_level" : 0,
"total_comments" : 0,
"is_bookmarked" : False,
"tags" : (),
"title" : kind,
"type" : kind,
@@ -520,7 +522,10 @@ class PixivAvatarExtractor(PixivExtractor):
def _init(self):
PixivExtractor._init(self)
self.sanity_workaround = self.meta_comments = False
self.sanity_workaround = \
self.meta_bookmark = \
self.meta_comments = \
self.meta_captions = False
def works(self):
user = self.api.user_detail(self.groups[0])["user"]
@@ -536,9 +541,7 @@ class PixivBackgroundExtractor(PixivExtractor):
pattern = USER_PATTERN + "/background"
example = "https://www.pixiv.net/en/users/12345/background"
def _init(self):
PixivExtractor._init(self)
self.sanity_workaround = self.meta_comments = False
_init = PixivAvatarExtractor._init
def works(self):
detail = self.api.user_detail(self.groups[0])

View File

@@ -55,7 +55,7 @@ class RedditExtractor(Extractor):
for submission, comments in submissions:
urls = []
if submission:
if submission and submission.get("_media", True):
submission["comment"] = None
submission["date"] = self.parse_timestamp(
submission["created_utc"])
@@ -126,7 +126,7 @@ class RedditExtractor(Extractor):
data = submission.copy()
data["comment"] = comment
comment["date"] = self.parse_timestamp(
comment["date"] = data["date"] = self.parse_timestamp(
comment["created_utc"])
if media:
@@ -300,19 +300,39 @@ class RedditHomeExtractor(RedditSubredditExtractor):
class RedditUserExtractor(RedditExtractor):
"""Extractor for URLs from posts by a reddit user"""
subcategory = "user"
directory_fmt = ("{category}", "Users", "{user[name]}")
pattern = (r"(?:https?://)?(?:\w+\.)?reddit\.com/u(?:ser)?/"
r"([^/?#]+(?:/([a-z]+))?)/?(?:\?([^#]*))?$")
example = "https://www.reddit.com/user/USER/"
def __init__(self, match):
self.user, sub, params = match.groups()
self.params = text.parse_query(params)
if sub:
if sub := match[2]:
self.subcategory += "-" + sub
RedditExtractor.__init__(self, match)
def submissions(self):
return self.api.submissions_user(self.user, self.params)
username, sub, qs = self.groups
username = text.unquote(username)
self.kwdict["user"] = user = self.api.user_about(username)
submissions = self.api.submissions_user(
user["name"], text.parse_query(qs))
if self.config("only", True):
submissions = self._only(submissions, user)
return submissions
def _only(self, submissions, user):
uid = "t2_" + user["id"]
for submission, comments in submissions:
if submission and submission.get("author_fullname") != uid:
submission["_media"] = False
comments = [
comment
for comment in (comments or ())
if comment.get("author_fullname") == uid
]
if submission or comments:
yield submission, comments
class RedditSubmissionExtractor(RedditExtractor):
@@ -323,12 +343,8 @@ class RedditSubmissionExtractor(RedditExtractor):
r"comments|gallery)|redd\.it)/([a-z0-9]+)")
example = "https://www.reddit.com/r/SUBREDDIT/comments/id/"
def __init__(self, match):
RedditExtractor.__init__(self, match)
self.submission_id = match[1]
def submissions(self):
return (self.api.submission(self.submission_id),)
return (self.api.submission(self.groups[0]),)
class RedditImageExtractor(Extractor):
@@ -449,9 +465,9 @@ class RedditAPI():
endpoint = subreddit + "/.json"
return self._pagination(endpoint, params)
def submissions_user(self, user, params):
def submissions_user(self, username, params):
"""Collect all (submission, comments)-tuples posted by a user"""
endpoint = "/user/" + user + "/.json"
endpoint = f"/user/{username}/.json"
return self._pagination(endpoint, params)
def morechildren(self, link_id, children):
@@ -472,6 +488,10 @@ class RedditAPI():
else:
yield thing["data"]
def user_about(self, username):
endpoint = f"/user/{username}/about.json"
return self._call(endpoint, {})["data"]
def authenticate(self):
"""Authenticate the application by requesting an access token"""
self.headers["Authorization"] = \

View File

@@ -181,21 +181,16 @@ class TiktokExtractor(Extractor):
raise KeyError(assert_key)
return data
except (ValueError, KeyError):
# We failed to retrieve rehydration data. This happens
# relatively frequently when making many requests, so retry.
if tries >= self._retries:
raise
tries += 1
self.log.warning("%s: Failed to retrieve rehydration data "
"(%s/%s)", url.rpartition("/")[2], tries,
self._retries)
if challenge_attempt:
self.sleep(self._timeout, "retry")
challenge_attempt = False
else:
# Even if the retries option has been set to 0, we should
# always at least try to solve the JS challenge and go again
# immediately.
if not challenge_attempt:
challenge_attempt = True
self.log.info("Solving JavaScript challenge")
try:
self._solve_challenge(html)
html = None
continue
except Exception as exc:
self.log.traceback(exc)
self.log.warning(
@@ -204,9 +199,19 @@ class TiktokExtractor(Extractor):
"with the --write-pages option and include the "
"resulting page in your bug report",
url.rpartition("/")[2])
self.sleep(self._timeout, "retry")
html = None
challenge_attempt = True
# We've already tried resolving the challenge, and either
# resolving it failed, or resolving it didn't get us the
# rehydration data, so fail this attempt.
self.log.warning("%s: Failed to retrieve rehydration data "
"(%s/%s)", url.rpartition("/")[2], tries + 1,
self._retries)
if tries >= self._retries:
raise
tries += 1
self.sleep(self._timeout, "retry")
challenge_attempt = False
html = None
def _extract_rehydration_data_user(self, profile_url, additional_keys=()):
if profile_url in self.rehydration_data_cache:
@@ -223,7 +228,7 @@ class TiktokExtractor(Extractor):
data = data["webapp.user-detail"]
if not self._check_status_code(data, profile_url, "profile"):
raise exception.ExtractionError(
"%s: could not extract rehydration data", profile_url)
f"{profile_url}: could not extract rehydration data")
try:
for key in additional_keys:
data = data[key]
@@ -475,6 +480,8 @@ class TiktokExtractor(Extractor):
self.log.error("%s: Login required to access this %s, or this "
"profile has no videos posted", url,
type_of_url)
elif status == 10221:
self.log.error("%s: User account could not be found", url)
elif status == 10204:
self.log.error("%s: Requested %s not available", url, type_of_url)
elif status == 10231:
@@ -605,8 +612,7 @@ class TiktokPostsExtractor(TiktokExtractor):
f"{user_name}"
if not ytdl:
message += ", try extracting post information using " \
"yt-dlp with the -o " \
"tiktok-user-extractor=ytdl argument"
"yt-dlp with the -o ytdl=true argument"
self.log.warning(message)
return ()
@@ -913,19 +919,22 @@ class TiktokPaginationCursor:
class TiktokTimeCursor(TiktokPaginationCursor):
def __init__(self, *, reverse=True):
def __init__(self, *, reverse=True, has_more_attribute="hasMore",
cursor_attribute="cursor"):
super().__init__()
self.cursor = 0
# If we expect the cursor to go up or down as we go to the next page.
# True for down, False for up.
self.reverse = reverse
self.has_more_key = has_more_attribute
self.cursor_key = cursor_attribute
def current_page(self):
return self.cursor
def next_page(self, data, query_parameters):
skip_fallback_logic = self.cursor == 0
new_cursor = int(data.get("cursor", 0))
new_cursor = int(data.get(self.cursor_key, 0))
no_cursor = not new_cursor
if not skip_fallback_logic:
# If the new cursor doesn't go in the direction we expect, use the
@@ -937,7 +946,7 @@ class TiktokTimeCursor(TiktokPaginationCursor):
elif no_cursor:
raise exception.ExtractionError("Could not extract next cursor")
self.cursor = new_cursor
return not data.get("hasMore", False)
return not data.get(self.has_more_key, False)
def fallback_cursor(self, data):
try:
@@ -967,6 +976,12 @@ class TiktokPopularTimeCursor(TiktokTimeCursor):
return -50_000
class TiktokStoryTimeCursor(TiktokTimeCursor):
def __init__(self):
super().__init__(reverse=False, has_more_attribute="HasMoreAfter",
cursor_attribute="MaxCursor")
class TiktokLegacyTimeCursor(TiktokPaginationCursor):
def __init__(self):
super().__init__()
@@ -1049,8 +1064,10 @@ class TiktokPaginationRequest:
cursor_type = self.cursor_type(query_parameters)
cursor = cursor_type() if cursor_type else None
for page in itertools.count(start=1):
extractor.log.info("%s: retrieving %s page %d", url, self.endpoint,
page)
item_count = len(self.items)
extractor.log.info("%s: retrieving %s page %d (%d item%s)", url,
self.endpoint, page, item_count,
"" if item_count == 1 else "s")
tries = 0
while True:
try:
@@ -1269,7 +1286,8 @@ class TiktokItemListRequest(TiktokPaginationRequest):
def extract_items(self, data):
if "itemList" not in data:
self.exit_early_due_to_no_items = True
if not data.get("hasMorePrevious", data.get("hasMore", False)):
self.exit_early_due_to_no_items = True
return {}
return {item["id"]: item for item in data["itemList"]}
@@ -1468,7 +1486,7 @@ class TiktokStoryItemListRequest(TiktokItemListRequest):
assert query_parameters["loadBackward"] in ["true", "false"]
def cursor_type(self, query_parameters):
return TiktokItemCursor
return TiktokStoryTimeCursor
class TiktokStoryBatchItemListRequest(TiktokItemListRequest):

View File

@@ -37,6 +37,7 @@ class TwitterExtractor(Extractor):
def _init(self):
self.unavailable = self.config("unavailable", False)
self.textonly = self.config("text-tweets", False)
self.articles = self.config("articles", True)
self.retweets = self.config("retweets", False)
self.replies = self.config("replies", True)
self.twitpic = self.config("twitpic", False)
@@ -159,6 +160,15 @@ class TwitterExtractor(Extractor):
"%s: Error while extracting Card files (%s: %s)",
data["id_str"], exc.__class__.__name__, exc)
if self.articles and "article" in tweet:
try:
self._extract_article(tweet, files)
except Exception as exc:
self.log.traceback(exc)
self.log.warning(
"%s: Error while extracting article files (%s: %s)",
data["id_str"], exc.__class__.__name__, exc)
if self.twitpic:
try:
self._extract_twitpic(data, files)
@@ -319,6 +329,31 @@ class TwitterExtractor(Extractor):
url = f"ytdl:{self.root}/i/web/status/{tweet_id}"
files.append({"url": url})
def _extract_article(self, tweet, files):
article = tweet["article"]["article_results"]["result"]
if media := article.get("cover_media"):
info = media["media_info"]
files.append({
"media_id" : media["media_id"],
"media_key": media["media_key"],
"url" : info["original_img_url"],
"width" : info["original_img_width"],
"height" : info["original_img_height"],
"type" : "article:cover",
})
for media in article["media_entities"]:
info = media["media_info"]
files.append({
"media_id" : media["media_id"],
"media_key": media["media_key"],
"url" : info["original_img_url"],
"width" : info["original_img_width"],
"height" : info["original_img_height"],
"type" : "article:cover",
})
def _extract_twitpic(self, tweet, files):
urls = {}

View File

@@ -9,7 +9,7 @@
"""Extractors for XenForo forums"""
from .common import BaseExtractor, Message
from .. import text, exception
from .. import text, util, exception
from ..cache import cache
import binascii
@@ -46,10 +46,10 @@ class XenforoExtractor(BaseExtractor):
base = root if (pos := root.find("/", 8)) < 0 else root[:pos]
for post in self.posts():
urls = extract_urls(post["content"])
if "data-s9e-mediaembed-iframe=" in post["content"]:
self._extract_embeds(urls, post)
if post["attachments"]:
for att in text.extract_iter(
post["attachments"], "<li", "</li>"):
urls.append((None, att[att.find('href="')+6:], None, None))
self._extract_attachments(urls, post)
data = {"post": post}
post["count"] = data["count"] = len(urls)
@@ -195,14 +195,14 @@ class XenforoExtractor(BaseExtractor):
if cookie.domain.endswith(self.cookies_domain)
}
def _pagination(self, base, pnum=None, callback=None):
def _pagination(self, base, pnum=None, callback=None, params=""):
base = self.root + base
if pnum is None:
url = base + "/"
url = f"{base}/{params}"
pnum = 1
else:
url = f"{base}/page-{pnum}"
url = f"{base}/page-{pnum}{params}"
pnum = None
page = self.request_page(url).text
@@ -214,7 +214,7 @@ class XenforoExtractor(BaseExtractor):
if pnum is None or "pageNav-jump--next" not in page:
return
pnum += 1
page = self.request_page(f"{base}/page-{pnum}").text
page = self.request_page(f"{base}/page-{pnum}{params}").text
def _pagination_reverse(self, base, pnum=None, callback=None):
base = self.root + base
@@ -340,6 +340,39 @@ class XenforoExtractor(BaseExtractor):
data["author_id"] = data["author"][15:]
return data
def _extract_attachments(self, urls, post):
for att in text.extract_iter(post["attachments"], "<li", "</li>"):
urls.append((None, att[att.find('href="')+6:], None, None))
def _extract_embeds(self, urls, post):
for embed in text.extract_iter(
post["content"], "data-s9e-mediaembed-iframe='", "'"):
data = {}
key = None
for value in util.json_loads(embed):
if key is None:
key = value
else:
data[key] = value
key = None
src = data.get("src")
if not src:
self.log.debug(data)
continue
type = data.get("data-s9e-mediaembed")
frag = src[src.find("#")+1:]
if type == "tiktok":
url = "https://www.tiktok.com/@/video/" + frag
elif type == "reddit":
url = "https://embed.reddit.com/r/" + frag
else:
self.log.warning("%s: Unsupported media embed type '%s'",
post["id"], type)
continue
urls.append((None, None, None, url))
def _extract_media(self, url, file):
media = {}
name, _, media["id"] = file.rpartition(".")
@@ -449,7 +482,12 @@ class XenforoThreadExtractor(XenforoExtractor):
if (order := self.config("order-posts")) and \
order[0] not in ("d", "r"):
pages = self._pagination(path, pnum)
params = "?order=reaction_score" if order[0] == "s" else ""
pages = self._pagination(path, pnum, params=params)
reverse = False
elif order == "reaction":
pages = self._pagination(
path, pnum, params="?order=reaction_score")
reverse = False
else:
pages = self._pagination_reverse(path, pnum)

View File

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
# Copyright 2019-2025 Mike Fährmann
# Copyright 2019-2026 Mike Fährmann
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 2 as
@@ -112,7 +112,7 @@ class XhamsterUserExtractor(XhamsterExtractor):
while url:
extr = text.extract_from(self.request(url).text)
while True:
url = extr('thumb-image-container role-pop" href="', '"')
url = extr(' role-pop" href="', '"')
if not url:
break
yield Message.Queue, url, data

View File

@@ -459,6 +459,8 @@ class DownloadJob(Job):
job.kwdict.update(self.kwdict)
if kwdict:
job.kwdict.update(kwdict)
if "_extractor" in kwdict:
del job.kwdict["_extractor"]
if pextr.config("parent-session", parent):
extr.session = pextr.session

View File

@@ -6,5 +6,5 @@
# it under the terms of the GNU General Public License version 2 as
# published by the Free Software Foundation.
__version__ = "1.31.5"
__version__ = "1.31.6"
__variant__ = None

View File

@@ -5,68 +5,62 @@
# published by the Free Software Foundation.
from gallery_dl.extractor import artstation
from gallery_dl import exception
__tests__ = (
{
"#url" : "https://www.artstation.com/sungchoi/",
"#category": ("", "artstation", "user"),
"#class" : artstation.ArtstationUserExtractor,
"#pattern" : r"https://\w+\.artstation\.com/p/assets/images/images/\d+/\d+/\d+/(4k|large|medium|small)/[^/]+",
"#pattern" : r"https://\w+\.artstation\.com/p/assets/images/images/\d+/\d+/\d+/8k/[^/]+",
"#range" : "1-10",
"#count" : ">= 10",
},
{
"#url" : "https://www.artstation.com/sungchoi/albums/all/",
"#category": ("", "artstation", "user"),
"#class" : artstation.ArtstationUserExtractor,
},
{
"#url" : "https://sungchoi.artstation.com/",
"#category": ("", "artstation", "user"),
"#class" : artstation.ArtstationUserExtractor,
},
{
"#url" : "https://sungchoi.artstation.com/projects/",
"#category": ("", "artstation", "user"),
"#comment" : "alternate user URL format",
"#class" : artstation.ArtstationUserExtractor,
},
{
"#url" : "https://www.artstation.com/huimeiye/albums/770899",
"#category": ("", "artstation", "album"),
"#comment" : "'Hellboy' album",
"#class" : artstation.ArtstationAlbumExtractor,
"#count" : 2,
},
{
"#url" : "https://www.artstation.com/huimeiye/albums/770898",
"#category": ("", "artstation", "album"),
"#comment" : "non-existent album",
"#class" : artstation.ArtstationAlbumExtractor,
"#exception": exception.NotFoundError,
"#exception": "NotFoundError",
},
{
"#url" : "https://huimeiye.artstation.com/albums/770899",
"#category": ("", "artstation", "album"),
"#comment" : "alternate user URL format",
"#class" : artstation.ArtstationAlbumExtractor,
},
{
"#url" : "https://www.artstation.com/mikf/likes",
"#category": ("", "artstation", "likes"),
"#class" : artstation.ArtstationLikesExtractor,
"#pattern" : r"https://\w+\.artstation\.com/p/assets/images/images/\d+/\d+/\d+/(4k|large|medium|small)/[^/]+",
"#pattern" : r"https://\w+\.artstation\.com/p/assets/images/images/\d+/\d+/\d+/8k/[^/]+",
"#count" : 6,
},
{
"#url" : "https://www.artstation.com/mikf/collections/2647023",
"#category": ("", "artstation", "collection"),
"#class" : artstation.ArtstationCollectionExtractor,
"#count" : 10,
@@ -85,7 +79,6 @@ __tests__ = (
{
"#url" : "https://www.artstation.com/mikf/collections",
"#category": ("", "artstation", "collections"),
"#class" : artstation.ArtstationCollectionsExtractor,
"#results" : (
"https://www.artstation.com/mikf/collections/2647023",
@@ -105,28 +98,42 @@ __tests__ = (
{
"#url" : "https://www.artstation.com/sungchoi/likes",
"#comment" : "no likes",
"#category": ("", "artstation", "likes"),
"#class" : artstation.ArtstationLikesExtractor,
"#count" : 0,
},
{
"#url" : "https://www.artstation.com/contests/thu-2017/challenges/20",
"#category": ("", "artstation", "challenge"),
"#class" : artstation.ArtstationChallengeExtractor,
},
{
"#url" : "https://www.artstation.com/contests/beyond-human/challenges/23?sorting=winners",
"#category": ("", "artstation", "challenge"),
"#url" : "https://www.artstation.com/challenges/beyond-human/categories/23/submissions",
"#class" : artstation.ArtstationChallengeExtractor,
},
{
"#url" : "https://www.artstation.com/contests/beyond-human/challenges/23?sorting=popular",
"#class" : artstation.ArtstationChallengeExtractor,
"#range" : "1-30",
"#count" : 30,
"challenge": {
"id" : 23,
"headline" : "Imagining Where Future Humans Live",
"created_at": "2017-06-26T14:45:43+00:00",
"contest" : {
"archived" : True,
"published": True,
"slug" : "beyond-human",
"title" : "Beyond Human",
"submissions_count": 4258,
},
},
},
{
"#url" : "https://www.artstation.com/search?query=ancient&sort_by=rank",
"#category": ("", "artstation", "search"),
"#class" : artstation.ArtstationSearchExtractor,
"#range" : "1-20",
"#count" : 20,
@@ -134,7 +141,6 @@ __tests__ = (
{
"#url" : "https://www.artstation.com/artwork?sorting=latest",
"#category": ("", "artstation", "artwork"),
"#class" : artstation.ArtstationArtworkExtractor,
"#range" : "1-20",
"#count" : 20,
@@ -142,16 +148,14 @@ __tests__ = (
{
"#url" : "https://www.artstation.com/artwork/LQVJr",
"#category": ("", "artstation", "image"),
"#class" : artstation.ArtstationImageExtractor,
"#pattern" : r"https?://\w+\.artstation\.com/p/assets/images/images/008/760/279/4k/.+",
"#pattern" : r"https?://\w+\.artstation\.com/p/assets/images/images/008/760/279/8k/.+",
"#sha1_content": "3f211ce0d6ecdb502db2cdf7bbeceb11d8421170",
},
{
"#url" : "https://www.artstation.com/artwork/Db3dy",
"#comment" : "multiple images per project",
"#category": ("", "artstation", "image"),
"#class" : artstation.ArtstationImageExtractor,
"#count" : 4,
},
@@ -159,7 +163,6 @@ __tests__ = (
{
"#url" : "https://www.artstation.com/artwork/lR8b5k",
"#comment" : "artstation video clips (#2566)",
"#category": ("", "artstation", "image"),
"#class" : artstation.ArtstationImageExtractor,
"#options" : {"videos": True},
"#range" : "2-3",
@@ -185,7 +188,6 @@ __tests__ = (
{
"#url" : "https://www.artstation.com/artwork/g4WPK",
"#comment" : "embedded youtube video",
"#category": ("", "artstation", "image"),
"#class" : artstation.ArtstationImageExtractor,
"#options" : {"external": True},
"#pattern" : r"ytdl:https://www\.youtube(-nocookie)?\.com/embed/JNFfJtwwrU0",
@@ -195,27 +197,23 @@ __tests__ = (
{
"#url" : "https://www.artstation.com/artwork/3q3mXB",
"#comment" : "404 (#3016)",
"#category": ("", "artstation", "image"),
"#class" : artstation.ArtstationImageExtractor,
"#count" : 0,
"#exception": "NotFoundError",
},
{
"#url" : "https://sungchoi.artstation.com/projects/LQVJr",
"#comment" : "alternate URL patterns",
"#category": ("", "artstation", "image"),
"#class" : artstation.ArtstationImageExtractor,
},
{
"#url" : "https://artstn.co/p/LQVJr",
"#category": ("", "artstation", "image"),
"#class" : artstation.ArtstationImageExtractor,
},
{
"#url" : "https://www.artstation.com/sungchoi/following",
"#category": ("", "artstation", "following"),
"#class" : artstation.ArtstationFollowingExtractor,
"#pattern" : artstation.ArtstationUserExtractor.pattern,
"#count" : ">= 40",
@@ -224,21 +222,18 @@ __tests__ = (
{
"#url" : "https://fede-x-rojas.artstation.com/projects/WBdaZy",
"#comment" : "dash in username",
"#category": ("", "artstation", "image"),
"#class" : artstation.ArtstationImageExtractor,
},
{
"#url" : "https://fede-x-rojas.artstation.com/albums/8533110",
"#comment" : "dash in username",
"#category": ("", "artstation", "album"),
"#class" : artstation.ArtstationAlbumExtractor,
},
{
"#url" : "https://fede-x-rojas.artstation.com/",
"#comment" : "dash in username",
"#category": ("", "artstation", "user"),
"#class" : artstation.ArtstationUserExtractor,
},

View File

@@ -18,7 +18,7 @@ __tests__ = (
"commentsCount" : int,
"createdAt" : "2025-10-21T00:49:00.306Z",
"date" : "dt:2025-10-21 00:49:00",
"date_updated" : "dt:2025-12-10 01:09:26",
"date_updated" : "dt:2026-02-07 01:07:45",
"deletedAt" : None,
"duration" : None,
"explicitnessRating": None,
@@ -27,15 +27,15 @@ __tests__ = (
"inCollectionsCount": range(20, 50),
"isBunnyVideoReady": True,
"label" : "check my FREE VIP OF ⬇️",
"likesCount" : range(500, 2000),
"likesCount" : range(300, 2000),
"mediaId" : "b821619e-96a1-49a3-a3f8-a8a3e8432a51",
"postId" : 1429486,
"publishedAt" : "2025-10-21T00:50:37.143Z",
"score" : range(500, 2000),
"score" : range(300, 2000),
"sexualOrientation": "STRAIGHT",
"tags" : ["lesbian"],
"thumbnailStreamUrl": str,
"updatedAt" : "2025-12-10T01:09:26.902Z",
"updatedAt" : "iso:dt",
"uploadMethod" : "USER_FILE",
"userId" : "32f4c8d6-2409-4db8-9e66-d3b5ff0c1a98",
"videoStreamUrl" : str,
@@ -49,7 +49,7 @@ __tests__ = (
"labelLower" : "lesbian",
"lastCountUpdatedAt": "iso:dt",
"searchTags" : [],
"thumbnailPostId": 301300,
"thumbnailPostId": 311180,
"updatedAt" : "iso:dt",
"sexualOrientations": [
"STRAIGHT",
@@ -119,4 +119,20 @@ __tests__ = (
"linkSidebar" : dict,
},
{
"#url" : "https://fikfap.com/user/Hot-sauce-34",
"#comment" : "'-' in username",
"#class" : fikfap.FikfapUserExtractor,
},
{
"#url" : "https://fikfap.com/hash/outercourse",
"#class" : fikfap.FikfapHashtagExtractor,
"#pattern" : r"ytdl:https://[^/]+\.b\-cdn\.net/bcdn_token=.+/playlist\.m3u8$",
"#count" : range(50, 100),
"algorithm": "hashtag-posts",
"hashtag" : "outercourse",
},
)

View File

@@ -208,4 +208,13 @@ __tests__ = (
"#class" : imagefap.ImagefapUserExtractor,
},
{
"#url" : "https://www.imagefap.com/profile/brookdale",
"#comment" : "multiple pagea (#9016)",
"#class" : imagefap.ImagefapUserExtractor,
"#pattern" : imagefap.ImagefapFolderExtractor.pattern,
"#range" : "1-100",
"#count" : 100,
},
)

View File

@@ -211,12 +211,24 @@ __tests__ = (
{
"#url" : "https://www.pixiv.net/en/users/173530/avatar",
"#class" : pixiv.PixivAvatarExtractor,
"#options" : {
"metadata" : True,
"metadata-bookmark": True,
"captions" : True,
"comments" : True,
},
"#sha1_content": "4e57544480cc2036ea9608103e8f024fa737fe66",
},
{
"#url" : "https://www.pixiv.net/en/users/194921/background",
"#class" : pixiv.PixivBackgroundExtractor,
"#options" : {
"metadata" : True,
"metadata-bookmark": True,
"captions" : True,
"comments" : True,
},
"#pattern" : r"https://i\.pximg\.net/background/img/2021/01/30/16/12/02/194921_af1f71e557a42f499213d4b9eaccc0f8\.jpg",
},

View File

@@ -240,6 +240,24 @@ __tests__ = (
),
},
{
"#url" : "https://simpcity.cr/threads/arianaskyeshelby-itsarianaskyebaby-busty.1237895/post-40205575",
"#comment" : "tiktok s9e media embed iframe (#8994)",
"#category": ("xenforo", "simpcity", "post"),
"#class" : xenforo.XenforoPostExtractor,
"#auth" : True,
"#results" : "https://www.tiktok.com/@/video/7556556034794425631",
},
{
"#url" : "https://simpcity.cr/threads/alrightsierra.70601/post-571509",
"#comment" : "reddit s9e media embed iframe (#8996)",
"#category": ("xenforo", "simpcity", "post"),
"#class" : xenforo.XenforoPostExtractor,
"#auth" : True,
"#results" : "https://embed.reddit.com/r/TikTokFeet/comments/rtzwnz#theme=auto",
},
{
"#url" : "https://simpcity.cr/threads/alua-tatakai.89490/",
"#category": ("xenforo", "simpcity", "thread"),
@@ -284,6 +302,19 @@ __tests__ = (
"#class" : xenforo.XenforoThreadExtractor,
},
{
"#url" : "https://simpcity.cr/threads/ririkana-rr_loveit.10731/",
"#comment" : "post order by reaction score (#8997)",
"#category": ("xenforo", "simpcity", "thread"),
"#class" : xenforo.XenforoThreadExtractor,
"#auth" : True,
"#options" : {
"post-range" : 1,
"order-posts": "reaction",
},
"#results" : "https://bunkr.cr/v/BKLYkkr9KK6dg",
},
{
"#url" : "https://simpcity.cr/forums/asians.48/",
"#category": ("xenforo", "simpcity", "forum"),