- Plan 06-01: Cookie file support (--cookies CLI + Netscape parser) - Plan 06-02: Browser cookie extraction (Firefox, Chrome) - Plan 06-03: Main integration (wire cookies, OAuth, simulate, input-file, destination)
12 KiB
Phase 6: Authentication & CLI Features - Research
Researched: 2026-02-16 Domain: Authentication (cookies, OAuth, browser extraction) and CLI usability features Confidence: HIGH
Summary
Phase 6 implements user-facing authentication and CLI usability features. Most CLI arguments already exist in the codebase (--input-file, --simulate, -v, --destination), but the underlying implementation for cookie parsing, browser extraction, and OAuth flow needs completion. The existing extractor implementations (Twitter, Instagram, Pixiv) have authentication structures but aren't connected to CLI arguments.
Primary recommendation: Implement cookie file parsing first, then browser extraction, and finally OAuth flow integration. Use the Python gallery-dl implementation as the reference implementation since it's battle-tested.
User Constraints
<user_constraints>
User Constraints (from CONTEXT.md)
Locked Decisions
- None explicitly specified for Phase 6
Claude's Discretion
- Authentication implementation approach
- CLI argument naming conventions
- Browser support priority
Deferred Ideas (OUT OF SCOPE)
- Proxy support
- Multi-account handling
- Advanced rate limiting per-domain </user_constraints>
Standard Stack
Core
| Library | Version | Purpose | Why Standard |
|---|---|---|---|
| reqwest | 0.13 | HTTP client with cookie support | Already in use |
| rusqlite | 0.38 | SQLite database access for browser cookies | Already in use |
Supporting (New)
| Library | Version | Purpose | When to Use |
|---|---|---|---|
| cookie | 0.18 | HTTP cookie parsing | Parse Set-Cookie headers |
| aes | - | AES decryption | Chromium cookie decryption (can implement manually) |
| ring | 0.17 | Cryptographic operations | Linux keyring password retrieval |
Alternative Considered
| Instead of | Could Use | Tradeoff |
|---|---|---|
| Custom Netscape parser | netscape-cookie crate |
Manual parsing is simple (6-7 fields tab-separated), no crate needed |
| Browser extraction | External tool (cookies.txt) | Less dependency, but requires external dependency |
| Full OAuth library | Individual implementations | OAuth flows vary significantly between sites |
Installation:
# New dependencies to add to Cargo.toml
cookie = "0.18"
ring = "0.17"
Architecture Patterns
Recommended Project Structure
src/
├── cli.rs # Add --cookies, --cookies-from-browser arguments
├── auth/
│ ├── mod.rs # Authentication module
│ ├── cookies.rs # Cookie file parsing (Netscape format)
│ ├── browser.rs # Browser cookie extraction
│ └── oauth.rs # OAuth flow implementations
├── extractor/
│ ├── extractors/
│ │ ├── twitter.rs # Already has cookie support, wire to CLI
│ │ ├── instagram.rs # Already has cookie support, wire to CLI
│ │ └── pixiv.rs # Already has OAuth structure, wire to CLI
Pattern 1: Cookie File Loading
What: Load cookies from Netscape format file
When to use: User provides --cookies argument with path to cookie file
Example:
// Source: Python gallery-dl util.py cookiestxt_load()
// Netscape format: domain\tflag\tpath\texpire\tname\tvalue
pub fn parse_netscape_cookies(content: &str) -> Result<HashMap<String, String>, Error> {
let mut cookies = HashMap::new();
for line in content.lines() {
let line = line.trim();
// Skip comments and empty lines
if line.starts_with('#') || line.is_empty() {
continue;
}
let parts: Vec<&str> = line.split('\t').collect();
if parts.len() >= 7 {
let name = parts[4].to_string();
let value = parts[5].to_string();
cookies.insert(name, value);
}
}
Ok(cookies)
}
Pattern 2: Browser Cookie Extraction
What: Extract cookies directly from browser SQLite databases
When to use: User provides --cookies-from-browser firefox or --cookies-from-browser chrome
Implementation approach (from Python gallery-dl):
-
Firefox: Read
cookies.sqlitefrom profile directory- Path:
~/.mozilla/firefox/*.default/cookies.sqlite - Query:
SELECT name, value, host, path, isSecure, expiry FROM moz_cookies
- Path:
-
Chrome/Chromium: Read
CookiesSQLite database- Path:
~/.config/google-chrome/Default/Cookies - May need decryption for encrypted values (v10/v11)
- Path:
-
Safari: Read
Cookies.binarycookiesbinary format- Complex binary parsing, consider optional feature
Pattern 3: OAuth Flow for Pixiv
What: Implement OAuth2 authorization code flow for Pixiv When to use: User configures Pixiv API credentials Flow:
- User registers app at https://www.pixiv.net/developers
- Get client_id and client_secret
- Direct user to authorization URL
- Receive authorization code
- Exchange code for access_token and refresh_token
- Store tokens securely (config file)
Anti-Patterns to Avoid
- Don't store tokens in plain text: Use OS keyring or at minimum warn users
- Don't hardcode OAuth credentials: Always require user to provide their own
- Don't skip SSL verification for "simplicity": Security risk
- Don't implement custom crypto: Use ring or aes-gcm crates
Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---|---|---|---|
| HTTP cookie parsing | Custom parser | cookie crate | Handles Set-Cookie, edge cases |
| SQLite for browser cookies | Custom SQLite wrapper | rusqlite | Already in use, handles cross-platform |
| AES decryption | Custom AES | ring + custom implementation | Based on Python gallery-dl which is well-tested |
| Keyring access | Custom keyring integration | DBus calls for KDE/GNOME | Platform-specific, well-documented |
Key insight: The Python gallery-dl cookie extraction is the gold standard for browser cookie extraction. It's been battle-tested and handles all the edge cases (encryption, different browser versions, keyrings). For Rust, we can implement simplified versions focusing on the most common use cases.
Common Pitfalls
Pitfall 1: Chrome Cookie Encryption
What goes wrong: Chrome stores cookies encrypted since v80, using OS-level protection Why it happens: Linux uses keyring (KDE/GNOME), macOS uses Keychain, Windows uses DPAPI How to avoid:
- Linux: Detect desktop environment, use appropriate keyring
- For simple cases: Try fixed key "peanuts" (older Chrome versions)
- Provide clear error message when decryption fails
Pitfall 2: Cookie File Format Confusion
What goes wrong: Users provide curl-style cookie headers instead of Netscape format Why it happens: Both are called "cookies", but formats differ How to avoid: Detect format automatically or provide clear error message Warning signs: Parser returns empty cookie map, check format detection
Pitfall 3: Browser Database Locked
What goes wrong: Can't open browser cookie database because browser is running Why it happens: SQLite database locked by browser process How to avoid:
- Copy database to temp location before reading (like Python version does)
- Or warn user to close browser
Pitfall 4: OAuth Token Expiration
What goes wrong: OAuth access token expires, requests fail silently Why it happens: Tokens have limited lifetime (typically 1 hour for Pixiv) How to avoid:
- Implement refresh token flow
- Store refresh token and automatically refresh
- Cache tokens in config
Code Examples
Common Operation 1: Adding --cookies CLI argument
// Add to cli.rs Args struct
/// Path to Netscape-format cookies file
#[arg(long = "cookies", value_name = "FILE")]
pub cookies: Option<PathBuf>,
/// Extract cookies from browser (firefox, chrome, etc.)
#[arg(long = "cookies-from-browser", value_name = "BROWSER[+PROFILE]")]
pub cookies_from_browser: Option<String>,
Common Operation 2: Parse cookies from file
// Simple Netscape format parser
pub fn load_cookies_from_file(path: &Path) -> Result<HashMap<String, String>> {
let content = std::fs::read_to_string(path)?;
let mut cookies = HashMap::new();
for line in content.lines() {
let line = line.trim();
if line.is_empty() || line.starts_with('#') || line.starts_with('#HttpOnly_') {
continue;
}
let parts: Vec<&str> = line.split('\t').collect();
if parts.len() >= 7 {
// domain, flag, path, secure, expiration, name, value
cookies.insert(parts[4].to_string(), parts[5].to_string());
}
}
Ok(cookies)
}
Common Operation 3: Firefox cookie extraction
pub fn extract_firefox_cookies(domain: Option<&str>) -> Result<HashMap<String, String>> {
// Find Firefox profile directory
let profile_dir = find_firefox_profile()?;
let db_path = profile_dir.join("cookies.sqlite");
// Copy to temp to avoid locking
let temp_path = copy_to_temp(&db_path)?;
let conn = rusqlite::Connection::open(&temp_path)?;
let mut cookies = HashMap::new();
let mut query = "SELECT name, value FROM moz_cookies".to_string();
if let Some(d) = domain {
query.push_str(&format!(" WHERE host LIKE '%{}%'", d));
}
let mut stmt = conn.prepare(&query)?;
let rows = stmt.query_map([], |row| {
Ok((row.get::<_, String>(0)?, row.get::<_, String>(1)?))
})?;
for row in rows {
let (name, value) = row?;
cookies.insert(name, value);
}
Ok(cookies)
}
Common Operation 4: Connect cookies to extractor
// In main.rs when processing URLs
let cookies = if let Some(cookies_file) = &args.cookies {
Some(auth::load_cookies_from_file(cookies_file)?)
} else if let Some(browser_spec) = &args.cookies_from_browser {
Some(auth::extract_browser_cookies(browser_spec)?)
} else {
None
};
// Pass to extractor
if let Some(ref c) = cookies {
extractor = extractor.with_cookies(c.clone());
}
State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|---|---|---|---|
| Manual cookie entry | Browser extraction | ~2020 | Much better UX |
| OAuth1 | OAuth2 (Pixiv) | ~2020 | Better security, longer tokens |
| Plain text tokens | Refresh tokens | ~2020 | No re-authentication needed |
| Session cookies | Persistent tokens | - | User convenience |
Deprecated/outdated:
sessionStoragecookies (not persisted) - Not supported- OAuth1.0a (except Twitter which still uses it) - OAuth2 preferred
- Netscape format comments with
$prefix - Rare, can skip
Open Questions
-
Browser support priority
- What: Which browsers to support first?
- What's unclear: Firefox and Chrome cover 90%+ of users, but Safari/WebKit has unique format
- Recommendation: Start with Firefox + Chrome, add Safari as optional
-
Token storage
- What: Where to store OAuth tokens securely?
- What's unclear: Simple file storage vs OS keyring integration
- Recommendation: Start with file storage with clear warnings, add keyring later
-
CLI integration vs config file
- What: Should auth be primarily CLI args or config file?
- What's unclear: OAuth tokens are long-lived, better in config; cookies can be CLI
- Recommendation: CLI for cookies, config for OAuth tokens
-
Dry-run implementation detail
- What: Is
--simulatealready implemented the same as--dry-run? - What's unclear: Need to verify simulate actually skips downloads
- Recommendation: Verify current behavior, add alias
--dry-runif needed
- What: Is
Sources
Primary (HIGH confidence)
/mnt/Data/Projects/gallery-dl/gallery_dl/cookies.py- Browser cookie extraction (1167 lines, comprehensive)/mnt/Data/Projects/gallery-dl/gallery_dl/util.py-cookiestxt_load()function (lines 402-438)/mnt/Data/Projects/gallery-dl/src/cli.rs- Existing CLI implementation
Secondary (MEDIUM confidence)
https://docs.rs/cookie/0.18/cookie/- Cookie parsing crate- Chromium cookie encryption: https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/os_crypt/
Tertiary (LOW confidence)
- Web search for Rust browser cookie extraction crates (no mature crates found)
Metadata
Confidence breakdown:
- Standard stack: HIGH - Uses existing reqwest/rusqlite, simple cookie parsing
- Architecture: HIGH - Based on working Python implementation
- Pitfalls: HIGH - Python implementation covers edge cases
Research date: 2026-02-16 Valid until: 2026-03-16 (30 days - stable domain)