Files
gallery-dl/.planning/phases/06-auth-cli/06-RESEARCH.md
eliott 4c7d3eb2e2 docs(06-auth-cli): create phase plans for authentication and CLI features
- Plan 06-01: Cookie file support (--cookies CLI + Netscape parser)
- Plan 06-02: Browser cookie extraction (Firefox, Chrome)
- Plan 06-03: Main integration (wire cookies, OAuth, simulate, input-file, destination)
2026-02-16 09:47:01 +01:00

12 KiB

Phase 6: Authentication & CLI Features - Research

Researched: 2026-02-16 Domain: Authentication (cookies, OAuth, browser extraction) and CLI usability features Confidence: HIGH

Summary

Phase 6 implements user-facing authentication and CLI usability features. Most CLI arguments already exist in the codebase (--input-file, --simulate, -v, --destination), but the underlying implementation for cookie parsing, browser extraction, and OAuth flow needs completion. The existing extractor implementations (Twitter, Instagram, Pixiv) have authentication structures but aren't connected to CLI arguments.

Primary recommendation: Implement cookie file parsing first, then browser extraction, and finally OAuth flow integration. Use the Python gallery-dl implementation as the reference implementation since it's battle-tested.

User Constraints

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

  • None explicitly specified for Phase 6

Claude's Discretion

  • Authentication implementation approach
  • CLI argument naming conventions
  • Browser support priority

Deferred Ideas (OUT OF SCOPE)

  • Proxy support
  • Multi-account handling
  • Advanced rate limiting per-domain </user_constraints>

Standard Stack

Core

Library Version Purpose Why Standard
reqwest 0.13 HTTP client with cookie support Already in use
rusqlite 0.38 SQLite database access for browser cookies Already in use

Supporting (New)

Library Version Purpose When to Use
cookie 0.18 HTTP cookie parsing Parse Set-Cookie headers
aes - AES decryption Chromium cookie decryption (can implement manually)
ring 0.17 Cryptographic operations Linux keyring password retrieval

Alternative Considered

Instead of Could Use Tradeoff
Custom Netscape parser netscape-cookie crate Manual parsing is simple (6-7 fields tab-separated), no crate needed
Browser extraction External tool (cookies.txt) Less dependency, but requires external dependency
Full OAuth library Individual implementations OAuth flows vary significantly between sites

Installation:

# New dependencies to add to Cargo.toml
cookie = "0.18"
ring = "0.17"

Architecture Patterns

src/
├── cli.rs              # Add --cookies, --cookies-from-browser arguments
├── auth/
│   ├── mod.rs          # Authentication module
│   ├── cookies.rs      # Cookie file parsing (Netscape format)
│   ├── browser.rs      # Browser cookie extraction
│   └── oauth.rs        # OAuth flow implementations
├── extractor/
│   ├── extractors/
│   │   ├── twitter.rs  # Already has cookie support, wire to CLI
│   │   ├── instagram.rs # Already has cookie support, wire to CLI
│   │   └── pixiv.rs   # Already has OAuth structure, wire to CLI

What: Load cookies from Netscape format file When to use: User provides --cookies argument with path to cookie file Example:

// Source: Python gallery-dl util.py cookiestxt_load()
// Netscape format: domain\tflag\tpath\texpire\tname\tvalue
pub fn parse_netscape_cookies(content: &str) -> Result<HashMap<String, String>, Error> {
    let mut cookies = HashMap::new();
    for line in content.lines() {
        let line = line.trim();
        // Skip comments and empty lines
        if line.starts_with('#') || line.is_empty() {
            continue;
        }
        let parts: Vec<&str> = line.split('\t').collect();
        if parts.len() >= 7 {
            let name = parts[4].to_string();
            let value = parts[5].to_string();
            cookies.insert(name, value);
        }
    }
    Ok(cookies)
}

What: Extract cookies directly from browser SQLite databases When to use: User provides --cookies-from-browser firefox or --cookies-from-browser chrome Implementation approach (from Python gallery-dl):

  1. Firefox: Read cookies.sqlite from profile directory

    • Path: ~/.mozilla/firefox/*.default/cookies.sqlite
    • Query: SELECT name, value, host, path, isSecure, expiry FROM moz_cookies
  2. Chrome/Chromium: Read Cookies SQLite database

    • Path: ~/.config/google-chrome/Default/Cookies
    • May need decryption for encrypted values (v10/v11)
  3. Safari: Read Cookies.binarycookies binary format

    • Complex binary parsing, consider optional feature

Pattern 3: OAuth Flow for Pixiv

What: Implement OAuth2 authorization code flow for Pixiv When to use: User configures Pixiv API credentials Flow:

  1. User registers app at https://www.pixiv.net/developers
  2. Get client_id and client_secret
  3. Direct user to authorization URL
  4. Receive authorization code
  5. Exchange code for access_token and refresh_token
  6. Store tokens securely (config file)

Anti-Patterns to Avoid

  • Don't store tokens in plain text: Use OS keyring or at minimum warn users
  • Don't hardcode OAuth credentials: Always require user to provide their own
  • Don't skip SSL verification for "simplicity": Security risk
  • Don't implement custom crypto: Use ring or aes-gcm crates

Don't Hand-Roll

Problem Don't Build Use Instead Why
HTTP cookie parsing Custom parser cookie crate Handles Set-Cookie, edge cases
SQLite for browser cookies Custom SQLite wrapper rusqlite Already in use, handles cross-platform
AES decryption Custom AES ring + custom implementation Based on Python gallery-dl which is well-tested
Keyring access Custom keyring integration DBus calls for KDE/GNOME Platform-specific, well-documented

Key insight: The Python gallery-dl cookie extraction is the gold standard for browser cookie extraction. It's been battle-tested and handles all the edge cases (encryption, different browser versions, keyrings). For Rust, we can implement simplified versions focusing on the most common use cases.

Common Pitfalls

What goes wrong: Chrome stores cookies encrypted since v80, using OS-level protection Why it happens: Linux uses keyring (KDE/GNOME), macOS uses Keychain, Windows uses DPAPI How to avoid:

  • Linux: Detect desktop environment, use appropriate keyring
  • For simple cases: Try fixed key "peanuts" (older Chrome versions)
  • Provide clear error message when decryption fails

What goes wrong: Users provide curl-style cookie headers instead of Netscape format Why it happens: Both are called "cookies", but formats differ How to avoid: Detect format automatically or provide clear error message Warning signs: Parser returns empty cookie map, check format detection

Pitfall 3: Browser Database Locked

What goes wrong: Can't open browser cookie database because browser is running Why it happens: SQLite database locked by browser process How to avoid:

  • Copy database to temp location before reading (like Python version does)
  • Or warn user to close browser

Pitfall 4: OAuth Token Expiration

What goes wrong: OAuth access token expires, requests fail silently Why it happens: Tokens have limited lifetime (typically 1 hour for Pixiv) How to avoid:

  • Implement refresh token flow
  • Store refresh token and automatically refresh
  • Cache tokens in config

Code Examples

Common Operation 1: Adding --cookies CLI argument

// Add to cli.rs Args struct
/// Path to Netscape-format cookies file
#[arg(long = "cookies", value_name = "FILE")]
pub cookies: Option<PathBuf>,

/// Extract cookies from browser (firefox, chrome, etc.)
#[arg(long = "cookies-from-browser", value_name = "BROWSER[+PROFILE]")]
pub cookies_from_browser: Option<String>,

Common Operation 2: Parse cookies from file

// Simple Netscape format parser
pub fn load_cookies_from_file(path: &Path) -> Result<HashMap<String, String>> {
    let content = std::fs::read_to_string(path)?;
    let mut cookies = HashMap::new();
    
    for line in content.lines() {
        let line = line.trim();
        if line.is_empty() || line.starts_with('#') || line.starts_with('#HttpOnly_') {
            continue;
        }
        
        let parts: Vec<&str> = line.split('\t').collect();
        if parts.len() >= 7 {
            // domain, flag, path, secure, expiration, name, value
            cookies.insert(parts[4].to_string(), parts[5].to_string());
        }
    }
    
    Ok(cookies)
}
pub fn extract_firefox_cookies(domain: Option<&str>) -> Result<HashMap<String, String>> {
    // Find Firefox profile directory
    let profile_dir = find_firefox_profile()?;
    let db_path = profile_dir.join("cookies.sqlite");
    
    // Copy to temp to avoid locking
    let temp_path = copy_to_temp(&db_path)?;
    
    let conn = rusqlite::Connection::open(&temp_path)?;
    let mut cookies = HashMap::new();
    
    let mut query = "SELECT name, value FROM moz_cookies".to_string();
    if let Some(d) = domain {
        query.push_str(&format!(" WHERE host LIKE '%{}%'", d));
    }
    
    let mut stmt = conn.prepare(&query)?;
    let rows = stmt.query_map([], |row| {
        Ok((row.get::<_, String>(0)?, row.get::<_, String>(1)?))
    })?;
    
    for row in rows {
        let (name, value) = row?;
        cookies.insert(name, value);
    }
    
    Ok(cookies)
}

Common Operation 4: Connect cookies to extractor

// In main.rs when processing URLs
let cookies = if let Some(cookies_file) = &args.cookies {
    Some(auth::load_cookies_from_file(cookies_file)?)
} else if let Some(browser_spec) = &args.cookies_from_browser {
    Some(auth::extract_browser_cookies(browser_spec)?)
} else {
    None
};

// Pass to extractor
if let Some(ref c) = cookies {
    extractor = extractor.with_cookies(c.clone());
}

State of the Art

Old Approach Current Approach When Changed Impact
Manual cookie entry Browser extraction ~2020 Much better UX
OAuth1 OAuth2 (Pixiv) ~2020 Better security, longer tokens
Plain text tokens Refresh tokens ~2020 No re-authentication needed
Session cookies Persistent tokens - User convenience

Deprecated/outdated:

  • sessionStorage cookies (not persisted) - Not supported
  • OAuth1.0a (except Twitter which still uses it) - OAuth2 preferred
  • Netscape format comments with $ prefix - Rare, can skip

Open Questions

  1. Browser support priority

    • What: Which browsers to support first?
    • What's unclear: Firefox and Chrome cover 90%+ of users, but Safari/WebKit has unique format
    • Recommendation: Start with Firefox + Chrome, add Safari as optional
  2. Token storage

    • What: Where to store OAuth tokens securely?
    • What's unclear: Simple file storage vs OS keyring integration
    • Recommendation: Start with file storage with clear warnings, add keyring later
  3. CLI integration vs config file

    • What: Should auth be primarily CLI args or config file?
    • What's unclear: OAuth tokens are long-lived, better in config; cookies can be CLI
    • Recommendation: CLI for cookies, config for OAuth tokens
  4. Dry-run implementation detail

    • What: Is --simulate already implemented the same as --dry-run?
    • What's unclear: Need to verify simulate actually skips downloads
    • Recommendation: Verify current behavior, add alias --dry-run if needed

Sources

Primary (HIGH confidence)

  • /mnt/Data/Projects/gallery-dl/gallery_dl/cookies.py - Browser cookie extraction (1167 lines, comprehensive)
  • /mnt/Data/Projects/gallery-dl/gallery_dl/util.py - cookiestxt_load() function (lines 402-438)
  • /mnt/Data/Projects/gallery-dl/src/cli.rs - Existing CLI implementation

Secondary (MEDIUM confidence)

Tertiary (LOW confidence)

  • Web search for Rust browser cookie extraction crates (no mature crates found)

Metadata

Confidence breakdown:

  • Standard stack: HIGH - Uses existing reqwest/rusqlite, simple cookie parsing
  • Architecture: HIGH - Based on working Python implementation
  • Pitfalls: HIGH - Python implementation covers edge cases

Research date: 2026-02-16 Valid until: 2026-03-16 (30 days - stable domain)