Files

eliott 4c7d3eb2e2 docs(06-auth-cli): create phase plans for authentication and CLI features

- Plan 06-01: Cookie file support (--cookies CLI + Netscape parser)
- Plan 06-02: Browser cookie extraction (Firefox, Chrome)
- Plan 06-03: Main integration (wire cookies, OAuth, simulate, input-file, destination)

2026-02-16 09:47:01 +01:00

12 KiB

Raw Blame History

Phase 6: Authentication & CLI Features - Research

Researched: 2026-02-16 Domain: Authentication (cookies, OAuth, browser extraction) and CLI usability features Confidence: HIGH

Summary

Phase 6 implements user-facing authentication and CLI usability features. Most CLI arguments already exist in the codebase (--input-file, --simulate, -v, --destination), but the underlying implementation for cookie parsing, browser extraction, and OAuth flow needs completion. The existing extractor implementations (Twitter, Instagram, Pixiv) have authentication structures but aren't connected to CLI arguments.

Primary recommendation: Implement cookie file parsing first, then browser extraction, and finally OAuth flow integration. Use the Python gallery-dl implementation as the reference implementation since it's battle-tested.

User Constraints

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

None explicitly specified for Phase 6

Claude's Discretion

Authentication implementation approach
CLI argument naming conventions
Browser support priority

Deferred Ideas (OUT OF SCOPE)

Proxy support
Multi-account handling
Advanced rate limiting per-domain </user_constraints>

Standard Stack

Core

Library	Version	Purpose	Why Standard
reqwest	0.13	HTTP client with cookie support	Already in use
rusqlite	0.38	SQLite database access for browser cookies	Already in use

Supporting (New)

Library	Version	Purpose	When to Use
cookie	0.18	HTTP cookie parsing	Parse Set-Cookie headers
aes	-	AES decryption	Chromium cookie decryption (can implement manually)
ring	0.17	Cryptographic operations	Linux keyring password retrieval

Alternative Considered

Instead of	Could Use	Tradeoff
Custom Netscape parser	`netscape-cookie` crate	Manual parsing is simple (6-7 fields tab-separated), no crate needed
Browser extraction	External tool (cookies.txt)	Less dependency, but requires external dependency
Full OAuth library	Individual implementations	OAuth flows vary significantly between sites

Installation:

# New dependencies to add to Cargo.toml
cookie = "0.18"
ring = "0.17"

Architecture Patterns

Recommended Project Structure

src/
├── cli.rs              # Add --cookies, --cookies-from-browser arguments
├── auth/
│   ├── mod.rs          # Authentication module
│   ├── cookies.rs      # Cookie file parsing (Netscape format)
│   ├── browser.rs      # Browser cookie extraction
│   └── oauth.rs        # OAuth flow implementations
├── extractor/
│   ├── extractors/
│   │   ├── twitter.rs  # Already has cookie support, wire to CLI
│   │   ├── instagram.rs # Already has cookie support, wire to CLI
│   │   └── pixiv.rs   # Already has OAuth structure, wire to CLI

What: Load cookies from Netscape format file When to use: User provides --cookies argument with path to cookie file Example:

// Source: Python gallery-dl util.py cookiestxt_load()
// Netscape format: domain\tflag\tpath\texpire\tname\tvalue
pub fn parse_netscape_cookies(content: &str) -> Result<HashMap<String, String>, Error> {
    let mut cookies = HashMap::new();
    for line in content.lines() {
        let line = line.trim();
        // Skip comments and empty lines
        if line.starts_with('#') || line.is_empty() {
            continue;
        }
        let parts: Vec<&str> = line.split('\t').collect();
        if parts.len() >= 7 {
            let name = parts[4].to_string();
            let value = parts[5].to_string();
            cookies.insert(name, value);
        }
    }
    Ok(cookies)
}

What: Extract cookies directly from browser SQLite databases When to use: User provides --cookies-from-browser firefox or --cookies-from-browser chrome Implementation approach (from Python gallery-dl):

Firefox: Read cookies.sqlite from profile directory
- Path: ~/.mozilla/firefox/*.default/cookies.sqlite
- Query: SELECT name, value, host, path, isSecure, expiry FROM moz_cookies
Chrome/Chromium: Read Cookies SQLite database
- Path: ~/.config/google-chrome/Default/Cookies
- May need decryption for encrypted values (v10/v11)
Safari: Read Cookies.binarycookies binary format
- Complex binary parsing, consider optional feature

Pattern 3: OAuth Flow for Pixiv

What: Implement OAuth2 authorization code flow for Pixiv When to use: User configures Pixiv API credentials Flow:

User registers app at https://www.pixiv.net/developers
Get client_id and client_secret
Direct user to authorization URL
Receive authorization code
Exchange code for access_token and refresh_token
Store tokens securely (config file)

Anti-Patterns to Avoid

Don't store tokens in plain text: Use OS keyring or at minimum warn users
Don't hardcode OAuth credentials: Always require user to provide their own
Don't skip SSL verification for "simplicity": Security risk
Don't implement custom crypto: Use ring or aes-gcm crates

Don't Hand-Roll

Problem	Don't Build	Use Instead	Why
HTTP cookie parsing	Custom parser	cookie crate	Handles Set-Cookie, edge cases
SQLite for browser cookies	Custom SQLite wrapper	rusqlite	Already in use, handles cross-platform
AES decryption	Custom AES	ring + custom implementation	Based on Python gallery-dl which is well-tested
Keyring access	Custom keyring integration	DBus calls for KDE/GNOME	Platform-specific, well-documented

Key insight: The Python gallery-dl cookie extraction is the gold standard for browser cookie extraction. It's been battle-tested and handles all the edge cases (encryption, different browser versions, keyrings). For Rust, we can implement simplified versions focusing on the most common use cases.

Common Pitfalls

What goes wrong: Chrome stores cookies encrypted since v80, using OS-level protection Why it happens: Linux uses keyring (KDE/GNOME), macOS uses Keychain, Windows uses DPAPI How to avoid:

Linux: Detect desktop environment, use appropriate keyring
For simple cases: Try fixed key "peanuts" (older Chrome versions)
Provide clear error message when decryption fails

What goes wrong: Users provide curl-style cookie headers instead of Netscape format Why it happens: Both are called "cookies", but formats differ How to avoid: Detect format automatically or provide clear error message Warning signs: Parser returns empty cookie map, check format detection

Pitfall 3: Browser Database Locked

What goes wrong: Can't open browser cookie database because browser is running Why it happens: SQLite database locked by browser process How to avoid:

Copy database to temp location before reading (like Python version does)
Or warn user to close browser

Pitfall 4: OAuth Token Expiration

What goes wrong: OAuth access token expires, requests fail silently Why it happens: Tokens have limited lifetime (typically 1 hour for Pixiv) How to avoid:

Implement refresh token flow
Store refresh token and automatically refresh
Cache tokens in config

Code Examples

Common Operation 1: Adding --cookies CLI argument

// Add to cli.rs Args struct
/// Path to Netscape-format cookies file
#[arg(long = "cookies", value_name = "FILE")]
pub cookies: Option<PathBuf>,

/// Extract cookies from browser (firefox, chrome, etc.)
#[arg(long = "cookies-from-browser", value_name = "BROWSER[+PROFILE]")]
pub cookies_from_browser: Option<String>,

Common Operation 2: Parse cookies from file

// Simple Netscape format parser
pub fn load_cookies_from_file(path: &Path) -> Result<HashMap<String, String>> {
    let content = std::fs::read_to_string(path)?;
    let mut cookies = HashMap::new();
    
    for line in content.lines() {
        let line = line.trim();
        if line.is_empty() || line.starts_with('#') || line.starts_with('#HttpOnly_') {
            continue;
        }
        
        let parts: Vec<&str> = line.split('\t').collect();
        if parts.len() >= 7 {
            // domain, flag, path, secure, expiration, name, value
            cookies.insert(parts[4].to_string(), parts[5].to_string());
        }
    }
    
    Ok(cookies)
}

pub fn extract_firefox_cookies(domain: Option<&str>) -> Result<HashMap<String, String>> {
    // Find Firefox profile directory
    let profile_dir = find_firefox_profile()?;
    let db_path = profile_dir.join("cookies.sqlite");
    
    // Copy to temp to avoid locking
    let temp_path = copy_to_temp(&db_path)?;
    
    let conn = rusqlite::Connection::open(&temp_path)?;
    let mut cookies = HashMap::new();
    
    let mut query = "SELECT name, value FROM moz_cookies".to_string();
    if let Some(d) = domain {
        query.push_str(&format!(" WHERE host LIKE '%{}%'", d));
    }
    
    let mut stmt = conn.prepare(&query)?;
    let rows = stmt.query_map([], |row| {
        Ok((row.get::<_, String>(0)?, row.get::<_, String>(1)?))
    })?;
    
    for row in rows {
        let (name, value) = row?;
        cookies.insert(name, value);
    }
    
    Ok(cookies)
}

Common Operation 4: Connect cookies to extractor

// In main.rs when processing URLs
let cookies = if let Some(cookies_file) = &args.cookies {
    Some(auth::load_cookies_from_file(cookies_file)?)
} else if let Some(browser_spec) = &args.cookies_from_browser {
    Some(auth::extract_browser_cookies(browser_spec)?)
} else {
    None
};

// Pass to extractor
if let Some(ref c) = cookies {
    extractor = extractor.with_cookies(c.clone());
}

State of the Art

Old Approach	Current Approach	When Changed	Impact
Manual cookie entry	Browser extraction	~2020	Much better UX
OAuth1	OAuth2 (Pixiv)	~2020	Better security, longer tokens
Plain text tokens	Refresh tokens	~2020	No re-authentication needed
Session cookies	Persistent tokens	-	User convenience

Deprecated/outdated:

sessionStorage cookies (not persisted) - Not supported
OAuth1.0a (except Twitter which still uses it) - OAuth2 preferred
Netscape format comments with $ prefix - Rare, can skip

Open Questions

Browser support priority
- What: Which browsers to support first?
- What's unclear: Firefox and Chrome cover 90%+ of users, but Safari/WebKit has unique format
- Recommendation: Start with Firefox + Chrome, add Safari as optional
Token storage
- What: Where to store OAuth tokens securely?
- What's unclear: Simple file storage vs OS keyring integration
- Recommendation: Start with file storage with clear warnings, add keyring later
CLI integration vs config file
- What: Should auth be primarily CLI args or config file?
- What's unclear: OAuth tokens are long-lived, better in config; cookies can be CLI
- Recommendation: CLI for cookies, config for OAuth tokens
Dry-run implementation detail
- What: Is --simulate already implemented the same as --dry-run?
- What's unclear: Need to verify simulate actually skips downloads
- Recommendation: Verify current behavior, add alias --dry-run if needed

Sources

Primary (HIGH confidence)

/mnt/Data/Projects/gallery-dl/gallery_dl/cookies.py - Browser cookie extraction (1167 lines, comprehensive)
/mnt/Data/Projects/gallery-dl/gallery_dl/util.py - cookiestxt_load() function (lines 402-438)
/mnt/Data/Projects/gallery-dl/src/cli.rs - Existing CLI implementation

Secondary (MEDIUM confidence)

https://docs.rs/cookie/0.18/cookie/ - Cookie parsing crate
Chromium cookie encryption: https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/os_crypt/

Tertiary (LOW confidence)

Web search for Rust browser cookie extraction crates (no mature crates found)

Metadata

Confidence breakdown:

Standard stack: HIGH - Uses existing reqwest/rusqlite, simple cookie parsing
Architecture: HIGH - Based on working Python implementation
Pitfalls: HIGH - Python implementation covers edge cases

Research date: 2026-02-16 Valid until: 2026-03-16 (30 days - stable domain)

12 KiB Raw Blame History