browser-use

mirror of https://github.com/browser-use/browser-use synced 2026-04-22 17:45:09 +02:00

Author	SHA1	Message	Date
Laith Weinberger	4476f6e16e	fix input clear fallbacks and clarify clear-then-type behavior	2026-04-15 17:31:04 -04:00
Shawn Pana	d0fbf4c580	improve connect failure UX: fix chrome://inspect link and add fallback guidance When `browser-use connect` fails to discover a running Chrome, the error now points to the correct `chrome://inspect/#remote-debugging` URL. The SKILL.md also guides agents to prompt users with two options: enable remote debugging or use managed Chromium with a Chrome profile. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 11:23:50 -07:00
Laith Weinberger	c1151715d3	improve model docs	2026-04-05 21:59:33 -04:00
ShawnPana	4cd2f456ef	fix: SKILL.md references doctor instead of cloud login for status check	2026-04-01 19:27:36 -07:00
ShawnPana	870776fa9e	docs: restructure SKILL.md — headless-first, push advanced to references Core workflow now starts with bare headless (no setup step). connect and cloud connect mentioned prominently but not required. Python/CDP and multi-session pushed to reference files. Added session recovery tip. 219 lines (down from 272).	2026-04-01 18:20:02 -07:00
ShawnPana	ca05f46352	refactor: remove --agent/register/tab-ownership, sessions-as-agents model Multi-agent isolation is now achieved through separate sessions (--session NAME), each with its own browser. Removed: - register command and agents.json - --agent flag and agent_id plumbing - TabOwnershipManager and all tab locking logic - dispatch lock and focus swapping between agents - tab_ownership.py (deleted) - test_tab_ownership.py (deleted) Simplified tab commands: no lock checks, no _tab_list injection, no _resolved_target_id params. agent_focus_target_id stays for single-agent tab tracking. Tested: 3 concurrent subagents on separate cloud sessions, 3 concurrent subagents on separate headless Chromium sessions.	2026-04-01 17:34:46 -07:00
ShawnPana	73a926caa6	feat: add connect command + --agent flag, decouple multi-agent from Chrome - browser-use connect: one-time command to discover and connect to local Chrome (like cloud connect but for local) - --agent INDEX: per-command flag for multi-agent tab isolation, works with any browser mode (cloud, profile, cdp-url, headless) - register is now per-session ({session}.agents.json) - --connect deprecated with migration message - SKILL.md updated for new connect/--agent workflow - Tested: 3 concurrent agents on shared cloud browser session	2026-04-01 16:52:56 -07:00
ShawnPana	c4d5a0eace	docs: add multi-session and config sections to browser-use skill SKILL.md: added Multiple Browser Sessions section (with pointer to references/multi-session.md) and Configuration section (config set/get/ list/unset, doctor, setup). New references/multi-session.md: detailed guide for running multiple browser sessions simultaneously with --session flag.	2026-04-01 16:04:44 -07:00
ShawnPana	c2c9aa8556	feat: cloud signup command, unified base URL, auth isolation - browser-use cloud signup: challenge-response agent self-registration - browser-use cloud signup --verify: verify and save API key - browser-use cloud signup --claim: generate account claim URL - Base URL convention unified: BROWSER_USE_CLOUD_BASE_URL is host-only (e.g. https://api.browser-use.com), CLI appends /api/{version} - CLI daemon blocked from falling back to library's cloud_auth.json - cloud connect validates API key before spawning daemon - No-key error message mentions cloud signup	2026-03-31 21:14:20 -07:00
ShawnPana	b4b2a4c18d	feat: zero-config cloud connect with auto-managed profile cloud connect now works with no flags. On first use, creates a "Browser Use CLI" profile via the Cloud API and saves the ID to config.json. Subsequent connects reuse it (validates on each call, recreates if deleted). Removed --timeout, --proxy-country, --profile-id flags and their plumbing through daemon/sessions. Power users who need custom browser settings use cloud v2 POST /browsers directly.	2026-03-31 12:28:42 -07:00
ShawnPana	04ca40c3af	docs: add CDP/Python reference for browser-use skill New references/cdp-python.md with tested recipes for raw CDP access via the Python session: activating tabs, listing targets, running JS, device emulation, cookies. SKILL.md updated with pointer to the reference file explaining when to reach for CDP vs CLI commands.	2026-03-31 10:35:03 -07:00
ShawnPana	f819d3e3a6	feat: add `tab` command (list, new, switch, close) and agent registration - `browser-use register` assigns numeric agent index for --connect mode - `--connect <index>` requires explicit agent index (no more bare --connect) - `tab list` shows all tabs with lock status per agent - `tab new [url]` creates a new tab without visually switching - `tab switch <index>` changes agent focus without activating Chrome tab - `tab close <index> [index...]` closes multiple tabs in one command - Agent registry in ~/.browser-use/agents.json with 5min expiry - Improved error messages guide agents to register or use their own tab - Session lock prevents double BrowserSession creation on simultaneous connect - Updated SKILL.md with register workflow and tab commands	2026-03-24 17:27:38 -07:00
ShawnPana	31694df283	feat: multi-agent tab isolation for --connect mode Multiple agents can share one browser via --connect without interfering with each other. Each agent registers with `browser-use register` to get a numeric index, then passes it with `--connect <index>` on every command. - Tab locking: mutating commands (click, type, open) lock the tab to the agent. Other agents get an error if they try to mutate the same tab. Read-only commands (state, screenshot) work on any tab. - Agent registry: agents.json tracks registered agents with timestamps. Expired agents (5min inactive) get cleaned up automatically. - Session lock: prevents double BrowserSession creation when two agents connect simultaneously. - Focus swap: daemon swaps agent_focus_target_id and cached_selector_map per-agent before each command, so element indices are isolated.	2026-03-24 16:38:53 -07:00
ShawnPana	318eb3b41e	fix Selenium wss:// handling and webhook JSON parse guard	2026-03-21 19:04:16 -07:00
ShawnPana	709853c307	fix accuracy issues found by cross-referencing SDK source Open-source: - agent.md: max_steps default 100 → 500 (matches agent/service.py) - agent.md: NavigateToUrlEvent timeout 15.0 → 30.0 (matches events.py) Cloud: - quickstart.md: v3 pricing corrected — BU Mini output $2.88 → $4.20, BU Max output $14.40 → $18.00 (matches pricing.mdx) - browser-api.md: Selenium debugger_address now strips /devtools/browser/ suffix (matches browser-api.mdx) - features.md: webhook payload uses "type"/"payload" not "event"/"data", signature verification now includes 5-min replay protection and sorted JSON keys (matches webhooks.mdx)	2026-03-21 18:59:26 -07:00
ShawnPana	884f5d20d6	fix chat-ui.md: remove duplicate await asyncio.sleep(2) after asyncio.run()	2026-03-21 18:45:08 -07:00
ShawnPana	00d9a9694f	fix chat-ui.md: wrap Python example in async def main() + asyncio.run()	2026-03-21 18:39:55 -07:00
ShawnPana	71e8415764	fix cubic review: TOC anchor, relative link, full v3 base URL - subagent.md: fix TOC anchor to match renamed heading (#python-agents-cloud-sdk) - subagent.md: fix relative link to ../features.md from guides/ subdir - SKILL.md: use full v3 base URL instead of .../api/v3	2026-03-21 18:35:13 -07:00
ShawnPana	74275eff3c	split browser-use-docs into separate open-source and cloud skills Two independent skills so Claude only loads what's relevant: skills/open-source/ — Python library docs (Agent, Browser, Tools, Actor, MCP, monitoring, models, examples). 9 reference files. skills/cloud/ — Cloud API, SDK, and integration patterns. 7 reference files + 3 guides (chat-ui, subagent, tools-integration). Guides edited to remove open-source Python imports (Actor API, Tools Registry, MCPClient sections removed from tools-integration; local Agent wrapper replaced with Cloud SDK in subagent; local-only error/cost patterns removed from cross-cutting concerns). Delete skills/browser-use-docs/ entirely.	2026-03-21 18:29:23 -07:00
ShawnPana	4416cf18c2	add integration guides: chat UI, subagent, tools integration Three guides organized by agent type (CLI sandbox agents, Python frameworks, TypeScript, MCP clients, HTTP/workflow engines, existing Playwright/Puppeteer): - chat-ui.md: Build conversational browser UI with Cloud SDK v3, liveUrl iframe embedding, message polling, follow-ups, optimistic updates - subagent.md: Delegate entire web tasks (black box). CLI cloud passthrough for sandbox agents, Python Agent wrapper, Cloud SDK, MCP browser_task, REST API for workflow engines. Plus structured output, error handling, cost control patterns - tools-integration.md: Add individual browser actions to existing agent. CLI commands for sandbox agents, Actor API (Page/Element/Mouse), Tools Registry (execute_action), MCPClient (auto-discovery), CDP+Playwright for TypeScript, local MCP server for Claude/Cursor. Decision matrix mapping agent type → best integration	2026-03-21 18:16:45 -07:00
ShawnPana	5ee7104fcc	fix 3 remaining cubic review issues - agent.md: fix llm_timeout docs to match auto-detection logic (Groq 30s, Gemini 75s, o3/Claude/DeepSeek 90s, default 75s) - examples.md: guard pw.stop() with None check to prevent UnboundLocalError - api-v2.md: show all S3 presigned form fields in upload example	2026-03-21 17:20:42 -07:00
ShawnPana	8dfb3cc882	fix review issues from cubic bot - models.md: fix ChatVercel provider_options and env var (AI_GATEWAY_API_KEY) - examples.md: add try/finally for browser cleanup in parallel and playwright examples - agent.md: fix defaults to match source (max_actions_per_step=5, max_failures=5, use_vision=True, llm_timeout=60, step_timeout=180) - features.md: fix undefined session_id in workspace example - monitoring.md: add missing await on get_usage_summary() - api-v2.md: fix upload flow to use multipart POST with fields (S3-style) - SKILL.md: use uv pip for cloud SDK install - integrations.md: use distinct MCP server name for docs endpoint	2026-03-21 17:15:18 -07:00
ShawnPana	607c6c814e	split cloud API reference into v2 and v3, add REST-first coverage Replace cloud/api.md with api-v2.md (321 lines, 39 endpoints with cURL examples) and api-v3.md (257 lines, 16 endpoints with full REST details). Key improvements: - cURL examples for all major operations (create task, poll, get session, create browser/CDP, upload file, stop session) - All 15 skills + marketplace endpoints (previously missing) - Session purge and task status polling endpoints - v3 REST endpoints (was SDK-only, now has full HTTP paths/params/responses) - v3 cost fields, polling defaults, stop strategies, file upload flow - CDP discovery endpoint and auto-lifecycle in browser-api.md - Updated SKILL.md routing table for v2/v3 split	2026-03-21 17:08:25 -07:00
ShawnPana	7c765aaa53	restructure browser-use-docs skill with 15 granular reference files Replace two monolithic reference files with open-source/ and cloud/ subdirs, each containing focused reference files that Claude loads on demand. Open-source (9 files): quickstart, models (15+ providers), agent (params, hooks, timeouts), browser (params, auth, real/remote), tools (custom, built-in, ActionResult), actor (Page/Element/Mouse API), integrations (MCP, skills, docs-mcp), monitoring (Laminar, OpenLIT, costs), examples (fast agent, parallel, playwright, sensitive data). Cloud (6 files): quickstart (setup, pricing, FAQ), api (v2 REST + v3 SDK), sessions (profiles, auth, 1Password, social media), browser-api (CDP direct, Playwright/Puppeteer/Selenium), features (proxies, webhooks, workspaces, skills, MCP, live view), patterns (parallel, streaming, geo-scraping, tutorials). Content sourced from full SDK docs (89 files) rather than just AGENTS.md and CLOUD.md.	2026-03-21 16:24:41 -07:00
ShawnPana	5642d069a3	add browser-use-docs skill with cloud API and open-source library references New skill providing documentation reference for writing Python code against the browser-use library and making Cloud REST API calls. Condensed from CLOUD.md (2700 lines of OpenAPI YAML → 530 lines of tables) and AGENTS.md (1021 lines → 762 lines with markup stripped and sections deduplicated).	2026-03-21 15:34:12 -07:00
ShawnPana	5a77129d1b	fix docs: use --profile "Default" instead of bare --profile before subcommands Bare --profile before a subcommand like 'open' causes argparse to consume the subcommand as the profile value. Always use explicit profile name.	2026-03-19 22:38:37 -07:00
ShawnPana	aa2ac2e7f1	condense SKILL.md files: merge redundant sections, cut 45% of lines Merge duplicate Essential Commands / Commands sections into one consolidated block. Collapse verbose flag permutations and workflow details into compact one-liners. Down from 653 total lines to 362.	2026-03-19 20:27:40 -07:00
ShawnPana	69902b713d	fix docs/code discrepancies: add click coordinates, annotate extract, add --session - Document coordinate clicking (click <x> <y>) in all interaction sections - Annotate extract command as not yet implemented in README and SKILL.md - Remove browser.extract() from Python wrapper docs (raises NotImplementedError) - Add --session flag to global options tables in both SKILL.md files	2026-03-19 20:21:09 -07:00
ShawnPana	f850fe34ba	add command chaining section to SKILL.md files	2026-03-19 18:54:14 -07:00
ShawnPana	694a111fad	add upload command to CLI, extract find_file_input_near_element to BrowserSession - Add `browser-use upload <index> <path>` command for uploading files to file input elements via the CLI - Extract find_file_input_near_element from nested closures in tools/service.py to a reusable method on BrowserSession, deduplicating two copies - Add BrowserWrapper.upload() for the Python REPL - Resolve file paths to absolute on the client side before sending to daemon - Update SKILL.md files and README with upload command docs	2026-03-19 17:02:34 -07:00
ShawnPana	43e007238c	update docs for profile-use integration and ~/.browser-use/ directory layout Document profile subcommand as passthrough to profile-use Go binary, add file layout section showing ~/.browser-use/ structure, update authenticated browsing workflow examples for new profile list output.	2026-03-18 16:30:56 -07:00
ShawnPana	d3e1818496	consolidate CLI files under ~/.browser-use/, add profile-use integration and doctor checks Unify all CLI-managed files under ~/.browser-use/ (config, sockets, PIDs, binaries, tunnels) instead of scattering across ~/.config/browser-use/ and ~/.browser-use/run/. Add profile-use Go binary as managed subcommand via browser-use profile, with auto-download fallback and install.sh integration. Wire cloudflared and profile-use availability checks into browser-use doctor.	2026-03-18 16:16:28 -07:00
ShawnPana	bff2918558	add --connect flag for Chrome auto-discovery and fix daemon shutdown for external browsers Adds `--connect` to auto-discover running Chrome instances via DevToolsActivePort files and well-known port probing, eliminating manual CDP URL construction. Fixes daemon process hanging on `close` when connected to external browsers (--connect, --cdp-url, cloud) by calling stop() (disconnect) instead of kill() (terminate).	2026-03-18 09:41:58 -07:00
ShawnPana	2778c41c3b	update skill docs for simplified CLI: remove agent/task/session management, add cloud connect and CDP workflows Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 10:12:57 -07:00
ShawnPana	0dc2fc5a6f	remove `run` command and agent infrastructure from CLI The `run` command pulled in heavy SDK dependencies (openai, anthropic, google), had a bug (await on sync get_llm), and is superseded by `browser-use cloud` for agent execution. CLI is now purely a browser automation interface. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 11:44:13 -07:00
ShawnPana	91e987acdc	add `browser-use cloud` command: generic REST passthrough to Cloud API Login/logout with API key persistence, versioned REST calls (v2/v3), task polling, and OpenAPI-driven help. Stdlib only, no daemon needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 11:13:43 -07:00
ShawnPana	4be221386b	changes	2026-03-11 12:57:50 -07:00
ShawnPana	859cb97063	simplify daemon architecture: single session, socket-as-liveness, no PID/lock files Replace the multi-session server (server.py, SessionRegistry, portalocker locking, PID files, orphan detection) with a minimal daemon (daemon.py) that holds one BrowserSession in memory. Socket file existence = alive. Auto-exits when browser dies via CDP watchdog. -2277 lines, +142 lines across 20 files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 19:05:44 -08:00
ShawnPana	39698e58e4	strip cloud/remote commands from skill_cli Remove all cloud API paths from the CLI while leaving core library cloud support (BrowserSession(use_cloud=True), browser/cloud/) untouched. Deleted: api_key.py, cloud_task.py, cloud_session.py Removed: --browser remote, cloud-only run flags, task/session subcommands, cloud profile ops (create/update/delete/sync), remote mode validation Kept: tunnel (just Cloudflare), all local commands, install_config.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 15:56:31 -08:00
ShawnPana	f9694b6af3	added README link	2026-02-18 20:44:13 -08:00
ShawnPana	624cef10a9	update SKILL.md files	2026-02-18 20:37:14 -08:00
ShawnPana	ada8b77d0b	docs(cli): add missing commands and flags to skill documentation - Add missing commands: init, setup, session create, session share, profile create - Add missing agent run flags: --wait, --start-url, --metadata, --secret, --allowed-domain, --skill-id, --structured-output, --judge, --judge-ground-truth - Add missing task status flags: --step, --reverse, --session filter - Add missing cookie flags: --same-site, --expires - Add scroll --amount, get bbox, profile pagination flags - Add Python execution section to remote-browser/SKILL.md - Add Cloud Profile Management section to remote-browser/SKILL.md - Add missing browser wrapper methods to Python docs - Fix incorrect profile list-local command references Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-12 00:46:35 -08:00
ShawnPana	79e91e6adf	fix(cli): update install script URL to /cli/install.sh Update all references from browser-use.com/install.sh to browser-use.com/cli/install.sh across documentation and code. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-12 00:24:48 -08:00
ShawnPana	6bb2301e8f	docs(skill): add missing sections from remote-browser to browser-use browser-use/SKILL.md should be the superset containing everything. Added the following sections that were only in remote-browser: - Setup section with install.sh one-liner and modes table - Exposing Local Dev Servers section with tunnel commands - browser-use doctor in troubleshooting - Enhanced Monitoring Subagents section with token-efficient table - Common Patterns section (dev server + screenshot loop) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-10 16:44:57 -08:00
ShawnPana	6be83f6467	fix: remove hardcoded test repo URLs for production deployment - install.sh: Change default BROWSER_USE_REPO from ShawnPana/browser-use to browser-use/browser-use - install.sh: Generalize branch name in dev testing comment - SKILL.md: Remove dev testing section with fork URLs, keep clean pip install instruction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-10 16:05:21 -08:00
ShawnPana	723f3335b2	feat(cli): add token-efficient task monitoring with duration display - Default task status shows only latest step (token efficient for agents) - Add --compact/-c flag to show all steps with reasoning - Add --verbose/-v flag to show all steps with URLs + actions - Add --last/-n, --reverse/-r, --step/-s flags for navigating long tasks - Display duration for tasks (e.g., "18s", "5m 13s") and sessions - Format proxy cost to 2 decimal places in session output - Update SKILL.md docs with recommended monitoring workflow	2026-02-10 14:39:20 -08:00
ShawnPana	a568f45979	feat(cli): add cloud task/session management and subagent support Cloud API Integration: - Add cloud_task.py with full Cloud API client (tasks, sessions) - Support for task creation, status polling, stopping, and logs - Session management: create, list, get, stop, delete Agent Task Options (remote mode): - --llm: specify model (gpt-4o, claude-sonnet-4-20250514, gemini-2.0-flash) - --proxy-country: proxy location (default: us) - --session-id: reuse existing cloud session - --keep-alive: preserve session after task - --no-wait: async execution, return task_id immediately - --stream: stream status updates - --flash: fast execution mode - --thinking: extended reasoning - --vision/--no-vision: control vision capability CLI Commands: - task list [--limit N] [--status running\|finished] [--json] - task status <task-id> [--json] - task stop <task-id> - task logs <task-id> - session list [--limit N] [--status active\|stopped] [--json] - session get <session-id> [--json] - session stop <session-id> \| --all Documentation: - Add comprehensive subagent examples to SKILL.md files - Document session=agent, task=work mental model - Examples for parallel agents, session reuse, agent management	2026-02-10 12:33:05 -08:00
ShawnPana	c0dd7875b8	feat(install): mode-based installation with interactive TUI Add installation mode selection that determines which browser modes are available at runtime. This enables lightweight remote-only installs for sandboxed agents (skipping ~300MB Chromium download). ## Install Modes - `--remote-only`: Only remote mode, installs cloudflared, skips Chromium - `--local-only`: Only local modes (chromium/real), skips cloudflared - `--full`: All modes available (default interactive selection) ## Runtime Behavior - Single installed mode becomes the default (no --browser flag needed) - Requesting unavailable mode shows clear error with reinstall instructions - No config file = all modes available (backward compat for pip users) ## Changes New: browser_use/skill_cli/install_config.py - Manages ~/.browser-use/install-config.json - get_config(), save_config(), is_mode_available(), get_default_mode() - get_mode_unavailable_error() for helpful reinstall guidance Modified: browser_use/skill_cli/main.py - build_parser() uses config for --browser choices and default - Validates mode availability before starting session server Modified: browser_use/skill_cli/sessions.py - create_browser_session() validates mode before creation Modified: install.sh - Command-line flags: --full, --remote-only, --local-only, --api-key - Interactive TUI with fallback: gum → whiptail → pure bash menu - Conditional dependency installation based on selected modes - Writes install-config.json with installed modes Updated: skills/remote-browser/SKILL.md - Simplified examples (no --browser remote needed with --remote-only) - Updated setup instructions for mode-based installation New: tests/ci/test_install_config.py - 13 tests covering config module	2026-02-09 20:36:08 -08:00
ShawnPana	94c336407d	fix(cli): make tunnel command independent of browser sessions Previously, `browser-use tunnel 3000` would start a session server with the default browser mode (chromium), which poisoned subsequent `--browser remote` commands with a mode mismatch error. Now tunnel commands are handled directly in main.py without starting a browser session: npm run dev & browser-use tunnel 3000 # No session created browser-use --browser remote open <tunnel-url> # Works! Changes: - Move tunnel logic to standalone functions in tunnel_manager.py - Handle tunnel commands in main.py before ensure_server() - Tunnels persist across browser-use close (independent lifecycle) - Add `browser-use tunnel stop --all` command Breaking behavior change: - `browser-use close` no longer stops tunnels - Use `browser-use tunnel stop <port>` or `tunnel stop --all` explicitly Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 18:38:39 -08:00
ShawnPana	9f51461bbd	fix(cli): error when requesting remote browser on local session + improve docs When a session is started without --browser remote and a subsequent command uses --browser remote, the CLI now errors with guidance instead of silently using the local browser (which loses cloud features like live_url). Code changes: - Add session metadata file to track browser_mode per session - Error only when remote requested but session is local - Allow local-on-remote (user gets more features than requested) - Clean up metadata on session close SKILL.md improvements: - Clarify that --browser remote only needed on FIRST command - Add table showing session modes - Add "What happens if you forget" section with error example - Add troubleshooting entry for mode mismatch error Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 18:20:06 -08:00

1 2

69 Commits