69 Commits

Author SHA1 Message Date
Laith Weinberger
4476f6e16e fix input clear fallbacks and clarify clear-then-type behavior 2026-04-15 17:31:04 -04:00
Shawn Pana
d0fbf4c580 improve connect failure UX: fix chrome://inspect link and add fallback guidance
When `browser-use connect` fails to discover a running Chrome, the error
now points to the correct `chrome://inspect/#remote-debugging` URL. The
SKILL.md also guides agents to prompt users with two options: enable
remote debugging or use managed Chromium with a Chrome profile.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:23:50 -07:00
Laith Weinberger
c1151715d3 improve model docs 2026-04-05 21:59:33 -04:00
ShawnPana
4cd2f456ef fix: SKILL.md references doctor instead of cloud login for status check 2026-04-01 19:27:36 -07:00
ShawnPana
870776fa9e docs: restructure SKILL.md — headless-first, push advanced to references
Core workflow now starts with bare headless (no setup step). connect and
cloud connect mentioned prominently but not required. Python/CDP and
multi-session pushed to reference files. Added session recovery tip.
219 lines (down from 272).
2026-04-01 18:20:02 -07:00
ShawnPana
ca05f46352 refactor: remove --agent/register/tab-ownership, sessions-as-agents model
Multi-agent isolation is now achieved through separate sessions
(--session NAME), each with its own browser. Removed:
- register command and agents.json
- --agent flag and agent_id plumbing
- TabOwnershipManager and all tab locking logic
- dispatch lock and focus swapping between agents
- tab_ownership.py (deleted)
- test_tab_ownership.py (deleted)

Simplified tab commands: no lock checks, no _tab_list injection,
no _resolved_target_id params. agent_focus_target_id stays for
single-agent tab tracking.

Tested: 3 concurrent subagents on separate cloud sessions,
3 concurrent subagents on separate headless Chromium sessions.
2026-04-01 17:34:46 -07:00
ShawnPana
73a926caa6 feat: add connect command + --agent flag, decouple multi-agent from Chrome
- browser-use connect: one-time command to discover and connect to local
  Chrome (like cloud connect but for local)
- --agent INDEX: per-command flag for multi-agent tab isolation, works
  with any browser mode (cloud, profile, cdp-url, headless)
- register is now per-session ({session}.agents.json)
- --connect deprecated with migration message
- SKILL.md updated for new connect/--agent workflow
- Tested: 3 concurrent agents on shared cloud browser session
2026-04-01 16:52:56 -07:00
ShawnPana
c4d5a0eace docs: add multi-session and config sections to browser-use skill
SKILL.md: added Multiple Browser Sessions section (with pointer to
references/multi-session.md) and Configuration section (config set/get/
list/unset, doctor, setup).

New references/multi-session.md: detailed guide for running multiple
browser sessions simultaneously with --session flag.
2026-04-01 16:04:44 -07:00
ShawnPana
c2c9aa8556 feat: cloud signup command, unified base URL, auth isolation
- browser-use cloud signup: challenge-response agent self-registration
- browser-use cloud signup --verify: verify and save API key
- browser-use cloud signup --claim: generate account claim URL
- Base URL convention unified: BROWSER_USE_CLOUD_BASE_URL is host-only
  (e.g. https://api.browser-use.com), CLI appends /api/{version}
- CLI daemon blocked from falling back to library's cloud_auth.json
- cloud connect validates API key before spawning daemon
- No-key error message mentions cloud signup
2026-03-31 21:14:20 -07:00
ShawnPana
b4b2a4c18d feat: zero-config cloud connect with auto-managed profile
cloud connect now works with no flags. On first use, creates a
"Browser Use CLI" profile via the Cloud API and saves the ID to
config.json. Subsequent connects reuse it (validates on each call,
recreates if deleted).

Removed --timeout, --proxy-country, --profile-id flags and their
plumbing through daemon/sessions. Power users who need custom browser
settings use cloud v2 POST /browsers directly.
2026-03-31 12:28:42 -07:00
ShawnPana
04ca40c3af docs: add CDP/Python reference for browser-use skill
New references/cdp-python.md with tested recipes for raw CDP access
via the Python session: activating tabs, listing targets, running JS,
device emulation, cookies. SKILL.md updated with pointer to the
reference file explaining when to reach for CDP vs CLI commands.
2026-03-31 10:35:03 -07:00
ShawnPana
f819d3e3a6 feat: add tab command (list, new, switch, close) and agent registration
- `browser-use register` assigns numeric agent index for --connect mode
- `--connect <index>` requires explicit agent index (no more bare --connect)
- `tab list` shows all tabs with lock status per agent
- `tab new [url]` creates a new tab without visually switching
- `tab switch <index>` changes agent focus without activating Chrome tab
- `tab close <index> [index...]` closes multiple tabs in one command
- Agent registry in ~/.browser-use/agents.json with 5min expiry
- Improved error messages guide agents to register or use their own tab
- Session lock prevents double BrowserSession creation on simultaneous connect
- Updated SKILL.md with register workflow and tab commands
2026-03-24 17:27:38 -07:00
ShawnPana
31694df283 feat: multi-agent tab isolation for --connect mode
Multiple agents can share one browser via --connect without interfering
with each other. Each agent registers with `browser-use register` to get
a numeric index, then passes it with `--connect <index>` on every command.

- Tab locking: mutating commands (click, type, open) lock the tab to the
  agent. Other agents get an error if they try to mutate the same tab.
  Read-only commands (state, screenshot) work on any tab.
- Agent registry: agents.json tracks registered agents with timestamps.
  Expired agents (5min inactive) get cleaned up automatically.
- Session lock: prevents double BrowserSession creation when two agents
  connect simultaneously.
- Focus swap: daemon swaps agent_focus_target_id and cached_selector_map
  per-agent before each command, so element indices are isolated.
2026-03-24 16:38:53 -07:00
ShawnPana
318eb3b41e fix Selenium wss:// handling and webhook JSON parse guard 2026-03-21 19:04:16 -07:00
ShawnPana
709853c307 fix accuracy issues found by cross-referencing SDK source
Open-source:
- agent.md: max_steps default 100 → 500 (matches agent/service.py)
- agent.md: NavigateToUrlEvent timeout 15.0 → 30.0 (matches events.py)

Cloud:
- quickstart.md: v3 pricing corrected — BU Mini output $2.88 → $4.20,
  BU Max output $14.40 → $18.00 (matches pricing.mdx)
- browser-api.md: Selenium debugger_address now strips /devtools/browser/
  suffix (matches browser-api.mdx)
- features.md: webhook payload uses "type"/"payload" not "event"/"data",
  signature verification now includes 5-min replay protection and sorted
  JSON keys (matches webhooks.mdx)
2026-03-21 18:59:26 -07:00
ShawnPana
884f5d20d6 fix chat-ui.md: remove duplicate await asyncio.sleep(2) after asyncio.run() 2026-03-21 18:45:08 -07:00
ShawnPana
00d9a9694f fix chat-ui.md: wrap Python example in async def main() + asyncio.run() 2026-03-21 18:39:55 -07:00
ShawnPana
71e8415764 fix cubic review: TOC anchor, relative link, full v3 base URL
- subagent.md: fix TOC anchor to match renamed heading (#python-agents-cloud-sdk)
- subagent.md: fix relative link to ../features.md from guides/ subdir
- SKILL.md: use full v3 base URL instead of .../api/v3
2026-03-21 18:35:13 -07:00
ShawnPana
74275eff3c split browser-use-docs into separate open-source and cloud skills
Two independent skills so Claude only loads what's relevant:

skills/open-source/ — Python library docs (Agent, Browser, Tools, Actor,
  MCP, monitoring, models, examples). 9 reference files.

skills/cloud/ — Cloud API, SDK, and integration patterns. 7 reference
  files + 3 guides (chat-ui, subagent, tools-integration). Guides edited
  to remove open-source Python imports (Actor API, Tools Registry,
  MCPClient sections removed from tools-integration; local Agent wrapper
  replaced with Cloud SDK in subagent; local-only error/cost patterns
  removed from cross-cutting concerns).

Delete skills/browser-use-docs/ entirely.
2026-03-21 18:29:23 -07:00
ShawnPana
4416cf18c2 add integration guides: chat UI, subagent, tools integration
Three guides organized by agent type (CLI sandbox agents, Python frameworks,
TypeScript, MCP clients, HTTP/workflow engines, existing Playwright/Puppeteer):

- chat-ui.md: Build conversational browser UI with Cloud SDK v3, liveUrl
  iframe embedding, message polling, follow-ups, optimistic updates
- subagent.md: Delegate entire web tasks (black box). CLI cloud passthrough
  for sandbox agents, Python Agent wrapper, Cloud SDK, MCP browser_task,
  REST API for workflow engines. Plus structured output, error handling,
  cost control patterns
- tools-integration.md: Add individual browser actions to existing agent.
  CLI commands for sandbox agents, Actor API (Page/Element/Mouse), Tools
  Registry (execute_action), MCPClient (auto-discovery), CDP+Playwright
  for TypeScript, local MCP server for Claude/Cursor. Decision matrix
  mapping agent type → best integration
2026-03-21 18:16:45 -07:00
ShawnPana
5ee7104fcc fix 3 remaining cubic review issues
- agent.md: fix llm_timeout docs to match auto-detection logic (Groq 30s,
  Gemini 75s, o3/Claude/DeepSeek 90s, default 75s)
- examples.md: guard pw.stop() with None check to prevent UnboundLocalError
- api-v2.md: show all S3 presigned form fields in upload example
2026-03-21 17:20:42 -07:00
ShawnPana
8dfb3cc882 fix review issues from cubic bot
- models.md: fix ChatVercel provider_options and env var (AI_GATEWAY_API_KEY)
- examples.md: add try/finally for browser cleanup in parallel and playwright examples
- agent.md: fix defaults to match source (max_actions_per_step=5, max_failures=5,
  use_vision=True, llm_timeout=60, step_timeout=180)
- features.md: fix undefined session_id in workspace example
- monitoring.md: add missing await on get_usage_summary()
- api-v2.md: fix upload flow to use multipart POST with fields (S3-style)
- SKILL.md: use uv pip for cloud SDK install
- integrations.md: use distinct MCP server name for docs endpoint
2026-03-21 17:15:18 -07:00
ShawnPana
607c6c814e split cloud API reference into v2 and v3, add REST-first coverage
Replace cloud/api.md with api-v2.md (321 lines, 39 endpoints with cURL
examples) and api-v3.md (257 lines, 16 endpoints with full REST details).

Key improvements:
- cURL examples for all major operations (create task, poll, get session,
  create browser/CDP, upload file, stop session)
- All 15 skills + marketplace endpoints (previously missing)
- Session purge and task status polling endpoints
- v3 REST endpoints (was SDK-only, now has full HTTP paths/params/responses)
- v3 cost fields, polling defaults, stop strategies, file upload flow
- CDP discovery endpoint and auto-lifecycle in browser-api.md
- Updated SKILL.md routing table for v2/v3 split
2026-03-21 17:08:25 -07:00
ShawnPana
7c765aaa53 restructure browser-use-docs skill with 15 granular reference files
Replace two monolithic reference files with open-source/ and cloud/ subdirs,
each containing focused reference files that Claude loads on demand.

Open-source (9 files): quickstart, models (15+ providers), agent (params,
hooks, timeouts), browser (params, auth, real/remote), tools (custom,
built-in, ActionResult), actor (Page/Element/Mouse API), integrations
(MCP, skills, docs-mcp), monitoring (Laminar, OpenLIT, costs), examples
(fast agent, parallel, playwright, sensitive data).

Cloud (6 files): quickstart (setup, pricing, FAQ), api (v2 REST + v3 SDK),
sessions (profiles, auth, 1Password, social media), browser-api (CDP direct,
Playwright/Puppeteer/Selenium), features (proxies, webhooks, workspaces,
skills, MCP, live view), patterns (parallel, streaming, geo-scraping,
tutorials).

Content sourced from full SDK docs (89 files) rather than just AGENTS.md
and CLOUD.md.
2026-03-21 16:24:41 -07:00
ShawnPana
5642d069a3 add browser-use-docs skill with cloud API and open-source library references
New skill providing documentation reference for writing Python code against
the browser-use library and making Cloud REST API calls. Condensed from
CLOUD.md (2700 lines of OpenAPI YAML → 530 lines of tables) and AGENTS.md
(1021 lines → 762 lines with markup stripped and sections deduplicated).
2026-03-21 15:34:12 -07:00
ShawnPana
5a77129d1b fix docs: use --profile "Default" instead of bare --profile before subcommands
Bare --profile before a subcommand like 'open' causes argparse to consume
the subcommand as the profile value. Always use explicit profile name.
2026-03-19 22:38:37 -07:00
ShawnPana
aa2ac2e7f1 condense SKILL.md files: merge redundant sections, cut 45% of lines
Merge duplicate Essential Commands / Commands sections into one consolidated
block. Collapse verbose flag permutations and workflow details into compact
one-liners. Down from 653 total lines to 362.
2026-03-19 20:27:40 -07:00
ShawnPana
69902b713d fix docs/code discrepancies: add click coordinates, annotate extract, add --session
- Document coordinate clicking (click <x> <y>) in all interaction sections
- Annotate extract command as not yet implemented in README and SKILL.md
- Remove browser.extract() from Python wrapper docs (raises NotImplementedError)
- Add --session flag to global options tables in both SKILL.md files
2026-03-19 20:21:09 -07:00
ShawnPana
f850fe34ba add command chaining section to SKILL.md files 2026-03-19 18:54:14 -07:00
ShawnPana
694a111fad add upload command to CLI, extract find_file_input_near_element to BrowserSession
- Add `browser-use upload <index> <path>` command for uploading files to
  file input elements via the CLI
- Extract find_file_input_near_element from nested closures in tools/service.py
  to a reusable method on BrowserSession, deduplicating two copies
- Add BrowserWrapper.upload() for the Python REPL
- Resolve file paths to absolute on the client side before sending to daemon
- Update SKILL.md files and README with upload command docs
2026-03-19 17:02:34 -07:00
ShawnPana
43e007238c update docs for profile-use integration and ~/.browser-use/ directory layout
Document profile subcommand as passthrough to profile-use Go binary,
add file layout section showing ~/.browser-use/ structure, update
authenticated browsing workflow examples for new profile list output.
2026-03-18 16:30:56 -07:00
ShawnPana
d3e1818496 consolidate CLI files under ~/.browser-use/, add profile-use integration and doctor checks
Unify all CLI-managed files under ~/.browser-use/ (config, sockets, PIDs,
binaries, tunnels) instead of scattering across ~/.config/browser-use/ and
~/.browser-use/run/. Add profile-use Go binary as managed subcommand via
browser-use profile, with auto-download fallback and install.sh integration.
Wire cloudflared and profile-use availability checks into browser-use doctor.
2026-03-18 16:16:28 -07:00
ShawnPana
bff2918558 add --connect flag for Chrome auto-discovery and fix daemon shutdown for external browsers
Adds `--connect` to auto-discover running Chrome instances via DevToolsActivePort
files and well-known port probing, eliminating manual CDP URL construction. Fixes
daemon process hanging on `close` when connected to external browsers (--connect,
--cdp-url, cloud) by calling stop() (disconnect) instead of kill() (terminate).
2026-03-18 09:41:58 -07:00
ShawnPana
2778c41c3b update skill docs for simplified CLI: remove agent/task/session management, add cloud connect and CDP workflows
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 10:12:57 -07:00
ShawnPana
0dc2fc5a6f remove run command and agent infrastructure from CLI
The `run` command pulled in heavy SDK dependencies (openai, anthropic,
google), had a bug (await on sync get_llm), and is superseded by
`browser-use cloud` for agent execution. CLI is now purely a browser
automation interface.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 11:44:13 -07:00
ShawnPana
91e987acdc add browser-use cloud command: generic REST passthrough to Cloud API
Login/logout with API key persistence, versioned REST calls (v2/v3),
task polling, and OpenAPI-driven help. Stdlib only, no daemon needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 11:13:43 -07:00
ShawnPana
4be221386b changes 2026-03-11 12:57:50 -07:00
ShawnPana
859cb97063 simplify daemon architecture: single session, socket-as-liveness, no PID/lock files
Replace the multi-session server (server.py, SessionRegistry, portalocker locking,
PID files, orphan detection) with a minimal daemon (daemon.py) that holds one
BrowserSession in memory. Socket file existence = alive. Auto-exits when browser
dies via CDP watchdog.

-2277 lines, +142 lines across 20 files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 19:05:44 -08:00
ShawnPana
39698e58e4 strip cloud/remote commands from skill_cli
Remove all cloud API paths from the CLI while leaving core library
cloud support (BrowserSession(use_cloud=True), browser/cloud/) untouched.

Deleted: api_key.py, cloud_task.py, cloud_session.py
Removed: --browser remote, cloud-only run flags, task/session subcommands,
cloud profile ops (create/update/delete/sync), remote mode validation
Kept: tunnel (just Cloudflare), all local commands, install_config.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 15:56:31 -08:00
ShawnPana
f9694b6af3 added README link 2026-02-18 20:44:13 -08:00
ShawnPana
624cef10a9 update SKILL.md files 2026-02-18 20:37:14 -08:00
ShawnPana
ada8b77d0b docs(cli): add missing commands and flags to skill documentation
- Add missing commands: init, setup, session create, session share, profile create
- Add missing agent run flags: --wait, --start-url, --metadata, --secret,
  --allowed-domain, --skill-id, --structured-output, --judge, --judge-ground-truth
- Add missing task status flags: --step, --reverse, --session filter
- Add missing cookie flags: --same-site, --expires
- Add scroll --amount, get bbox, profile pagination flags
- Add Python execution section to remote-browser/SKILL.md
- Add Cloud Profile Management section to remote-browser/SKILL.md
- Add missing browser wrapper methods to Python docs
- Fix incorrect profile list-local command references

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-12 00:46:35 -08:00
ShawnPana
79e91e6adf fix(cli): update install script URL to /cli/install.sh
Update all references from browser-use.com/install.sh to
browser-use.com/cli/install.sh across documentation and code.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-12 00:24:48 -08:00
ShawnPana
6bb2301e8f docs(skill): add missing sections from remote-browser to browser-use
browser-use/SKILL.md should be the superset containing everything.
Added the following sections that were only in remote-browser:

- Setup section with install.sh one-liner and modes table
- Exposing Local Dev Servers section with tunnel commands
- browser-use doctor in troubleshooting
- Enhanced Monitoring Subagents section with token-efficient table
- Common Patterns section (dev server + screenshot loop)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-10 16:44:57 -08:00
ShawnPana
6be83f6467 fix: remove hardcoded test repo URLs for production deployment
- install.sh: Change default BROWSER_USE_REPO from ShawnPana/browser-use
  to browser-use/browser-use
- install.sh: Generalize branch name in dev testing comment
- SKILL.md: Remove dev testing section with fork URLs, keep clean
  pip install instruction

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-10 16:05:21 -08:00
ShawnPana
723f3335b2 feat(cli): add token-efficient task monitoring with duration display
- Default task status shows only latest step (token efficient for agents)
- Add --compact/-c flag to show all steps with reasoning
- Add --verbose/-v flag to show all steps with URLs + actions
- Add --last/-n, --reverse/-r, --step/-s flags for navigating long tasks
- Display duration for tasks (e.g., "18s", "5m 13s") and sessions
- Format proxy cost to 2 decimal places in session output
- Update SKILL.md docs with recommended monitoring workflow
2026-02-10 14:39:20 -08:00
ShawnPana
a568f45979 feat(cli): add cloud task/session management and subagent support
Cloud API Integration:
- Add cloud_task.py with full Cloud API client (tasks, sessions)
- Support for task creation, status polling, stopping, and logs
- Session management: create, list, get, stop, delete

Agent Task Options (remote mode):
- --llm: specify model (gpt-4o, claude-sonnet-4-20250514, gemini-2.0-flash)
- --proxy-country: proxy location (default: us)
- --session-id: reuse existing cloud session
- --keep-alive: preserve session after task
- --no-wait: async execution, return task_id immediately
- --stream: stream status updates
- --flash: fast execution mode
- --thinking: extended reasoning
- --vision/--no-vision: control vision capability

CLI Commands:
- task list [--limit N] [--status running|finished] [--json]
- task status <task-id> [--json]
- task stop <task-id>
- task logs <task-id>
- session list [--limit N] [--status active|stopped] [--json]
- session get <session-id> [--json]
- session stop <session-id> | --all

Documentation:
- Add comprehensive subagent examples to SKILL.md files
- Document session=agent, task=work mental model
- Examples for parallel agents, session reuse, agent management
2026-02-10 12:33:05 -08:00
ShawnPana
c0dd7875b8 feat(install): mode-based installation with interactive TUI
Add installation mode selection that determines which browser modes are
available at runtime. This enables lightweight remote-only installs for
sandboxed agents (skipping ~300MB Chromium download).

## Install Modes

- `--remote-only`: Only remote mode, installs cloudflared, skips Chromium
- `--local-only`: Only local modes (chromium/real), skips cloudflared
- `--full`: All modes available (default interactive selection)

## Runtime Behavior

- Single installed mode becomes the default (no --browser flag needed)
- Requesting unavailable mode shows clear error with reinstall instructions
- No config file = all modes available (backward compat for pip users)

## Changes

**New: browser_use/skill_cli/install_config.py**
- Manages ~/.browser-use/install-config.json
- get_config(), save_config(), is_mode_available(), get_default_mode()
- get_mode_unavailable_error() for helpful reinstall guidance

**Modified: browser_use/skill_cli/main.py**
- build_parser() uses config for --browser choices and default
- Validates mode availability before starting session server

**Modified: browser_use/skill_cli/sessions.py**
- create_browser_session() validates mode before creation

**Modified: install.sh**
- Command-line flags: --full, --remote-only, --local-only, --api-key
- Interactive TUI with fallback: gum → whiptail → pure bash menu
- Conditional dependency installation based on selected modes
- Writes install-config.json with installed modes

**Updated: skills/remote-browser/SKILL.md**
- Simplified examples (no --browser remote needed with --remote-only)
- Updated setup instructions for mode-based installation

**New: tests/ci/test_install_config.py**
- 13 tests covering config module
2026-02-09 20:36:08 -08:00
ShawnPana
94c336407d fix(cli): make tunnel command independent of browser sessions
Previously, `browser-use tunnel 3000` would start a session server with
the default browser mode (chromium), which poisoned subsequent
`--browser remote` commands with a mode mismatch error.

Now tunnel commands are handled directly in main.py without starting a
browser session:

  npm run dev &
  browser-use tunnel 3000                          # No session created
  browser-use --browser remote open <tunnel-url>   # Works!

Changes:
- Move tunnel logic to standalone functions in tunnel_manager.py
- Handle tunnel commands in main.py before ensure_server()
- Tunnels persist across browser-use close (independent lifecycle)
- Add `browser-use tunnel stop --all` command

Breaking behavior change:
- `browser-use close` no longer stops tunnels
- Use `browser-use tunnel stop <port>` or `tunnel stop --all` explicitly

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 18:38:39 -08:00
ShawnPana
9f51461bbd fix(cli): error when requesting remote browser on local session + improve docs
When a session is started without --browser remote and a subsequent command
uses --browser remote, the CLI now errors with guidance instead of silently
using the local browser (which loses cloud features like live_url).

Code changes:
- Add session metadata file to track browser_mode per session
- Error only when remote requested but session is local
- Allow local-on-remote (user gets more features than requested)
- Clean up metadata on session close

SKILL.md improvements:
- Clarify that --browser remote only needed on FIRST command
- Add table showing session modes
- Add "What happens if you forget" section with error example
- Add troubleshooting entry for mode mismatch error

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 18:20:06 -08:00