browser-use

mirror of https://github.com/browser-use/browser-use synced 2026-04-22 17:45:09 +02:00

Author	SHA1	Message	Date
MagMueller	30419e82dc	bench: add threshold curve + JS boundary tests for heavy page strategy	2026-04-01 14:47:39 -07:00
MagMueller	c1bfe4f328	bench: add per-method timing benchmark across scales (10k→1M)	2026-04-01 13:19:46 -07:00
MagMueller	715e0bbf02	chore: gitignore generated test HTML files	2026-04-01 12:25:18 -07:00
MagMueller	5ec5c8d43a	perf: hoist CDP session lookup + cap paint order rect explosion Two additional performance fixes for heavy pages: 1. Hoist get_or_create_cdp_session() outside _construct_enhanced_node Previously called once PER DOM NODE inside the recursive tree construction. On a 100k-element page, this was 100k+ async operations. Now resolved once before recursion starts. 2. Add _MAX_RECTS=5000 safety cap to RectUnionPure The paint order rect union can fragment exponentially with many overlapping translucent layers (each add() splits up to 4 rects). Cap prevents memory/CPU explosion on complex pages. Also: expanded stress test suite to 15 pages (up to 132k elements) including shadow DOM + iframe combos, overlapping layers, cross-origin iframes, and a 100k flat element test. All 15 pass.	2026-04-01 12:25:10 -07:00
MagMueller	dee1af29cc	test: add heavy page DOM capture stress tests 10 progressively heavier test pages (1k to 50k+ elements): - Flat divs, nested tables, shadow DOM, iframes, deep nesting - Mega forms, SVG heavy, event listeners, cross-origin iframes - Ultimate stress test combining everything Two test modes: - --dom-only: tests DOM capture without LLM (fast, no API key needed) - --agent: tests full agent loop with real LLM on subset of pages	2026-04-01 11:44:31 -07:00
MagMueller	683994da8e	fix: prevent DOM capture timeout on heavy pages (20k+ elements) Pages with very large DOMs (e.g. Stimulsoft designer with 20,000+ elements) cause the browser state capture to time out, making the agent unable to interact with the browser. Three targeted fixes: 1. Skip JS listener detection on heavy pages (>10k elements) The querySelectorAll('*') + getEventListeners() loop followed by individual DOM.describeNode CDP calls for each listener element is O(n) and can take 10s+ alone on heavy pages. 2. Batch DOM.describeNode calls (chunks of 50) Previously all calls fired at once via asyncio.gather, flooding the CDP WebSocket and causing timeouts on concurrent operations. 3. Adaptive CDP timeouts based on page complexity - >15k elements: 25s initial / 10s retry (was 10s/2s) - >5k elements: 15s initial / 5s retry - Normal pages: unchanged 10s/2s	2026-04-01 11:25:03 -07:00
ShawnPana	c2c9aa8556	feat: cloud signup command, unified base URL, auth isolation - browser-use cloud signup: challenge-response agent self-registration - browser-use cloud signup --verify: verify and save API key - browser-use cloud signup --claim: generate account claim URL - Base URL convention unified: BROWSER_USE_CLOUD_BASE_URL is host-only (e.g. https://api.browser-use.com), CLI appends /api/{version} - CLI daemon blocked from falling back to library's cloud_auth.json - cloud connect validates API key before spawning daemon - No-key error message mentions cloud signup	2026-03-31 21:14:20 -07:00
ShawnPana	9fd3d81f8b	fix: default cloud connect proxy to US, validate country codes _get_cloud_connect_proxy defaults to 'us' when not in config. Invalid codes (including null, false, 'none') return None (no proxy).	2026-03-31 14:17:42 -07:00
ShawnPana	032e73eec7	feat: config-driven proxy/timeout for cloud connect, enable recording cloud connect reads cloud_connect_proxy and cloud_connect_timeout from ~/.browser-use/config.json. Recording always enabled via enableRecording default on CreateBrowserRequest. No CLI flags — edit config for custom settings, use cloud v2 REST for full control.	2026-03-31 13:44:27 -07:00
ShawnPana	2b9ee57975	fix: CloudBrowserClient respects BROWSER_USE_CLOUD_BASE_URL in CLI CLIBrowserSession._provision_cloud_browser now reads the env var and overrides CloudBrowserClient.api_base_url before provisioning. This keeps the change CLI-only without modifying library code.	2026-03-31 12:51:44 -07:00
ShawnPana	584b1a0854	feat: add BROWSER_USE_CLOUD_BASE_URL env var override Single env var to override the cloud API base URL for all versions. Per-version overrides (BROWSER_USE_CLOUD_BASE_URL_V2/V3) still take precedence for backward compatibility.	2026-03-31 12:38:01 -07:00
ShawnPana	b4b2a4c18d	feat: zero-config cloud connect with auto-managed profile cloud connect now works with no flags. On first use, creates a "Browser Use CLI" profile via the Cloud API and saves the ID to config.json. Subsequent connects reuse it (validates on each call, recreates if deleted). Removed --timeout, --proxy-country, --profile-id flags and their plumbing through daemon/sessions. Power users who need custom browser settings use cloud v2 POST /browsers directly.	2026-03-31 12:28:42 -07:00
Saurav Panda	e026a51fd3	fix: close httpx client pool on session stop to prevent OOM on Lambda (#4577 ) ## Summary - Close `CloudBrowserClient`'s httpx connection pool in `on_BrowserStopEvent` after `stop_browser()` completes - Make `CloudBrowserClient.close()` idempotent (safe to call multiple times) ## Problem `BrowserSession.on_BrowserStopEvent` calls `stop_browser()` to end the cloud session but never calls `_cloud_browser_client.close()`. The underlying `httpx.AsyncClient` connection pool stays alive in memory. On AWS Lambda with provisioned concurrency, the same container handles many invocations sequentially. Each invocation creates a new `BrowserSession` → new `CloudBrowserClient` → new `httpx.AsyncClient`, but the old ones are never cleaned up. Memory climbs from ~1.3GB to the 3GB ceiling over hours, triggering OOM kills. Observed in production: 21 `Runtime.ExitError` crashes in 6 hours, `aws.lambda.enhanced.max_memory_used` saturated at 3,009 MB (the Lambda limit), 3 confirmed `out_of_memory` events. ## Test plan - [ ] Run `tests/ci` suite to verify no regressions - [ ] Verify cloud browser sessions still clean up properly (`stop_browser` + `close`) - [ ] Verify calling `close()` twice doesn't raise 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Close the `httpx` client pool when a cloud browser session stops to prevent memory leaks and OOM on AWS Lambda. Also makes `CloudBrowserClient.close()` safe to call multiple times. - Bug Fixes - Always call `_cloud_browser_client.close()` in `BrowserSession.on_BrowserStopEvent` after `stop_browser()` (in finally) to free the `httpx.AsyncClient` pool. - Make `CloudBrowserClient.close()` idempotent by checking `client.is_closed` before `aclose()`. <sup>Written for commit `5e644981e8`. Summary will update on new commits.</sup> <!-- End of auto-generated description by cubic. -->	2026-03-31 11:39:04 -07:00
LarsenCundric	5e644981e8	fix: close CloudBrowserClient httpx pool on session stop to prevent memory leak BrowserSession.on_BrowserStopEvent calls stop_browser() but never calls _cloud_browser_client.close(), leaving the httpx connection pool alive. On Lambda provisioned concurrency, these pools accumulate across invocations — memory climbs from ~1.3GB to the 3GB ceiling over hours, triggering OOM kills (21 Runtime.ExitError crashes in 6 hours observed in production). Changes: - Call _cloud_browser_client.close() in on_BrowserStopEvent after stop_browser completes (in a finally block so it runs even if stop_browser fails) - Make CloudBrowserClient.close() idempotent (check is_closed before calling aclose) so it's safe to call multiple times Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 11:19:43 -07:00
ShawnPana	04ca40c3af	docs: add CDP/Python reference for browser-use skill New references/cdp-python.md with tested recipes for raw CDP access via the Python session: activating tabs, listing targets, running JS, device emulation, cookies. SKILL.md updated with pointer to the reference file explaining when to reach for CDP vs CLI commands.	2026-03-31 10:35:03 -07:00
ShawnPana	8d557b3de7	docs: update README tabs section and file layout for recent changes Replace standalone switch/close-tab with tab subcommands. Add state.json to file layout.	2026-03-30 20:11:39 -07:00
ShawnPana	526bde4d0f	refactor: remove standalone switch and close-tab command aliases These were duplicates of tab switch and tab close with separate lock-checking and focus-resolution code paths. Having one way to do each thing reduces maintenance surface and avoids isolation bugs.	2026-03-30 20:09:50 -07:00
ShawnPana	a6744f4ade	fix: tab close uses caller's logical focus, not Chrome's global focus Default close-tab and tab-close paths used session_manager.get_focused_target() which returns Chrome's globally focused tab. In multi-agent mode this lets one agent close another's tab. Now uses agent_focus_target_id first, falling back to global focus only in single-agent mode. Also: fresh daemon spawn now uses phase-aware state file waiting (15s) instead of fixed 5s socket polling, consistent with ensure_daemon's probe logic.	2026-03-30 20:04:01 -07:00
ShawnPana	d59c0e6984	fix: CLI reconciliation gaps found by codex review - ensure_daemon now phase-aware: waits for initializing/starting/ shutting_down with staleness timeouts, errors on unhealthy sessions - _close_session only cleans files after confirmed PID death - _handle_sessions won't delete live daemon's files on stale terminal state - _probe_session uses socket_pid for split-brain PID resolution - _is_daemon_process works on Windows (wmic) - Updated + added robustness tests (codex-authored)	2026-03-30 19:49:57 -07:00
ShawnPana	7c7fd37220	test: add daemon lifecycle tests with real subprocess daemons 15 tests covering state file transitions, _probe_session branches (live daemon, dead PID, no files, corrupt state), close via socket, close orphaned daemon, close --all with mixed sessions, sessions listing with phase column, and stale/terminal state cleanup. Uses real daemon subprocesses with BROWSER_USE_HOME overridden to temp dirs. No mocking.	2026-03-30 19:09:00 -07:00
ShawnPana	f222c98bbf	feat: unified session probe and cross-platform lifecycle helpers Replace scattered PID/socket/process checks with _probe_session() that reads the state file, reconciles PIDs, checks liveness, and probes the socket without deleting anything. Callers decide cleanup policy. Adds _is_pid_alive, _is_daemon_process, _terminate_pid (cross-platform with SIGKILL escalation), _close_session (shared by close and close-all). sessions command now shows phase column. close and close-all both handle orphaned daemons via SIGTERM fallback. close polls for PID disappearance up to 15s before giving up.	2026-03-30 18:47:34 -07:00
ShawnPana	8ade5748e2	feat: add daemon lifecycle state file and shutdown re-entrancy guard Daemon now writes <session>.state.json at each lifecycle transition (initializing → ready → starting → running → shutting_down → stopped/failed). All shutdown triggers funneled through _request_shutdown() to prevent double cleanup. Startup rollback on failure cleans up browser resources. Also fixes cloud browser leak: CLIBrowserSession.stop() now explicitly stops the remote browser via API instead of just disconnecting the websocket. Daemon ping response now includes PID for split-brain resolution.	2026-03-30 18:43:21 -07:00
ShawnPana	ac48ebecaf	fix: SIGTERM fallback for orphaned daemons in close command When the daemon's socket is unreachable but the PID file references a live process, close now sends SIGTERM directly instead of printing "No active browser session" and leaving the daemon running forever.	2026-03-30 18:39:06 -07:00
ShawnPana	e989797541	fix: sweep orphaned sockets in sessions command Sockets without a corresponding live PID file were never cleaned up because cleanup was only triggered per-session on next use. Add a glob pass in _handle_sessions to remove any .sock file that doesn't match a live session.	2026-03-29 18:05:56 -07:00
ShawnPana	18c2199e22	fix: serialize multi-agent focus swaps with dispatch lock Concurrent agents could interleave focus swaps on the shared BrowserSession, corrupting each other's state. Wrap the entire swap-execute-restore cycle in _dispatch_lock and separate the tab-ownership path from single-agent mode.	2026-03-28 15:58:29 -07:00
ShawnPana	4be3525d81	fix: use JS scrollBy instead of CDP gesture for CLI scroll command CDP input gesture simulation doesn't work in --connect mode (external Chrome). Switch to window.scrollBy via Runtime.evaluate which works regardless of connection mode.	2026-03-28 15:58:24 -07:00
ShawnPana	ae93cee5c2	fix: unify CLI cloud auth so cloud connect uses ~/.browser-use/config.json cloud connect was silently reading credentials from the library's ~/.config/browseruse/cloud_auth.json, bypassing cloud login/logout entirely. Now ensure_daemon injects the CLI config's API key into the daemon subprocess env, so all cloud commands share a single auth source.	2026-03-28 15:56:56 -07:00
ShawnPana	1f8b4a5346	fix: rewrite register tests to use real code path via subprocess	2026-03-27 11:49:47 -07:00
ShawnPana	6bb73ab274	merge upstream/main, resolve install_lite.sh conflict (take upstream)	2026-03-26 11:40:33 -07:00
ShawnPana	464bd167c3	fix: daemon zombie on close, add idle timeout, clean up PID file before exit	2026-03-26 11:31:15 -07:00
shawn pana	eee98ff272	feat(cli): add install-lite.sh script for the CLI (#4534 ) <!-- This is an auto-generated description by cubic. --> ## Summary by cubic Adds a lightweight installer (`install_lite.sh`) that sets up the `browser-use` CLI with minimal deps and Chromium, avoiding the full library. Also extracts CLI deps into `requirements-cli.txt`, adds a CI test, and hardens the installer with a `curl` pre-check and non-fatal validate. - New Features - `install_lite.sh` installs Python 3.11+, `uv`, the CLI (`--no-deps`) with minimal deps from `requirements-cli.txt`, and Chromium via `uvx playwright` in `~/.browser-use-env`, and configures PATH (Linux, macOS, Windows). - Installs the `profile-use` helper into `~/.browser-use/bin` for profile management. - CI test (`tests/ci/test_cli_lite_deps.py`) boots the CLI in a clean venv with only minimal deps and checks key imports. - Minimal CLI deps pinned in `browser_use/skill_cli/requirements-cli.txt`. - Bug Fixes - Validation is non-fatal; next steps are shown even if checks warn. - Checks for `curl` before installing `uv` to prevent confusing failures. <sup>Written for commit `7e92031593`. Summary will update on new commits.</sup> <!-- End of auto-generated description by cubic. -->	2026-03-26 09:47:18 -07:00
shawn pana	7e92031593	Merge branch 'main' into install-lite	2026-03-26 09:43:49 -07:00
ShawnPana	e3a994a39d	fix: non-fatal validate + curl check before use	2026-03-26 09:40:35 -07:00
ShawnPana	2f41a42eab	extract CLI deps to requirements-cli.txt + add CI test	2026-03-26 09:27:18 -07:00
ShawnPana	822a4d948b	add aiohttp to lite install deps (needed by LocalBrowserWatchdog._wait_for_cdp_url)	2026-03-26 09:15:29 -07:00
ShawnPana	a2cbe678b7	add lightweight CLI install script	2026-03-26 09:15:29 -07:00
laithrw	76cc49089b	fix(mcp): remove oneOf from browser_click schema that breaks Claude API clients (#4212 ) ## Problem Closes #4211 `browser_click` in the MCP server uses `oneOf` at the top level of `inputSchema` to express "provide either `index` OR `coordinate_x`+`coordinate_y`": ```json { "type": "object", "properties": { ... }, "oneOf": [ {"required": ["index"]}, {"required": ["coordinate_x", "coordinate_y"]} ] } ``` Claude's API (and other strict MCP clients) rejects `oneOf`/`allOf`/`anyOf` at the top level of a tool input schema: ``` 400 tools.N.custom.input_schema: input_schema does not support oneOf, allOf, or anyOf at the top level ``` This cascades — it breaks all MCP tools in the session, not just `browser_click`. ## Fix Remove the `oneOf` block. The constraint is now expressed in the property descriptions instead. No behaviour change: the runtime guard in `_click()` already returns a clear error string when neither `index` nor coordinates are provided. ## Diff summary - `-4 lines`: remove the `oneOf: [...]` block entirely - `+3 lines`: update the three property descriptions to say "Provide this OR ..." The `_click()` handler is untouched. <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Remove the top-level oneOf from the `browser_click` tool input schema so Claude and other strict MCP clients accept it and the session doesn’t break. No behavior change; `_click()` still validates, and field descriptions now clarify using `index` or `coordinate_x`+`coordinate_y`. <sup>Written for commit `6094cddd90`. Summary will update on new commits.</sup> <!-- End of auto-generated description by cubic. -->	2026-03-26 11:08:20 -04:00
laithrw	6094cddd90	Merge branch 'main' into fix/mcp-browser-click-oneof-schema	2026-03-26 10:59:46 -04:00
ShawnPana	24a0df7e72	add aiohttp to lite install deps (needed by LocalBrowserWatchdog._wait_for_cdp_url)	2026-03-25 22:27:43 -07:00
ShawnPana	d8aeaa9ab4	add lightweight CLI install script	2026-03-25 22:15:45 -07:00
laithrw	8f201decd3	fix: stop cloud sessions when reconnecting via cdp_url (#4096 ) ## Summary - derive cloud browser session UUID from Browser Use CDP host (e.g. `<uuid>.cdp0.browser-use.com`) - use that UUID as fallback in `BrowserSession.on_BrowserStopEvent` when `_cloud_browser_client.current_session_id` is empty - ensure reconnected sessions created via `BrowserSession(cdp_url=...)` are properly stopped in cloud, not just reset locally ## Why Reconnected cloud sessions (common MFA resume pattern) can leak browser instances because stop logic only used `current_session_id`, which is populated when the same process created the cloud browser. In reconnect flows, that field is often unset. <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Ensure cloud browser sessions are stopped when reconnecting via cdp_url, preventing leaked instances during MFA/resume flows. We derive the session UUID from the CDP URL host and pass it to `stop_browser` with the explicit ID. - Bug Fixes - Added a helper to extract the session UUID from the CDP host (e.g., <uuid>.cdpN.browser-use.com). - Updated stop logic to use `current_session_id` or the derived UUID and call `stop_browser(id)`; logs now include the session ID. <sup>Written for commit `4c883feb7c`. Summary will update on new commits.</sup> <!-- End of auto-generated description by cubic. -->	2026-03-25 19:39:19 -04:00
ShawnPana	e09ba11ef1	feat: remove event bus dependency, add dialog handling - Remove event bus tab listeners from daemon — track tabs directly - Remove dead event bus fallback branches from commands/browser.py - Replace SwitchTabEvent/CloseTabEvent dispatches with direct CDP calls - Update python_session.py to use ActionHandler instead of event bus - Add JS dialog handler (alert/confirm/prompt) to CLIBrowserSession - Surface auto-dismissed popup messages in state output - Only dummy EventBus() for watchdog constructors remains (unavoidable)	2026-03-25 16:37:26 -07:00
laithrw	4c883feb7c	Merge branch 'main' into fix/cloud-session-stop-cdp-reconnect	2026-03-25 19:34:25 -04:00
laithrw	6ce58afbc3	fix tunnel process kill and daemon spawn on windows (#4530 ) Resolves #4352 <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Fixes Windows tunnel lifecycle bugs by spawning `cloudflared` as a detached daemon and reliably terminating it to prevent orphaned processes. Resolves #4352. - Bug Fixes - Windows spawn uses `CREATE_NEW_PROCESS_GROUP \| CREATE_NO_WINDOW`; Unix uses `start_new_session=True`. - Windows kill uses Win32 `OpenProcess/TerminateProcess` with brief polling; Unix uses `SIGTERM` with `SIGKILL` fallback. <sup>Written for commit `4950c3baa1`. Summary will update on new commits.</sup> <!-- End of auto-generated description by cubic. -->	2026-03-25 19:34:17 -04:00
laithrw	4950c3baa1	Merge branch 'main' into fix-4352	2026-03-25 19:32:05 -04:00
Laith Weinberger	9b401cf04c	fix tunnel process kill and daemon spawn on windows	2026-03-25 19:28:47 -04:00
laithrw	86d33635c5	fix(agent): prevent stale history and stuck step counter on timeout (#4481 ) ## Summary - Clear `last_model_output` and `last_result` at the start of `step()` to prevent stale data from previous steps being recorded in history on timeout - Increment `n_steps` in `_execute_step`'s timeout handler to prevent the main loop from retrying the same step number Fixes #4480 ## What changed `step()` — clear stale state at entry: ```diff self.step_start_time = time.time() + + # Clear previous step state to prevent stale data from being recorded + self.state.last_model_output = None + self.state.last_result = None + browser_state_summary = None ``` `_execute_step()` — ensure counter advances on timeout: ```diff self.state.last_result = [ActionResult(error=error_msg)] + # Ensure step counter advances on timeout + if self.state.n_steps == step + 1: + self.state.n_steps += 1 ``` The guard `if self.state.n_steps == step + 1` prevents double-increment when `_finalize()` has already incremented on the normal path. <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Fixes stale history entries and a stuck step counter when a step times out. Clears per-step state after `_prepare_context` and ensures `n_steps` advances on timeout. - Bug Fixes - Clear `last_model_output` and `last_result` right after `_prepare_context` (before LLM/action calls) so prompts keep previous output and timeouts don't record stale data. - In `_execute_step()`, increment `n_steps` after a timeout (with a guard) so the loop moves forward. <sup>Written for commit `c0a11dc61e`. Summary will update on new commits.</sup> <!-- End of auto-generated description by cubic. -->	2026-03-25 19:28:20 -04:00
laithrw	8fcbe9f2fa	Merge branch 'main' into fix/step-timeout-counter-and-stale-history	2026-03-25 19:22:19 -04:00
laithrw	34597151bc	fix anthropic action field double-serialization (#4529 ) Resolves #4510 <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Fixes double-serialized action fields in Anthropic tool responses to prevent schema validation errors and failed tool calls. Preserves original tracebacks in the fallback path. - Bug Fixes - Parse nested JSON strings in tool inputs; normalize newline/carriage return/tab escapes when needed. - Applied in `browser_use/llm/anthropic/chat.py` and `browser_use/llm/aws/chat_anthropic.py`. <sup>Written for commit `f74a4435b1`. Summary will update on new commits.</sup> <!-- End of auto-generated description by cubic. -->	2026-03-25 19:20:51 -04:00
laithrw	f74a4435b1	Merge branch 'main' into fix-4510	2026-03-25 19:18:32 -04:00

... 2 3 4 5 6 ...

9182 Commits