9182 Commits

Author SHA1 Message Date
MagMueller
30419e82dc bench: add threshold curve + JS boundary tests for heavy page strategy 2026-04-01 14:47:39 -07:00
MagMueller
c1bfe4f328 bench: add per-method timing benchmark across scales (10k→1M) 2026-04-01 13:19:46 -07:00
MagMueller
715e0bbf02 chore: gitignore generated test HTML files 2026-04-01 12:25:18 -07:00
MagMueller
5ec5c8d43a perf: hoist CDP session lookup + cap paint order rect explosion
Two additional performance fixes for heavy pages:

1. Hoist get_or_create_cdp_session() outside _construct_enhanced_node
   Previously called once PER DOM NODE inside the recursive tree
   construction. On a 100k-element page, this was 100k+ async
   operations. Now resolved once before recursion starts.

2. Add _MAX_RECTS=5000 safety cap to RectUnionPure
   The paint order rect union can fragment exponentially with many
   overlapping translucent layers (each add() splits up to 4 rects).
   Cap prevents memory/CPU explosion on complex pages.

Also: expanded stress test suite to 15 pages (up to 132k elements)
including shadow DOM + iframe combos, overlapping layers, cross-origin
iframes, and a 100k flat element test. All 15 pass.
2026-04-01 12:25:10 -07:00
MagMueller
dee1af29cc test: add heavy page DOM capture stress tests
10 progressively heavier test pages (1k to 50k+ elements):
- Flat divs, nested tables, shadow DOM, iframes, deep nesting
- Mega forms, SVG heavy, event listeners, cross-origin iframes
- Ultimate stress test combining everything

Two test modes:
- --dom-only: tests DOM capture without LLM (fast, no API key needed)
- --agent: tests full agent loop with real LLM on subset of pages
2026-04-01 11:44:31 -07:00
MagMueller
683994da8e fix: prevent DOM capture timeout on heavy pages (20k+ elements)
Pages with very large DOMs (e.g. Stimulsoft designer with 20,000+
elements) cause the browser state capture to time out, making the
agent unable to interact with the browser.

Three targeted fixes:

1. Skip JS listener detection on heavy pages (>10k elements)
   The querySelectorAll('*') + getEventListeners() loop followed by
   individual DOM.describeNode CDP calls for each listener element is
   O(n) and can take 10s+ alone on heavy pages.

2. Batch DOM.describeNode calls (chunks of 50)
   Previously all calls fired at once via asyncio.gather, flooding the
   CDP WebSocket and causing timeouts on concurrent operations.

3. Adaptive CDP timeouts based on page complexity
   - >15k elements: 25s initial / 10s retry (was 10s/2s)
   - >5k elements: 15s initial / 5s retry
   - Normal pages: unchanged 10s/2s
2026-04-01 11:25:03 -07:00
ShawnPana
c2c9aa8556 feat: cloud signup command, unified base URL, auth isolation
- browser-use cloud signup: challenge-response agent self-registration
- browser-use cloud signup --verify: verify and save API key
- browser-use cloud signup --claim: generate account claim URL
- Base URL convention unified: BROWSER_USE_CLOUD_BASE_URL is host-only
  (e.g. https://api.browser-use.com), CLI appends /api/{version}
- CLI daemon blocked from falling back to library's cloud_auth.json
- cloud connect validates API key before spawning daemon
- No-key error message mentions cloud signup
2026-03-31 21:14:20 -07:00
ShawnPana
9fd3d81f8b fix: default cloud connect proxy to US, validate country codes
_get_cloud_connect_proxy defaults to 'us' when not in config.
Invalid codes (including null, false, 'none') return None (no proxy).
2026-03-31 14:17:42 -07:00
ShawnPana
032e73eec7 feat: config-driven proxy/timeout for cloud connect, enable recording
cloud connect reads cloud_connect_proxy and cloud_connect_timeout from
~/.browser-use/config.json. Recording always enabled via enableRecording
default on CreateBrowserRequest. No CLI flags — edit config for custom
settings, use cloud v2 REST for full control.
2026-03-31 13:44:27 -07:00
ShawnPana
2b9ee57975 fix: CloudBrowserClient respects BROWSER_USE_CLOUD_BASE_URL in CLI
CLIBrowserSession._provision_cloud_browser now reads the env var and
overrides CloudBrowserClient.api_base_url before provisioning. This
keeps the change CLI-only without modifying library code.
2026-03-31 12:51:44 -07:00
ShawnPana
584b1a0854 feat: add BROWSER_USE_CLOUD_BASE_URL env var override
Single env var to override the cloud API base URL for all versions.
Per-version overrides (BROWSER_USE_CLOUD_BASE_URL_V2/V3) still take
precedence for backward compatibility.
2026-03-31 12:38:01 -07:00
ShawnPana
b4b2a4c18d feat: zero-config cloud connect with auto-managed profile
cloud connect now works with no flags. On first use, creates a
"Browser Use CLI" profile via the Cloud API and saves the ID to
config.json. Subsequent connects reuse it (validates on each call,
recreates if deleted).

Removed --timeout, --proxy-country, --profile-id flags and their
plumbing through daemon/sessions. Power users who need custom browser
settings use cloud v2 POST /browsers directly.
2026-03-31 12:28:42 -07:00
Saurav Panda
e026a51fd3 fix: close httpx client pool on session stop to prevent OOM on Lambda (#4577)
## Summary
- Close `CloudBrowserClient`'s httpx connection pool in
`on_BrowserStopEvent` after `stop_browser()` completes
- Make `CloudBrowserClient.close()` idempotent (safe to call multiple
times)

## Problem
`BrowserSession.on_BrowserStopEvent` calls `stop_browser()` to end the
cloud session but never calls `_cloud_browser_client.close()`. The
underlying `httpx.AsyncClient` connection pool stays alive in memory.

On AWS Lambda with provisioned concurrency, the same container handles
many invocations sequentially. Each invocation creates a new
`BrowserSession` → new `CloudBrowserClient` → new `httpx.AsyncClient`,
but the old ones are never cleaned up. Memory climbs from ~1.3GB to the
3GB ceiling over hours, triggering OOM kills.

**Observed in production**: 21 `Runtime.ExitError` crashes in 6 hours,
`aws.lambda.enhanced.max_memory_used` saturated at 3,009 MB (the Lambda
limit), 3 confirmed `out_of_memory` events.

## Test plan
- [ ] Run `tests/ci` suite to verify no regressions
- [ ] Verify cloud browser sessions still clean up properly
(`stop_browser` + `close`)
- [ ] Verify calling `close()` twice doesn't raise


🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Close the `httpx` client pool when a cloud browser session stops to
prevent memory leaks and OOM on AWS Lambda. Also makes
`CloudBrowserClient.close()` safe to call multiple times.

- **Bug Fixes**
- Always call `_cloud_browser_client.close()` in
`BrowserSession.on_BrowserStopEvent` after `stop_browser()` (in finally)
to free the `httpx.AsyncClient` pool.
- Make `CloudBrowserClient.close()` idempotent by checking
`client.is_closed` before `aclose()`.

<sup>Written for commit 5e644981e8.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-03-31 11:39:04 -07:00
LarsenCundric
5e644981e8 fix: close CloudBrowserClient httpx pool on session stop to prevent memory leak
BrowserSession.on_BrowserStopEvent calls stop_browser() but never calls
_cloud_browser_client.close(), leaving the httpx connection pool alive.
On Lambda provisioned concurrency, these pools accumulate across
invocations — memory climbs from ~1.3GB to the 3GB ceiling over hours,
triggering OOM kills (21 Runtime.ExitError crashes in 6 hours observed
in production).

Changes:
- Call _cloud_browser_client.close() in on_BrowserStopEvent after
  stop_browser completes (in a finally block so it runs even if
  stop_browser fails)
- Make CloudBrowserClient.close() idempotent (check is_closed before
  calling aclose) so it's safe to call multiple times

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 11:19:43 -07:00
ShawnPana
04ca40c3af docs: add CDP/Python reference for browser-use skill
New references/cdp-python.md with tested recipes for raw CDP access
via the Python session: activating tabs, listing targets, running JS,
device emulation, cookies. SKILL.md updated with pointer to the
reference file explaining when to reach for CDP vs CLI commands.
2026-03-31 10:35:03 -07:00
ShawnPana
8d557b3de7 docs: update README tabs section and file layout for recent changes
Replace standalone switch/close-tab with tab subcommands.
Add state.json to file layout.
2026-03-30 20:11:39 -07:00
ShawnPana
526bde4d0f refactor: remove standalone switch and close-tab command aliases
These were duplicates of tab switch and tab close with separate
lock-checking and focus-resolution code paths. Having one way to
do each thing reduces maintenance surface and avoids isolation bugs.
2026-03-30 20:09:50 -07:00
ShawnPana
a6744f4ade fix: tab close uses caller's logical focus, not Chrome's global focus
Default close-tab and tab-close paths used session_manager.get_focused_target()
which returns Chrome's globally focused tab. In multi-agent mode this lets
one agent close another's tab. Now uses agent_focus_target_id first, falling
back to global focus only in single-agent mode.

Also: fresh daemon spawn now uses phase-aware state file waiting (15s) instead
of fixed 5s socket polling, consistent with ensure_daemon's probe logic.
2026-03-30 20:04:01 -07:00
ShawnPana
d59c0e6984 fix: CLI reconciliation gaps found by codex review
- ensure_daemon now phase-aware: waits for initializing/starting/
  shutting_down with staleness timeouts, errors on unhealthy sessions
- _close_session only cleans files after confirmed PID death
- _handle_sessions won't delete live daemon's files on stale terminal state
- _probe_session uses socket_pid for split-brain PID resolution
- _is_daemon_process works on Windows (wmic)
- Updated + added robustness tests (codex-authored)
2026-03-30 19:49:57 -07:00
ShawnPana
7c7fd37220 test: add daemon lifecycle tests with real subprocess daemons
15 tests covering state file transitions, _probe_session branches (live
daemon, dead PID, no files, corrupt state), close via socket, close
orphaned daemon, close --all with mixed sessions, sessions listing with
phase column, and stale/terminal state cleanup.

Uses real daemon subprocesses with BROWSER_USE_HOME overridden to temp
dirs. No mocking.
2026-03-30 19:09:00 -07:00
ShawnPana
f222c98bbf feat: unified session probe and cross-platform lifecycle helpers
Replace scattered PID/socket/process checks with _probe_session() that
reads the state file, reconciles PIDs, checks liveness, and probes the
socket without deleting anything. Callers decide cleanup policy.

Adds _is_pid_alive, _is_daemon_process, _terminate_pid (cross-platform
with SIGKILL escalation), _close_session (shared by close and close-all).

sessions command now shows phase column. close and close-all both handle
orphaned daemons via SIGTERM fallback. close polls for PID disappearance
up to 15s before giving up.
2026-03-30 18:47:34 -07:00
ShawnPana
8ade5748e2 feat: add daemon lifecycle state file and shutdown re-entrancy guard
Daemon now writes <session>.state.json at each lifecycle transition
(initializing → ready → starting → running → shutting_down → stopped/failed).
All shutdown triggers funneled through _request_shutdown() to prevent
double cleanup. Startup rollback on failure cleans up browser resources.

Also fixes cloud browser leak: CLIBrowserSession.stop() now explicitly
stops the remote browser via API instead of just disconnecting the
websocket.

Daemon ping response now includes PID for split-brain resolution.
2026-03-30 18:43:21 -07:00
ShawnPana
ac48ebecaf fix: SIGTERM fallback for orphaned daemons in close command
When the daemon's socket is unreachable but the PID file references a
live process, close now sends SIGTERM directly instead of printing
"No active browser session" and leaving the daemon running forever.
2026-03-30 18:39:06 -07:00
ShawnPana
e989797541 fix: sweep orphaned sockets in sessions command
Sockets without a corresponding live PID file were never cleaned up
because cleanup was only triggered per-session on next use. Add a
glob pass in _handle_sessions to remove any .sock file that doesn't
match a live session.
2026-03-29 18:05:56 -07:00
ShawnPana
18c2199e22 fix: serialize multi-agent focus swaps with dispatch lock
Concurrent agents could interleave focus swaps on the shared
BrowserSession, corrupting each other's state. Wrap the entire
swap-execute-restore cycle in _dispatch_lock and separate the
tab-ownership path from single-agent mode.
2026-03-28 15:58:29 -07:00
ShawnPana
4be3525d81 fix: use JS scrollBy instead of CDP gesture for CLI scroll command
CDP input gesture simulation doesn't work in --connect mode (external
Chrome). Switch to window.scrollBy via Runtime.evaluate which works
regardless of connection mode.
2026-03-28 15:58:24 -07:00
ShawnPana
ae93cee5c2 fix: unify CLI cloud auth so cloud connect uses ~/.browser-use/config.json
cloud connect was silently reading credentials from the library's
~/.config/browseruse/cloud_auth.json, bypassing cloud login/logout
entirely. Now ensure_daemon injects the CLI config's API key into the
daemon subprocess env, so all cloud commands share a single auth source.
2026-03-28 15:56:56 -07:00
ShawnPana
1f8b4a5346 fix: rewrite register tests to use real code path via subprocess 2026-03-27 11:49:47 -07:00
ShawnPana
6bb73ab274 merge upstream/main, resolve install_lite.sh conflict (take upstream) 2026-03-26 11:40:33 -07:00
ShawnPana
464bd167c3 fix: daemon zombie on close, add idle timeout, clean up PID file before exit 2026-03-26 11:31:15 -07:00
shawn pana
eee98ff272 feat(cli): add install-lite.sh script for the CLI (#4534)
<!-- This is an auto-generated description by cubic. -->
## Summary by cubic
Adds a lightweight installer (`install_lite.sh`) that sets up the
`browser-use` CLI with minimal deps and Chromium, avoiding the full
library. Also extracts CLI deps into `requirements-cli.txt`, adds a CI
test, and hardens the installer with a `curl` pre-check and non-fatal
validate.

- New Features
- `install_lite.sh` installs Python 3.11+, `uv`, the CLI (`--no-deps`)
with minimal deps from `requirements-cli.txt`, and Chromium via `uvx
playwright` in `~/.browser-use-env`, and configures PATH (Linux, macOS,
Windows).
- Installs the `profile-use` helper into `~/.browser-use/bin` for
profile management.
- CI test (`tests/ci/test_cli_lite_deps.py`) boots the CLI in a clean
venv with only minimal deps and checks key imports.
- Minimal CLI deps pinned in
`browser_use/skill_cli/requirements-cli.txt`.

- Bug Fixes
  - Validation is non-fatal; next steps are shown even if checks warn.
- Checks for `curl` before installing `uv` to prevent confusing
failures.

<sup>Written for commit 7e92031593.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-03-26 09:47:18 -07:00
shawn pana
7e92031593 Merge branch 'main' into install-lite 2026-03-26 09:43:49 -07:00
ShawnPana
e3a994a39d fix: non-fatal validate + curl check before use 2026-03-26 09:40:35 -07:00
ShawnPana
2f41a42eab extract CLI deps to requirements-cli.txt + add CI test 2026-03-26 09:27:18 -07:00
ShawnPana
822a4d948b add aiohttp to lite install deps (needed by LocalBrowserWatchdog._wait_for_cdp_url) 2026-03-26 09:15:29 -07:00
ShawnPana
a2cbe678b7 add lightweight CLI install script 2026-03-26 09:15:29 -07:00
laithrw
76cc49089b fix(mcp): remove oneOf from browser_click schema that breaks Claude API clients (#4212)
## Problem

Closes #4211

`browser_click` in the MCP server uses `oneOf` at the top level of
`inputSchema` to express "provide either `index` OR
`coordinate_x`+`coordinate_y`":

```json
{
  "type": "object",
  "properties": { ... },
  "oneOf": [
    {"required": ["index"]},
    {"required": ["coordinate_x", "coordinate_y"]}
  ]
}
```

Claude's API (and other strict MCP clients) rejects
`oneOf`/`allOf`/`anyOf` at the top level of a tool input schema:

```
400 tools.N.custom.input_schema: input_schema does not support oneOf, allOf, or anyOf at the top level
```

This cascades — it breaks **all** MCP tools in the session, not just
`browser_click`.

## Fix

Remove the `oneOf` block. The constraint is now expressed in the
property descriptions instead. No behaviour change: the runtime guard in
`_click()` already returns a clear error string when neither `index` nor
coordinates are provided.

## Diff summary

- `-4 lines`: remove the `oneOf: [...]` block entirely
- `+3 lines`: update the three property descriptions to say "Provide
this OR ..."

The `_click()` handler is untouched.

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Remove the top-level oneOf from the `browser_click` tool input schema so
Claude and other strict MCP clients accept it and the session doesn’t
break. No behavior change; `_click()` still validates, and field
descriptions now clarify using `index` or `coordinate_x`+`coordinate_y`.

<sup>Written for commit 6094cddd90.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-03-26 11:08:20 -04:00
laithrw
6094cddd90 Merge branch 'main' into fix/mcp-browser-click-oneof-schema 2026-03-26 10:59:46 -04:00
ShawnPana
24a0df7e72 add aiohttp to lite install deps (needed by LocalBrowserWatchdog._wait_for_cdp_url) 2026-03-25 22:27:43 -07:00
ShawnPana
d8aeaa9ab4 add lightweight CLI install script 2026-03-25 22:15:45 -07:00
laithrw
8f201decd3 fix: stop cloud sessions when reconnecting via cdp_url (#4096)
## Summary
- derive cloud browser session UUID from Browser Use CDP host (e.g.
`<uuid>.cdp0.browser-use.com`)
- use that UUID as fallback in `BrowserSession.on_BrowserStopEvent` when
`_cloud_browser_client.current_session_id` is empty
- ensure reconnected sessions created via `BrowserSession(cdp_url=...)`
are properly stopped in cloud, not just reset locally

## Why
Reconnected cloud sessions (common MFA resume pattern) can leak browser
instances because stop logic only used `current_session_id`, which is
populated when the same process created the cloud browser. In reconnect
flows, that field is often unset.



<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Ensure cloud browser sessions are stopped when reconnecting via cdp_url,
preventing leaked instances during MFA/resume flows. We derive the
session UUID from the CDP URL host and pass it to `stop_browser` with
the explicit ID.

- **Bug Fixes**
- Added a helper to extract the session UUID from the CDP host (e.g.,
<uuid>.cdpN.browser-use.com).
- Updated stop logic to use `current_session_id` or the derived UUID and
call `stop_browser(id)`; logs now include the session ID.

<sup>Written for commit 4c883feb7c.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-03-25 19:39:19 -04:00
ShawnPana
e09ba11ef1 feat: remove event bus dependency, add dialog handling
- Remove event bus tab listeners from daemon — track tabs directly
- Remove dead event bus fallback branches from commands/browser.py
- Replace SwitchTabEvent/CloseTabEvent dispatches with direct CDP calls
- Update python_session.py to use ActionHandler instead of event bus
- Add JS dialog handler (alert/confirm/prompt) to CLIBrowserSession
- Surface auto-dismissed popup messages in state output
- Only dummy EventBus() for watchdog constructors remains (unavoidable)
2026-03-25 16:37:26 -07:00
laithrw
4c883feb7c Merge branch 'main' into fix/cloud-session-stop-cdp-reconnect 2026-03-25 19:34:25 -04:00
laithrw
6ce58afbc3 fix tunnel process kill and daemon spawn on windows (#4530)
Resolves #4352

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes Windows tunnel lifecycle bugs by spawning `cloudflared` as a
detached daemon and reliably terminating it to prevent orphaned
processes. Resolves #4352.

- **Bug Fixes**
- Windows spawn uses `CREATE_NEW_PROCESS_GROUP | CREATE_NO_WINDOW`; Unix
uses `start_new_session=True`.
- Windows kill uses Win32 `OpenProcess/TerminateProcess` with brief
polling; Unix uses `SIGTERM` with `SIGKILL` fallback.

<sup>Written for commit 4950c3baa1.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-03-25 19:34:17 -04:00
laithrw
4950c3baa1 Merge branch 'main' into fix-4352 2026-03-25 19:32:05 -04:00
Laith Weinberger
9b401cf04c fix tunnel process kill and daemon spawn on windows 2026-03-25 19:28:47 -04:00
laithrw
86d33635c5 fix(agent): prevent stale history and stuck step counter on timeout (#4481)
## Summary

- Clear `last_model_output` and `last_result` at the start of `step()`
to prevent stale data from previous steps being recorded in history on
timeout
- Increment `n_steps` in `_execute_step`'s timeout handler to prevent
the main loop from retrying the same step number

Fixes #4480

## What changed

**`step()` — clear stale state at entry:**
```diff
  self.step_start_time = time.time()
+
+ # Clear previous step state to prevent stale data from being recorded
+ self.state.last_model_output = None
+ self.state.last_result = None
+
  browser_state_summary = None
```

**`_execute_step()` — ensure counter advances on timeout:**
```diff
  self.state.last_result = [ActionResult(error=error_msg)]
+ # Ensure step counter advances on timeout
+ if self.state.n_steps == step + 1:
+     self.state.n_steps += 1
```

The guard `if self.state.n_steps == step + 1` prevents double-increment
when `_finalize()` has already incremented on the normal path.

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes stale history entries and a stuck step counter when a step times
out. Clears per-step state after `_prepare_context` and ensures
`n_steps` advances on timeout.

- **Bug Fixes**
- Clear `last_model_output` and `last_result` right after
`_prepare_context` (before LLM/action calls) so prompts keep previous
output and timeouts don't record stale data.
- In `_execute_step()`, increment `n_steps` after a timeout (with a
guard) so the loop moves forward.

<sup>Written for commit c0a11dc61e.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-03-25 19:28:20 -04:00
laithrw
8fcbe9f2fa Merge branch 'main' into fix/step-timeout-counter-and-stale-history 2026-03-25 19:22:19 -04:00
laithrw
34597151bc fix anthropic action field double-serialization (#4529)
Resolves #4510

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Fixes double-serialized action fields in Anthropic tool responses to
prevent schema validation errors and failed tool calls. Preserves
original tracebacks in the fallback path.

- **Bug Fixes**
- Parse nested JSON strings in tool inputs; normalize newline/carriage
return/tab escapes when needed.
- Applied in `browser_use/llm/anthropic/chat.py` and
`browser_use/llm/aws/chat_anthropic.py`.

<sup>Written for commit f74a4435b1.
Summary will update on new commits.</sup>

<!-- End of auto-generated description by cubic. -->
2026-03-25 19:20:51 -04:00
laithrw
f74a4435b1 Merge branch 'main' into fix-4510 2026-03-25 19:18:32 -04:00