Commit Graph

30 Commits

Author SHA1 Message Date
Saurav Panda
44f7ead5cd Drop timeout on recording finalize in daemon shutdown
`asyncio.wait_for(stop_recording(), timeout=5.0)` could expire while the
ffmpeg encoder was still flushing, leading the daemon's subsequent
`os._exit(0)` to kill the executor thread mid-write and leave the exact
truncated MP4 this hook was meant to prevent. `stop_recording()` already
offloads the blocking close to an executor, so awaiting it directly is
safe — and if it genuinely hangs, a stuck daemon is a clearer failure
signal than silent video corruption.

Verified end-to-end: start recording → `open` → `close` (no explicit
`record stop`) now produces a decodable MP4 with the captured frames.
2026-04-20 15:38:34 -07:00
Saurav Panda
132756dabb Address PR review feedback for record start/stop
- `on_BrowserConnectedEvent` now catches `RuntimeError` from
  `start_recording()` so sessions with `record_video_dir` configured but
  missing `[video]` extras (or a viewport that can't be sized) keep
  starting — prior graceful-degradation behavior is restored.
- Lazy `RecordingWatchdog` in the CLI handler now calls
  `attach_to_session()`, so `AgentFocusChangedEvent` / `BrowserStopEvent`
  handlers are wired correctly if the session dispatches them.
- Daemon shutdown finalizes any in-progress recording before tearing the
  browser down, preventing truncated MP4s on `close`, idle timeout, or
  signal-driven exit.
- Added regression test that monkeypatches `start_recording` to raise and
  asserts `on_BrowserConnectedEvent` swallows it without breaking startup.
2026-04-20 15:11:49 -07:00
sauravpanda
ca2185ba61 fix: create token temp file with 0o600 at open() time; raise on failure
- Use os.open() with mode 0o600 instead of write-then-chmod to eliminate
  the permission race window where the temp file is briefly world-readable.
- Raise instead of warn when token file write fails: a daemon that cannot
  persist its auth token is permanently unauthorized for all clients, so
  failing fast is correct (identified by cubic).
2026-04-02 17:58:12 -07:00
sauravpanda
a05a053da6 fix: add per-session auth token to daemon socket to prevent unauthorized code execution
Generate a secrets.token_hex(32) on daemon startup, write it atomically
to ~/.browser-use/{session}.token (chmod 0o600), and validate it on every
incoming request via hmac.compare_digest. The client reads the token file
and includes it in each send_command() call.

This closes the arbitrary-code-execution vector where any local process
could connect to the deterministic Windows TCP port (or a world-readable
Unix socket) and dispatch the 'python' action to run eval()/exec() as the
daemon owner.
2026-04-02 17:41:15 -07:00
ShawnPana
b1522b5e23 fix: sessions shows CDP URL for cloud sessions too
Ping response now returns live CDP URL from the browser session
(not just the constructor arg). Cloud sessions show their
provisioned CDP URL.
2026-04-01 22:37:23 -07:00
ShawnPana
0bf1f02d97 fix: CI failures — ruff formatting, type errors, test_setup_command
- Ruff format all skill_cli and test files
- Fix type: get_config_value returns str|int|None, callers cast properly
- Fix type: BrowserWrapper.actions is non-optional (always provided)
- Fix type: config comparison uses 'is' not '=='
- Rewrite test_setup_command for new setup.handle(yes=True) API
- Add None guard in test_cli_lifecycle for state file
2026-04-01 19:58:33 -07:00
ShawnPana
ca05f46352 refactor: remove --agent/register/tab-ownership, sessions-as-agents model
Multi-agent isolation is now achieved through separate sessions
(--session NAME), each with its own browser. Removed:
- register command and agents.json
- --agent flag and agent_id plumbing
- TabOwnershipManager and all tab locking logic
- dispatch lock and focus swapping between agents
- tab_ownership.py (deleted)
- test_tab_ownership.py (deleted)

Simplified tab commands: no lock checks, no _tab_list injection,
no _resolved_target_id params. agent_focus_target_id stays for
single-agent tab tracking.

Tested: 3 concurrent subagents on separate cloud sessions,
3 concurrent subagents on separate headless Chromium sessions.
2026-04-01 17:34:46 -07:00
ShawnPana
73a926caa6 feat: add connect command + --agent flag, decouple multi-agent from Chrome
- browser-use connect: one-time command to discover and connect to local
  Chrome (like cloud connect but for local)
- --agent INDEX: per-command flag for multi-agent tab isolation, works
  with any browser mode (cloud, profile, cdp-url, headless)
- register is now per-session ({session}.agents.json)
- --connect deprecated with migration message
- SKILL.md updated for new connect/--agent workflow
- Tested: 3 concurrent agents on shared cloud browser session
2026-04-01 16:52:56 -07:00
ShawnPana
032e73eec7 feat: config-driven proxy/timeout for cloud connect, enable recording
cloud connect reads cloud_connect_proxy and cloud_connect_timeout from
~/.browser-use/config.json. Recording always enabled via enableRecording
default on CreateBrowserRequest. No CLI flags — edit config for custom
settings, use cloud v2 REST for full control.
2026-03-31 13:44:27 -07:00
ShawnPana
b4b2a4c18d feat: zero-config cloud connect with auto-managed profile
cloud connect now works with no flags. On first use, creates a
"Browser Use CLI" profile via the Cloud API and saves the ID to
config.json. Subsequent connects reuse it (validates on each call,
recreates if deleted).

Removed --timeout, --proxy-country, --profile-id flags and their
plumbing through daemon/sessions. Power users who need custom browser
settings use cloud v2 POST /browsers directly.
2026-03-31 12:28:42 -07:00
ShawnPana
526bde4d0f refactor: remove standalone switch and close-tab command aliases
These were duplicates of tab switch and tab close with separate
lock-checking and focus-resolution code paths. Having one way to
do each thing reduces maintenance surface and avoids isolation bugs.
2026-03-30 20:09:50 -07:00
ShawnPana
8ade5748e2 feat: add daemon lifecycle state file and shutdown re-entrancy guard
Daemon now writes <session>.state.json at each lifecycle transition
(initializing → ready → starting → running → shutting_down → stopped/failed).
All shutdown triggers funneled through _request_shutdown() to prevent
double cleanup. Startup rollback on failure cleans up browser resources.

Also fixes cloud browser leak: CLIBrowserSession.stop() now explicitly
stops the remote browser via API instead of just disconnecting the
websocket.

Daemon ping response now includes PID for split-brain resolution.
2026-03-30 18:43:21 -07:00
ShawnPana
ac48ebecaf fix: SIGTERM fallback for orphaned daemons in close command
When the daemon's socket is unreachable but the PID file references a
live process, close now sends SIGTERM directly instead of printing
"No active browser session" and leaving the daemon running forever.
2026-03-30 18:39:06 -07:00
ShawnPana
18c2199e22 fix: serialize multi-agent focus swaps with dispatch lock
Concurrent agents could interleave focus swaps on the shared
BrowserSession, corrupting each other's state. Wrap the entire
swap-execute-restore cycle in _dispatch_lock and separate the
tab-ownership path from single-agent mode.
2026-03-28 15:58:29 -07:00
ShawnPana
6bb73ab274 merge upstream/main, resolve install_lite.sh conflict (take upstream) 2026-03-26 11:40:33 -07:00
ShawnPana
464bd167c3 fix: daemon zombie on close, add idle timeout, clean up PID file before exit 2026-03-26 11:31:15 -07:00
ShawnPana
e09ba11ef1 feat: remove event bus dependency, add dialog handling
- Remove event bus tab listeners from daemon — track tabs directly
- Remove dead event bus fallback branches from commands/browser.py
- Replace SwitchTabEvent/CloseTabEvent dispatches with direct CDP calls
- Update python_session.py to use ActionHandler instead of event bus
- Add JS dialog handler (alert/confirm/prompt) to CLIBrowserSession
- Surface auto-dismissed popup messages in state output
- Only dummy EventBus() for watchdog constructors remains (unavoidable)
2026-03-25 16:37:26 -07:00
Laith Weinberger
a31809b83d preserve error exit code 2026-03-25 17:15:34 -04:00
Laith Weinberger
c9ea2fb1cf fix daemon orphaning when external browser dies on --connect/--cdp-url 2026-03-25 17:07:03 -04:00
ShawnPana
7bb2741292 feat: lightweight CLIBrowserSession — no watchdogs, no event bus
Subclass BrowserSession as CLIBrowserSession that calls connect()
directly instead of start(). Skips all 13 watchdogs and event bus
handler registration. Actions execute via ActionHandler which calls
DefaultActionWatchdog methods directly and DomService for DOM snapshots.

- CLIBrowserSession.start() → connect() only (CDP + SessionManager)
- CLIBrowserSession.stop() → close websocket directly (no BrowserStopEvent)
- CLIBrowserSession.kill() → Browser.close + disconnect
- ActionHandler wraps DefaultActionWatchdog for click/type/scroll/keys/etc
- DomService called directly for state (no DOMWatchdog)
- Monkey-patches _enable_page_monitoring to no-op after initial connect
- Disables auto-reconnect (_intentional_stop = True)
- Falls back to event bus path if ActionHandler is not available
2026-03-25 13:28:12 -07:00
ShawnPana
f819d3e3a6 feat: add tab command (list, new, switch, close) and agent registration
- `browser-use register` assigns numeric agent index for --connect mode
- `--connect <index>` requires explicit agent index (no more bare --connect)
- `tab list` shows all tabs with lock status per agent
- `tab new [url]` creates a new tab without visually switching
- `tab switch <index>` changes agent focus without activating Chrome tab
- `tab close <index> [index...]` closes multiple tabs in one command
- Agent registry in ~/.browser-use/agents.json with 5min expiry
- Improved error messages guide agents to register or use their own tab
- Session lock prevents double BrowserSession creation on simultaneous connect
- Updated SKILL.md with register workflow and tab commands
2026-03-24 17:27:38 -07:00
ShawnPana
31694df283 feat: multi-agent tab isolation for --connect mode
Multiple agents can share one browser via --connect without interfering
with each other. Each agent registers with `browser-use register` to get
a numeric index, then passes it with `--connect <index>` on every command.

- Tab locking: mutating commands (click, type, open) lock the tab to the
  agent. Other agents get an error if they try to mutate the same tab.
  Read-only commands (state, screenshot) work on any tab.
- Agent registry: agents.json tracks registered agents with timestamps.
  Expired agents (5min inactive) get cleaned up automatically.
- Session lock: prevents double BrowserSession creation when two agents
  connect simultaneously.
- Focus swap: daemon swaps agent_focus_target_id and cached_selector_map
  per-agent before each command, so element indices are isolated.
2026-03-24 16:38:53 -07:00
ShawnPana
54f3febdfa address PR review comments: fix cloud v2 --help, guard double signal, error to stderr, stale port
- Narrow cloud --help intercept to only fire when --help is immediately
  after 'cloud', so 'cloud v2 --help' still shows OpenAPI endpoints
- Guard signal handler against concurrent shutdown tasks on repeated signals
- Route error response bodies to stderr in cloud REST commands
- Replace stale port 49200 in README Windows troubleshooting
2026-03-19 22:04:34 -07:00
ShawnPana
c5885e9e06 store shutdown task reference to prevent browser orphaning on signal exit
await the shutdown task in run() so browser cleanup (kill/stop) completes
before the event loop tears down. Previously, asyncio.run() could cancel
the shutdown mid-cleanup, leaving Chrome processes orphaned.
2026-03-19 21:55:14 -07:00
ShawnPana
9980a9ea4f remove redundant string quotes on type annotations in daemon.py 2026-03-19 21:21:18 -07:00
ShawnPana
bff2918558 add --connect flag for Chrome auto-discovery and fix daemon shutdown for external browsers
Adds `--connect` to auto-discover running Chrome instances via DevToolsActivePort
files and well-known port probing, eliminating manual CDP URL construction. Fixes
daemon process hanging on `close` when connected to external browsers (--connect,
--cdp-url, cloud) by calling stop() (disconnect) instead of kill() (terminate).
2026-03-18 09:41:58 -07:00
ShawnPana
3318f56318 simplify CLI infrastructure: single-session daemon, remove install modes, streamline setup
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 10:12:41 -07:00
ShawnPana
0dc2fc5a6f remove run command and agent infrastructure from CLI
The `run` command pulled in heavy SDK dependencies (openai, anthropic,
google), had a bug (await on sync get_llm), and is superseded by
`browser-use cloud` for agent execution. CLI is now purely a browser
automation interface.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 11:44:13 -07:00
ShawnPana
4be221386b changes 2026-03-11 12:57:50 -07:00
ShawnPana
859cb97063 simplify daemon architecture: single session, socket-as-liveness, no PID/lock files
Replace the multi-session server (server.py, SessionRegistry, portalocker locking,
PID files, orphan detection) with a minimal daemon (daemon.py) that holds one
BrowserSession in memory. Socket file existence = alive. Auto-exits when browser
dies via CDP watchdog.

-2277 lines, +142 lines across 20 files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 19:05:44 -08:00