`asyncio.wait_for(stop_recording(), timeout=5.0)` could expire while the
ffmpeg encoder was still flushing, leading the daemon's subsequent
`os._exit(0)` to kill the executor thread mid-write and leave the exact
truncated MP4 this hook was meant to prevent. `stop_recording()` already
offloads the blocking close to an executor, so awaiting it directly is
safe — and if it genuinely hangs, a stuck daemon is a clearer failure
signal than silent video corruption.
Verified end-to-end: start recording → `open` → `close` (no explicit
`record stop`) now produces a decodable MP4 with the captured frames.
- `on_BrowserConnectedEvent` now catches `RuntimeError` from
`start_recording()` so sessions with `record_video_dir` configured but
missing `[video]` extras (or a viewport that can't be sized) keep
starting — prior graceful-degradation behavior is restored.
- Lazy `RecordingWatchdog` in the CLI handler now calls
`attach_to_session()`, so `AgentFocusChangedEvent` / `BrowserStopEvent`
handlers are wired correctly if the session dispatches them.
- Daemon shutdown finalizes any in-progress recording before tearing the
browser down, preventing truncated MP4s on `close`, idle timeout, or
signal-driven exit.
- Added regression test that monkeypatches `start_recording` to raise and
asserts `on_BrowserConnectedEvent` swallows it without breaking startup.
- Use os.open() with mode 0o600 instead of write-then-chmod to eliminate
the permission race window where the temp file is briefly world-readable.
- Raise instead of warn when token file write fails: a daemon that cannot
persist its auth token is permanently unauthorized for all clients, so
failing fast is correct (identified by cubic).
Generate a secrets.token_hex(32) on daemon startup, write it atomically
to ~/.browser-use/{session}.token (chmod 0o600), and validate it on every
incoming request via hmac.compare_digest. The client reads the token file
and includes it in each send_command() call.
This closes the arbitrary-code-execution vector where any local process
could connect to the deterministic Windows TCP port (or a world-readable
Unix socket) and dispatch the 'python' action to run eval()/exec() as the
daemon owner.
- Ruff format all skill_cli and test files
- Fix type: get_config_value returns str|int|None, callers cast properly
- Fix type: BrowserWrapper.actions is non-optional (always provided)
- Fix type: config comparison uses 'is' not '=='
- Rewrite test_setup_command for new setup.handle(yes=True) API
- Add None guard in test_cli_lifecycle for state file
Multi-agent isolation is now achieved through separate sessions
(--session NAME), each with its own browser. Removed:
- register command and agents.json
- --agent flag and agent_id plumbing
- TabOwnershipManager and all tab locking logic
- dispatch lock and focus swapping between agents
- tab_ownership.py (deleted)
- test_tab_ownership.py (deleted)
Simplified tab commands: no lock checks, no _tab_list injection,
no _resolved_target_id params. agent_focus_target_id stays for
single-agent tab tracking.
Tested: 3 concurrent subagents on separate cloud sessions,
3 concurrent subagents on separate headless Chromium sessions.
- browser-use connect: one-time command to discover and connect to local
Chrome (like cloud connect but for local)
- --agent INDEX: per-command flag for multi-agent tab isolation, works
with any browser mode (cloud, profile, cdp-url, headless)
- register is now per-session ({session}.agents.json)
- --connect deprecated with migration message
- SKILL.md updated for new connect/--agent workflow
- Tested: 3 concurrent agents on shared cloud browser session
cloud connect reads cloud_connect_proxy and cloud_connect_timeout from
~/.browser-use/config.json. Recording always enabled via enableRecording
default on CreateBrowserRequest. No CLI flags — edit config for custom
settings, use cloud v2 REST for full control.
cloud connect now works with no flags. On first use, creates a
"Browser Use CLI" profile via the Cloud API and saves the ID to
config.json. Subsequent connects reuse it (validates on each call,
recreates if deleted).
Removed --timeout, --proxy-country, --profile-id flags and their
plumbing through daemon/sessions. Power users who need custom browser
settings use cloud v2 POST /browsers directly.
These were duplicates of tab switch and tab close with separate
lock-checking and focus-resolution code paths. Having one way to
do each thing reduces maintenance surface and avoids isolation bugs.
Daemon now writes <session>.state.json at each lifecycle transition
(initializing → ready → starting → running → shutting_down → stopped/failed).
All shutdown triggers funneled through _request_shutdown() to prevent
double cleanup. Startup rollback on failure cleans up browser resources.
Also fixes cloud browser leak: CLIBrowserSession.stop() now explicitly
stops the remote browser via API instead of just disconnecting the
websocket.
Daemon ping response now includes PID for split-brain resolution.
When the daemon's socket is unreachable but the PID file references a
live process, close now sends SIGTERM directly instead of printing
"No active browser session" and leaving the daemon running forever.
Concurrent agents could interleave focus swaps on the shared
BrowserSession, corrupting each other's state. Wrap the entire
swap-execute-restore cycle in _dispatch_lock and separate the
tab-ownership path from single-agent mode.
- Remove event bus tab listeners from daemon — track tabs directly
- Remove dead event bus fallback branches from commands/browser.py
- Replace SwitchTabEvent/CloseTabEvent dispatches with direct CDP calls
- Update python_session.py to use ActionHandler instead of event bus
- Add JS dialog handler (alert/confirm/prompt) to CLIBrowserSession
- Surface auto-dismissed popup messages in state output
- Only dummy EventBus() for watchdog constructors remains (unavoidable)
Subclass BrowserSession as CLIBrowserSession that calls connect()
directly instead of start(). Skips all 13 watchdogs and event bus
handler registration. Actions execute via ActionHandler which calls
DefaultActionWatchdog methods directly and DomService for DOM snapshots.
- CLIBrowserSession.start() → connect() only (CDP + SessionManager)
- CLIBrowserSession.stop() → close websocket directly (no BrowserStopEvent)
- CLIBrowserSession.kill() → Browser.close + disconnect
- ActionHandler wraps DefaultActionWatchdog for click/type/scroll/keys/etc
- DomService called directly for state (no DOMWatchdog)
- Monkey-patches _enable_page_monitoring to no-op after initial connect
- Disables auto-reconnect (_intentional_stop = True)
- Falls back to event bus path if ActionHandler is not available
- `browser-use register` assigns numeric agent index for --connect mode
- `--connect <index>` requires explicit agent index (no more bare --connect)
- `tab list` shows all tabs with lock status per agent
- `tab new [url]` creates a new tab without visually switching
- `tab switch <index>` changes agent focus without activating Chrome tab
- `tab close <index> [index...]` closes multiple tabs in one command
- Agent registry in ~/.browser-use/agents.json with 5min expiry
- Improved error messages guide agents to register or use their own tab
- Session lock prevents double BrowserSession creation on simultaneous connect
- Updated SKILL.md with register workflow and tab commands
Multiple agents can share one browser via --connect without interfering
with each other. Each agent registers with `browser-use register` to get
a numeric index, then passes it with `--connect <index>` on every command.
- Tab locking: mutating commands (click, type, open) lock the tab to the
agent. Other agents get an error if they try to mutate the same tab.
Read-only commands (state, screenshot) work on any tab.
- Agent registry: agents.json tracks registered agents with timestamps.
Expired agents (5min inactive) get cleaned up automatically.
- Session lock: prevents double BrowserSession creation when two agents
connect simultaneously.
- Focus swap: daemon swaps agent_focus_target_id and cached_selector_map
per-agent before each command, so element indices are isolated.
- Narrow cloud --help intercept to only fire when --help is immediately
after 'cloud', so 'cloud v2 --help' still shows OpenAPI endpoints
- Guard signal handler against concurrent shutdown tasks on repeated signals
- Route error response bodies to stderr in cloud REST commands
- Replace stale port 49200 in README Windows troubleshooting
await the shutdown task in run() so browser cleanup (kill/stop) completes
before the event loop tears down. Previously, asyncio.run() could cancel
the shutdown mid-cleanup, leaving Chrome processes orphaned.
Adds `--connect` to auto-discover running Chrome instances via DevToolsActivePort
files and well-known port probing, eliminating manual CDP URL construction. Fixes
daemon process hanging on `close` when connected to external browsers (--connect,
--cdp-url, cloud) by calling stop() (disconnect) instead of kill() (terminate).
The `run` command pulled in heavy SDK dependencies (openai, anthropic,
google), had a bug (await on sync get_llm), and is superseded by
`browser-use cloud` for agent execution. CLI is now purely a browser
automation interface.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the multi-session server (server.py, SessionRegistry, portalocker locking,
PID files, orphan detection) with a minimal daemon (daemon.py) that holds one
BrowserSession in memory. Socket file existence = alive. Auto-exits when browser
dies via CDP watchdog.
-2277 lines, +142 lines across 20 files.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>