Commit Graph

25 Commits

Author SHA1 Message Date
ShawnPana
92181538d3 fix: type-narrow actions instead of assert (fixes pyright + test compat) 2026-04-01 21:43:56 -07:00
ShawnPana
080eeae62a fix: CI type errors and test compatibility
- type: ignore on each param line in sessions.py (pyright per-line)
- Remove ActionHandler assert in browser.py (breaks pre-existing tests)
- Ruff format
2026-04-01 21:39:51 -07:00
ShawnPana
094b699eb7 fix: add 'new' to tab command error message 2026-04-01 19:01:25 -07:00
ShawnPana
ca05f46352 refactor: remove --agent/register/tab-ownership, sessions-as-agents model
Multi-agent isolation is now achieved through separate sessions
(--session NAME), each with its own browser. Removed:
- register command and agents.json
- --agent flag and agent_id plumbing
- TabOwnershipManager and all tab locking logic
- dispatch lock and focus swapping between agents
- tab_ownership.py (deleted)
- test_tab_ownership.py (deleted)

Simplified tab commands: no lock checks, no _tab_list injection,
no _resolved_target_id params. agent_focus_target_id stays for
single-agent tab tracking.

Tested: 3 concurrent subagents on separate cloud sessions,
3 concurrent subagents on separate headless Chromium sessions.
2026-04-01 17:34:46 -07:00
ShawnPana
526bde4d0f refactor: remove standalone switch and close-tab command aliases
These were duplicates of tab switch and tab close with separate
lock-checking and focus-resolution code paths. Having one way to
do each thing reduces maintenance surface and avoids isolation bugs.
2026-03-30 20:09:50 -07:00
ShawnPana
a6744f4ade fix: tab close uses caller's logical focus, not Chrome's global focus
Default close-tab and tab-close paths used session_manager.get_focused_target()
which returns Chrome's globally focused tab. In multi-agent mode this lets
one agent close another's tab. Now uses agent_focus_target_id first, falling
back to global focus only in single-agent mode.

Also: fresh daemon spawn now uses phase-aware state file waiting (15s) instead
of fixed 5s socket polling, consistent with ensure_daemon's probe logic.
2026-03-30 20:04:01 -07:00
ShawnPana
e09ba11ef1 feat: remove event bus dependency, add dialog handling
- Remove event bus tab listeners from daemon — track tabs directly
- Remove dead event bus fallback branches from commands/browser.py
- Replace SwitchTabEvent/CloseTabEvent dispatches with direct CDP calls
- Update python_session.py to use ActionHandler instead of event bus
- Add JS dialog handler (alert/confirm/prompt) to CLIBrowserSession
- Surface auto-dismissed popup messages in state output
- Only dummy EventBus() for watchdog constructors remains (unavoidable)
2026-03-25 16:37:26 -07:00
ShawnPana
7bb2741292 feat: lightweight CLIBrowserSession — no watchdogs, no event bus
Subclass BrowserSession as CLIBrowserSession that calls connect()
directly instead of start(). Skips all 13 watchdogs and event bus
handler registration. Actions execute via ActionHandler which calls
DefaultActionWatchdog methods directly and DomService for DOM snapshots.

- CLIBrowserSession.start() → connect() only (CDP + SessionManager)
- CLIBrowserSession.stop() → close websocket directly (no BrowserStopEvent)
- CLIBrowserSession.kill() → Browser.close + disconnect
- ActionHandler wraps DefaultActionWatchdog for click/type/scroll/keys/etc
- DomService called directly for state (no DOMWatchdog)
- Monkey-patches _enable_page_monitoring to no-op after initial connect
- Disables auto-reconnect (_intentional_stop = True)
- Falls back to event bus path if ActionHandler is not available
2026-03-25 13:28:12 -07:00
ShawnPana
f819d3e3a6 feat: add tab command (list, new, switch, close) and agent registration
- `browser-use register` assigns numeric agent index for --connect mode
- `--connect <index>` requires explicit agent index (no more bare --connect)
- `tab list` shows all tabs with lock status per agent
- `tab new [url]` creates a new tab without visually switching
- `tab switch <index>` changes agent focus without activating Chrome tab
- `tab close <index> [index...]` closes multiple tabs in one command
- Agent registry in ~/.browser-use/agents.json with 5min expiry
- Improved error messages guide agents to register or use their own tab
- Session lock prevents double BrowserSession creation on simultaneous connect
- Updated SKILL.md with register workflow and tab commands
2026-03-24 17:27:38 -07:00
ShawnPana
31694df283 feat: multi-agent tab isolation for --connect mode
Multiple agents can share one browser via --connect without interfering
with each other. Each agent registers with `browser-use register` to get
a numeric index, then passes it with `--connect <index>` on every command.

- Tab locking: mutating commands (click, type, open) lock the tab to the
  agent. Other agents get an error if they try to mutate the same tab.
  Read-only commands (state, screenshot) work on any tab.
- Agent registry: agents.json tracks registered agents with timestamps.
  Expired agents (5min inactive) get cleaned up automatically.
- Session lock: prevents double BrowserSession creation when two agents
  connect simultaneously.
- Focus swap: daemon swaps agent_focus_target_id and cached_selector_map
  per-agent before each command, so element indices are isolated.
2026-03-24 16:38:53 -07:00
ShawnPana
694a111fad add upload command to CLI, extract find_file_input_near_element to BrowserSession
- Add `browser-use upload <index> <path>` command for uploading files to
  file input elements via the CLI
- Extract find_file_input_near_element from nested closures in tools/service.py
  to a reusable method on BrowserSession, deduplicating two copies
- Add BrowserWrapper.upload() for the Python REPL
- Resolve file paths to absolute on the client side before sending to daemon
- Update SKILL.md files and README with upload command docs
2026-03-19 17:02:34 -07:00
ShawnPana
0dc2fc5a6f remove run command and agent infrastructure from CLI
The `run` command pulled in heavy SDK dependencies (openai, anthropic,
google), had a bug (await on sync get_llm), and is superseded by
`browser-use cloud` for agent execution. CLI is now purely a browser
automation interface.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 11:44:13 -07:00
Laith Weinberger
8ebe4385ed fix: preserve non-ASCII chars in all remaining JSON file outputs 2026-03-01 14:19:38 -05:00
Saurav Panda
3f63677756 new direct cli mode 2026-02-19 10:18:12 -08:00
ShawnPana
92de59c6cc fix(cli): fix pyright type errors in CLI
- Remove invalid cdp_url parameter from create_browser_session calls
- Replace deleted GetCookiesEvent with direct _cdp_get_cookies() call
- Add type assertion for cookies_file after error check
- Use unique variable names for CDP results to avoid type conflicts
- Add type ignores for CDP typed return values used as dicts
2026-02-02 13:05:11 -08:00
ShawnPana
ad47a97518 style: apply ruff formatting 2026-02-02 10:51:08 -08:00
ShawnPana
a961021670 refactor(cli): move CDP logic into CLI, remove events and watchdogs
Move all CLI-specific CDP logic directly into browser.py command
handlers instead of using events and watchdogs. This keeps the CLI
self-contained and avoids adding CLI-specific code to the core library.

- Remove 14 CLI-specific events from events.py
- Remove 3 watchdog files (cookies, interactions, wait)
- Remove watchdog registrations from session.py
- Implement direct CDP calls in browser.py for cookies, wait, hover,
  dblclick, rightclick, and get commands
2026-02-02 10:44:30 -08:00
ShawnPana
2d992c3726 feat(cli): add cloud profile management and cookie sync
- Add cloud profile commands: list, create, get, update, delete
- Add profile sync command to sync local Chrome cookies to cloud
- Add profile cookies command to list cookies by domain
- Add cookies export/import commands for bulk transfer
- Optimize cookie import using Network.setCookies (bulk API)
- Real browser now runs headless by default (use --headed for visible)
- Support --domain flag for syncing specific domain cookies only
- Update SKILL.md with cloud profile workflow documentation
2026-01-31 20:02:50 -08:00
ShawnPana
9abfc327cc feat(cli): add get command for info retrieval (title, html, text, value, attributes, bbox)
Adds information retrieval commands to the browser-use CLI:
- get title: page title
- get html: full page or element HTML via selector
- get text: text content of element by index
- get value: input/textarea value by index
- get attributes: all attributes of element by index
- get bbox: bounding box (x, y, width, height) of element
2026-01-31 17:34:33 -08:00
ShawnPana
3ca625b75e feat(cli): add hover, dblclick, rightclick interaction commands
Adds mouse interaction commands to the browser-use CLI:
- hover: move mouse over element (triggers CSS :hover effects)
- dblclick: double-click element
- rightclick: right-click element (context menu)
2026-01-31 17:28:20 -08:00
ShawnPana
33e03b7b0f feat(cli): add wait command for selector and text conditions
Adds wait conditions to the browser-use CLI with two subcommands:
- wait selector: wait for CSS selector with state options (visible, hidden, attached, detached)
- wait text: wait for text to appear on the page
Both support custom timeout in milliseconds.
2026-01-31 17:17:35 -08:00
ShawnPana
45d244e01c feat(cli): add cookies command for get/set/clear operations
Adds cookie management to the browser-use CLI with three subcommands:
- cookies get: retrieve all cookies or filter by URL
- cookies set: create cookies with domain, path, secure, httpOnly options
- cookies clear: remove all cookies or filter by URL
2026-01-31 17:09:35 -08:00
Gregor Žunič
c7c5cae20e remote browser fixes + readme 2026-01-21 21:49:26 -08:00
Gregor Žunič
4716ae50ce fix: State command now outputs LLM representation, add CLI aliases
- State command outputs same format that browser-use agents see
- Added aliases: bu, browser (all work the same)
- Headed mode now correctly shows browser window
2026-01-19 09:36:20 -08:00
Gregor Žunič
c4ffaca6e7 feat: Add fast CLI for browser automation (bu command)
Implements a fast, persistent browser automation CLI per CLI_SPEC.md:

- Fast CLI layer using stdlib only (<50ms startup)
- Session server with Unix socket IPC (TCP on Windows)
- Browser modes: chromium, real, remote
- Commands: open, click, type, input, scroll, back, screenshot,
  state, switch, close-tab, keys, select, eval, extract
- Python execution with persistent namespace (Jupyter-like REPL)
- Agent task execution (requires API key)
- JSON output mode

The CLI maintains browser sessions across commands, enabling complex
multi-step workflows. Includes Claude skill description for AI-assisted
browser automation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 01:05:35 -08:00