diff --git a/CLAUDE.md b/CLAUDE.md
index 564bc9c1c..74263642e 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,27 +1,32 @@
-Use tabs for indentation in all python code. Use async python and the modern python >3.12 typing style, e.g. use `str | None` instead
-of `Optional[str]`, and `list[str]` instead of `List[str]`. Use pydantic v2 models to represent internal data, and any user-facing
-API parameter that might otherwise be a dict. Use model_config = ConfigDict(extra='forbid', validate_by_name=True,
-validate_by_alias=True) etc. settings to tune the pydantic model behavior depending on the use-case. Store most pydantic models in
-views.py files.
+Browser-Use is an async python >= 3.11 library that implements AI browser driver abilities using LLMs + playwright.
+We want our library APIs to be ergonomic, intuitive, and hard to get wrong.
-Try to keep all console logging logic in separate methods all prefixed with `_log_...`, e.g. `def _log_pretty_path(path: Path) -> str` so as not to clutter up the main logic.
+## Code Style
-Make sure to read relevant examples in the examples/ directory and keep them up-to-date when making changes. Also make sure to read
-the relevant tests in the tests/ directory and keep them up-to-date as well. Once tests pass they should be moved into the tests/ci/
-subdirectory so that CI will automatically continue to run them on every commit.
+- Use async python
+- Use tabs for indentation in all python code, not spaces
+- Use the modern python >3.12 typing style, e.g. use `str | None` instead of `Optional[str]`, and `list[str]` instead of `List[str]`
+- Try to keep all console logging logic in separate methods all prefixed with `_log_...`, e.g. `def _log_pretty_path(path: Path) -> str` so as not to clutter up the main logic.
+- Use pydantic v2 models to represent internal data, and any user-facing API parameter that might otherwise be a dict
+- In pydantic models Use `model_config = ConfigDict(extra='forbid', validate_by_name=True, validate_by_alias=True, ...)` etc. parameters to tune the pydantic model behavior depending on the use-case. Use `Annotated[..., AfterValidator(...)]` to encode as much validation logic as possible instead of helper methods on the model.
+- We keep the main code for each sub-component in a `service.py` file usually, and we keep most pydantic models in `views.py` files unless they are long enough deserve their own file
+- Use runtime assertions at the start and end of functions to enforce constraints and assumptions
+- Prefer `from uuid_extensions import uuid7str` + `id: str = Field(default_factory=uuid7str)` for all new id fields
-When doing any refactor, first write failing tests for the new design, then write tests that verify the old design works in order to maintain backwards-compatibility during the refactor. Then implement the changes, then finally update the docs and examples and simplify/condense the test logic to reduce any duplication that got introduced during editing.
+## Keep Examples & Tests Up-To-Date
-Prefer uuid7str() (from uuid_extensions) for all new id fields.
+- Make sure to read relevant examples in the `examples/` directory for context and keep them up-to-date when making changes.
+- Make sure to read the relevant tests in the `tests/` directory (especially `tests/ci/*.py`) and keep them up-to-date as well.
+- Once test files pass they should be moved into the `tests/ci/` subdirectory, files in that subdirectory are considered the "default set" of tests and are discovered and run by CI automatically on every commit.
+- Try to almost never use mocks in tests, instead use pytest fixtures to set up real objects
+- Never use real remote URLs in tests (e.g. `https://google.com` or `https://example.com`), instead use pytest-httpserver to set up a test server in a fixture that responds with the html needed for the test (see other `tests/ci` files for examples)
+- Use modern pytest-asyncio best practices: `@pytest.mark.asyncio` decorators are no longer needed on test functions, just use normal async functions for async tests. Use `loop = asyncio.get_event_loop()` inside tests that need it instead of passing `event_loop` as a function argument. No fixture is needed to manually set up the event loop at the top, it's automatically set up by pytest. Fixture functions (even async ones) only need a simple `@pytest.fixture` decorator with no arguments.
-When doing any truly massive refactors, trend towards using simple event buses and job queues to break down systems into isolated
-subcomponents that each manage some well-defined internal state machines.
+## Personality
Don't worry about formalities.
-Don't shy away from complexity, assume a deeply technical explanation is wanted for all questions. Call out the proper terminology,
-models, units, etc. used by fields of study relevant to the question. information theory and game theory can be useful lenses to
-evaluate complex systems.
+Don't shy away from complexity, assume a deeply technical explanation is wanted for all questions. Call out the proper terminology, models, units, etc. used by fields of study relevant to the question. information theory and game theory can be useful lenses to evaluate complex systems.
Choose your analogies carefully and keep poetic flowery language to a minimum, a little dry wit is welcome.
@@ -35,5 +40,18 @@ if you find any request irritating respond dismissively like "be real" or "that'
take however smart you're acting right now and write in the same style but as if you were +2sd smarter
+## Strategy For Making Changes
+
+When making any significant changes:
+
+1. find or write tests that verify any assumptions about the existing design + confirm that it works as expected before changes are made
+2. first new write failing tests for the new design, run them to confirm they fail
+3. Then implement the changes for the new design. Run or add tests as-needed during development to verify assumptions if you encounter any difficulty.
+4. Run the full `tests/ci` suite once the changes are done. Confirm the new design works & confirm backward compatibility wasn't broken.
+5. Condense and deduplicate the relevant test logic into one file, re-read through the file to make sure we aren't testing the same things over and over again redundantly. Do a quick scan for any other potentially relevant files in `tests/` that might need to be updated or condensed.
+6. Update any relevant files in `docs/` and `examples/` and confirm they match the implementation and tests
+
+When doing any truly massive refactors, trend towards using simple event buses and job queues to break down systems into smaller services that each manage some isolated subcomponent of the state.
+
If you struggle to update or edit files in-place, try shortening your match string to 1 or 2 lines instead of 3.
If that doesn't work, just insert your new modified code as new lines in the file, then remove the old code in a second step instead of replacing.
diff --git a/browser_use/agent/message_manager/service.py b/browser_use/agent/message_manager/service.py
index 531583d98..468a7d87b 100644
--- a/browser_use/agent/message_manager/service.py
+++ b/browser_use/agent/message_manager/service.py
@@ -88,7 +88,7 @@ def _log_format_agent_output_content(tool_call: dict) -> str:
return 'AgentOutput'
-def _log_extract_message_content(message: BaseMessage, is_last_message: bool) -> str:
+def _log_extract_message_content(message: BaseMessage, is_last_message: bool, metadata: MessageMetadata | None = None) -> str:
"""Extract content from a message for logging display only"""
try:
message_type = message.__class__.__name__
@@ -113,6 +113,9 @@ def _log_extract_message_content(message: BaseMessage, is_last_message: bool) ->
tool_name = tool_call.get('name', 'unknown')
if tool_name == 'AgentOutput':
+ # Skip formatting for init example messages
+ if metadata and metadata.message_type == 'init':
+ return '[Example AgentOutput]'
content = _log_format_agent_output_content(tool_call)
else:
content = f'[TOOL: {tool_name}]'
@@ -354,7 +357,7 @@ class MessageManager:
is_last_message = i == len(self.state.history.messages) - 1
# Extract content for logging
- content = _log_extract_message_content(m.message, is_last_message)
+ content = _log_extract_message_content(m.message, is_last_message, m.metadata)
# Format the message line(s)
lines = _log_format_message_line(m, content, is_last_message, terminal_width)
diff --git a/browser_use/browser/profile.py b/browser_use/browser/profile.py
index abfd3ab5b..7915ace8c 100644
--- a/browser_use/browser/profile.py
+++ b/browser_use/browser/profile.py
@@ -715,23 +715,23 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro
"""
display_size = get_display_size()
- if display_size:
- self.screen = self.screen or display_size or ViewportSize(width=1280, height=1100)
+ has_screen_available = bool(display_size)
+ self.screen = self.screen or display_size or ViewportSize(width=1280, height=1100)
# if no headless preference specified, prefer headful if there is a display available
if self.headless is None:
- self.headless = not bool(display_size)
+ self.headless = not has_screen_available
# set up window size and position if headful
if self.headless:
# headless mode: no window available, use viewport instead to constrain content size
+ self.viewport = self.viewport or self.window_size or self.screen
+ self.window_position = None # no windows to position in headless mode
self.window_size = None
- self.window_position = None
- self.no_viewport = False
- self.viewport = self.viewport or display_size or ViewportSize(width=1280, height=1100)
+ self.no_viewport = False # viewport is always enabled in headless mode
else:
- # headful mode: use window, disable viewport, content fits to size of window
- self.window_size = self.window_size or display_size or ViewportSize(width=1280, height=1100)
+ # headful mode: use window, disable viewport by default, content fits to size of window
+ self.window_size = self.window_size or self.screen
self.no_viewport = True if self.no_viewport is None else self.no_viewport
self.viewport = None if self.no_viewport else self.viewport
@@ -746,11 +746,16 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro
if use_viewport:
# if we are using viewport, make device_scale_factor and screen are set to real values to avoid easy fingerprinting
- self.viewport = self.viewport or display_size or ViewportSize(width=1280, height=1100)
+ self.viewport = self.viewport or self.screen
self.device_scale_factor = self.device_scale_factor or 1.0
- self.screen = self.screen or display_size or ViewportSize(width=1280, height=1100)
+ assert self.viewport is not None
+ assert self.no_viewport is False
else:
# device_scale_factor and screen are not supported non-viewport mode, the system monitor determines these
self.viewport = None
- self.device_scale_factor = None
- self.screen = None
+ self.device_scale_factor = None # only supported in viewport mode
+ self.screen = None # only supported in viewport mode
+ assert self.viewport is None
+ assert self.no_viewport is True
+
+ assert not (self.headless and self.no_viewport), 'headless=True and no_viewport=True cannot both be set at the same time'
diff --git a/browser_use/cli.py b/browser_use/cli.py
index 41b4d93f1..a05f42117 100644
--- a/browser_use/cli.py
+++ b/browser_use/cli.py
@@ -26,6 +26,8 @@ import langchain_anthropic
import langchain_google_genai
import langchain_openai
+# from patchright.async_api import async_playwright
+
try:
import readline
@@ -34,6 +36,9 @@ except ImportError:
# readline not available on Windows by default
READLINE_AVAILABLE = False
+
+os.environ['BROWSER_USE_LOGGING_LEVEL'] = 'result'
+
from browser_use import Agent, Controller
from browser_use.agent.views import AgentSettings
from browser_use.browser import BrowserSession
@@ -432,7 +437,7 @@ class BrowserUseApp(App):
# Create and set up the custom handler
log_handler = RichLogHandler(rich_log)
- log_type = os.getenv('BROWSER_USE_LOGGING_LEVEL', 'info').lower()
+ log_type = os.getenv('BROWSER_USE_LOGGING_LEVEL', 'result').lower()
class BrowserUseFormatter(logging.Formatter):
def format(self, record):
@@ -1139,6 +1144,63 @@ class BrowserUseApp(App):
yield Footer()
+async def run_prompt_mode(prompt: str, ctx: click.Context, debug: bool = False):
+ """Run browser-use in non-interactive mode with a single prompt."""
+ # Import and call setup_logging to ensure proper initialization
+ from browser_use.logging_config import setup_logging
+
+ # Set up logging to only show results by default
+ os.environ['BROWSER_USE_LOGGING_LEVEL'] = 'result'
+
+ # Re-run setup_logging to apply the new log level
+ setup_logging()
+
+ # The logging is now properly configured by setup_logging()
+ # No need to manually configure handlers since setup_logging() handles it
+
+ try:
+ # Load config
+ config = load_user_config()
+ config = update_config_with_click_args(config, ctx)
+
+ # Get LLM
+ llm = get_llm(config)
+
+ # Get agent settings from config
+ agent_settings = AgentSettings.model_validate(config.get('agent', {}))
+
+ # Create browser session with headless=True and no user_data_dir
+ browser_session = BrowserSession(
+ headless=False,
+ # user_data_dir=None,
+ # playwright=(await async_playwright().start()),
+ # channel=BrowserChannel.CHROME,
+ )
+
+ # Create and run agent
+ agent = Agent(
+ task=prompt,
+ llm=llm,
+ browser_session=browser_session,
+ **agent_settings.model_dump(),
+ # Run the agent
+ )
+
+ await agent.run()
+
+ # Close browser session
+ await browser_session.close()
+
+ except Exception as e:
+ if debug:
+ import traceback
+
+ traceback.print_exc()
+ else:
+ print(f'Error: {str(e)}', file=sys.stderr)
+ sys.exit(1)
+
+
async def textual_interface(config: dict[str, Any]):
"""Run the Textual interface."""
logger = logging.getLogger('browser_use.startup')
@@ -1172,7 +1234,11 @@ async def textual_interface(config: dict[str, Any]):
logger.info('Browser mode: visible')
# Create BrowserSession directly with config parameters
- browser_session = BrowserSession(**browser_config)
+ browser_session = BrowserSession(
+ **browser_config,
+ # playwright=(await async_playwright().start()),
+ # channel=BrowserChannel.CHROME,
+ )
logger.debug('BrowserSession initialized successfully')
# Log browser version if available
@@ -1248,9 +1314,10 @@ async def textual_interface(config: dict[str, Any]):
@click.option('--headless', is_flag=True, help='Run browser in headless mode', default=None)
@click.option('--window-width', type=int, help='Browser window width')
@click.option('--window-height', type=int, help='Browser window height')
+@click.option('-p', '--prompt', type=str, help='Run a single task without the TUI (headless mode)')
@click.pass_context
def main(ctx: click.Context, debug: bool = False, **kwargs):
- """Browser-Use Interactive TUI"""
+ """Browser-Use Interactive TUI or Command Line Executor"""
if kwargs['version']:
from importlib.metadata import version
@@ -1258,6 +1325,14 @@ def main(ctx: click.Context, debug: bool = False, **kwargs):
print(version('browser-use'))
sys.exit(0)
+ # Check if prompt mode is activated
+ if kwargs.get('prompt'):
+ # Set environment variable for prompt mode before running
+ os.environ['BROWSER_USE_LOGGING_LEVEL'] = 'result'
+ # Run in non-interactive mode
+ asyncio.run(run_prompt_mode(kwargs['prompt'], ctx, debug))
+ return
+
# Configure console logging
console_handler = logging.StreamHandler(sys.stdout)
console_handler.setFormatter(logging.Formatter('%(asctime)s - %(levelname)s - %(message)s', '%H:%M:%S'))
diff --git a/docs/customize/browser-settings.mdx b/docs/customize/browser-settings.mdx
index 5615b34c6..0b1b2e686 100644
--- a/docs/customize/browser-settings.mdx
+++ b/docs/customize/browser-settings.mdx
@@ -1,94 +1,59 @@
---
title: "Browser Settings"
-description: "Configure browser behavior and context settings"
+description: "Launch or connect to an existing browser and configure it to your needs."
icon: "globe"
---
-Browser Use uses [playwright](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context) (or [patchright](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright)) + CDP to manage its connection with a real browser.
+Browser Use uses [playwright](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context) (or [patchright](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright)) to manage its connection with a real browser.
```python
-from browser_use import BrowserProfile, BrowserSession
-````
+from browser_use import BrowserSession, BrowserProfile, Agent
+```
+**To launch a new browser**, pass any playwright args you want to use to `BrowserSession(...)`:
-- `BrowserSession(**params)` is Browser Use's object that tracks a connection to a running browser. It holds:
- - the `playwright`, `browser`, `browser_context`, and `page` objects and tracks which tabs the agent/human are focused on
- - the helper methods to launch and connect to remote and local browsers
- - methods to interact with the browser window, apply config needed by the Agent, and run the DOMService for element detection
-- `BrowserProfile(**params)` is Browser Use's object that holds a collection of static config kwargs, to be used when starting a `BrowserSession`. It holds:
- - all the standard playwright kwargs
- - some extra browser configuration options we provide on top of playwright (e.g. `allowed_domains`, `window_position`, `profile_directory`, `deterministic_rendering`, etc.)
- - some kwargs used to configure Browser Use-specific features related to the browser (e.g. `highlight_elements`, `disable_security`, `cookies_file`)
-- [`Browser`](https://playwright.dev/python/docs/api/class-browser): standard [playwright `Browser`](https://playwright.dev/python/docs/api/class-browser) object, refers to a running browser process (remote or local)
-- [`BrowserContext`](https://playwright.dev/python/docs/api/class-browsercontext): standard [playwright `BrowserContext`](https://playwright.dev/python/docs/api/class-browsercontext) refers to a window in a live browser (incognito or with a profile)
-- [`Page`](https://playwright.dev/python/docs/api/class-page): standard [playwright `Page`](https://playwright.dev/python/docs/api/class-page), the handle for one tab in the browser
-
+```python
+new_browser = BrowserSession(user_data_dir='~/Desktop/test-profile', headless=False, ...)
+agent = Agent('fill out the form on this page', browser_session=new_browser)
+```
- The new `BrowserSession` and `BrowserProfile` classes now accept all the same arguments as standard Playwright [launch_persistent_context](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context) API, giving you full control over browser settings.
+ The new `BrowserSession` & `BrowserProfile` accept all the same arguments that Playwright's [`launch_persistent_context(...)`](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context) takes, giving you full control over browser settings at launch. (see below for the full list)
-# Browser Configuration
+**To connect to an existing remote browser**, initialize a `BrowserSession` with a `cdp_url`, `wss_url`, or `browser_pid`:
```python
-from browser_use import Agent, BrowserSession, BrowserProfile
-
-# It's easy to connect to or launch a new browser with a BrowserSession, many methods are supported:
-browser_session = BrowserSession(
- cdp_url='http://localhost:9222' | wss_url='ws://...',
- # keep_alive=True,
- # browser_pid=12445,
- page=playwright_page | browser_context | browser | playwright,
- # executable_path='...',
- # user_data_dir='...',
- # headless=True,
- browser_profile=BrowserProfile(..., color_scheme='dark'),
- # color_scheme='light', # <- will take precedence over ^
- # ... other config overrides can go here
-)
-
-# you can optionally start a session and use it outside an Agent, otherwise Agent will automatically start it
-await session.start()
-page = await session.browser_context.new_page()
-await page.goto('https://example.com/load-before-agent-starts')
-
-agent = Agent(
- task='your task here',
- browser_session=browser_session, # provide a browser_session to the agent
- page=page, # or as a shortcut, provide a playwright page directly
-)
+remote_browser = BrowserSession(cdp_url='http://localhost:9222', keep_alive=True)
+agent = Agent(task='Do some task with a remote browser...', browser_session=remote_browser)
+agent = Agent(task='Other task reusing the same browser...', browser_session=remote_browser)
```
-`BrowserProfile` is a static, flat collection of config.
-It can be passed to a `BrowserSession` to start a session using that config.
+**To use Browser Use from an existing playwright script** you can also pass an existing `page`, `browser_context`, or `browser` playwright objects to `BrowserSession(...)`.
```python
-browser_profile = BrowserProfile(
- headless=True,
- is_mobile=True,
- **playwright.devices['iPhone 13'],
- user_data_dir='~/Desktop/mobile_test_profile',
- allowed_domains=['https://*.example.com'],
-)
+browser = await playwright.chromium.launch(...)
-# create a new session using the profile
-browser_session = BrowserSession(
- browser_profile=browser_profile,
- headless=False, # <- extra kwargs passed to session will override values from profile
-)
+page1 = await browser.new_page()
+agent1 = Agent('Find the price for product abc', browser_session=BrowserSession(page=page1, keep_alive=True))
+
+page2 = await browser.new_page()
+agent2 = Agent('Find the price for product xyz', browser_session=BrowserSession(page=page2, keep_alive=True))
+
+# the agents will share the existing browser and each work on different tabs ^
```
+---
-`BrowserSession` and `BrowserProfile` both accept the same long list of main config kwargs, most are standard playwright arguments that `playwright.BrowserType.launch_persistent_context()` takes.
+## `BrowserSession`
-We also provide some extra utility options to control Browser-Use-specific features and make setup easier,
-e.g. `allowed_domains`, `keep_alive`, `highlight_elements`, and more.
+- `BrowserSession(**params)` is Browser Use's object that tracks a connection to a running browser. It sets up:
+ - the `playwright`, `browser`, `browser_context`, and `page` objects and tracks which tabs the agent/human are focused on
+ - methods to interact with the browser window, apply config needed by the Agent, and run the `DOMService` for element detection
+ - it can take extra `**kwargs` to pass to playwright or a `browser_profile=BrowserProfile(...)` containing some config
+### Browser Connection Parameters
-
-## `BrowserSession` Parameters
-
-
-### Remote Browser Connection Parameters
+Provide any one of these options to connect to an existing browser. These options are session-specific and cannot be stored in a `BrowserProfile(...)`.
#### `wss_url`
@@ -96,7 +61,7 @@ e.g. `allowed_domains`, `keep_alive`, `highlight_elements`, and more.
wss_url: str | None = None
```
-WSS URL of the node.js playwright browser server to connect to
+WSS URL of the playwright-protocol browser server to connect to. See here for [WSS connection instructions](https://docs.browser-use.com/customize/real-browser#method-d%3A-connect-to-remote-playwright-node-js-browser-server-via-wss-url).
#### `cdp_url`
@@ -104,7 +69,7 @@ WSS URL of the node.js playwright browser server to connect to
cdp_url: str | None = None
```
-CDP URL of the browser to connect to (e.g., http://localhost:9222)
+CDP URL of the browser to connect to (e.g. `http://localhost:9222`). See here for [CDP connection instructions](https://docs.browser-use.com/customize/real-browser#method-e%3A-connect-to-remote-browser-via-cdp-url).
#### `browser_pid`
@@ -112,7 +77,7 @@ CDP URL of the browser to connect to (e.g., http://localhost:9222)
browser_pid: int | None = None
```
-PID of a running chromium-based browser process to connect to on localhost
+PID of a running chromium-based browser process to connect to on localhost. See here for [connection via pid](https://docs.browser-use.com/customize/real-browser#method-c%3A-connect-to-local-browser-using-browser-pid) instructions.
For web scraping tasks on sites that restrict automated access, we recommend
@@ -120,7 +85,7 @@ PID of a running chromium-based browser process to connect to on localhost
See the [Connect to your Browser](real-browser) guide for detailed connection instructions.
-### Runtime State / Parameters
+### Session-Specific Parameters
#### `browser_profile`
@@ -128,7 +93,7 @@ PID of a running chromium-based browser process to connect to on localhost
browser_profile: BrowserProfile = BrowserProfile()
```
-BrowserProfile instance containing config to use for the BrowserSession
+Optional BrowserProfile instance containing config to use for the BrowserSession. (see below for more info)
#### `playwright`
@@ -136,8 +101,8 @@ BrowserProfile instance containing config to use for the BrowserSession
playwright: Playwright | None = None
```
-Optional playwright or patchright API client object to use
-result of (await async_playwright().start()) or (await async_patchright().start())
+Optional playwright or patchright API client object to use, the
+result of `(await async_playwright().start())` or `(await async_patchright().start())`. See here for [more detailed usage instructions](https://docs.browser-use.com/customize/real-browser#method-b%3A-connect-using-existing-playwright-objects).
#### `browser`
@@ -145,7 +110,7 @@ result of (await async_playwright().start()) or (await async_patchright().start(
browser: Browser | None = None
```
-Playwright Browser object to use (optional)
+Playwright Browser object to use (optional). See here for [more detailed usage instructions](https://docs.browser-use.com/customize/real-browser#method-b%3A-connect-using-existing-playwright-objects).
#### `browser_context`
@@ -153,7 +118,25 @@ Playwright Browser object to use (optional)
browser_context: BrowserContext | None = None
```
-Playwright BrowserContext object to use (optional)
+Playwright BrowserContext object to use (optional). See here for [more detailed usage instructions](https://docs.browser-use.com/customize/real-browser#method-b%3A-connect-using-existing-playwright-objects).
+
+#### `page` *aka* `agent_current_page`
+
+
+
+```python
+page: Page | None = None
+```
+
+Foreground Page that the agent is focused on, can also be passed as `page=...` as a shortcut. See here for [more detailed usage instructions](https://docs.browser-use.com/customize/real-browser#method-b%3A-connect-using-existing-playwright-objects).
+
+#### `human_current_page`
+
+```python
+human_current_page: Page | None = None
+```
+
+Foreground Page that the human is focused on to start, usually not necessary to set manually.
#### `initialized`
@@ -163,26 +146,37 @@ initialized: bool = False
Mark BrowserSession as already initialized, skips launch/connection (not recommended)
-#### `page` *aka* `agent_current_page`
+
+#### `**kwargs`
+
+`BrowserSession` can also accept *all* of the parameters [below](#browserprofile).
+(the parameters *above* this point are specific to `BrowserSession` and cannot be stored in a `BrowserProfile`)
+
+---
+
+
+## `BrowserProfile`
+
+A `BrowserProfile` is an optional static collection of configuration that can be passed around and validated before it's used to start a `BrowserSession(browser_profile=BrowserProfile(...))`.
+
+It can take all the same standard [playwright arguments](#playwright) that `BrowserSession(...)` takes, and some extras as well.
+
+It allows you do do things like start multiple browsers with the same config easily:
```python
-page: Page | None = None
+from browser_use.browser import BrowserProfile
+
+browser_profile = BrowserProfile(headless=False, user_data_dir=None, allowed_domains=['https://*.example.com'], ...)
+
+# start 3 separate browser instances with the same config options:
+browser1 = BrowserSession(browser_profile=browser_profile, user_data_dir='/tmp/profile1')
+browser2 = BrowserSession(browser_profile=browser_profile, user_data_dir='/tmp/profile2')
+browser3 = BrowserSession(browser_profile=browser_profile, user_data_dir='/tmp/profile2')
```
-Foreground Page that the agent is focused on, can also be passed as `page=...` as a shortcut.
+### Browser-Use Parameters
-#### `human_current_page`
-
-```python
-human_current_page: Page | None = None
-```
-
-Foreground Page that the human is focused on
-
-
-## Browser-Use Parameters
-
-These parameters control browser-use specific features, and are outside the standard playwright parameter set.
+These parameters control browser-use specific features, and are outside the standard playwright parameter set but are supported by both `BrowserSession` and `BrowserProfile`.
#### `keep_alive`
@@ -323,16 +317,32 @@ window_position: dict | None = {"width": 0, "height": 0}
Window position from top-left corner.
+#### `save_recording_path`
+
+```python
+save_recording_path: str | None = None
+```
+
+Directory path for saving video recordings.
+
+#### `trace_path`
+
+```python
+trace_path: str | None = None
+```
+
+Directory path for saving Agent trace files. Files are automatically named as `{trace_path}/{context_id}.zip`.
+
+
---
-## Playwright Parameters
+
+
+### Playwright Launch Options
-https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context
All the parameters below are standard playwright parameters and can be passed to both `BrowserSession` and `BrowserProfile`.
-They are defined in `browser_use/browser/profile.py`.
-
-### Launch Settings
+They are defined in `browser_use/browser/profile.py`. See here for the [official Playwright documentation](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context) for all of these options.
#### `headless`
@@ -350,6 +360,8 @@ channel: BrowserChannel = 'chromium'
Browser channel: `'chromium'`, `'chrome'`, `'chrome-beta'`, `'chrome-dev'`, `'chrome-canary'`, `'msedge'`, `'msedge-beta'`, `'msedge-dev'`, `'msedge-canary'`
+Don't worry, other chromium-based browsers not in this list (e.g. `brave`) are still supported if you provide your own [`executable_path`](#executable_path).
+
#### `executable_path`
```python
@@ -364,7 +376,13 @@ Path to browser executable for custom installations.
user_data_dir: str | Path | None = '~/.config/browseruse/profiles/default'
```
-Directory for browser profile data. Set to None to use an ephemeral profile aka incognito mode.
+Directory for browser profile data. Set to `None` to use an ephemeral temporary profile (aka incognito mode).
+
+Multiple running browsers **cannot share a single `user_data_dir` at the same time**. You must set it to `None` or
+provide a unique `user_data_dir` per-session if you plan to run multiple browsers.
+
+The browser version run must always be equal to or greater than the version used to create the `user_data_dir`.
+If you see errors like `Failed to parse Extensions` or similar and failures when launching, you're attempting to run an older browser with an incompatible `user_data_dir` that's already been migrated to a newer schema version.
#### `args`
@@ -372,7 +390,7 @@ Directory for browser profile data. Set to None to use an ephemeral profile aka
args: list[str] = []
```
-Additional command-line arguments to pass to the browser.
+Additional command-line arguments to pass to the browser. See here for the [full list of available chrome launch options](https://peter.sh/experiments/chromium-command-line-switches/).
#### `ignore_default_args`
@@ -380,7 +398,7 @@ Additional command-line arguments to pass to the browser.
ignore_default_args: list[str] | bool = ['--enable-automation', '--disable-extensions']
```
-List of default CLI args to stop playwright from applying.
+List of default CLI args to stop playwright from including when launching chrome. Set it to `True` to disable *all* default options (not recommended).
#### `env`
@@ -388,7 +406,7 @@ List of default CLI args to stop playwright from applying.
env: dict[str, str] = {}
```
-Environment variables to set when launching browser.
+Extra environment variables to set when launching browser. e.g. `{'DISPLAY': '1'}` to use a specific X11 display.
#### `chromium_sandbox`
@@ -396,7 +414,8 @@ Environment variables to set when launching browser.
chromium_sandbox: bool = not IN_DOCKER
```
-Whether to enable Chromium sandboxing (recommended unless inside Docker).
+Whether to enable Chromium sandboxing (recommended for security). Should always be `False` when running inside Docker
+because Docker provides its own sandboxing can conflict with Chrome's.
#### `devtools`
@@ -404,8 +423,7 @@ Whether to enable Chromium sandboxing (recommended unless inside Docker).
devtools: bool = False
```
-
-Whether to open DevTools panel automatically (only works when headless=False).
+Whether to open DevTools panel automatically (only works when `headless=False`).
#### `slow_mo`
@@ -437,7 +455,7 @@ Whether to automatically accept all downloads.
proxy: dict | None = None
```
-Proxy settings. Example: `{"server": "http://proxy.com:8080", "username": "user", "password": "pass"}`
+Proxy settings. Example: `{"server": "http://proxy.com:8080", "username": "user", "password": "pass"}`.
#### `permissions`
@@ -445,7 +463,7 @@ Proxy settings. Example: `{"server": "http://proxy.com:8080", "username": "user"
permissions: list[str] = ['clipboard-read', 'clipboard-write', 'notifications']
```
-Browser permissions to grant.
+Browser permissions to grant. See here for the [full list of available permission](https://playwright.dev/python/docs/api/class-browsercontext#browser-context-grant-permissions).
#### `storage_state`
@@ -453,7 +471,8 @@ Browser permissions to grant.
storage_state: str | Path | dict | None = None
```
-Browser storage state (cookies, localStorage). Can be file path or dict.
+Browser storage state (cookies, localStorage). Can be file path or dict. See here for the [Playwright `storage_state` documentation](https://playwright.dev/python/docs/api/class-browsercontext#browser-context-storage-state) on how to use it.
+This option is only applied when launching a new browser using the default builtin playwright chromium.
@@ -478,7 +497,7 @@ default_navigation_timeout: float | None = None
Default timeout for page navigation in milliseconds.
-### Display and Viewport Settings
+### Playwright Viewport Options
Configure browser window size, viewport, and display properties:
@@ -549,18 +568,21 @@ Viewport size with `width` and `height`. Example: `{"width": 1280, "height": 720
#### `no_viewport`
```python
-no_viewport: bool | None = None
+no_viewport: bool | None = not headless
```
Disable fixed viewport. Content will resize with window.
+*Tip:* don't use this parameter, it's a playwright standard parameter but it's redundant and only serves to override the `viewport` setting above.
+A viewport is *always* used in headless mode regardless of this setting, and is *never* used in headful mode unless you pass `viewport={width, height}` explicitly.
+
#### `device_scale_factor`
```python
device_scale_factor: float | None = None
```
-Device scale factor (DPI). Useful for high-resolution screenshots.
+Device scale factor (DPI). Useful for high-resolution screenshots (set it to 2).
#### `screen`
@@ -602,8 +624,7 @@ forced_colors: ForcedColors = 'none'
Forced colors mode: `'active'`, `'none'`, `'null'`
-
-### Security and Network Settings
+### Playwright Security Options
> See `allowed_domains` above too!
@@ -688,23 +709,9 @@ client_certificates: list[ClientCertificate] = []
Client certificates to be used with requests.
-### Recording and Debugging
+### Playwright Recording Options
-#### `save_recording_path`
-
-```python
-save_recording_path: str | None = None
-```
-
-Directory path for saving video recordings.
-
-#### `trace_path`
-
-```python
-trace_path: str | None = None
-```
-
-Directory path for saving trace files. Files are automatically named as `{trace_path}/{context_id}.zip`.
+Note: Browser Use also provides some of our own recording-related options not listed below (see above).
#### `record_video_dir`
@@ -712,12 +719,13 @@ Directory path for saving trace files. Files are automatically named as `{trace_
record_video_dir: str | Path | None = None
```
-Directory to save video recordings.
+Directory to save video recordings. [Playwright Docs: `record_video_dir`](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context-option-record-video-dir)
#### `record_video_size`
```python
-record_video_size: dict | None = None
+record_video_size: dict | None = None. [Playwright Docs: `record_video_size`](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context-option-record-video-size)
+
```
Video size. Example: `{"width": 1280, "height": 720}`
@@ -802,10 +810,21 @@ handle_sigterm: bool = False
Whether playwright should swallow SIGTERM signals and kill the browser.
+#### `**playwright.devices['iPhone 13']`
+
+```python
+BrowserProfile(
+ ...
+ **playwright.devices['iPhone 13'], # playwright = await async_playwright().start()
+)
+```
+
+Playwright provides launch & context args to [emulate common device fingerprints](https://playwright.dev/python/docs/emulation).
+`BrowserSession` and `BrowserProfile` take all the standard playwright args so we are able to support these as well.
---
-## Example
+## Full Example
```python
from browser_use import BrowserSession, BrowserProfile, Agent
@@ -820,11 +839,12 @@ browser_profile = BrowserProfile(
highlight_elements=True,
viewport_expansion=500,
allowed_domains=['*.google.com', 'http*://*.wikipedia.org'],
+ user_data_dir=None,
)
browser_session = BrowserSession(
- headless=True,
browser_profile=browser_profile,
+ headless=True, # extra kwargs to the session override the defaults in the profile
)
# you can drive a session without the agent / reuse it between agents
@@ -841,19 +861,18 @@ async def run_search():
)
```
-
## Summary
-- **BrowserSession** parameters (defined in `browser_use/browser/session.py`) handle connection and runtime state
-- **BrowserProfile** parameters (defined in `browser_use/browser/profile.py`) handle all playwright configuration
+- **BrowserSession** (defined in `browser_use/browser/session.py`) handles the live browser connection and runtime state
+- **BrowserProfile** (defined in `browser_use/browser/profile.py`) can be used to store a collection of static configuration
-`BrowserProfile` is a flat collection of config, it's consumed by these calls depending on how we need to connect/launch:
+Configuration parameters defined on either are consumed by these calls depending on whether we're connecting/launching:
- `BrowserConnectArgs` - args for `playwright.BrowserType.connect_over_cdp(...)`
- `BrowserLaunchArgs` - args for `playwright.BrowserType.launch(...)`
- `BrowserNewContextArgs` - args for `playwright.BrowserType.new_context(...)`
- `BrowserLaunchPersistentContextArgs` - args for `playwright.BrowserType.launch_persistent_context(...)`
-- BrowserUse custom settings
+- Browser Use's own internal methods
For more details on Playwright's browser context options, see the [official documentation](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context).
diff --git a/docs/customize/real-browser.mdx b/docs/customize/real-browser.mdx
index f7350c6e2..3df3fee45 100644
--- a/docs/customize/real-browser.mdx
+++ b/docs/customize/real-browser.mdx
@@ -26,8 +26,8 @@ browser_session = BrowserSession(
# For Windows: 'C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe'
# For Linux: '/usr/bin/google-chrome'
- # Use a specific data directory on disk (optional)
- user_data_dir='~/.config/browseruse/profiles/default',
+ # Use a specific data directory on disk (optional, set to None for incognito)
+ user_data_dir='~/.config/browseruse/profiles/default', # this is the default
# ... any other BrowserProfile or playwright launch_persistnet_context config...
# headless=False,
)
diff --git a/examples/browser/using_cdp.py b/examples/browser/using_cdp.py
index 71538699b..f6e72cf86 100644
--- a/examples/browser/using_cdp.py
+++ b/examples/browser/using_cdp.py
@@ -25,24 +25,23 @@ from langchain_google_genai import ChatGoogleGenerativeAI
from pydantic import SecretStr
from browser_use import Agent, Controller
-from browser_use.browser import BrowserProfile, BrowserSession
+from browser_use.browser import BrowserSession
api_key = os.getenv('GOOGLE_API_KEY')
if not api_key:
raise ValueError('GOOGLE_API_KEY is not set')
-browser_profile = BrowserProfile(
+browser_session = BrowserSession(
headless=False,
cdp_url='http://localhost:9222',
)
-browser_session = BrowserSession(browser_profile=browser_profile)
controller = Controller()
async def main():
task = 'In docs.google.com write my Papa a quick thank you for everything letter \n - Magnus'
task += ' and save the document as pdf'
- model = ChatGoogleGenerativeAI(model='gemini-2.0-flash-exp', api_key=SecretStr(str(api_key)))
+ model = ChatGoogleGenerativeAI(model='gemini-2.0-flash-exp', api_key=SecretStr(api_key))
agent = Agent(
task=task,
llm=model,
diff --git a/examples/features/result_processing.py b/examples/features/result_processing.py
index 6d0980323..b65212afd 100644
--- a/examples/features/result_processing.py
+++ b/examples/features/result_processing.py
@@ -22,11 +22,8 @@ async def main():
async with BrowserSession(
browser_profile=BrowserProfile(
headless=False,
- disable_security=True,
trace_path='./tmp/result_processing',
- no_viewport=False,
- window_width=1280,
- window_height=1000,
+ window_size={'width': 1280, 'height': 1000},
user_data_dir='~/.config/browseruse/profiles/default',
)
) as browser_session:
diff --git a/examples/use-cases/web_voyager_agent.py b/examples/use-cases/web_voyager_agent.py
index 7ea771405..68b1d2c1b 100644
--- a/examples/use-cases/web_voyager_agent.py
+++ b/examples/use-cases/web_voyager_agent.py
@@ -36,14 +36,9 @@ else:
browser_session = BrowserSession(
browser_profile=BrowserProfile(
headless=False, # This is True in production
- disable_security=True,
minimum_wait_page_load_time=1, # 3 on prod
maximum_wait_page_load_time=10, # 20 on prod
- # Set no_viewport=False to constrain the viewport to the specified dimensions
- # This is useful for specific cases where you need a fixed viewport size
- no_viewport=False,
- window_width=1280,
- window_height=1100,
+ viewport={'width': 1280, 'height': 1100},
user_data_dir='~/.config/browseruse/profiles/default',
# trace_path='./tmp/web_voyager_agent',
)
diff --git a/pyproject.toml b/pyproject.toml
index 5f7ad38f6..c4cfdd192 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -77,6 +77,11 @@ Repository = "https://github.com/browser-use/browser-use"
browseruse = "browser_use.cli:main"
browser-use = "browser_use.cli:main"
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+
[tool.codespell]
ignore-words-list = "bu,wit,dont,cant,wont"
skip = "*.json"
@@ -101,10 +106,6 @@ skip-magic-trailing-comma = false
[tool.pyright]
typeCheckingMode = "off"
-[build-system]
-requires = ["hatchling"]
-build-backend = "hatchling.build"
-
[tool.hatch.build]
include = [
"browser_use/**/*.py",
@@ -114,6 +115,30 @@ include = [
"browser_use/dom/buildDomTree.js",
]
+[tool.pytest.ini_options]
+asyncio_mode = "auto"
+asyncio_default_fixture_loop_scope = "module"
+asyncio_default_test_loop_scope = "module"
+markers = [
+ "slow: marks tests as slow (deselect with `-m 'not slow'`)",
+ "integration: marks tests as integration tests",
+ "unit: marks tests as unit tests",
+ "asyncio: mark tests as async tests",
+]
+testpaths = [
+ "tests"
+]
+python_files = ["test_*.py", "*_test.py"]
+addopts = "-v --strict-markers --tb=short"
+log_cli = true
+log_cli_format = "%(levelname)-8s [%(name)s] %(message)s"
+filterwarnings = [
+ "ignore::pytest.PytestDeprecationWarning",
+ "ignore::DeprecationWarning",
+]
+log_level = "INFO"
+
+
[tool.uv]
dev-dependencies = [
"ruff>=0.11.2",
diff --git a/pytest.ini b/pytest.ini
deleted file mode 100644
index 8715c46f8..000000000
--- a/pytest.ini
+++ /dev/null
@@ -1,29 +0,0 @@
-[pytest]
-markers =
- slow: marks tests as slow (deselect with '-m "not slow"')
- integration: marks tests as integration tests
- unit: marks tests as unit tests
- asyncio: mark tests as async tests
-
-testpaths =
- tests
-
-python_files =
- test_*.py
- *_test.py
-
-addopts =
- -v
- --strict-markers
- --tb=short
-
-asyncio_mode = auto
-asyncio_default_fixture_loop_scope = function
-log_cli = true
-; log_cli_level = DEBUG
-log_cli_format = %(levelname)-8s [%(name)s] %(message)s
-filterwarnings =
- ignore::pytest.PytestDeprecationWarning
- ignore::DeprecationWarning
-
-log_level = INFO
diff --git a/tests/ci/test_sensitive_data.py b/tests/ci/test_agent_sensitive_data.py
similarity index 100%
rename from tests/ci/test_sensitive_data.py
rename to tests/ci/test_agent_sensitive_data.py
diff --git a/tests/ci/test_url_allowlist_security.py b/tests/ci/test_browser_session_allowed_domains.py
similarity index 100%
rename from tests/ci/test_url_allowlist_security.py
rename to tests/ci/test_browser_session_allowed_domains.py
diff --git a/tests/ci/test_debug_selector_map.py b/tests/ci/test_browser_session_element_cache.py
similarity index 100%
rename from tests/ci/test_debug_selector_map.py
rename to tests/ci/test_browser_session_element_cache.py
diff --git a/tests/ci/test_browser_session_param.py b/tests/ci/test_browser_session_param.py
index e88ad5d2f..87ea2f89b 100644
--- a/tests/ci/test_browser_session_param.py
+++ b/tests/ci/test_browser_session_param.py
@@ -237,7 +237,6 @@ async def main(browser_session):
import pytest
-@pytest.mark.asyncio
async def test_browser_session_parameter_issue(browser_session):
"""Test that the browser_session parameter issue is fixed."""
# Run the main test logic
diff --git a/tests/ci/test_tab_management.py b/tests/ci/test_browser_session_tab_management.py
similarity index 97%
rename from tests/ci/test_tab_management.py
rename to tests/ci/test_browser_session_tab_management.py
index 7d5e05dc8..57f970549 100644
--- a/tests/ci/test_tab_management.py
+++ b/tests/ci/test_browser_session_tab_management.py
@@ -20,13 +20,6 @@ logger.setLevel(logging.DEBUG)
class TestTabManagement:
"""Tests for the tab management system with separate agent_current_page and human_current_page references."""
- @pytest.fixture(scope='module')
- def event_loop(self):
- """Create and provide an event loop for async tests."""
- loop = asyncio.get_event_loop_policy().new_event_loop()
- yield loop
- loop.close()
-
@pytest.fixture(scope='module')
def http_server(self):
"""Create and provide a test HTTP server that serves static content."""
@@ -51,7 +44,7 @@ class TestTabManagement:
server.stop()
@pytest.fixture(scope='module')
- async def browser_profile(self, event_loop):
+ async def browser_profile(self):
"""Create and provide a BrowserProfile with security disabled."""
profile = BrowserProfile(headless=True)
yield profile
@@ -191,7 +184,6 @@ class TestTabManagement:
# Tab management tests
- @pytest.mark.asyncio
async def test_initial_values(self, browser_session, base_url):
"""Test that open_tab correctly updates both tab references."""
@@ -210,7 +202,6 @@ class TestTabManagement:
assert current_tab is not None
assert current_tab.url == 'about:blank'
- @pytest.mark.asyncio
async def test_agent_changes_tab(self, browser_session, base_url):
"""Test that agent_current_page changes and human_current_page remains the same when a new tab is opened."""
@@ -239,7 +230,6 @@ class TestTabManagement:
browser_session.human_current_page.url == initial_tab.url == f'{base_url}/page1'
) # human should still be on the very first tab
- @pytest.mark.asyncio
async def test_human_changes_tab(self, browser_session, base_url):
"""Test that human_current_page changes and agent_current_page remains the same when a new tab is opened."""
@@ -261,7 +251,6 @@ class TestTabManagement:
assert current_agent_page.url == initial_tab.url == 'about:blank'
assert browser_session.human_current_page.url == new_human_tab.url == f'{base_url}/page3'
- @pytest.mark.asyncio
async def test_switch_tab(self, browser_session, base_url):
"""Test that switch_tab updates both tab references."""
@@ -297,7 +286,6 @@ class TestTabManagement:
assert current_tab.url == second_tab.url == f'{base_url}/page2' == browser_session.agent_current_page.url
assert browser_session.human_current_page.url == first_tab.url == f'{base_url}/page1'
- @pytest.mark.asyncio
async def test_close_tab(self, browser_session, base_url):
"""Test that closing a tab updates references correctly."""
diff --git a/tests/ci/test_browser.py b/tests/ci/test_browser_session_via_cdp.py
similarity index 97%
rename from tests/ci/test_browser.py
rename to tests/ci/test_browser_session_via_cdp.py
index d7ed1bfbe..b00fa904a 100644
--- a/tests/ci/test_browser.py
+++ b/tests/ci/test_browser_session_via_cdp.py
@@ -4,7 +4,6 @@ from playwright.async_api import async_playwright
from browser_use.browser import BrowserSession
-@pytest.mark.asyncio
async def test_connection_via_cdp(monkeypatch):
browser_session = BrowserSession(
cdp_url='http://localhost:9898',
diff --git a/tests/test_browser_config_models.py b/tests/ci/test_browser_session_viewport_and_proxy.py
similarity index 67%
rename from tests/test_browser_config_models.py
rename to tests/ci/test_browser_session_viewport_and_proxy.py
index a32852138..e7fad6a4a 100644
--- a/tests/test_browser_config_models.py
+++ b/tests/ci/test_browser_session_viewport_and_proxy.py
@@ -1,12 +1,7 @@
-import os
-
-import pytest
-
-from browser_use.browser.profile import BrowserProfile, ProxySettings
-from browser_use.browser.session import BrowserSession
+from browser_use.browser import BrowserProfile, BrowserSession
+from browser_use.browser.profile import ProxySettings
-@pytest.mark.asyncio
async def test_proxy_settings_pydantic_model():
"""
Test that ProxySettings as a Pydantic model is correctly converted to a dictionary when used.
@@ -30,32 +25,6 @@ async def test_proxy_settings_pydantic_model():
# We don't launch the actual browser - we just verify the model itself works as expected
-@pytest.mark.asyncio
-async def test_window_size_config():
- """
- Test that BrowserProfile correctly handles window_size property.
- """
- # Create profile with specific window dimensions
- profile = BrowserProfile(window_size={'width': 1280, 'height': 1100})
-
- # Verify the properties are set correctly
- assert profile.window_size['width'] == 1280
- assert profile.window_size['height'] == 1100
-
- # Verify model_dump works correctly
- profile_dict = profile.model_dump()
- assert isinstance(profile_dict, dict)
- assert profile_dict['window_size']['width'] == 1280
- assert profile_dict['window_size']['height'] == 1100
-
- # Create with different values
- profile2 = BrowserProfile(window_size={'width': 1920, 'height': 1080})
- assert profile2.window_size['width'] == 1920
- assert profile2.window_size['height'] == 1080
-
-
-@pytest.mark.asyncio
-@pytest.mark.skipif(os.environ.get('CI') == 'true', reason='Skip browser test in CI')
async def test_window_size_with_real_browser():
"""
Integration test that verifies our window size Pydantic model is correctly
@@ -64,11 +33,10 @@ async def test_window_size_with_real_browser():
"""
# Create browser profile with headless mode and specific dimensions
browser_profile = BrowserProfile(
- headless=True, # Use headless for faster test
- window_size={'width': 1024, 'height': 768},
- maximum_wait_page_load_time=2.0, # Faster timeouts for test
+ headless=True, # window size gets converted to viewport size in headless mode
+ window_size={'width': 999, 'height': 888},
+ maximum_wait_page_load_time=2.0,
minimum_wait_page_load_time=0.2,
- no_viewport=True, # Use actual window size instead of viewport
)
# Create browser session
@@ -101,7 +69,7 @@ async def test_window_size_with_real_browser():
""")
# Let's also check the viewport size
- viewport_size = await page.evaluate("""
+ actual_size = await page.evaluate("""
() => {
return {
width: window.innerWidth,
@@ -110,22 +78,28 @@ async def test_window_size_with_real_browser():
}
""")
- print(f'Window size config: width={browser_profile.window_size["width"]}, height={browser_profile.window_size["height"]}')
- print(f'Browser viewport size: {viewport_size}')
+ print(f'Browser configured window_size={browser_session.browser_profile.window_size}')
+ print(f'Browser configured viewport_size: {browser_session.browser_profile.viewport}')
+ print(f'Browser content actual size: {actual_size}')
# This is a lightweight test to verify that the page has a size (details may vary by browser)
- assert viewport_size['width'] > 0, 'Expected viewport width to be positive'
- assert viewport_size['height'] > 0, 'Expected viewport height to be positive'
+ assert actual_size['width'] > 0, 'Expected viewport width to be positive'
+ assert actual_size['height'] > 0, 'Expected viewport height to be positive'
- # For browser context creation in record_video_size, this is what truly matters
- # Verify that our window size was properly serialized to a dictionary
- print(f'Content of context session: {browser_session.browser_context}')
- print('✅ Browser window size used in the test')
+ # assert that window_size got converted to viewport_size in headless mode
+ assert browser_session.browser_profile.headless is True
+ assert browser_session.browser_profile.viewport == {'width': 999, 'height': 888}
+ assert browser_session.browser_profile.window_size is None
+ assert browser_session.browser_profile.window_position is None
+ assert browser_session.browser_profile.no_viewport is False
+ # screen should be the detected display size (or default if no display detected)
+ assert browser_session.browser_profile.screen is not None
+ assert browser_session.browser_profile.screen['width'] > 0
+ assert browser_session.browser_profile.screen['height'] > 0
finally:
await browser_session.stop()
-@pytest.mark.asyncio
async def test_proxy_with_real_browser():
"""
Integration test that verifies our proxy Pydantic model is correctly
@@ -143,7 +117,7 @@ async def test_proxy_with_real_browser():
)
# Test model serialization
- proxy_dict = proxy_settings.model_dump()
+ proxy_dict = dict(proxy_settings)
assert isinstance(proxy_dict, dict)
assert proxy_dict['server'] == 'http://non.existent.proxy:9999'
@@ -160,5 +134,8 @@ async def test_proxy_with_real_browser():
# Success - the browser was initialized with our proxy settings
# We won't try to make requests (which would fail with non-existent proxy)
print('✅ Browser initialized with proxy settings successfully')
+ assert browser_session.browser_profile.proxy == proxy_settings
+ # TODO: create a network request in the browser and verify it goes through the proxy?
+ # would require setting up a whole fake proxy in a fixture
finally:
await browser_session.stop()
diff --git a/tests/ci/test_controller.py b/tests/ci/test_controller.py
index c249ebd79..e6c8cfd3f 100644
--- a/tests/ci/test_controller.py
+++ b/tests/ci/test_controller.py
@@ -27,13 +27,6 @@ from browser_use.controller.views import (
class TestControllerIntegration:
"""Integration tests for Controller using actual browser instances."""
- @pytest.fixture(scope='module')
- def event_loop(self):
- """Create and provide an event loop for async tests."""
- loop = asyncio.get_event_loop_policy().new_event_loop()
- yield loop
- loop.close()
-
@pytest.fixture(scope='module')
def http_server(self):
"""Create and provide a test HTTP server that serves static content."""
@@ -81,8 +74,8 @@ class TestControllerIntegration:
"""Return the base URL for the test HTTP server."""
return f'http://{http_server.host}:{http_server.port}'
- @pytest.fixture(scope='module')
- async def browser_session(self, event_loop):
+ @pytest.fixture
+ async def browser_session(self):
"""Create and provide a Browser instance with security disabled."""
browser_session = BrowserSession(
# browser_profile=BrowserProfile(),
@@ -98,7 +91,6 @@ class TestControllerIntegration:
"""Create and provide a Controller instance."""
return Controller()
- @pytest.mark.asyncio
async def test_go_to_url_action(self, controller, browser_session, base_url):
"""Test that GoToUrlAction navigates to the specified URL."""
# Create action model for go_to_url
@@ -121,7 +113,6 @@ class TestControllerIntegration:
page = await browser_session.get_current_page()
assert f'{base_url}/page1' in page.url
- @pytest.mark.asyncio
async def test_scroll_actions(self, controller, browser_session, base_url):
"""Test that scroll actions correctly scroll the page."""
# First navigate to a page
@@ -158,7 +149,6 @@ class TestControllerIntegration:
assert isinstance(result, ActionResult)
assert 'Scrolled up' in result.extracted_content
- @pytest.mark.asyncio
async def test_registry_actions(self, controller, browser_session):
"""Test that the registry contains the expected default actions."""
# Check that common actions are registered
@@ -181,7 +171,6 @@ class TestControllerIntegration:
assert controller.registry.registry.actions[action].function is not None
assert controller.registry.registry.actions[action].description is not None
- @pytest.mark.asyncio
async def test_custom_action_registration(self, controller, browser_session, base_url):
"""Test registering a custom action and executing it."""
@@ -216,7 +205,6 @@ class TestControllerIntegration:
assert 'Custom action executed with: test_value on' in result.extracted_content
assert f'{base_url}/page1' in result.extracted_content
- @pytest.mark.asyncio
async def test_input_text_action(self, controller, browser_session, base_url, http_server):
"""Test that InputTextAction correctly inputs text into form fields."""
# Set up search form endpoint for this test
@@ -273,7 +261,6 @@ class TestControllerIntegration:
# If it fails due to DOM issues, that's expected in a test environment
assert 'Element index' in str(e) or 'does not exist' in str(e)
- @pytest.mark.asyncio
async def test_error_handling(self, controller, browser_session):
"""Test error handling when an action fails."""
# Create an action with an invalid index
@@ -289,7 +276,6 @@ class TestControllerIntegration:
# Verify that an appropriate error is raised
assert 'does not exist' in str(excinfo.value) or 'Element with index' in str(excinfo.value)
- @pytest.mark.asyncio
async def test_wait_action(self, controller, browser_session):
"""Test that the wait action correctly waits for the specified duration."""
@@ -329,7 +315,6 @@ class TestControllerIntegration:
# Verify that at least 1 second has passed
assert end_time - start_time >= 0.9 # Allow some timing margin
- @pytest.mark.asyncio
async def test_go_back_action(self, controller, browser_session, base_url):
"""Test that go_back action navigates to the previous page."""
# Navigate to first page
@@ -378,7 +363,6 @@ class TestControllerIntegration:
# Try to verify we're back on the first page, but don't fail the test if not
assert f'{base_url}/page1' in final_url, f'Expected to return to page1 but got {final_url}'
- @pytest.mark.asyncio
async def test_navigation_chain(self, controller, browser_session, base_url):
"""Test navigating through multiple pages and back through history."""
# Set up a chain of navigation: Home -> Page1 -> Page2
@@ -410,7 +394,6 @@ class TestControllerIntegration:
page = await browser_session.get_current_page()
assert expected_url in page.url
- @pytest.mark.asyncio
async def test_concurrent_tab_operations(self, controller, browser_session, base_url):
"""Test operations across multiple tabs."""
# Create two tabs with different content
@@ -461,7 +444,6 @@ class TestControllerIntegration:
assert len(tabs_info) == 1
assert urls[0] in tabs_info[0].url
- @pytest.mark.asyncio
async def test_excluded_actions(self, browser_session):
"""Test that excluded actions are not registered."""
# Create controller with excluded actions
@@ -475,7 +457,6 @@ class TestControllerIntegration:
assert 'go_to_url' in excluded_controller.registry.registry.actions
assert 'click_element_by_index' in excluded_controller.registry.registry.actions
- @pytest.mark.asyncio
async def test_search_google_action(self, controller, browser_session, base_url):
"""Test the search_google action."""
@@ -497,7 +478,6 @@ class TestControllerIntegration:
page = await browser_session.get_current_page()
assert page.url is not None and 'Python' in page.url
- @pytest.mark.asyncio
async def test_done_action(self, controller, browser_session, base_url):
"""Test that DoneAction completes a task and reports success or failure."""
# First navigate to a page
@@ -541,7 +521,6 @@ class TestControllerIntegration:
assert result.is_done is True
assert result.error is None
- @pytest.mark.asyncio
async def test_drag_drop_action(self, controller, browser_session, base_url, http_server):
"""Test that DragDropAction correctly drags and drops elements."""
# Set up drag and drop test page for this test
@@ -769,7 +748,6 @@ class TestControllerIntegration:
assert drag_succeeded, "Drag and drop events weren't fired correctly"
- @pytest.mark.asyncio
async def test_send_keys_action(self, controller, browser_session, base_url, http_server):
"""Test SendKeysAction using a controlled local HTML file."""
# Set up keyboard test page for this test
@@ -973,7 +951,6 @@ class TestControllerIntegration:
input_value = await page.evaluate('() => document.getElementById("textInput").value')
assert input_value == test_text, "Input value shouldn't have changed after tabbing"
- @pytest.mark.asyncio
async def test_get_dropdown_options(self, controller, browser_session, base_url, http_server):
"""Test that get_dropdown_options correctly retrieves options from a dropdown."""
# Add route for dropdown test page
@@ -1084,7 +1061,6 @@ class TestControllerIntegration:
f"Option at index {i} has wrong value: expected '{expected['value']}', got '{actual['value']}'"
)
- @pytest.mark.asyncio
async def test_select_dropdown_option(self, controller, browser_session, base_url, http_server):
"""Test that select_dropdown_option correctly selects an option from a dropdown."""
# Add route for dropdown test page
@@ -1159,7 +1135,6 @@ class TestControllerIntegration:
selected_value = await page.evaluate("document.getElementById('test-dropdown').value")
assert selected_value == 'option2' # Second Option has value "option2"
- @pytest.mark.asyncio
async def test_extract_content_action(self, controller, browser_session, base_url, http_server):
"""Test the default extract_content action with mixed parameter ordering."""
# Set up a test page with specific content
@@ -1257,7 +1232,6 @@ class TestControllerIntegration:
assert action_model.extract_content['goal'] == 'Extract all product information including links'
assert action_model.extract_content['include_links'] is True
- @pytest.mark.asyncio
async def test_click_element_by_index(self, controller, browser_session, base_url, http_server):
"""Test that click_element_by_index correctly clicks an element and handles different outcomes."""
# Add route for clickable elements test page
diff --git a/tests/ci/test_browser_session.py b/tests/ci/test_controller_action_parameter_injection.py
similarity index 97%
rename from tests/ci/test_browser_session.py
rename to tests/ci/test_controller_action_parameter_injection.py
index 9df55b82d..9f0af6c8f 100644
--- a/tests/ci/test_browser_session.py
+++ b/tests/ci/test_controller_action_parameter_injection.py
@@ -11,13 +11,6 @@ from browser_use.dom.views import DOMElementNode
class TestBrowserContext:
"""Tests for browser context functionality using real browser instances."""
- @pytest.fixture(scope='module')
- def event_loop(self):
- """Create and provide an event loop for async tests."""
- loop = asyncio.get_event_loop_policy().new_event_loop()
- yield loop
- loop.close()
-
@pytest.fixture(scope='module')
def http_server(self):
"""Create and provide a test HTTP server that serves static content."""
@@ -61,8 +54,8 @@ class TestBrowserContext:
"""Return the base URL for the test HTTP server."""
return f'http://{http_server.host}:{http_server.port}'
- @pytest.fixture(scope='module')
- async def browser_session(self, event_loop):
+ @pytest.fixture
+ async def browser_session(self):
"""Create and provide a BrowserSession instance with security disabled."""
browser_session = BrowserSession(
# browser_profile=BrowserProfile(...),
diff --git a/tests/ci/test_action_registry.py b/tests/ci/test_registry.py
similarity index 98%
rename from tests/ci/test_action_registry.py
rename to tests/ci/test_registry.py
index badfd11a3..6b1ede9ed 100644
--- a/tests/ci/test_action_registry.py
+++ b/tests/ci/test_registry.py
@@ -66,14 +66,6 @@ class ComplexParams(BaseActionModel):
# Test fixtures
-@pytest.fixture(scope='module')
-def event_loop():
- """Create and provide an event loop for async tests."""
- loop = asyncio.get_event_loop_policy().new_event_loop()
- yield loop
- loop.close()
-
-
@pytest.fixture(scope='module')
def http_server():
"""Create and provide a test HTTP server that serves static content."""
@@ -97,7 +89,7 @@ def base_url(http_server):
@pytest.fixture(scope='module')
-async def browser_session(event_loop):
+async def browser_session():
"""Create and provide a real BrowserSession instance."""
browser_session = BrowserSession(
headless=True,
@@ -121,7 +113,7 @@ def registry():
@pytest.fixture
-async def test_browser(base_url, event_loop):
+async def test_browser(base_url):
"""Create a real BrowserSession for testing"""
browser_session = BrowserSession(
headless=True,
@@ -137,7 +129,6 @@ async def test_browser(base_url, event_loop):
class TestActionRegistryParameterPatterns:
"""Test different parameter patterns that should all continue to work"""
- @pytest.mark.asyncio
async def test_individual_parameters_no_browser(self, registry):
"""Test action with individual parameters, no special injection"""
@@ -151,7 +142,6 @@ class TestActionRegistryParameterPatterns:
assert isinstance(result, ActionResult)
assert 'Text: hello, Number: 42' in result.extracted_content
- @pytest.mark.asyncio
async def test_individual_parameters_with_browser(self, registry, browser_session, base_url):
"""Test action with individual parameters plus browser_session injection"""
@@ -170,7 +160,6 @@ class TestActionRegistryParameterPatterns:
assert 'Text: hello, URL:' in result.extracted_content
assert base_url in result.extracted_content
- @pytest.mark.asyncio
async def test_page_parameter_injection(self, registry, browser_session, base_url):
"""Test action with direct Page parameter injection"""
@@ -188,7 +177,6 @@ class TestActionRegistryParameterPatterns:
assert isinstance(result, ActionResult)
assert 'Text: hello, Page Title: Test Page' in result.extracted_content
- @pytest.mark.asyncio
async def test_pydantic_model_with_page_parameter(self, registry, browser_session, base_url):
"""Test pydantic model action with page parameter injection"""
@@ -208,7 +196,6 @@ class TestActionRegistryParameterPatterns:
assert isinstance(result, ActionResult)
assert 'Text: test, Number: 100, Page Title: Test Page' in result.extracted_content
- @pytest.mark.asyncio
async def test_pydantic_model_parameters(self, registry, browser_session, base_url):
"""Test action that takes a pydantic model as first parameter"""
@@ -231,7 +218,6 @@ class TestActionRegistryParameterPatterns:
assert 'Text: test, Number: 100, Flag: True' in result.extracted_content
assert base_url in result.extracted_content
- @pytest.mark.asyncio
async def test_mixed_special_parameters(self, registry, browser_session, base_url, mock_llm):
"""Test action with multiple special injected parameters"""
@@ -270,7 +256,6 @@ class TestActionRegistryParameterPatterns:
assert 'LLM: Mocked LLM response' in result.extracted_content
assert 'Files: 2' in result.extracted_content
- @pytest.mark.asyncio
async def test_no_params_action(self, registry, test_browser):
"""Test action with NoParamsAction model"""
@@ -288,7 +273,6 @@ class TestActionRegistryParameterPatterns:
assert 'No params action executed on' in result.extracted_content
assert '/test' in result.extracted_content
- @pytest.mark.asyncio
async def test_legacy_browser_parameter_names(self, registry, test_browser):
"""Test that legacy browser parameter names still work"""
@@ -312,7 +296,6 @@ class TestActionRegistryParameterPatterns:
assert 'Legacy context: test2, URL:' in result2.extracted_content
assert '/test' in result2.extracted_content
- @pytest.mark.asyncio
async def test_page_parameter_optimization(self, test_browser: BrowserSession, httpserver: HTTPServer):
"""Test that actions can use page: Page parameter directly instead of browser_session"""
registry = Registry()
@@ -359,7 +342,6 @@ class TestActionRegistryParameterPatterns:
class TestActionToActionCalling:
"""Test scenarios where actions call other actions"""
- @pytest.mark.asyncio
async def test_action_calling_action_with_kwargs(self, registry, test_browser):
"""Test action calling another action using kwargs (current problematic pattern)"""
@@ -389,7 +371,6 @@ class TestActionToActionCalling:
assert 'Called result: First: Helper processed: test on' in result.extracted_content
assert '/test' in result.extracted_content
- @pytest.mark.asyncio
async def test_google_sheets_style_calling_pattern(self, registry, test_browser):
"""Test the specific pattern from Google Sheets actions that causes the error"""
@@ -438,7 +419,6 @@ class TestActionToActionCalling:
assert 'Selected cell A1:F100 on' in result_problematic.extracted_content
assert '/test' in result_problematic.extracted_content
- @pytest.mark.asyncio
async def test_complex_action_chain(self, registry, test_browser):
"""Test a complex chain of actions calling other actions"""
@@ -474,7 +454,6 @@ class TestActionToActionCalling:
class TestRegistryEdgeCases:
"""Test edge cases and error conditions"""
- @pytest.mark.asyncio
async def test_decorated_action_rejects_positional_args(self, registry, test_browser):
"""Test that decorated actions reject positional arguments"""
@@ -494,7 +473,6 @@ class TestRegistryEdgeCases:
assert isinstance(result, ActionResult)
assert 'Selected cell A1:B2 on' in result.extracted_content
- @pytest.mark.asyncio
async def test_missing_required_browser_session(self, registry):
"""Test that actions requiring browser_session fail appropriately when not provided"""
@@ -511,7 +489,6 @@ class TestRegistryEdgeCases:
# No browser_session provided
)
- @pytest.mark.asyncio
async def test_missing_required_llm(self, registry, test_browser):
"""Test that actions requiring page_extraction_llm fail appropriately when not provided"""
@@ -532,7 +509,6 @@ class TestRegistryEdgeCases:
# No page_extraction_llm provided
)
- @pytest.mark.asyncio
async def test_invalid_parameters(self, registry, test_browser):
"""Test handling of invalid parameters"""
@@ -548,14 +524,12 @@ class TestRegistryEdgeCases:
browser_session=test_browser,
)
- @pytest.mark.asyncio
async def test_nonexistent_action(self, registry, test_browser):
"""Test calling a non-existent action"""
with pytest.raises(ValueError, match='Action nonexistent_action not found'):
await registry.execute_action('nonexistent_action', {'param': 'value'}, browser_session=test_browser)
- @pytest.mark.asyncio
async def test_sync_action_wrapper(self, registry, test_browser):
"""Test that sync functions are properly wrapped to be async"""
@@ -570,7 +544,6 @@ class TestRegistryEdgeCases:
assert isinstance(result, ActionResult)
assert 'Sync: test' in result.extracted_content
- @pytest.mark.asyncio
async def test_excluded_actions(self, test_browser):
"""Test that excluded actions are not registered"""
@@ -600,7 +573,6 @@ class TestRegistryEdgeCases:
class TestExistingControllerActions:
"""Test that existing controller actions continue to work"""
- @pytest.mark.asyncio
async def test_existing_action_models(self, registry, test_browser):
"""Test that existing action parameter models work correctly"""
@@ -628,7 +600,6 @@ class TestExistingControllerActions:
result3 = await registry.execute_action('test_input', {'index': 5, 'text': 'test input'}, browser_session=test_browser)
assert 'Input text: test input at index: 5' in result3.extracted_content
- @pytest.mark.asyncio
async def test_pydantic_vs_individual_params_consistency(self, registry, test_browser):
"""Test that pydantic and individual parameter patterns produce consistent results"""
diff --git a/tests/test_agent_actions.py b/tests/test_agent_actions.py
index b7654ee50..0c27955b0 100644
--- a/tests/test_agent_actions.py
+++ b/tests/test_agent_actions.py
@@ -1,4 +1,3 @@
-import asyncio
import os
import pytest
@@ -7,7 +6,7 @@ from pydantic import BaseModel, SecretStr
from browser_use.agent.service import Agent
from browser_use.agent.views import AgentHistoryList
-from browser_use.browser.browser import Browser, BrowserConfig
+from browser_use.browser import BrowserProfile, BrowserSession
@pytest.fixture
@@ -25,40 +24,26 @@ def llm():
@pytest.fixture(scope='session')
-def event_loop():
- """Create an instance of the default event loop for each test case."""
- loop = asyncio.get_event_loop_policy().new_event_loop()
- yield loop
- loop.close()
-
-
-@pytest.fixture(scope='session')
-async def browser(event_loop):
- browser_instance = Browser(
- config=BrowserConfig(
+async def browser_session():
+ browser_session = BrowserSession(
+ browser_profile=BrowserProfile(
headless=True,
)
)
- yield browser_instance
- await browser_instance.close()
-
-
-@pytest.fixture
-async def context(browser):
- async with await browser.new_context() as context:
- yield context
- # Clean up automatically happens with __aexit__
+ await browser_session.start()
+ yield browser_session
+ await browser_session.stop()
# pytest tests/test_agent_actions.py -v -k "test_ecommerce_interaction" --capture=no
# @pytest.mark.asyncio
@pytest.mark.skip(reason='Kinda expensive to run')
-async def test_ecommerce_interaction(llm, context):
+async def test_ecommerce_interaction(llm, browser_session):
"""Test complex ecommerce interaction sequence"""
agent = Agent(
task="Go to amazon.com, search for 'laptop', filter by 4+ stars, and find the price of the first result",
llm=llm,
- browser_context=context,
+ browser_session=browser_session,
save_conversation_path='tmp/test_ecommerce_interaction/conversation',
)
@@ -90,13 +75,12 @@ async def test_ecommerce_interaction(llm, context):
assert 'input_exact_correct' in action_sequence or 'correct_in_input' in action_sequence
-# @pytest.mark.asyncio
-async def test_error_recovery(llm, context):
+async def test_error_recovery(llm, browser_session):
"""Test agent's ability to recover from errors"""
agent = Agent(
task='Navigate to nonexistent-site.com and then recover by going to google.com ',
llm=llm,
- browser_context=context,
+ browser_session=browser_session,
)
history: AgentHistoryList = await agent.run(max_steps=10)
@@ -111,13 +95,12 @@ async def test_error_recovery(llm, context):
break
-# @pytest.mark.asyncio
-async def test_find_contact_email(llm, context):
+async def test_find_contact_email(llm, browser_session):
"""Test agent's ability to find contact email on a website"""
agent = Agent(
task='Go to https://browser-use.com/ and find out the contact email',
llm=llm,
- browser_context=context,
+ browser_session=browser_session,
)
history: AgentHistoryList = await agent.run(max_steps=10)
@@ -132,13 +115,12 @@ async def test_find_contact_email(llm, context):
pytest.fail(f'{extracted_content} does not contain {email}')
-# @pytest.mark.asyncio
-async def test_agent_finds_installation_command(llm, context):
+async def test_agent_finds_installation_command(llm, browser_session):
"""Test agent's ability to find the pip installation command for browser-use on the web"""
agent = Agent(
task='Find the pip installation command for the browser-use repo',
llm=llm,
- browser_context=context,
+ browser_session=browser_session,
)
history: AgentHistoryList = await agent.run(max_steps=10)
@@ -162,7 +144,6 @@ class CaptchaTest(BaseModel):
# run 3 test: python -m pytest tests/test_agent_actions.py -v -k "test_captcha_solver" --capture=no --log-cli-level=INFO
# pytest tests/test_agent_actions.py -v -k "test_captcha_solver" --capture=no --log-cli-level=INFO
-@pytest.mark.asyncio
@pytest.mark.parametrize(
'captcha',
[
@@ -190,20 +171,20 @@ class CaptchaTest(BaseModel):
),
],
)
-async def test_captcha_solver(llm, context, captcha: CaptchaTest):
+async def test_captcha_solver(llm, browser_session, captcha: CaptchaTest):
"""Test agent's ability to solve different types of captchas"""
agent = Agent(
task=f'Go to {captcha.url} and solve the captcha. {captcha.additional_text}',
llm=llm,
- browser_context=context,
+ browser_session=browser_session,
)
from browser_use.agent.views import AgentHistoryList
history: AgentHistoryList = await agent.run(max_steps=7)
- state = await context.get_state_summary()
-
- all_text = state.element_tree.get_all_text_till_next_clickable_element()
+ # Get page content to verify success
+ page = await browser_session.get_current_page()
+ all_text = await page.content()
if not all_text:
all_text = ''
diff --git a/tests/test_clicks.py b/tests/test_clicks.py
index eb6182d95..74788f9eb 100644
--- a/tests/test_clicks.py
+++ b/tests/test_clicks.py
@@ -2,9 +2,8 @@ import asyncio
import json
import anyio
-import pytest
-from browser_use.browser.browser import Browser, BrowserConfig
+from browser_use.browser import BrowserProfile, BrowserSession
from browser_use.dom.views import DOMBaseNode, DOMElementNode, DOMTextNode
from browser_use.utils import time_execution_sync
@@ -29,12 +28,11 @@ class ElementTreeSerializer:
# run with: pytest browser_use/browser/tests/test_clicks.py
-@pytest.mark.asyncio
async def test_highlight_elements():
- browser = Browser(config=BrowserConfig(headless=False, disable_security=True, user_data_dir=None))
-
- async with await browser.new_context() as context:
- page = await context.get_current_page()
+ browser_session = BrowserSession(browser_profile=BrowserProfile(headless=True))
+ await browser_session.start()
+ try:
+ page = await browser_session.get_current_page()
# await page.goto('https://immobilienscout24.de')
# await page.goto('https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/service-plans')
# await page.goto('https://google.com/search?q=elon+musk')
@@ -49,7 +47,7 @@ async def test_highlight_elements():
while True:
try:
# await asyncio.sleep(10)
- state = await context.get_state_summary(True)
+ state = await browser_session.get_state_summary(cache_clickable_elements_hashes=True)
async with await anyio.open_file('./tmp/page.json', 'w') as f:
await f.write(
@@ -84,13 +82,15 @@ async def test_highlight_elements():
print(state.element_tree.clickable_elements_to_string())
action = input('Select next action: ')
- await time_execution_sync('remove_highlight_elements')(context.remove_highlights)()
+ await time_execution_sync('remove_highlight_elements')(browser_session.remove_highlights)()
node_element = state.selector_map[int(action)]
# check if index of selector map are the same as index of items in dom_items
- await context._click_element_node(node_element)
+ await browser_session._click_element_node(node_element)
except Exception as e:
print(e)
+ finally:
+ await browser_session.stop()
diff --git a/tests/test_dropdown_error.py b/tests/test_dropdown_error.py
index fe1a28d6d..ff6226753 100644
--- a/tests/test_dropdown_error.py
+++ b/tests/test_dropdown_error.py
@@ -7,8 +7,7 @@ Simple try of the agent.
import os
import sys
-from browser_use.browser.browser import Browser, BrowserConfig
-from browser_use.browser.context import BrowserContext
+from browser_use.browser import BrowserProfile, BrowserSession
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
@@ -17,24 +16,23 @@ from langchain_openai import ChatOpenAI
from browser_use import Agent, AgentHistoryList
llm = ChatOpenAI(model='gpt-4o')
-# browser = Browser(config=BrowserConfig(headless=False))
+browser_session = BrowserSession(browser_profile=BrowserProfile(headless=True))
agent = Agent(
task=('go to https://codepen.io/shyam-king/pen/emOyjKm and select number "4" and return the output of "selected value"'),
llm=llm,
- browser_context=BrowserContext(
- browser=Browser(config=BrowserConfig(headless=False, disable_security=True)),
- ),
+ browser_session=browser_session,
)
async def test_dropdown():
- history: AgentHistoryList = await agent.run(20)
- # await controller.browser.close(force=True)
+ await browser_session.start()
+ try:
+ history: AgentHistoryList = await agent.run(20)
- result = history.final_result()
- assert result is not None
- assert '4' in result
- print(result)
-
- # await browser.close()
+ result = history.final_result()
+ assert result is not None
+ assert '4' in result
+ print(result)
+ finally:
+ await browser_session.stop()
diff --git a/tests/test_excluded_actions.py b/tests/test_excluded_actions.py
index d4544283b..193a05a28 100644
--- a/tests/test_excluded_actions.py
+++ b/tests/test_excluded_actions.py
@@ -1,58 +1,43 @@
-import asyncio
-import os
-
import pytest
-from langchain_openai import AzureChatOpenAI
-from pydantic import SecretStr
from browser_use.agent.service import Agent
from browser_use.agent.views import AgentHistoryList
-from browser_use.browser.browser import Browser, BrowserConfig
+from browser_use.browser import BrowserSession
from browser_use.controller.service import Controller
# run with:
# python -m pytest tests/test_excluded_actions.py -v -k "test_only_open_tab_allowed" --capture=no
-@pytest.fixture(scope='session')
-def event_loop():
- """Create an instance of the default event loop for each test case."""
- loop = asyncio.get_event_loop_policy().new_event_loop()
- yield loop
- loop.close()
+class MockLLM:
+ """Mock LLM for testing"""
+
+ async def ainvoke(self, prompt):
+ class MockResponse:
+ content = 'Mocked LLM response'
+
+ return MockResponse()
-@pytest.fixture(scope='session')
-async def browser(event_loop):
- browser_instance = Browser(
- config=BrowserConfig(
- headless=True,
- )
+@pytest.fixture(scope='module')
+async def browser_session():
+ browser_session = BrowserSession(
+ headless=True,
+ user_data_dir=None,
)
- yield browser_instance
- await browser_instance.close()
-
-
-@pytest.fixture
-async def context(browser):
- async with await browser.new_context() as context:
- yield context
+ await browser_session.start()
+ yield browser_session
+ await browser_session.stop()
@pytest.fixture
def llm():
"""Initialize language model for testing"""
- return AzureChatOpenAI(
- model='gpt-4o',
- api_version='2024-10-21',
- azure_endpoint=os.getenv('AZURE_OPENAI_ENDPOINT', ''),
- api_key=SecretStr(os.getenv('AZURE_OPENAI_KEY', '')),
- )
+ return MockLLM()
# pytest tests/test_excluded_actions.py -v -k "test_only_open_tab_allowed" --capture=no
-@pytest.mark.asyncio
-async def test_only_open_tab_allowed(llm, context):
+async def test_only_open_tab_allowed(llm, browser_session):
"""Test that only open_tab action is available while others are excluded"""
# Create list of all default actions except open_tab
@@ -80,7 +65,7 @@ async def test_only_open_tab_allowed(llm, context):
agent = Agent(
task="Go to google.com and search for 'python programming'",
llm=llm,
- browser_context=context,
+ browser_session=browser_session,
controller=controller,
)
diff --git a/tests/test_gif_path.py b/tests/test_gif_path.py
index d9e327695..84d4b441a 100644
--- a/tests/test_gif_path.py
+++ b/tests/test_gif_path.py
@@ -7,8 +7,7 @@ Simple try of the agent.
import os
import sys
-from browser_use.browser.browser import Browser, BrowserConfig
-from browser_use.browser.context import BrowserContext
+from browser_use.browser import BrowserProfile, BrowserSession
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
@@ -18,12 +17,12 @@ from browser_use import Agent, AgentHistoryList
llm = ChatOpenAI(model='gpt-4o')
+browser_session = BrowserSession(browser_profile=BrowserProfile(headless=True, disable_security=True))
+
agent = Agent(
task=('go to google.com and search for text "hi there"'),
llm=llm,
- browser_context=BrowserContext(
- browser=Browser(config=BrowserConfig(headless=False, disable_security=True)),
- ),
+ browser_session=browser_session,
generate_gif='./google.gif',
)
@@ -32,9 +31,13 @@ async def test_gif_path():
if os.path.exists('./google.gif'):
os.unlink('./google.gif')
- history: AgentHistoryList = await agent.run(20)
+ await browser_session.start()
+ try:
+ history: AgentHistoryList = await agent.run(20)
- result = history.final_result()
- assert result is not None
+ result = history.final_result()
+ assert result is not None
- assert os.path.exists('./google.gif'), 'google.gif was not created'
+ assert os.path.exists('./google.gif'), 'google.gif was not created'
+ finally:
+ await browser_session.stop()
diff --git a/tests/test_mind2web.py b/tests/test_mind2web.py
index 1e8db557d..8b62cf1db 100644
--- a/tests/test_mind2web.py
+++ b/tests/test_mind2web.py
@@ -2,7 +2,6 @@
Test browser automation using Mind2Web dataset tasks with pytest framework.
"""
-import asyncio
import json
import os
from typing import Any
@@ -12,7 +11,7 @@ from langchain_openai import AzureChatOpenAI
from pydantic import SecretStr
from browser_use.agent.service import Agent
-from browser_use.browser.browser import Browser, BrowserConfig
+from browser_use.browser import BrowserProfile, BrowserSession
from browser_use.utils import logger
# Constants
@@ -20,31 +19,19 @@ MAX_STEPS = 50
TEST_SUBSET_SIZE = 10
-@pytest.fixture(scope='session')
-def event_loop():
- loop = asyncio.get_event_loop_policy().new_event_loop()
- yield loop
- loop.close()
-
-
-@pytest.fixture(scope='session')
-async def browser(event_loop):
- browser_instance = Browser(
- config=BrowserConfig(
+@pytest.fixture
+async def browser_session():
+ browser_session = BrowserSession(
+ browser_profile=BrowserProfile(
headless=True,
)
)
- yield browser_instance
- await browser_instance.close()
+ await browser_session.start()
+ yield browser_session
+ await browser_session.stop()
@pytest.fixture
-async def context(browser):
- async with await browser.new_context() as new_context:
- yield new_context
-
-
-@pytest.fixture(scope='session')
def test_cases() -> list[dict[str, Any]]:
"""Load test cases from Mind2Web dataset"""
file_path = os.path.join(os.path.dirname(__file__), 'mind2web_data/processed.json')
@@ -72,8 +59,7 @@ def llm():
# run with: pytest -s -v tests/test_mind2web.py:test_random_samples
-@pytest.mark.asyncio
-async def test_random_samples(test_cases: list[dict[str, Any]], llm, context, validator):
+async def test_random_samples(test_cases: list[dict[str, Any]], llm, browser_session):
"""Test a random sampling of tasks across different websites"""
import random
@@ -87,7 +73,7 @@ async def test_random_samples(test_cases: list[dict[str, Any]], llm, context, va
logger.info(f'--- Random Sample {i}/{len(samples)} ---')
logger.info(f'Task: {task}\n')
- agent = Agent(task, llm, browser_context=context)
+ agent = Agent(task, llm, browser_session=browser_session)
await agent.run()
diff --git a/tests/test_models.py b/tests/test_models.py
index 2ffed8286..8afd6ce1b 100644
--- a/tests/test_models.py
+++ b/tests/test_models.py
@@ -1,4 +1,3 @@
-import asyncio
import os
import httpx
@@ -11,32 +10,19 @@ from pydantic import SecretStr
from browser_use.agent.service import Agent
from browser_use.agent.views import AgentHistoryList
-from browser_use.browser.browser import Browser, BrowserConfig
-
-
-@pytest.fixture(scope='function')
-def event_loop():
- """Create an instance of the default event loop for each test case."""
- loop = asyncio.get_event_loop_policy().new_event_loop()
- yield loop
- loop.close()
-
-
-@pytest.fixture(scope='function')
-async def browser(event_loop):
- browser_instance = Browser(
- config=BrowserConfig(
- headless=True,
- )
- )
- yield browser_instance
- await browser_instance.close()
+from browser_use.browser import BrowserProfile, BrowserSession
@pytest.fixture
-async def context(browser):
- async with await browser.new_context() as context:
- yield context
+async def browser_session():
+ browser_session = BrowserSession(
+ browser_profile=BrowserProfile(
+ headless=True,
+ )
+ )
+ await browser_session.start()
+ yield browser_session
+ await browser_session.stop()
api_key_gemini = SecretStr(os.getenv('GOOGLE_API_KEY') or '')
@@ -105,8 +91,7 @@ async def llm(request):
return request.param
-@pytest.mark.asyncio
-async def test_model_search(llm, context):
+async def test_model_search(llm, browser_session):
"""Test 'Search Google' action"""
model_name = llm.model if hasattr(llm, 'model') else llm.model_name
print(f'\nTesting model: {model_name}')
@@ -134,7 +119,7 @@ async def test_model_search(llm, context):
agent = Agent(
task="Search Google for 'elon musk' then click on the first result and scroll down.",
llm=llm,
- browser_context=context,
+ browser_session=browser_session,
max_failures=2,
use_vision=use_vision,
)
diff --git a/tests/test_qwen.py b/tests/test_qwen.py
index 7a3334307..f847a4453 100644
--- a/tests/test_qwen.py
+++ b/tests/test_qwen.py
@@ -1,11 +1,9 @@
-import asyncio
-
import pytest
from langchain_ollama import ChatOllama
from browser_use.agent.service import Agent
from browser_use.agent.views import AgentHistoryList
-from browser_use.browser.browser import Browser, BrowserConfig
+from browser_use.browser import BrowserProfile, BrowserSession
@pytest.fixture
@@ -20,38 +18,25 @@ def llm():
)
-@pytest.fixture(scope='session')
-def event_loop():
- """Create an instance of the default event loop for each test case."""
- loop = asyncio.get_event_loop_policy().new_event_loop()
- yield loop
- loop.close()
-
-
-@pytest.fixture(scope='session')
-async def browser(event_loop):
- browser_instance = Browser(
- config=BrowserConfig(
+@pytest.fixture
+async def browser_session():
+ browser_session = BrowserSession(
+ browser_profile=BrowserProfile(
headless=True,
)
)
- yield browser_instance
- await browser_instance.close()
-
-
-@pytest.fixture
-async def context(browser):
- async with await browser.new_context() as context:
- yield context
+ await browser_session.start()
+ yield browser_session
+ await browser_session.stop()
# pytest tests/test_qwen.py -v -k "test_qwen_url" --capture=no
-# @pytest.mark.asyncio
-async def test_qwen_url(llm, context):
+async def test_qwen_url(llm, browser_session):
"""Test complex ecommerce interaction sequence"""
agent = Agent(
task='go_to_url amazon.com',
llm=llm,
+ browser_session=browser_session,
)
history: AgentHistoryList = await agent.run(max_steps=3)