mirror of
https://github.com/browser-use/browser-use
synced 2026-05-06 17:52:15 +02:00
Issue with relative link caught in https://github.com/browser-use/browser-use/issues/3387
925 lines
33 KiB
Markdown
925 lines
33 KiB
Markdown
<guidelines>
|
|
Browser-Use is an AI agent that autonomously interacts with the web. It takes a user-defined task, navigates web pages using Chromium via CDP, processes HTML, and repeatedly queries a language model (like `gpt-4.1-mini`) to decide the next action—until the task is completed.
|
|
|
|
# Development Rules
|
|
- Always use [`uv`](mdc:https:/github.com/astral-sh/uv) instead of `pip`
|
|
```bash
|
|
uv venv --python 3.11
|
|
source .venv/bin/activate
|
|
uv sync
|
|
```
|
|
|
|
- Do not replace model names. Users try new models which you will not know about yet.
|
|
|
|
- Type-safe coding: Use Pydantic v2 models for all internal action schemas, task inputs/outputs, and tools I/O. This ensures robust validation and LLM-call integrity.
|
|
|
|
- Pre-commit formatting: ALWAYS make sure to run pre-commit before making PRs.
|
|
|
|
- Use descriptive names and docstrings for each action.
|
|
|
|
- Prefer returning `ActionResult` with structured content to help the agent reason better.
|
|
|
|
- Follow the documentation in <browser_use_docs/>: some docs have been truncated, if unsure, you can visit or request the information from specific docs pages before proceeding.
|
|
|
|
- Never create random examples: When asked to implement a feature never create new files that show off that feature -> the code just gets messy. If you do anything to test it out, just use inline code inside the terminal (if you want).
|
|
|
|
</guidelines>
|
|
|
|
<browser_use_docs>
|
|
# Human Quickstart
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/quickstart
|
|
|
|
## 1. Fast setup
|
|
|
|
<Tabs>
|
|
<Tab title="uv">
|
|
```bash create environment theme={null}
|
|
uv venv --python 3.12
|
|
```
|
|
</Tab>
|
|
|
|
<Tab title="pip">
|
|
```bash create environment with python >= 3.11 theme={null}
|
|
python3.12 -m venv .venv
|
|
```
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
<Tabs>
|
|
<Tab title="Mac/Linux">
|
|
```bash activate environment theme={null}
|
|
source .venv/bin/activate
|
|
```
|
|
</Tab>
|
|
|
|
<Tab title="Windows">
|
|
```bash activate environment theme={null}
|
|
.venv\Scripts\activate
|
|
```
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
<Tabs>
|
|
<Tab title="uv">
|
|
```bash install browser-use & chromium theme={null}
|
|
uv pip install browser-use
|
|
uvx playwright install chromium --with-deps
|
|
```
|
|
</Tab>
|
|
|
|
<Tab title="pip">
|
|
```bash install browser-use & chromium theme={null}
|
|
pip install browser-use
|
|
pip install playwright && playwright install chromium --with-deps
|
|
```
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
## 2. Choose your favorite LLM
|
|
|
|
Create a `.env` file and add your API key. Don't have one? Start with a [free Gemini key](https://aistudio.google.com/app/u/1/apikey?pli=1).
|
|
|
|
<Tabs>
|
|
<Tab title="Mac/Linux">
|
|
```bash create .env file theme={null}
|
|
touch .env
|
|
```
|
|
</Tab>
|
|
|
|
<Tab title="Windows">
|
|
```cmd create .env file theme={null}
|
|
echo. > .env
|
|
```
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
<Tabs>
|
|
<Tab title="Google">
|
|
```bash add your key to .env file theme={null}
|
|
GEMINI_API_KEY=
|
|
```
|
|
</Tab>
|
|
|
|
<Tab title="OpenAI">
|
|
```bash add your key to .env file theme={null}
|
|
OPENAI_API_KEY=
|
|
```
|
|
</Tab>
|
|
|
|
<Tab title="Anthropic">
|
|
```bash add your key to .env file theme={null}
|
|
ANTHROPIC_API_KEY=
|
|
```
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
See [Supported Models](/customize/supported-models) for more.
|
|
|
|
## 3. Run your first agent
|
|
|
|
<Tabs>
|
|
<Tab title="Google">
|
|
```python agent.py theme={null}
|
|
from browser_use import Agent, ChatGoogle
|
|
from dotenv import load_dotenv
|
|
import asyncio
|
|
|
|
load_dotenv()
|
|
|
|
async def main():
|
|
llm = ChatGoogle(model="gemini-flash-latest")
|
|
task = "Find the number 1 post on Show HN"
|
|
agent = Agent(task=task, llm=llm)
|
|
await agent.run()
|
|
|
|
if __name__ == "__main__":
|
|
asyncio.run(main())
|
|
```
|
|
</Tab>
|
|
|
|
<Tab title="OpenAI">
|
|
```python agent.py theme={null}
|
|
from browser_use import Agent, ChatOpenAI
|
|
from dotenv import load_dotenv
|
|
import asyncio
|
|
|
|
load_dotenv()
|
|
|
|
async def main():
|
|
llm = ChatOpenAI(model="gpt-4.1-mini")
|
|
task = "Find the number 1 post on Show HN"
|
|
agent = Agent(task=task, llm=llm)
|
|
await agent.run()
|
|
|
|
if __name__ == "__main__":
|
|
asyncio.run(main())
|
|
```
|
|
</Tab>
|
|
|
|
<Tab title="Anthropic">
|
|
```python agent.py theme={null}
|
|
from browser_use import Agent, ChatAnthropic
|
|
from dotenv import load_dotenv
|
|
import asyncio
|
|
|
|
load_dotenv()
|
|
|
|
async def main():
|
|
llm = ChatAnthropic(model='claude-sonnet-4-0', temperature=0.0)
|
|
task = "Find the number 1 post on Show HN"
|
|
agent = Agent(task=task, llm=llm)
|
|
await agent.run()
|
|
|
|
if __name__ == "__main__":
|
|
asyncio.run(main())
|
|
```
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
|
|
# Actor All Parameters
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/actor/all-parameters
|
|
|
|
Complete API reference for Browser Actor classes, methods, and parameters including BrowserSession, Page, Element, and Mouse
|
|
|
|
|
|
# Actor Basics
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/actor/basics
|
|
Low-level Playwright-like browser automation with direct and full CDP control and precise element interactions
|
|
|
|
|
|
# Actor Examples
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/actor/examples
|
|
Comprehensive examples for Browser Actor automation tasks including forms, JavaScript, mouse operations, and AI features
|
|
|
|
|
|
# Agent All Parameters
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/agent/all-parameters
|
|
|
|
Complete reference for all agent configuration options
|
|
|
|
## Available Parameters
|
|
|
|
### Core Settings
|
|
|
|
* `tools`: Registry of [our tools](https://github.com/browser-use/browser-use/blob/main/browser_use/tools/service.py) the agent can call. [Example for custom tools](https://github.com/browser-use/browser-use/tree/main/examples/custom-functions)
|
|
* `browser`: Browser object where you can specify the browser settings.
|
|
* `output_model_schema`: Pydantic model class for structured output validation. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/custom_output.py)
|
|
|
|
### Vision & Processing
|
|
|
|
* `use_vision` (default: `"auto"`): Vision mode - `"auto"` includes screenshot tool but only uses vision when requested, `True` always includes screenshots, `False` never includes screenshots and excludes screenshot tool
|
|
* `vision_detail_level` (default: `'auto'`): Screenshot detail level - `'low'`, `'high'`, or `'auto'`
|
|
* `page_extraction_llm`: Separate LLM model for page content extraction. You can choose a small & fast model because it only needs to extract text from the page (default: same as `llm`)
|
|
|
|
### Actions & Behavior
|
|
|
|
* `initial_actions`: List of actions to run before the main task without LLM. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/initial_actions.py)
|
|
* `max_actions_per_step` (default: `10`): Maximum actions per step, e.g. for form filling the agent can output 10 fields at once. We execute the actions until the page changes.
|
|
* `max_failures` (default: `3`): Maximum retries for steps with errors
|
|
* `final_response_after_failure` (default: `True`): If True, attempt to force one final model call with intermediate output after max\_failures is reached
|
|
* `use_thinking` (default: `True`): Controls whether the agent uses its internal "thinking" field for explicit reasoning steps.
|
|
* `flash_mode` (default: `False`): Fast mode that skips evaluation, next goal and thinking and only uses memory. If `flash_mode` is enabled, it overrides `use_thinking` and disables the thinking process entirely. [Example](https://github.com/browser-use/browser-use/blob/main/examples/getting_started/05_fast_agent.py)
|
|
|
|
### System Messages
|
|
|
|
* `override_system_message`: Completely replace the default system prompt.
|
|
* `extend_system_message`: Add additional instructions to the default system prompt. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/custom_system_prompt.py)
|
|
|
|
### File & Data Management
|
|
|
|
* `save_conversation_path`: Path to save complete conversation history
|
|
* `save_conversation_path_encoding` (default: `'utf-8'`): Encoding for saved conversations
|
|
* `available_file_paths`: List of file paths the agent can access
|
|
* `sensitive_data`: Dictionary of sensitive data to handle carefully. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/sensitive_data.py)
|
|
|
|
### Visual Output
|
|
|
|
* `generate_gif` (default: `False`): Generate GIF of agent actions. Set to `True` or string path
|
|
* `include_attributes`: List of HTML attributes to include in page analysis
|
|
|
|
### Performance & Limits
|
|
|
|
* `max_history_items`: Maximum number of last steps to keep in the LLM memory. If `None`, we keep all steps.
|
|
* `llm_timeout` (default: `90`): Timeout in seconds for LLM calls
|
|
* `step_timeout` (default: `120`): Timeout in seconds for each step
|
|
* `directly_open_url` (default: `True`): If we detect a url in the task, we directly open it.
|
|
|
|
### Advanced Options
|
|
|
|
* `calculate_cost` (default: `False`): Calculate and track API costs
|
|
* `display_files_in_done_text` (default: `True`): Show file information in completion messages
|
|
|
|
### Backwards Compatibility
|
|
|
|
* `controller`: Alias for `tools` for backwards compatibility.
|
|
* `browser_session`: Alias for `browser` for backwards compatibility.
|
|
|
|
|
|
# Agent Basics
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/agent/basics
|
|
|
|
|
|
|
|
```python theme={null}
|
|
from browser_use import Agent, ChatOpenAI
|
|
|
|
agent = Agent(
|
|
task="Search for latest news about AI",
|
|
llm=ChatOpenAI(model="gpt-4.1-mini"),
|
|
)
|
|
|
|
async def main():
|
|
history = await agent.run(max_steps=100)
|
|
```
|
|
|
|
* `task`: The task you want to automate.
|
|
* `llm`: Your favorite LLM. See <a href="/customize/supported-models">Supported Models</a>.
|
|
|
|
The agent is executed using the async `run()` method:
|
|
|
|
* `max_steps` (default: `100`): Maximum number of steps an agent can take.
|
|
|
|
|
|
# Agent Output Format
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/agent/output-format
|
|
|
|
## Agent History
|
|
|
|
The `run()` method returns an `AgentHistoryList` object with the complete execution history:
|
|
|
|
```python theme={null}
|
|
history = await agent.run()
|
|
|
|
# Access useful information
|
|
history.urls() # List of visited URLs
|
|
history.screenshot_paths() # List of screenshot paths
|
|
history.screenshots() # List of screenshots as base64 strings
|
|
history.action_names() # Names of executed actions
|
|
history.extracted_content() # List of extracted content from all actions
|
|
history.errors() # List of errors (with None for steps without errors)
|
|
history.model_actions() # All actions with their parameters
|
|
history.model_outputs() # All model outputs from history
|
|
history.last_action() # Last action in history
|
|
|
|
# Analysis methods
|
|
history.final_result() # Get the final extracted content (last step)
|
|
history.is_done() # Check if agent completed successfully
|
|
history.is_successful() # Check if agent completed successfully (returns None if not done)
|
|
history.has_errors() # Check if any errors occurred
|
|
history.model_thoughts() # Get the agent's reasoning process (AgentBrain objects)
|
|
history.action_results() # Get all ActionResult objects from history
|
|
history.action_history() # Get truncated action history with essential fields
|
|
history.number_of_steps() # Get the number of steps in the history
|
|
history.total_duration_seconds() # Get total duration of all steps in seconds
|
|
|
|
# Structured output (when using output_model_schema)
|
|
history.structured_output # Property that returns parsed structured output
|
|
```
|
|
|
|
See all helper methods in the [AgentHistoryList source code](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/views.py#L301).
|
|
|
|
## Structured Output
|
|
|
|
For structured output, use the `output_model_schema` parameter with a Pydantic model. [Example](https://github.com/browser-use/browser-use/blob/main/examples/features/custom_output.py).
|
|
|
|
|
|
# Agent Prompting Guide
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/agent/prompting-guide
|
|
|
|
Tips and tricks
|
|
|
|
Prompting can drastically improve performance and solve existing limitations of the library.
|
|
|
|
### 1. Be Specific vs Open-Ended
|
|
|
|
✅ Specific (Recommended)
|
|
|
|
```python theme={null}
|
|
task = """
|
|
1. Go to https://quotes.toscrape.com/
|
|
2. Use extract action with the query "first 3 quotes with their authors"
|
|
3. Save results to quotes.csv using write_file action
|
|
4. Do a google search for the first quote and find when it was written
|
|
"""
|
|
```
|
|
|
|
❌ Open-Ended
|
|
|
|
```python theme={null}
|
|
task = "Go to web and make money"
|
|
```
|
|
|
|
### 2. Name Actions Directly
|
|
|
|
When you know exactly what the agent should do, reference actions by name:
|
|
|
|
```python theme={null}
|
|
task = """
|
|
1. Use search action to find "Python tutorials"
|
|
2. Use click to open first result in a new tab
|
|
3. Use scroll action to scroll down 2 pages
|
|
4. Use extract to extract the names of the first 5 items
|
|
5. Wait for 2 seconds if the page is not loaded, refresh it and wait 10 sec
|
|
6. Use send_keys action with "Tab Tab ArrowDown Enter"
|
|
"""
|
|
```
|
|
|
|
See [Available Tools](https://docs.browser-use.com/customize/tools/available) for the complete list of actions.
|
|
|
|
### 3. Handle interaction problems via keyboard navigation
|
|
|
|
Sometimes buttons can't be clicked (you found a bug in the library - open an issue).
|
|
Good news - often you can work around it with keyboard navigation!
|
|
|
|
```python theme={null}
|
|
task = """
|
|
If the submit button cannot be clicked:
|
|
1. Use send_keys action with "Tab Tab Enter" to navigate and activate
|
|
2. Or use send_keys with "ArrowDown ArrowDown Enter" for form submission
|
|
"""
|
|
```
|
|
|
|
### 4. Custom Actions Integration
|
|
|
|
```python theme={null}
|
|
# When you have custom actions
|
|
@controller.action("Get 2FA code from authenticator app")
|
|
async def get_2fa_code():
|
|
# Your implementation
|
|
pass
|
|
|
|
task = """
|
|
Login with 2FA:
|
|
1. Enter username/password
|
|
2. When prompted for 2FA, use get_2fa_code action
|
|
3. NEVER try to extract 2FA codes from the page manually
|
|
4. ALWAYS use the get_2fa_code action for authentication codes
|
|
"""
|
|
```
|
|
|
|
### 5. Error Recovery
|
|
|
|
```python theme={null}
|
|
task = """
|
|
Robust data extraction:
|
|
1. Go to openai.com to find their CEO
|
|
2. If navigation fails due to anti-bot protection:
|
|
- Use google search to find the CEO
|
|
3. If page times out, use go_back and try alternative approach
|
|
"""
|
|
```
|
|
|
|
The key to effective prompting is being specific about actions.
|
|
|
|
|
|
# Agent Supported Models
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/agent/supported-models
|
|
LLMs supported (changes frequently, check the documentation when needed)
|
|
|
|
|
|
# Browser All Parameters
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/browser/all-parameters
|
|
|
|
Complete reference for all browser configuration options
|
|
|
|
<Note>
|
|
The `Browser` instance also provides all [Actor](/customize/actor/all-parameters) methods for direct browser control (page management, element interactions, etc.).
|
|
</Note>
|
|
|
|
## Core Settings
|
|
|
|
* `cdp_url`: CDP URL for connecting to existing browser instance (e.g., `"http://localhost:9222"`)
|
|
|
|
## Display & Appearance
|
|
|
|
* `headless` (default: `None`): Run browser without UI. Auto-detects based on display availability (`True`/`False`/`None`)
|
|
* `window_size`: Browser window size for headful mode. Use dict `{'width': 1920, 'height': 1080}` or `ViewportSize` object
|
|
* `window_position` (default: `{'width': 0, 'height': 0}`): Window position from top-left corner in pixels
|
|
* `viewport`: Content area size, same format as `window_size`. Use `{'width': 1280, 'height': 720}` or `ViewportSize` object
|
|
* `no_viewport` (default: `None`): Disable viewport emulation, content fits to window size
|
|
* `device_scale_factor`: Device scale factor (DPI). Set to `2.0` or `3.0` for high-resolution screenshots
|
|
|
|
## Browser Behavior
|
|
|
|
* `keep_alive` (default: `None`): Keep browser running after agent completes
|
|
* `allowed_domains`: Restrict navigation to specific domains. Domain pattern formats:
|
|
* `'example.com'` - Matches only `https://example.com/*`
|
|
* `'*.example.com'` - Matches `https://example.com/*` and any subdomain `https://*.example.com/*`
|
|
* `'http*://example.com'` - Matches both `http://` and `https://` protocols
|
|
* `'chrome-extension://*'` - Matches any Chrome extension URL
|
|
* Security: Wildcards in TLD (e.g., `example.*`) are not allowed for security
|
|
* Use list like `['*.google.com', 'https://example.com', 'chrome-extension://*']`
|
|
* Performance: Lists with 100+ domains are automatically optimized to sets for O(1) lookup. Pattern matching is disabled for optimized lists. Both `www.example.com` and `example.com` variants are checked automatically.
|
|
* `prohibited_domains`: Block navigation to specific domains. Uses same pattern formats as `allowed_domains`. When both `allowed_domains` and `prohibited_domains` are set, `allowed_domains` takes precedence. Examples:
|
|
* `['nsfw.com', '*.gambling-site.net']` - Block specific sites and all subdomains
|
|
* `['https://explicit-content.org']` - Block specific protocol/domain combination
|
|
* Performance: Lists with 100+ domains are automatically optimized to sets for O(1) lookup (same as `allowed_domains`)
|
|
* `enable_default_extensions` (default: `True`): Load automation extensions (uBlock Origin, cookie handlers, ClearURLs)
|
|
* `cross_origin_iframes` (default: `False`): Enable cross-origin iframe support (may cause complexity)
|
|
* `is_local` (default: `True`): Whether this is a local browser instance. Set to `False` for remote browsers. If we have a `executable_path` set, it will be automatically set to `True`. This can effect your download behavior.
|
|
|
|
## User Data & Profiles
|
|
|
|
* `user_data_dir` (default: auto-generated temp): Directory for browser profile data. Use `None` for incognito mode
|
|
* `profile_directory` (default: `'Default'`): Chrome profile subdirectory name (`'Profile 1'`, `'Work Profile'`, etc.)
|
|
* `storage_state`: Browser storage state (cookies, localStorage). Can be file path string or dict object
|
|
|
|
## Network & Security
|
|
|
|
* `proxy`: Proxy configuration using `ProxySettings(server='http://host:8080', bypass='localhost,127.0.0.1', username='user', password='pass')`
|
|
* `permissions` (default: `['clipboardReadWrite', 'notifications']`): Browser permissions to grant. Use list like `['camera', 'microphone', 'geolocation']`
|
|
* `headers`: Additional HTTP headers for connect requests (remote browsers only)
|
|
|
|
## Browser Launch
|
|
|
|
* `executable_path`: Path to browser executable for custom installations. Platform examples:
|
|
* macOS: `'/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'`
|
|
* Windows: `'C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe'`
|
|
* Linux: `'/usr/bin/google-chrome'`
|
|
* `channel`: Browser channel (`'chromium'`, `'chrome'`, `'chrome-beta'`, `'msedge'`, etc.)
|
|
* `args`: Additional command-line arguments for the browser. Use list format: `['--disable-gpu', '--custom-flag=value', '--another-flag']`
|
|
* `env`: Environment variables for browser process. Use dict like `{'DISPLAY': ':0', 'LANG': 'en_US.UTF-8', 'CUSTOM_VAR': 'test'}`
|
|
* `chromium_sandbox` (default: `True` except in Docker): Enable Chromium sandboxing for security
|
|
* `devtools` (default: `False`): Open DevTools panel automatically (requires `headless=False`)
|
|
* `ignore_default_args`: List of default args to disable, or `True` to disable all. Use list like `['--enable-automation', '--disable-extensions']`
|
|
|
|
## Timing & Performance
|
|
|
|
* `minimum_wait_page_load_time` (default: `0.25`): Minimum time to wait before capturing page state in seconds
|
|
* `wait_for_network_idle_page_load_time` (default: `0.5`): Time to wait for network activity to cease in seconds
|
|
* `wait_between_actions` (default: `0.5`): Time to wait between agent actions in seconds
|
|
|
|
## AI Integration
|
|
|
|
* `highlight_elements` (default: `True`): Highlight interactive elements for AI vision
|
|
* `paint_order_filtering` (default: `True`): Enable paint order filtering to optimize DOM tree by removing elements hidden behind others. Slightly experimental
|
|
|
|
## Downloads & Files
|
|
|
|
* `accept_downloads` (default: `True`): Automatically accept all downloads
|
|
* `downloads_path`: Directory for downloaded files. Use string like `'./downloads'` or `Path` object
|
|
* `auto_download_pdfs` (default: `True`): Automatically download PDFs instead of viewing in browser
|
|
|
|
## Device Emulation
|
|
|
|
* `user_agent`: Custom user agent string. Example: `'Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X)'`
|
|
* `screen`: Screen size information, same format as `window_size`
|
|
|
|
## Recording & Debugging
|
|
|
|
* `record_video_dir`: Directory to save video recordings as `.mp4` files
|
|
* `record_video_size` (default: `ViewportSize`): The frame size (width, height) of the video recording.
|
|
* `record_video_framerate` (default: `30`): The framerate to use for the video recording.
|
|
* `record_har_path`: Path to save network trace files as `.har` format
|
|
* `traces_dir`: Directory to save complete trace files for debugging
|
|
* `record_har_content` (default: `'embed'`): HAR content mode (`'omit'`, `'embed'`, `'attach'`)
|
|
* `record_har_mode` (default: `'full'`): HAR recording mode (`'full'`, `'minimal'`)
|
|
|
|
## Advanced Options
|
|
|
|
* `disable_security` (default: `False`): ⚠️ NOT RECOMMENDED - Disables all browser security features
|
|
* `deterministic_rendering` (default: `False`): ⚠️ NOT RECOMMENDED - Forces consistent rendering but reduces performance
|
|
|
|
*
|
|
|
|
## Browser vs BrowserSession
|
|
|
|
`Browser` is an alias for `BrowserSession` - they are exactly the same class:
|
|
Use `Browser` for cleaner, more intuitive code.
|
|
|
|
|
|
# Browser Basics
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/browser/basics
|
|
|
|
*
|
|
|
|
```python theme={null}
|
|
from browser_use import Agent, Browser, ChatOpenAI
|
|
|
|
browser = Browser(
|
|
headless=False, # Show browser window
|
|
window_size={'width': 1000, 'height': 700}, # Set window size
|
|
)
|
|
|
|
agent = Agent(
|
|
task='Search for Browser Use',
|
|
browser=browser,
|
|
llm=ChatOpenAI(model='gpt-4.1-mini'),
|
|
)
|
|
|
|
|
|
async def main():
|
|
await agent.run()
|
|
```
|
|
|
|
|
|
# Browser: Real Browser
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/browser/real-browser
|
|
Connect your existing Chrome browser to preserve authentication.
|
|
|
|
# Browser: Remote Browser
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/browser/remote
|
|
The easiest way to use a cloud browser is with the built-in Browser-Use cloud service:
|
|
|
|
|
|
# Lifecycle Hooks
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/hooks
|
|
Customize agent behavior with lifecycle hooks
|
|
|
|
|
|
# MCP Server
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/mcp-server
|
|
Expose browser-use capabilities via Model Context Protocol for AI assistants like Claude Desktop
|
|
|
|
|
|
# Tools: Add Tools
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/tools/add
|
|
|
|
Examples:
|
|
* deterministic clicks
|
|
* file handling
|
|
* calling APIs
|
|
* human-in-the-loop
|
|
* browser interactions
|
|
* calling LLMs
|
|
* get 2fa codes
|
|
* send emails
|
|
* Playwright integration (see [GitHub example](https://github.com/browser-use/browser-use/blob/main/examples/browser/playwright_integration.py))
|
|
* ...
|
|
|
|
Simply add `@tools.action(...)` to your function.
|
|
|
|
```python theme={null}
|
|
from browser_use import Tools, Agent, ActionResult
|
|
|
|
tools = Tools()
|
|
|
|
@tools.action(description='Ask human for help with a question')
|
|
def ask_human(question: str) -> ActionResult:
|
|
answer = input(f'{question} > ')
|
|
return f'The human responded with: {answer}'
|
|
```
|
|
|
|
```python theme={null}
|
|
agent = Agent(task='...', llm=llm, tools=tools)
|
|
```
|
|
|
|
* `description` *(required)* - What the tool does, the LLM uses this to decide when to call it.
|
|
* `allowed_domains` - List of domains where tool can run (e.g. `['*.example.com']`), defaults to all domains
|
|
|
|
The Agent fills your function parameters based on their names, type hints, & defaults.
|
|
|
|
|
|
# Tools: Available Tools
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/tools/available
|
|
Here is the [source code](https://github.com/browser-use/browser-use/blob/main/browser_use/tools/service.py) for the default tools:
|
|
|
|
### Navigation & Browser Control
|
|
|
|
* `search` - Search queries (DuckDuckGo, Google, Bing)
|
|
* `navigate` - Navigate to URLs
|
|
* `go_back` - Go back in browser history
|
|
* `wait` - Wait for specified seconds
|
|
|
|
### Page Interaction
|
|
|
|
* `click` - Click elements by their index
|
|
* `input` - Input text into form fields
|
|
* `upload_file` - Upload files to file inputs
|
|
* `scroll` - Scroll the page up/down
|
|
* `find_text` - Scroll to specific text on page
|
|
* `send_keys` - Send special keys (Enter, Escape, etc.)
|
|
|
|
### JavaScript Execution
|
|
|
|
* `evaluate` - Execute custom JavaScript code on the page (for advanced interactions, shadow DOM, custom selectors, data extraction)
|
|
|
|
### Tab Management
|
|
|
|
* `switch` - Switch between browser tabs
|
|
* `close` - Close browser tabs
|
|
|
|
### Content Extraction
|
|
|
|
* `extract` - Extract data from webpages using LLM
|
|
|
|
### Visual Analysis
|
|
|
|
* `screenshot` - Request a screenshot in your next browser state for visual confirmation
|
|
|
|
### Form Controls
|
|
|
|
* `dropdown_options` - Get dropdown option values
|
|
* `select_dropdown` - Select dropdown options
|
|
|
|
### File Operations
|
|
|
|
* `write_file` - Write content to files
|
|
* `read_file` - Read file contents
|
|
* `replace_file` - Replace text in files
|
|
|
|
### Task Completion
|
|
|
|
* `done` - Complete the task (always available)
|
|
|
|
|
|
|
|
# Tools: Basics
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/tools/basics
|
|
Tools are the functions that the agent has to interact with the world.
|
|
|
|
## Quick Example
|
|
|
|
```python theme={null}
|
|
from browser_use import Tools, ActionResult, Browser
|
|
|
|
tools = Tools()
|
|
|
|
@tools.action('Ask human for help with a question')
|
|
def ask_human(question: str, browser: Browser) -> ActionResult:
|
|
answer = input(f'{question} > ')
|
|
return f'The human responded with: {answer}'
|
|
|
|
agent = Agent(
|
|
task='Ask human for help',
|
|
llm=llm,
|
|
tools=tools,
|
|
)
|
|
```
|
|
|
|
<Note>
|
|
Use `browser` parameter in tools for deterministic [Actor](/customize/actor/basics) actions.
|
|
</Note>
|
|
|
|
|
|
# Tools: Remove Tools
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/tools/remove
|
|
|
|
You can exclude default tools:
|
|
|
|
```python theme={null}
|
|
from browser_use import Tools
|
|
|
|
tools = Tools(exclude_actions=['search', 'wait'])
|
|
agent = Agent(task='...', llm=llm, tools=tools)
|
|
```
|
|
|
|
|
|
# Tools: Tool Response
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/customize/tools/response
|
|
Tools return results using `ActionResult` or simple strings.
|
|
|
|
## Return Types
|
|
|
|
```python theme={null}
|
|
@tools.action('My tool')
|
|
def my_tool() -> str:
|
|
return "Task completed successfully"
|
|
|
|
@tools.action('Advanced tool')
|
|
def advanced_tool() -> ActionResult:
|
|
return ActionResult(
|
|
extracted_content="Main result",
|
|
long_term_memory="Remember this info",
|
|
error="Something went wrong",
|
|
is_done=True,
|
|
success=True,
|
|
attachments=["file.pdf"],
|
|
)
|
|
```
|
|
|
|
# Get Help
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/development/get-help
|
|
|
|
More than 20k developers help each other
|
|
|
|
1. Check our [GitHub Issues](https://github.com/browser-use/browser-use/issues)
|
|
2. Ask in our [Discord community](https://link.browser-use.com/discord)
|
|
3. Get support for your enterprise with [support@browser-use.com](mailto:support@browser-use.com)
|
|
|
|
|
|
# Costs
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/development/monitoring/costs
|
|
Track token usage and API costs for your browser automation tasks
|
|
|
|
## Cost Tracking
|
|
|
|
To track token usage and costs, enable cost calculation:
|
|
|
|
```python theme={null}
|
|
from browser_use import Agent, ChatOpenAI
|
|
|
|
agent = Agent(
|
|
task="Search for latest news about AI",
|
|
llm=ChatOpenAI(model="gpt-4.1-mini"),
|
|
calculate_cost=True # Enable cost tracking
|
|
)
|
|
|
|
history = await agent.run()
|
|
|
|
# Get usage from history
|
|
print(f"Token usage: {history.usage}")
|
|
|
|
# Or get from usage summary
|
|
usage_summary = await agent.token_cost_service.get_usage_summary()
|
|
print(f"Usage summary: {usage_summary}")
|
|
```
|
|
|
|
|
|
# Observability
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/development/monitoring/observability
|
|
Trace Browser Use's agent execution steps and browser sessions
|
|
Browser Use has a native integration with [Laminar](https://lmnr.ai) - open-source platform for tracing, evals and labeling of AI agents.
|
|
Read more about Laminar in the [Laminar docs](https://docs.lmnr.ai).
|
|
|
|
|
|
# Telemetry
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/development/monitoring/telemetry
|
|
|
|
Understanding Browser Use's telemetry
|
|
|
|
## Overview
|
|
|
|
Browser Use is free under the MIT license. To help us continue improving the library, we collect anonymous usage data with [PostHog](https://posthog.com) . This information helps us understand how the library is used, fix bugs more quickly, and prioritize new features.
|
|
|
|
## Opting Out
|
|
|
|
You can disable telemetry by setting the environment variable:
|
|
|
|
```bash .env theme={null}
|
|
ANONYMIZED_TELEMETRY=false
|
|
```
|
|
|
|
Or in your Python code:
|
|
|
|
```python theme={null}
|
|
import os
|
|
os.environ["ANONYMIZED_TELEMETRY"] = "false"
|
|
```
|
|
|
|
<Note>
|
|
Even when enabled, telemetry has zero impact on the library's performance. Code is available in [Telemetry
|
|
Service](https://github.com/browser-use/browser-use/tree/main/browser_use/telemetry).
|
|
</Note>
|
|
|
|
|
|
# Contribution Guide
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/development/setup/contribution-guide
|
|
|
|
## Mission
|
|
|
|
* Make developers happy
|
|
* Do more clicks than human
|
|
* Tell your computer what to do, and it gets it done.
|
|
* Make agents faster and more reliable.
|
|
|
|
## What to work on?
|
|
|
|
* This space is moving fast. We have 10 ideas daily. Let's exchange some.
|
|
* Browse our [GitHub Issues](https://github.com/browser-use/browser-use/issues)
|
|
* Check out our most active issues on [Discord](https://discord.gg/zXJJHtJf3k)
|
|
* Get inspiration in [`#showcase-your-work`](https://discord.com/channels/1303749220842340412/1305549200678850642) channel
|
|
|
|
## What makes a great PR?
|
|
|
|
1. Why do we need this PR?
|
|
2. Include a demo screenshot/gif
|
|
3. Make sure the PR passes all CI tests
|
|
4. Keep your PR focused on a single feature
|
|
|
|
## How?
|
|
|
|
1. Fork the repository
|
|
2. Create a new branch for your feature
|
|
3. Submit a PR
|
|
|
|
We are overwhelmed with Issues. Feel free to bump your issues/PRs with comments periodically if you need faster feedback.
|
|
|
|
|
|
# Local Setup
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/development/setup/local-setup
|
|
|
|
We're excited to have you join our community of contributors.
|
|
## Welcome to Browser Use Development!
|
|
|
|
```bash theme={null}
|
|
git clone https://github.com/browser-use/browser-use
|
|
cd browser-use
|
|
uv sync --all-extras --dev
|
|
# or pip install -U git+https://github.com/browser-use/browser-use.git@main
|
|
```
|
|
|
|
## Configuration
|
|
Set up your environment variables:
|
|
|
|
```bash theme={null}
|
|
# Copy the example environment file
|
|
cp .env.example .env
|
|
|
|
# set logging level
|
|
# BROWSER_USE_LOGGING_LEVEL=debug
|
|
```
|
|
|
|
## Helper Scripts
|
|
|
|
For common development tasks
|
|
|
|
```bash theme={null}
|
|
# Complete setup script - installs uv, creates a venv, and installs dependencies
|
|
./bin/setup.sh
|
|
|
|
# Run all pre-commit hooks (formatting, linting, type checking)
|
|
./bin/lint.sh
|
|
|
|
# Run the core test suite that's executed in CI
|
|
./bin/test.sh
|
|
```
|
|
|
|
## Run examples
|
|
|
|
```bash theme={null}
|
|
uv run examples/simple.py
|
|
```
|
|
|
|
|
|
|
|
# Example Code: News-Use (News Monitor)
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/examples/apps/news-use
|
|
Monitor news websites and extract articles with sentiment analysis using browser agents and Google Gemini.
|
|
|
|
|
|
# Example Code:Vibetest-Use (Automated QA)
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/examples/apps/vibetest-use
|
|
Run multi-agent Browser-Use tests to catch UI bugs, broken links, and accessibility issues before they ship.
|
|
|
|
|
|
# Fast Agent
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/examples/templates/fast-agent
|
|
Optimize agent performance for maximum speed and efficiency.
|
|
|
|
|
|
# Follow up tasks
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/examples/templates/follow-up-tasks
|
|
Follow up tasks with the same browser session.
|
|
|
|
|
|
# Parallel Agents
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/examples/templates/parallel-browser
|
|
Run multiple agents in parallel with separate browser instances
|
|
|
|
|
|
# Playwright Integration
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/examples/templates/playwright-integration
|
|
Advanced example showing Playwright and Browser-Use working together
|
|
|
|
|
|
# Guide: Secure Setup
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/examples/templates/secure
|
|
|
|
|
|
# Guide: Sensitive Data
|
|
Source: (go to or request this content to learn more) https://docs.browser-use.com/examples/templates/sensitive-data
|
|
Handle secret information securely and avoid sending PII & passwords to the LLM.
|
|
</browser_use_docs>
|