mirror of
https://github.com/browser-use/browser-use
synced 2026-05-06 17:52:15 +02:00
Changed "true" to "True" to avoid name error Error I got with old version: "name 'true' is not defined""
250 lines
8.8 KiB
Plaintext
250 lines
8.8 KiB
Plaintext
---
|
|
title: "Agent Settings"
|
|
description: "Learn how to configure the agent"
|
|
icon: "gear"
|
|
---
|
|
|
|
## Overview
|
|
|
|
The `Agent` class is the core component of Browser Use that handles browser automation. Here are the main configuration options you can use when initializing an agent.
|
|
|
|
## Basic Settings
|
|
|
|
```python
|
|
from browser_use import Agent
|
|
from browser_use.llm import ChatOpenAI
|
|
|
|
agent = Agent(
|
|
task="Search for latest news about AI",
|
|
llm=ChatOpenAI(model="gpt-4o"),
|
|
)
|
|
```
|
|
|
|
### Required Parameters
|
|
|
|
- `task`: The instruction for the agent to execute
|
|
- `llm`: A chat model instance. See <a href="/customize/supported-models">Supported Models</a> for supported models.
|
|
|
|
## Agent Behavior
|
|
|
|
Control how the agent operates:
|
|
|
|
```python
|
|
agent = Agent(
|
|
task="your task",
|
|
llm=llm,
|
|
controller=custom_controller, # For custom tool calling
|
|
use_vision=True, # Enable vision capabilities
|
|
save_conversation_path="logs/conversation" # Save chat logs
|
|
)
|
|
```
|
|
|
|
### Behavior Parameters
|
|
|
|
- `controller`: Registry of functions the agent can call. Defaults to base Controller. See <a href="/customize/custom-functions">Custom Functions</a> for details.
|
|
- `use_vision`: Enable/disable vision capabilities. Defaults to `True`.
|
|
- When enabled, the model processes visual information from web pages
|
|
- Disable to reduce costs or use models without vision support
|
|
- For GPT-4o, image processing costs approximately 800-1000 tokens (~$0.002 USD) per image (but this depends on the defined screen size)
|
|
- `save_conversation_path`: Path to save the complete conversation history. Useful for debugging.
|
|
- `override_system_message`: Completely replace the default system prompt with a custom one.
|
|
- `extend_system_message`: Add additional instructions to the default system prompt.
|
|
|
|
<Note>
|
|
Vision capabilities are recommended for better web interaction understanding,
|
|
but can be disabled to reduce costs or when using models without vision
|
|
support.
|
|
</Note>
|
|
|
|
### Reuse Existing Browser Context
|
|
|
|
By default browser-use launches its own builtin browser using playwright chromium.
|
|
You can also connect to a remote browser or pass any of the following
|
|
existing playwright objects to the Agent: `page`, `browser_context`, `browser`, `browser_session`, or `browser_profile`.
|
|
|
|
These all get passed down to create a `BrowserSession` for the `Agent`:
|
|
|
|
```python
|
|
agent = Agent(
|
|
task='book a flight to fiji',
|
|
llm=llm,
|
|
browser_profile=browser_profile, # use this profile to create a BrowserSession
|
|
browser_session=BrowserSession( # use an existing BrowserSession
|
|
cdp_url=..., # remote CDP browser to connect to
|
|
# or
|
|
wss_url=..., # remote wss playwright server provider
|
|
# or
|
|
browser_pid=... # pid of a locally running browser process to attach to
|
|
# or
|
|
executable_path=... # provide a custom chrome binary path
|
|
# or
|
|
channel=... # specify chrome, chromium, ms-edge, etc.
|
|
# or
|
|
page=page, # use an existing playwright Page object
|
|
# or
|
|
browser_context=browser_context, # use an existing playwright BrowserContext object
|
|
# or
|
|
browser=browser, # use an existing playwright Browser object
|
|
),
|
|
)
|
|
```
|
|
|
|
For example, to connect to an existing browser over CDP you could do:
|
|
|
|
```python
|
|
agent = Agent(
|
|
...
|
|
browser_session=BrowserSession(cdp_url='http://localhost:9222'),
|
|
)
|
|
```
|
|
|
|
For example, to connect to a local running chrome instance you can do:
|
|
|
|
```python
|
|
agent = Agent(
|
|
...
|
|
browser_session=BrowserSession(browser_pid=1234),
|
|
)
|
|
```
|
|
|
|
See <a href="/customize/real-browser">Connect to your Browser</a> for more info.
|
|
|
|
<Note>
|
|
You can reuse the same `BrowserSession` after an agent has completed running.
|
|
If you do nothing, the browser will be automatically closed on `run()`
|
|
completion only if it was launched by us.
|
|
</Note>
|
|
|
|
## Running the Agent
|
|
|
|
The agent is executed using the async `run()` method:
|
|
|
|
- `max_steps` (default: `100`)
|
|
Maximum number of steps the agent can take during execution. This prevents infinite loops and helps control execution time.
|
|
|
|
## Agent History
|
|
|
|
The method returns an `AgentHistoryList` object containing the complete execution history. This history is invaluable for debugging, analysis, and creating reproducible scripts.
|
|
|
|
```python
|
|
# Example of accessing history
|
|
history = await agent.run()
|
|
|
|
# Access (some) useful information
|
|
history.urls() # List of visited URLs
|
|
history.screenshots() # List of screenshot paths
|
|
history.action_names() # Names of executed actions
|
|
history.extracted_content() # Content extracted during execution
|
|
history.errors() # Any errors that occurred
|
|
history.model_actions() # All actions with their parameters
|
|
```
|
|
|
|
The `AgentHistoryList` provides many helper methods to analyze the execution:
|
|
|
|
- `final_result()`: Get the final extracted content
|
|
- `is_done()`: Check if the agent completed successfully
|
|
- `has_errors()`: Check if any errors occurred
|
|
- `model_thoughts()`: Get the agent's reasoning process
|
|
- `action_results()`: Get results of all actions
|
|
|
|
<Note>
|
|
For a complete list of helper methods and detailed history analysis
|
|
capabilities, refer to the [AgentHistoryList source
|
|
code](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/views.py#L111).
|
|
</Note>
|
|
|
|
## Run initial actions without LLM
|
|
|
|
With [this example](https://github.com/browser-use/browser-use/blob/main/examples/features/initial_actions.py) you can run initial actions without the LLM.
|
|
Specify the action as a dictionary where the key is the action name and the value is the action parameters. You can find all our actions in the [Controller](https://github.com/browser-use/browser-use/blob/main/browser_use/controller/service.py) source code.
|
|
|
|
```python
|
|
|
|
initial_actions = [
|
|
{'go_to_url': {'url': 'https://www.google.com', 'new_tab': True}},
|
|
{'go_to_url': {'url': 'https://en.wikipedia.org/wiki/Randomness', 'new_tab': True}},
|
|
{'scroll_down': {'amount': 1000}},
|
|
]
|
|
agent = Agent(
|
|
task='What theories are displayed on the page?',
|
|
initial_actions=initial_actions,
|
|
llm=llm,
|
|
)
|
|
```
|
|
|
|
## Run with message context
|
|
|
|
You can configure the agent and provide a separate message to help the LLM understand the task better.
|
|
|
|
```python
|
|
from browser_use.llm import ChatOpenAI
|
|
|
|
agent = Agent(
|
|
task="your task",
|
|
message_context="Additional information about the task",
|
|
llm = ChatOpenAI(model='gpt-4o')
|
|
)
|
|
```
|
|
|
|
## Run with planner model
|
|
|
|
You can configure the agent to use a separate planner model for high-level task planning:
|
|
|
|
```python
|
|
from browser_use.llm import ChatOpenAI
|
|
|
|
# Initialize models
|
|
llm = ChatOpenAI(model='gpt-4o')
|
|
planner_llm = ChatOpenAI(model='o3-mini')
|
|
|
|
agent = Agent(
|
|
task="your task",
|
|
llm=llm,
|
|
planner_llm=planner_llm, # Separate model for planning
|
|
use_vision_for_planner=False, # Disable vision for planner
|
|
planner_interval=4 # Plan every 4 steps
|
|
)
|
|
```
|
|
|
|
### Planner Parameters
|
|
|
|
- `planner_llm`: A chat model instance used for high-level task planning. Can be a smaller/cheaper model than the main LLM.
|
|
- `use_vision_for_planner`: Enable/disable vision capabilities for the planner model. Defaults to `True`.
|
|
- `planner_interval`: Number of steps between planning phases. Defaults to `1`.
|
|
|
|
Using a separate planner model can help:
|
|
|
|
- Reduce costs by using a smaller model for high-level planning
|
|
- Improve task decomposition and strategic thinking
|
|
- Better handle complex, multi-step tasks
|
|
|
|
<Note>
|
|
The planner model is optional. If not specified, the agent will not use the
|
|
planner model.
|
|
</Note>
|
|
|
|
### Optional Parameters
|
|
|
|
- `message_context`: Additional information about the task to help the LLM understand the task better.
|
|
- `initial_actions`: List of initial actions to run before the main task.
|
|
- `max_actions_per_step`: Maximum number of actions to run in a step. Defaults to `10`.
|
|
- `max_failures`: Maximum number of failures before giving up. Defaults to `3`.
|
|
- `retry_delay`: Time to wait between retries in seconds when rate limited. Defaults to `10`.
|
|
- `generate_gif`: Enable/disable GIF generation. Defaults to `False`. Set to `True` or a string path to save the GIF.
|
|
|
|
## Memory
|
|
|
|
Memory management in browser-use has been significantly improved since version 0.3.2. The agent's context handling and state management are now robust enough that the previous memory system (`mem0`) is no longer needed or supported.
|
|
|
|
The agent maintains its context and task progress through:
|
|
|
|
- Detailed history tracking of actions and results
|
|
- Structured state management
|
|
- Clear goal setting and evaluation at each step
|
|
|
|
The `enable_memory` parameter has been removed as the new system provides better context management by default.
|
|
|
|
<Note>
|
|
If you're upgrading from an older version that used `enable_memory`, simply remove this parameter. The agent will automatically use the improved context management system.
|
|
</Note>
|