mirror of
https://github.com/browser-use/browser-use
synced 2026-05-06 17:52:15 +02:00
347 lines
14 KiB
Plaintext
347 lines
14 KiB
Plaintext
---
|
|
title: "Agent Settings"
|
|
description: "Learn how to configure the agent"
|
|
icon: "gear"
|
|
---
|
|
|
|
## Overview
|
|
|
|
The `Agent` class is the core component of Browser Use that handles browser automation. Here are the main configuration options you can use when initializing an agent.
|
|
|
|
## Basic Settings
|
|
|
|
```python
|
|
from browser_use import Agent
|
|
from langchain_openai import ChatOpenAI
|
|
|
|
agent = Agent(
|
|
task="Search for latest news about AI",
|
|
llm=ChatOpenAI(model="gpt-4o"),
|
|
)
|
|
```
|
|
|
|
### Required Parameters
|
|
|
|
- `task`: The instruction for the agent to execute
|
|
- `llm`: A LangChain chat model instance. See <a href="/customize/supported-models">LangChain Models</a> for supported models.
|
|
|
|
## Agent Behavior
|
|
|
|
Control how the agent operates:
|
|
|
|
```python
|
|
agent = Agent(
|
|
task="your task",
|
|
llm=llm,
|
|
controller=custom_controller, # For custom tool calling
|
|
use_vision=True, # Enable vision capabilities
|
|
save_conversation_path="logs/conversation" # Save chat logs
|
|
)
|
|
```
|
|
|
|
### Behavior Parameters
|
|
|
|
- `controller`: Registry of functions the agent can call. Defaults to base Controller. See <a href="/customize/custom-functions">Custom Functions</a> for details.
|
|
- `use_vision`: Enable/disable vision capabilities. Defaults to `True`.
|
|
- When enabled, the model processes visual information from web pages
|
|
- Disable to reduce costs or use models without vision support
|
|
- For GPT-4o, image processing costs approximately 800-1000 tokens (~$0.002 USD) per image (but this depends on the defined screen size)
|
|
- `save_conversation_path`: Path to save the complete conversation history. Useful for debugging.
|
|
- `override_system_message`: Completely replace the default system prompt with a custom one.
|
|
- `extend_system_message`: Add additional instructions to the default system prompt.
|
|
|
|
<Note>
|
|
Vision capabilities are recommended for better web interaction understanding,
|
|
but can be disabled to reduce costs or when using models without vision
|
|
support.
|
|
</Note>
|
|
|
|
|
|
### Reuse Existing Browser Context
|
|
|
|
By default browser-use launches its own builtin browser using playwright chromium.
|
|
You can also connect to a remote browser or pass any of the following
|
|
existing playwright objects to the Agent: `page`, `browser_context`, `browser`, `browser_session`, or `browser_profile`.
|
|
|
|
These all get passed down to create a `BrowserSession` for the `Agent`:
|
|
|
|
|
|
```python
|
|
agent = Agent(
|
|
task='book a flight to fiji',
|
|
llm=llm,
|
|
browser_profile=browser_profile, # use this profile to create a BrowserSession
|
|
browser_session=BrowserSession( # use an existing BrowserSession
|
|
cdp_url=..., # remote CDP browser to connect to
|
|
# or
|
|
wss_url=..., # remote wss playwright server provider
|
|
# or
|
|
browser_pid=... # pid of a locally running browser process to attach to
|
|
# or
|
|
executable_path=... # provide a custom chrome binary path
|
|
# or
|
|
channel=... # specify chrome, chromium, ms-edge, etc.
|
|
# or
|
|
page=page, # use an existing playwright Page object
|
|
# or
|
|
browser_context=browser_context, # use an existing playwright BrowserContext object
|
|
# or
|
|
browser=browser, # use an existing playwright Browser object
|
|
),
|
|
)
|
|
```
|
|
|
|
For example, to connect to an existing browser over CDP you could do:
|
|
```python
|
|
agent = Agent(
|
|
...
|
|
browser_session=BrowserSession(cdp_url='http://localhost:9222'),
|
|
)
|
|
```
|
|
|
|
For example, to connect to a local running chrome instance you can do:
|
|
```python
|
|
agent = Agent(
|
|
...
|
|
browser_session=BrowserSession(browser_pid=1234),
|
|
)
|
|
```
|
|
|
|
See <a href="/customize/real-browser">Connect to your Browser</a> for more info.
|
|
|
|
<Note>
|
|
You can reuse the same `BrowserSession` after an agent has completed running. If you do nothing, the
|
|
browser will be automatically closed on `run()` completion only if it was launched by us.
|
|
</Note>
|
|
|
|
## Running the Agent
|
|
|
|
The agent is executed using the async `run()` method:
|
|
|
|
- `max_steps` (default: `100`)
|
|
Maximum number of steps the agent can take during execution. This prevents infinite loops and helps control execution time.
|
|
|
|
## Agent History
|
|
|
|
The method returns an `AgentHistoryList` object containing the complete execution history. This history is invaluable for debugging, analysis, and creating reproducible scripts.
|
|
|
|
```python
|
|
# Example of accessing history
|
|
history = await agent.run()
|
|
|
|
# Access (some) useful information
|
|
history.urls() # List of visited URLs
|
|
history.screenshots() # List of screenshot paths
|
|
history.action_names() # Names of executed actions
|
|
history.extracted_content() # Content extracted during execution
|
|
history.errors() # Any errors that occurred
|
|
history.model_actions() # All actions with their parameters
|
|
```
|
|
|
|
The `AgentHistoryList` provides many helper methods to analyze the execution:
|
|
|
|
- `final_result()`: Get the final extracted content
|
|
- `is_done()`: Check if the agent completed successfully
|
|
- `has_errors()`: Check if any errors occurred
|
|
- `model_thoughts()`: Get the agent's reasoning process
|
|
- `action_results()`: Get results of all actions
|
|
|
|
<Note>
|
|
For a complete list of helper methods and detailed history analysis
|
|
capabilities, refer to the [AgentHistoryList source
|
|
code](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/views.py#L111).
|
|
</Note>
|
|
|
|
## Run initial actions without LLM
|
|
With [this example](https://github.com/browser-use/browser-use/blob/main/examples/features/initial_actions.py) you can run initial actions without the LLM.
|
|
Specify the action as a dictionary where the key is the action name and the value is the action parameters. You can find all our actions in the [Controller](https://github.com/browser-use/browser-use/blob/main/browser_use/controller/service.py) source code.
|
|
```python
|
|
|
|
initial_actions = [
|
|
{'open_tab': {'url': 'https://www.google.com'}},
|
|
{'open_tab': {'url': 'https://en.wikipedia.org/wiki/Randomness'}},
|
|
{'scroll_down': {'amount': 1000}},
|
|
]
|
|
agent = Agent(
|
|
task='What theories are displayed on the page?',
|
|
initial_actions=initial_actions,
|
|
llm=llm,
|
|
)
|
|
```
|
|
|
|
## Run with message context
|
|
|
|
You can configure the agent and provide a separate message to help the LLM understand the task better.
|
|
|
|
```python
|
|
from langchain_openai import ChatOpenAI
|
|
|
|
agent = Agent(
|
|
task="your task",
|
|
message_context="Additional information about the task",
|
|
llm = ChatOpenAI(model='gpt-4o')
|
|
)
|
|
```
|
|
|
|
## Run with planner model
|
|
|
|
You can configure the agent to use a separate planner model for high-level task planning:
|
|
|
|
```python
|
|
from langchain_openai import ChatOpenAI
|
|
|
|
# Initialize models
|
|
llm = ChatOpenAI(model='gpt-4o')
|
|
planner_llm = ChatOpenAI(model='o3-mini')
|
|
|
|
agent = Agent(
|
|
task="your task",
|
|
llm=llm,
|
|
planner_llm=planner_llm, # Separate model for planning
|
|
use_vision_for_planner=False, # Disable vision for planner
|
|
planner_interval=4 # Plan every 4 steps
|
|
)
|
|
```
|
|
|
|
### Planner Parameters
|
|
|
|
- `planner_llm`: A LangChain chat model instance used for high-level task planning. Can be a smaller/cheaper model than the main LLM.
|
|
- `use_vision_for_planner`: Enable/disable vision capabilities for the planner model. Defaults to `True`.
|
|
- `planner_interval`: Number of steps between planning phases. Defaults to `1`.
|
|
|
|
Using a separate planner model can help:
|
|
- Reduce costs by using a smaller model for high-level planning
|
|
- Improve task decomposition and strategic thinking
|
|
- Better handle complex, multi-step tasks
|
|
|
|
<Note>
|
|
The planner model is optional. If not specified, the agent will not use the planner model.
|
|
</Note>
|
|
|
|
### Optional Parameters
|
|
|
|
- `message_context`: Additional information about the task to help the LLM understand the task better.
|
|
- `initial_actions`: List of initial actions to run before the main task.
|
|
- `max_actions_per_step`: Maximum number of actions to run in a step. Defaults to `10`.
|
|
- `max_failures`: Maximum number of failures before giving up. Defaults to `3`.
|
|
- `retry_delay`: Time to wait between retries in seconds when rate limited. Defaults to `10`.
|
|
- `generate_gif`: Enable/disable GIF generation. Defaults to `False`. Set to `True` or a string path to save the GIF.
|
|
## Memory Management
|
|
|
|
Browser Use includes a procedural memory system using [Mem0](https://mem0.ai) that automatically summarizes the agent's conversation history at regular intervals to optimize context window usage during long tasks.
|
|
|
|
```python
|
|
from browser_use.agent.memory import MemoryConfig
|
|
|
|
agent = Agent(
|
|
task="your task",
|
|
llm=llm,
|
|
enable_memory=True,
|
|
memory_config=MemoryConfig( # Ensure llm_instance is passed if not using default LLM config
|
|
llm_instance=llm, # Important: Pass the agent's LLM instance here
|
|
agent_id="my_custom_agent",
|
|
memory_interval=15
|
|
)
|
|
)
|
|
```
|
|
|
|
### Memory Parameters
|
|
|
|
- `enable_memory`: Enable/disable the procedural memory system. Defaults to `True`.
|
|
- `memory_config`: A `MemoryConfig` Pydantic model instance (required if `enable_memory` is `True`). Dictionary format is not supported.
|
|
|
|
### Using MemoryConfig
|
|
|
|
You must configure the memory system using the `MemoryConfig` Pydantic model for a type-safe approach:
|
|
|
|
```python
|
|
from browser_use.agent.memory import MemoryConfig
|
|
from langchain_openai import ChatOpenAI # Assuming llm is an instance of ChatOpenAI
|
|
|
|
llm_for_agent = ChatOpenAI(model="gpt-4o")
|
|
|
|
agent = Agent(
|
|
task=task_description,
|
|
llm=llm_for_agent,
|
|
enable_memory=True, # This is True by default
|
|
memory_config=MemoryConfig(
|
|
llm_instance=llm_for_agent, # Pass the LLM instance for Mem0
|
|
agent_id="my_agent",
|
|
memory_interval=15, # Summarize every 15 steps
|
|
embedder_provider="openai",
|
|
embedder_model="text-embedding-3-large",
|
|
embedder_dims=1536,
|
|
# --- Vector Store Customization ---
|
|
vector_store_provider="qdrant", # e.g., Qdrant, Pinecone, Chroma, etc.
|
|
vector_store_collection_name="my_browser_use_memories", # Optional: custom collection name
|
|
vector_store_config_override={ # Provider-specific config
|
|
"host": "localhost",
|
|
"port": 6333
|
|
# Add other Qdrant specific configs here if needed, e.g., api_key for cloud
|
|
}
|
|
)
|
|
)
|
|
```
|
|
|
|
The `MemoryConfig` model provides these configuration options:
|
|
|
|
#### Memory Settings
|
|
- `agent_id`: Unique identifier for the agent (default: `"browser_use_agent"`). Essential for persistent memory sessions if using a persistent vector store.
|
|
- `memory_interval`: Number of steps between memory summarization (default: `10`)
|
|
|
|
#### LLM Settings (for Mem0's internal operations)
|
|
- `llm_instance`: The LangChain `BaseChatModel` instance that Mem0 will use for its internal summarization and processing. You must pass the same LLM instance used by the main agent, or another compatible one, here.
|
|
|
|
#### Embedder Settings
|
|
- `embedder_provider`: Provider for embeddings (`'openai'`, `'gemini'`, `'ollama'`, or `'huggingface'`)
|
|
- `embedder_model`: Model name for the embedder
|
|
- `embedder_dims`: Dimensions for the embeddings
|
|
|
|
#### Vector Store Settings
|
|
- `vector_store_provider`: Choose the vector store backend. Supported options include:
|
|
`'faiss'` (default), `'qdrant'`, `'pinecone'`, `'supabase'`, `'elasticsearch'`, `'chroma'`, `'weaviate'`, `'milvus'`, `'pgvector'`, `'upstash_vector'`, `'vertex_ai_vector_search'`, `'azure_ai_search'`, `'lancedb'`, `'mongodb'`, `'redis'`, `'memory'` (in-memory, non-persistent).
|
|
- `vector_store_collection_name`: (Optional) Specify a custom name for the collection or index in your vector store. If not provided, a default name is generated (especially for local stores like FAISS/Chroma) or used by Mem0.
|
|
- `vector_store_base_path`: Path for local vector stores like FAISS or Chroma (e.g., `/tmp/mem0`). Default is `/tmp/mem0`.
|
|
- `vector_store_config_override`: (Optional) A dictionary to provide or override specific configuration parameters required by Mem0 for the chosen `vector_store_provider`. This is where you'd put connection details like `host`, `port`, `api_key`, `url`, `environment`, etc., for cloud-based or server-based vector stores.
|
|
|
|
The model automatically sets appropriate defaults based on the LLM being used:
|
|
- For `ChatOpenAI`: Uses OpenAI's `text-embedding-3-small` embeddings
|
|
- For `ChatGoogleGenerativeAI`: Uses Gemini's `models/text-embedding-004` embeddings
|
|
- For `ChatOllama`: Uses Ollama's `nomic-embed-text` embeddings
|
|
- Default: Uses Hugging Face's `all-MiniLM-L6-v2` embeddings
|
|
|
|
<Note>
|
|
**Important:**
|
|
- Always pass a properly constructed `MemoryConfig` object to the `memory_config` parameter.
|
|
- Ensure the `llm_instance` is provided to `MemoryConfig` so Mem0 can perform its operations.
|
|
- For persistent memory across agent runs or for shared memory, choose a scalable vector store provider (like Qdrant, Pinecone, etc.) and configure it correctly using `vector_store_provider` and `vector_store_config_override`. The default 'faiss' provider stores data locally in `vector_store_base_path`.
|
|
</Note>
|
|
|
|
### How Memory Works
|
|
|
|
When enabled, the agent periodically compresses its conversation history into concise summaries:
|
|
|
|
1. Every `memory_interval` steps, the agent reviews its recent interactions.
|
|
2. It uses Mem0 (configured with your chosen LLM and vector store) to create a procedural memory summary.
|
|
3. The original messages in the agent's active context are replaced with this summary, reducing token usage.
|
|
4. This process helps maintain important context while freeing up the context window for new information.
|
|
|
|
|
|
### Disabling Memory
|
|
|
|
If you want to disable the memory system (for debugging or for shorter tasks), set `enable_memory` to `False`:
|
|
|
|
```python
|
|
agent = Agent(
|
|
task="your task",
|
|
llm=llm,
|
|
enable_memory=False
|
|
)
|
|
```
|
|
|
|
<Note>
|
|
Disabling memory may be useful for debugging or short tasks, but for longer
|
|
tasks, it can lead to context window overflow as the conversation history
|
|
grows. The memory system helps maintain performance during extended sessions.
|
|
</Note>
|