Files
browser-use/docs/customize/agent-settings.mdx
2025-06-05 02:04:41 -07:00

347 lines
14 KiB
Plaintext

---
title: "Agent Settings"
description: "Learn how to configure the agent"
icon: "gear"
---
## Overview
The `Agent` class is the core component of Browser Use that handles browser automation. Here are the main configuration options you can use when initializing an agent.
## Basic Settings
```python
from browser_use import Agent
from langchain_openai import ChatOpenAI
agent = Agent(
task="Search for latest news about AI",
llm=ChatOpenAI(model="gpt-4o"),
)
```
### Required Parameters
- `task`: The instruction for the agent to execute
- `llm`: A LangChain chat model instance. See <a href="/customize/supported-models">LangChain Models</a> for supported models.
## Agent Behavior
Control how the agent operates:
```python
agent = Agent(
task="your task",
llm=llm,
controller=custom_controller, # For custom tool calling
use_vision=True, # Enable vision capabilities
save_conversation_path="logs/conversation" # Save chat logs
)
```
### Behavior Parameters
- `controller`: Registry of functions the agent can call. Defaults to base Controller. See <a href="/customize/custom-functions">Custom Functions</a> for details.
- `use_vision`: Enable/disable vision capabilities. Defaults to `True`.
- When enabled, the model processes visual information from web pages
- Disable to reduce costs or use models without vision support
- For GPT-4o, image processing costs approximately 800-1000 tokens (~$0.002 USD) per image (but this depends on the defined screen size)
- `save_conversation_path`: Path to save the complete conversation history. Useful for debugging.
- `override_system_message`: Completely replace the default system prompt with a custom one.
- `extend_system_message`: Add additional instructions to the default system prompt.
<Note>
Vision capabilities are recommended for better web interaction understanding,
but can be disabled to reduce costs or when using models without vision
support.
</Note>
### Reuse Existing Browser Context
By default browser-use launches its own builtin browser using playwright chromium.
You can also connect to a remote browser or pass any of the following
existing playwright objects to the Agent: `page`, `browser_context`, `browser`, `browser_session`, or `browser_profile`.
These all get passed down to create a `BrowserSession` for the `Agent`:
```python
agent = Agent(
task='book a flight to fiji',
llm=llm,
browser_profile=browser_profile, # use this profile to create a BrowserSession
browser_session=BrowserSession( # use an existing BrowserSession
cdp_url=..., # remote CDP browser to connect to
# or
wss_url=..., # remote wss playwright server provider
# or
browser_pid=... # pid of a locally running browser process to attach to
# or
executable_path=... # provide a custom chrome binary path
# or
channel=... # specify chrome, chromium, ms-edge, etc.
# or
page=page, # use an existing playwright Page object
# or
browser_context=browser_context, # use an existing playwright BrowserContext object
# or
browser=browser, # use an existing playwright Browser object
),
)
```
For example, to connect to an existing browser over CDP you could do:
```python
agent = Agent(
...
browser_session=BrowserSession(cdp_url='http://localhost:9222'),
)
```
For example, to connect to a local running chrome instance you can do:
```python
agent = Agent(
...
browser_session=BrowserSession(browser_pid=1234),
)
```
See <a href="/customize/real-browser">Connect to your Browser</a> for more info.
<Note>
You can reuse the same `BrowserSession` after an agent has completed running. If you do nothing, the
browser will be automatically closed on `run()` completion only if it was launched by us.
</Note>
## Running the Agent
The agent is executed using the async `run()` method:
- `max_steps` (default: `100`)
Maximum number of steps the agent can take during execution. This prevents infinite loops and helps control execution time.
## Agent History
The method returns an `AgentHistoryList` object containing the complete execution history. This history is invaluable for debugging, analysis, and creating reproducible scripts.
```python
# Example of accessing history
history = await agent.run()
# Access (some) useful information
history.urls() # List of visited URLs
history.screenshots() # List of screenshot paths
history.action_names() # Names of executed actions
history.extracted_content() # Content extracted during execution
history.errors() # Any errors that occurred
history.model_actions() # All actions with their parameters
```
The `AgentHistoryList` provides many helper methods to analyze the execution:
- `final_result()`: Get the final extracted content
- `is_done()`: Check if the agent completed successfully
- `has_errors()`: Check if any errors occurred
- `model_thoughts()`: Get the agent's reasoning process
- `action_results()`: Get results of all actions
<Note>
For a complete list of helper methods and detailed history analysis
capabilities, refer to the [AgentHistoryList source
code](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/views.py#L111).
</Note>
## Run initial actions without LLM
With [this example](https://github.com/browser-use/browser-use/blob/main/examples/features/initial_actions.py) you can run initial actions without the LLM.
Specify the action as a dictionary where the key is the action name and the value is the action parameters. You can find all our actions in the [Controller](https://github.com/browser-use/browser-use/blob/main/browser_use/controller/service.py) source code.
```python
initial_actions = [
{'open_tab': {'url': 'https://www.google.com'}},
{'open_tab': {'url': 'https://en.wikipedia.org/wiki/Randomness'}},
{'scroll_down': {'amount': 1000}},
]
agent = Agent(
task='What theories are displayed on the page?',
initial_actions=initial_actions,
llm=llm,
)
```
## Run with message context
You can configure the agent and provide a separate message to help the LLM understand the task better.
```python
from langchain_openai import ChatOpenAI
agent = Agent(
task="your task",
message_context="Additional information about the task",
llm = ChatOpenAI(model='gpt-4o')
)
```
## Run with planner model
You can configure the agent to use a separate planner model for high-level task planning:
```python
from langchain_openai import ChatOpenAI
# Initialize models
llm = ChatOpenAI(model='gpt-4o')
planner_llm = ChatOpenAI(model='o3-mini')
agent = Agent(
task="your task",
llm=llm,
planner_llm=planner_llm, # Separate model for planning
use_vision_for_planner=False, # Disable vision for planner
planner_interval=4 # Plan every 4 steps
)
```
### Planner Parameters
- `planner_llm`: A LangChain chat model instance used for high-level task planning. Can be a smaller/cheaper model than the main LLM.
- `use_vision_for_planner`: Enable/disable vision capabilities for the planner model. Defaults to `True`.
- `planner_interval`: Number of steps between planning phases. Defaults to `1`.
Using a separate planner model can help:
- Reduce costs by using a smaller model for high-level planning
- Improve task decomposition and strategic thinking
- Better handle complex, multi-step tasks
<Note>
The planner model is optional. If not specified, the agent will not use the planner model.
</Note>
### Optional Parameters
- `message_context`: Additional information about the task to help the LLM understand the task better.
- `initial_actions`: List of initial actions to run before the main task.
- `max_actions_per_step`: Maximum number of actions to run in a step. Defaults to `10`.
- `max_failures`: Maximum number of failures before giving up. Defaults to `3`.
- `retry_delay`: Time to wait between retries in seconds when rate limited. Defaults to `10`.
- `generate_gif`: Enable/disable GIF generation. Defaults to `False`. Set to `True` or a string path to save the GIF.
## Memory Management
Browser Use includes a procedural memory system using [Mem0](https://mem0.ai) that automatically summarizes the agent's conversation history at regular intervals to optimize context window usage during long tasks.
```python
from browser_use.agent.memory import MemoryConfig
agent = Agent(
task="your task",
llm=llm,
enable_memory=True,
memory_config=MemoryConfig( # Ensure llm_instance is passed if not using default LLM config
llm_instance=llm, # Important: Pass the agent's LLM instance here
agent_id="my_custom_agent",
memory_interval=15
)
)
```
### Memory Parameters
- `enable_memory`: Enable/disable the procedural memory system. Defaults to `True`.
- `memory_config`: A `MemoryConfig` Pydantic model instance (required if `enable_memory` is `True`). Dictionary format is not supported.
### Using MemoryConfig
You must configure the memory system using the `MemoryConfig` Pydantic model for a type-safe approach:
```python
from browser_use.agent.memory import MemoryConfig
from langchain_openai import ChatOpenAI # Assuming llm is an instance of ChatOpenAI
llm_for_agent = ChatOpenAI(model="gpt-4o")
agent = Agent(
task=task_description,
llm=llm_for_agent,
enable_memory=True, # This is True by default
memory_config=MemoryConfig(
llm_instance=llm_for_agent, # Pass the LLM instance for Mem0
agent_id="my_agent",
memory_interval=15, # Summarize every 15 steps
embedder_provider="openai",
embedder_model="text-embedding-3-large",
embedder_dims=1536,
# --- Vector Store Customization ---
vector_store_provider="qdrant", # e.g., Qdrant, Pinecone, Chroma, etc.
vector_store_collection_name="my_browser_use_memories", # Optional: custom collection name
vector_store_config_override={ # Provider-specific config
"host": "localhost",
"port": 6333
# Add other Qdrant specific configs here if needed, e.g., api_key for cloud
}
)
)
```
The `MemoryConfig` model provides these configuration options:
#### Memory Settings
- `agent_id`: Unique identifier for the agent (default: `"browser_use_agent"`). Essential for persistent memory sessions if using a persistent vector store.
- `memory_interval`: Number of steps between memory summarization (default: `10`)
#### LLM Settings (for Mem0's internal operations)
- `llm_instance`: The LangChain `BaseChatModel` instance that Mem0 will use for its internal summarization and processing. You must pass the same LLM instance used by the main agent, or another compatible one, here.
#### Embedder Settings
- `embedder_provider`: Provider for embeddings (`'openai'`, `'gemini'`, `'ollama'`, or `'huggingface'`)
- `embedder_model`: Model name for the embedder
- `embedder_dims`: Dimensions for the embeddings
#### Vector Store Settings
- `vector_store_provider`: Choose the vector store backend. Supported options include:
`'faiss'` (default), `'qdrant'`, `'pinecone'`, `'supabase'`, `'elasticsearch'`, `'chroma'`, `'weaviate'`, `'milvus'`, `'pgvector'`, `'upstash_vector'`, `'vertex_ai_vector_search'`, `'azure_ai_search'`, `'lancedb'`, `'mongodb'`, `'redis'`, `'memory'` (in-memory, non-persistent).
- `vector_store_collection_name`: (Optional) Specify a custom name for the collection or index in your vector store. If not provided, a default name is generated (especially for local stores like FAISS/Chroma) or used by Mem0.
- `vector_store_base_path`: Path for local vector stores like FAISS or Chroma (e.g., `/tmp/mem0`). Default is `/tmp/mem0`.
- `vector_store_config_override`: (Optional) A dictionary to provide or override specific configuration parameters required by Mem0 for the chosen `vector_store_provider`. This is where you'd put connection details like `host`, `port`, `api_key`, `url`, `environment`, etc., for cloud-based or server-based vector stores.
The model automatically sets appropriate defaults based on the LLM being used:
- For `ChatOpenAI`: Uses OpenAI's `text-embedding-3-small` embeddings
- For `ChatGoogleGenerativeAI`: Uses Gemini's `models/text-embedding-004` embeddings
- For `ChatOllama`: Uses Ollama's `nomic-embed-text` embeddings
- Default: Uses Hugging Face's `all-MiniLM-L6-v2` embeddings
<Note>
**Important:**
- Always pass a properly constructed `MemoryConfig` object to the `memory_config` parameter.
- Ensure the `llm_instance` is provided to `MemoryConfig` so Mem0 can perform its operations.
- For persistent memory across agent runs or for shared memory, choose a scalable vector store provider (like Qdrant, Pinecone, etc.) and configure it correctly using `vector_store_provider` and `vector_store_config_override`. The default 'faiss' provider stores data locally in `vector_store_base_path`.
</Note>
### How Memory Works
When enabled, the agent periodically compresses its conversation history into concise summaries:
1. Every `memory_interval` steps, the agent reviews its recent interactions.
2. It uses Mem0 (configured with your chosen LLM and vector store) to create a procedural memory summary.
3. The original messages in the agent's active context are replaced with this summary, reducing token usage.
4. This process helps maintain important context while freeing up the context window for new information.
### Disabling Memory
If you want to disable the memory system (for debugging or for shorter tasks), set `enable_memory` to `False`:
```python
agent = Agent(
task="your task",
llm=llm,
enable_memory=False
)
```
<Note>
Disabling memory may be useful for debugging or short tasks, but for longer
tasks, it can lead to context window overflow as the conversation history
grows. The memory system helps maintain performance during extended sessions.
</Note>