mirror of
https://github.com/browser-use/browser-use
synced 2026-05-06 17:52:15 +02:00
164 lines
5.2 KiB
Plaintext
164 lines
5.2 KiB
Plaintext
---
|
|
title: "Agent Settings"
|
|
description: "Learn how to configure the agent"
|
|
icon: "gear"
|
|
---
|
|
|
|
## Overview
|
|
|
|
The `Agent` class is the core component of Browser Use that handles browser automation. Here are the main configuration options you can use when initializing an agent.
|
|
|
|
## Basic Settings
|
|
|
|
```python
|
|
from browser_use import Agent
|
|
from langchain_openai import ChatOpenAI
|
|
|
|
agent = Agent(
|
|
task="Search for latest news about AI",
|
|
llm=ChatOpenAI(model="gpt-4"),
|
|
)
|
|
```
|
|
|
|
### Required Parameters
|
|
|
|
- `task`: The instruction for the agent to execute
|
|
- `llm`: A LangChain chat model instance. See <a href="/customize/langchain-models">LangChain Models</a> for supported models.
|
|
|
|
## Agent Behavior
|
|
|
|
Control how the agent operates:
|
|
|
|
```python
|
|
agent = Agent(
|
|
task="your task",
|
|
llm=llm,
|
|
controller=custom_controller, # Custom function registry
|
|
use_vision=True, # Enable vision capabilities
|
|
save_conversation_path="logs/conversation.json" # Save chat logs
|
|
)
|
|
```
|
|
|
|
### Behavior Parameters
|
|
|
|
- `controller`: Registry of functions the agent can call. Defaults to base Controller. See <a href="/customize/custom-functions">Custom Functions</a> for details.
|
|
- `use_vision`: Enable/disable vision capabilities. Defaults to `True`.
|
|
- When enabled, the model processes visual information from web pages
|
|
- Disable to reduce costs or use models without vision support
|
|
- For GPT-4o, image processing costs approximately 800-1000 tokens (~$0.002 USD) per image (but this depends on the defined screen size)
|
|
- `save_conversation_path`: Path to save the complete conversation history. Useful for debugging.
|
|
- `system_prompt_class`: Custom system prompt class. See <a href="/customize/system-prompt">System Prompt</a> for customization options.
|
|
|
|
<Note>
|
|
Vision capabilities are recommended for better web interaction understanding,
|
|
but can be disabled to reduce costs or when using models without vision
|
|
support.
|
|
</Note>
|
|
|
|
## (Reuse) Browser Configuration
|
|
|
|
You can configure how the agent interacts with the browser. To see more `Browser` options refer to the <a href="/customize/browser-settings">Browser Settings</a> documentation.
|
|
|
|
### Reuse Existing Browser
|
|
|
|
`browser`: A Browser Use Browser instance. When provided, the agent will reuse this browser instance and automatically create new contexts for each `run()`.
|
|
|
|
```python
|
|
from browser_use import Agent, Browser
|
|
from playwright.async_api import BrowserContext
|
|
|
|
# Reuse existing browser
|
|
browser = Browser()
|
|
agent = Agent(
|
|
task=task1,
|
|
llm=llm,
|
|
browser=browser # Browser instance will be reused
|
|
)
|
|
|
|
await agent.run()
|
|
|
|
# Manually close the browser
|
|
await browser.close()
|
|
```
|
|
|
|
<Note>
|
|
Remember: in this scenario the `Browser` will not be closed automatically.
|
|
</Note>
|
|
|
|
### Reuse Existing Browser Context
|
|
|
|
`browser_context`: A Playwright browser context. Useful for maintaining persistent sessions. See <a href="/customize/persistent-browser">Persistent Browser</a> for more details.
|
|
|
|
```python
|
|
from browser_use import Agent, Browser
|
|
from playwright.async_api import BrowserContext
|
|
|
|
# Use specific browser context (preferred method)
|
|
async with await browser.new_context() as context:
|
|
agent = Agent(
|
|
task=task2,
|
|
llm=llm,
|
|
browser_context=context # Use persistent context
|
|
)
|
|
|
|
# Run the agent
|
|
await agent.run()
|
|
|
|
# Pass the context to the next agent
|
|
next_agent = Agent(
|
|
task=task2,
|
|
llm=llm,
|
|
browser_context=context
|
|
)
|
|
|
|
...
|
|
|
|
await browser.close()
|
|
```
|
|
|
|
For more information about how browser context works, refer to the [Playwright
|
|
documentation](https://playwright.dev/docs/api/class-browsercontext).
|
|
|
|
<Note>
|
|
You can reuse the same context for multiple agents. If you do nothing, the
|
|
browser will be automatically created and closed on `run()` completion.
|
|
</Note>
|
|
|
|
## Running the Agent
|
|
|
|
The agent is executed using the async `run()` method:
|
|
|
|
- `max_steps` (default: `100`)
|
|
Maximum number of steps the agent can take during execution. This prevents infinite loops and helps control execution time.
|
|
|
|
### Agent History
|
|
|
|
The method returns an `AgentHistoryList` object containing the complete execution history. This history is invaluable for debugging, analysis, and creating reproducible scripts.
|
|
|
|
```python
|
|
# Example of accessing history
|
|
history = await agent.run()
|
|
|
|
# Access (some) useful information
|
|
history.urls() # List of visited URLs
|
|
history.screenshots() # List of screenshot paths
|
|
history.action_names() # Names of executed actions
|
|
history.extracted_content() # Content extracted during execution
|
|
history.errors() # Any errors that occurred
|
|
history.model_actions() # All actions with their parameters
|
|
```
|
|
|
|
The `AgentHistoryList` provides many helper methods to analyze the execution:
|
|
|
|
- `final_result()`: Get the final extracted content
|
|
- `is_done()`: Check if the agent completed successfully
|
|
- `has_errors()`: Check if any errors occurred
|
|
- `model_thoughts()`: Get the agent's reasoning process
|
|
- `action_results()`: Get results of all actions
|
|
|
|
<Note>
|
|
For a complete list of helper methods and detailed history analysis
|
|
capabilities, refer to the [AgentHistoryList source
|
|
code](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/views.py#L111).
|
|
</Note>
|