browser-use/docs/customize/agent-settings.mdx

---
title: "Agent Settings"
description: "Learn how to configure the agent"
icon: "gear"
---

## Overview

The `Agent` class is the core component of Browser Use that handles browser automation. Here are the main configuration options you can use when initializing an agent.

## Basic Settings

```python
from browser_use import Agent
from browser_use.llm import ChatOpenAI

agent = Agent(
    task="Search for latest news about AI",
    llm=ChatOpenAI(model="gpt-4o"),
)
```

### Required Parameters

- `task`: The instruction for the agent to execute
- `llm`: A chat model instance. See <a href="/customize/supported-models">Supported Models</a> for supported models.

## Agent Behavior

Control how the agent operates:

```python
agent = Agent(
    task="your task",
    llm=llm,
    controller=custom_controller,  # For custom tool calling
    use_vision=True,              # Enable vision capabilities
    save_conversation_path="logs/conversation"  # Save chat logs
)
```

### Behavior Parameters

- `controller`: Registry of functions the agent can call. Defaults to base Controller. See <a href="/customize/custom-functions">Custom Functions</a> for details.
- `use_vision`: Enable/disable vision capabilities. Defaults to `True`.
  - When enabled, the model processes visual information from web pages
  - Disable to reduce costs or use models without vision support
  - For GPT-4o, image processing costs approximately 800-1000 tokens (~$0.002 USD) per image (but this depends on the defined screen size)
- `save_conversation_path`: Path to save the complete conversation history. Useful for debugging.
- `override_system_message`: Completely replace the default system prompt with a custom one.
- `extend_system_message`: Add additional instructions to the default system prompt.

<Note>
  Vision capabilities are recommended for better web interaction understanding,
  but can be disabled to reduce costs or when using models without vision
  support.
</Note>

### Reuse Existing Browser Context

By default browser-use launches its own builtin browser using playwright chromium.
You can also connect to a remote browser or pass any of the following
existing playwright objects to the Agent: `page`, `browser_context`, `browser`, `browser_session`, or `browser_profile`.

These all get passed down to create a `BrowserSession` for the `Agent`:

```python
agent = Agent(
    task='book a flight to fiji',
    llm=llm,
    browser_profile=browser_profile,  # use this profile to create a BrowserSession
    browser_session=BrowserSession(   # use an existing BrowserSession
      cdp_url=...,                      # remote CDP browser to connect to
      # or
      wss_url=...,                      # remote wss playwright server provider
      # or
      browser_pid=...                   # pid of a locally running browser process to attach to
      # or
      executable_path=...               # provide a custom chrome binary path
      # or
      channel=...                       # specify chrome, chromium, ms-edge, etc.
      # or
      page=page,                        # use an existing playwright Page object
      # or
      browser_context=browser_context,  # use an existing playwright BrowserContext object
      # or
      browser=browser,                  # use an existing playwright Browser object
    ),
)
```

For example, to connect to an existing browser over CDP you could do:

```python
agent = Agent(
    ...
    browser_session=BrowserSession(cdp_url='http://localhost:9222'),
)
```

For example, to connect to a local running chrome instance you can do:

```python
agent = Agent(
    ...
    browser_session=BrowserSession(browser_pid=1234),
)
```

See <a href="/customize/real-browser">Connect to your Browser</a> for more info.

<Note>
  You can reuse the same `BrowserSession` after an agent has completed running.
  If you do nothing, the browser will be automatically closed on `run()`
  completion only if it was launched by us.
</Note>

## Running the Agent

The agent is executed using the async `run()` method:

- `max_steps` (default: `100`)
  Maximum number of steps the agent can take during execution. This prevents infinite loops and helps control execution time.

## Agent History

The method returns an `AgentHistoryList` object containing the complete execution history. This history is invaluable for debugging, analysis, and creating reproducible scripts.

```python
# Example of accessing history
history = await agent.run()

# Access (some) useful information
history.urls()              # List of visited URLs
history.screenshots()       # List of screenshot paths
history.action_names()      # Names of executed actions
history.extracted_content() # Content extracted during execution
history.errors()           # Any errors that occurred
history.model_actions()     # All actions with their parameters
```

The `AgentHistoryList` provides many helper methods to analyze the execution:

- `final_result()`: Get the final extracted content
- `is_done()`: Check if the agent completed successfully
- `has_errors()`: Check if any errors occurred
- `model_thoughts()`: Get the agent's reasoning process
- `action_results()`: Get results of all actions

<Note>
  For a complete list of helper methods and detailed history analysis
  capabilities, refer to the [AgentHistoryList source
  code](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/views.py#L111).
</Note>

## Run initial actions without LLM

With [this example](https://github.com/browser-use/browser-use/blob/main/examples/features/initial_actions.py) you can run initial actions without the LLM.
Specify the action as a dictionary where the key is the action name and the value is the action parameters. You can find all our actions in the [Controller](https://github.com/browser-use/browser-use/blob/main/browser_use/controller/service.py) source code.

```python

initial_actions = [
	{'go_to_url': {'url': 'https://www.google.com', 'new_tab': True}},
	{'go_to_url': {'url': 'https://en.wikipedia.org/wiki/Randomness', 'new_tab': True}},
	{'scroll_down': {'amount': 1000}},
]
agent = Agent(
	task='What theories are displayed on the page?',
	initial_actions=initial_actions,
	llm=llm,
)
```

## Run with message context

You can configure the agent and provide a separate message to help the LLM understand the task better.

```python
from browser_use.llm import ChatOpenAI

agent = Agent(
    task="your task",
    message_context="Additional information about the task",
    llm = ChatOpenAI(model='gpt-4o')
)
```

## Run with planner model

You can configure the agent to use a separate planner model for high-level task planning:

```python
from browser_use.llm import ChatOpenAI

# Initialize models
llm = ChatOpenAI(model='gpt-4o')
planner_llm = ChatOpenAI(model='o3-mini')

agent = Agent(
    task="your task",
    llm=llm,
    planner_llm=planner_llm,           # Separate model for planning
    use_vision_for_planner=False,      # Disable vision for planner
    planner_interval=4                 # Plan every 4 steps
)
```

### Planner Parameters

- `planner_llm`: A chat model instance used for high-level task planning. Can be a smaller/cheaper model than the main LLM.
- `use_vision_for_planner`: Enable/disable vision capabilities for the planner model. Defaults to `True`.
- `planner_interval`: Number of steps between planning phases. Defaults to `1`.

Using a separate planner model can help:

- Reduce costs by using a smaller model for high-level planning
- Improve task decomposition and strategic thinking
- Better handle complex, multi-step tasks

<Note>
  The planner model is optional. If not specified, the agent will not use the
  planner model.
</Note>

### Optional Parameters

- `message_context`: Additional information about the task to help the LLM understand the task better.
- `initial_actions`: List of initial actions to run before the main task.
- `max_actions_per_step`: Maximum number of actions to run in a step. Defaults to `10`.
- `max_failures`: Maximum number of failures before giving up. Defaults to `3`.
- `retry_delay`: Time to wait between retries in seconds when rate limited. Defaults to `10`.
- `generate_gif`: Enable/disable GIF generation. Defaults to `False`. Set to `True` or a string path to save the GIF.

## Memory

Memory management in browser-use has been significantly improved since version 0.3.2. The agent's context handling and state management are now robust enough that the previous memory system (`mem0`) is no longer needed or supported.

The agent maintains its context and task progress through:

- Detailed history tracking of actions and results
- Structured state management
- Clear goal setting and evaluation at each step

The `enable_memory` parameter has been removed as the new system provides better context management by default.

<Note>
  If you're upgrading from an older version that used `enable_memory`, simply remove this parameter. The agent will automatically use the improved context management system.
</Note>