mirror of
https://github.com/browser-use/browser-use
synced 2026-05-06 17:52:15 +02:00
251 lines
11 KiB
Plaintext
251 lines
11 KiB
Plaintext
---
|
|
title: "Custom Functions"
|
|
description: "Extend default agent and write custom action functions to do certain tasks"
|
|
icon: "function"
|
|
---
|
|
|
|
Custom actions are functions *you* provide, that are added to our [default actions](https://github.com/browser-use/browser-use/blob/main/browser_use/controller/service.py) the agent can use to accomplish tasks.
|
|
Action functions can request [arbitrary parameters](#action-parameters-via-pydantic-model) that the LLM has to come up with + a fixed set of [framework-provided arguments](#framework-provided-parameters) for browser APIs / `Agent(context=...)` / etc.
|
|
|
|
<Note>
|
|
Our default set of actions is already quite powerful, the built-in `Controller` provides basics like `go_to_url`, `scroll_down`, `extract_content`, [and more](https://github.com/browser-use/browser-use/blob/main/browser_use/controller/service.py).
|
|
</Note>
|
|
|
|
It's easy to add your own actions to implement additional custom behaviors, integrations with other apps, or performance optimizations.
|
|
|
|
For examples of custom actions (e.g. uploading files, asking a human-in-the-loop for help, drawing a polygon with the mouse, and more), see [examples/custom-functions](https://github.com/browser-use/browser-use/tree/main/examples/custom-functions).
|
|
|
|
|
|
## Action Function Registration
|
|
|
|
To register your own custom functions (which can be `sync` or `async`), decorate them with the `@controller.action(...)` decorator. This saves them into the `controller.registry`.
|
|
|
|
```python
|
|
from browser_use import Controller, ActionResult
|
|
|
|
controller = Controller()
|
|
|
|
@controller.action('Ask human for help with a question', domains=['example.com']) # pass allowed_domains= or page_filter= to limit actions to certain pages
|
|
def ask_human(question: str) -> ActionResult:
|
|
answer = input(f'{question} > ')
|
|
return ActionResult(extracted_content=f'The human responded with: {answer}', include_in_memory=True)
|
|
```
|
|
|
|
```python
|
|
# Then pass your controller to the agent to use it
|
|
agent = Agent(
|
|
task='...',
|
|
llm=llm,
|
|
controller=controller,
|
|
)
|
|
```
|
|
|
|
<Note>
|
|
Keep your action function names and descriptions short and concise:
|
|
- The LLM chooses between actions to run solely based on the function name and description
|
|
- The LLM decides how to fill action params based on their names, type hints, & defaults
|
|
</Note>
|
|
|
|
---
|
|
|
|
## Action Parameters
|
|
|
|
Browser Use supports two patterns for defining action parameters: normal function arguments, or a Pydantic model.
|
|
|
|
### Function Arguments
|
|
|
|
For simple actions that don't need default values, you can define the action parameters directly as arguments to the function. This one takes a single string argument, `css_selector`.
|
|
When the LLM calls an action, it sees its argument names & types, and will provide values that fit.
|
|
|
|
```python
|
|
@controller.action('Click element')
|
|
def click_element(css_selector: str, page: Page) -> ActionResult:
|
|
# css_selector is an action param the LLM must provide when calling
|
|
# page is a special framework-provided param to access the browser APIs (see below)
|
|
await page.locator(css_selector).click()
|
|
return ActionResult(extracted_content=f"Clicked element {css_selector}")
|
|
```
|
|
|
|
### Pydantic Model
|
|
|
|
You can define a pydantic model for the parameters your action expects by setting a `@controller.action(..., param_model=MyParams)`.
|
|
This allows you to use optional parameters, default values, `Annotated[...]` types with custom validation, field descriptions, and other features offered by pydantic.
|
|
|
|
When the agent calls calls your agent function, an instance of your model with the values filled by the LLM will be passed as the argument named `params` to your action function.
|
|
|
|
Using a pydantic model is helpful because it allows more flexibility and power to enforce the schema of the values the LLM should provide.
|
|
The LLM gets the entire pydantic JSON schema for your `param_model`, it will see the function name & description + individual field names, types, descriptions, and default values.
|
|
|
|
|
|
```python
|
|
from typing import Annotated
|
|
from pydantic import BaseModel, AfterValidator
|
|
from browser_use import ActionResult
|
|
|
|
class MyParams(BaseModel):
|
|
field1: int
|
|
field2: str = 'default value'
|
|
field3: Annotated[str, AfterValidator(lambda s: s.lower())] # example: enforce always lowercase
|
|
field4: str = Field(default='abc', description='Detailed description for the LLM')
|
|
|
|
@controller.action('My action', param_model=MyParams)
|
|
def my_action(params: MyParams, page: Page) -> ActionResult:
|
|
await page.keyboard.type(params.field2)
|
|
return ActionResult(extracted_content=f"Inputted {params} on {page.url}")
|
|
```
|
|
|
|
Any special framework-provided arguments (e.g. `page`) will be passed as separate positional arguments after `params`.
|
|
|
|
<Important>
|
|
To use a `BaseModel` the arg *must* be called `params`. Action function args are matched and filled like named arguments; arg order doesn't matter but names and types do.
|
|
</Important>
|
|
|
|
### Framework-Provided Parameters
|
|
|
|
These special action parameters are injected by the `Controller` and are passed as extra args to any actions that expect them.
|
|
|
|
For example, actions that need to run playwright code to interact with the browser should take the argument `page` or `browser_session`.
|
|
|
|
- `page: Page` - The current Playwright page (shortcut for `browser_session.get_current_page()`)
|
|
- `browser_session: BrowserSession` - The current browser session (and playwright context via `browser_session.browser_context`)
|
|
- `context: AgentContext` - Any optional top-level context object passed to the Agent, e.g. `Agent(context=user_provided_obj)`
|
|
- `page_extraction_llm: BaseChatModel` - LLM instance used for page content extraction
|
|
- `available_file_paths: list[str]` - List of available file paths for upload / processing
|
|
- `has_sensitive_data: bool` - Whether the action content contains sensitive data markers (check this to avoid logging sensitive data to terminal by accident)
|
|
|
|
#### Example: Action uses the current `page`
|
|
|
|
```python
|
|
from browser_use.browser.types import Page
|
|
from browser_use import Controller, ActionResult
|
|
|
|
controller = Controller()
|
|
|
|
@controller.action('Type keyboard input into a page')
|
|
async def input_text_into_page(text: str, page: Page) -> ActionResult:
|
|
await page.keyboard.type(text)
|
|
return ActionResult(extracted_content='Website opened')
|
|
```
|
|
|
|
#### Example: Action uses the `browser_context`
|
|
|
|
```python
|
|
from browser_use import BrowserSession, Controller, ActionResult
|
|
|
|
controller = Controller()
|
|
|
|
@controller.action('Open website')
|
|
async def open_website(url: str, browser_session: BrowserSession) -> ActionResult:
|
|
# find matching existing tab by looking through all pages in playwright browser_context
|
|
all_tabs = await browser_session.browser_context.pages
|
|
for tab in all_tabs:
|
|
if tab.url == url:
|
|
await tab.bring_to_foreground()
|
|
return ActionResult(extracted_content=f'Switched to tab with url {url}')
|
|
# otherwise, create a new tab
|
|
new_tab = await browser_session.browser_context.new_page()
|
|
await new_tab.goto(url)
|
|
return ActionResult(extracted_content=f'Opened new tab with url {url}')
|
|
```
|
|
|
|
|
|
---
|
|
|
|
|
|
## Important Rules
|
|
|
|
1. **Return an [`ActionResult`](https://github.com/search?q=repo%3Abrowser-use%2Fbrowser-use+%22class+ActionResult%28BaseModel%29%22&type=code)**: All actions should return an `ActionResult | str | None`. The stringified version of the result is passed back to the LLM, and optionally persisted in the long-term memory when `ActionResult(..., include_in_memory=True)`.
|
|
2. **Type hints on arguments are required**: They are used to verify that action params don't conflict with special arguments injected by the controller (e.g. `page`)
|
|
3. **Actions functions called directly must be passed kwargs**: When calling actions from other actions or python code, you must **pass all parameters as kwargs only**, even though the actions are usually defined using positional args (for the same reasons as [pluggy](https://pluggy.readthedocs.io/en/stable/index.html#calling-hooks)).
|
|
Action arguments are always matched by name and type, **not** positional order, so this helps prevent ambiguity / reordering issues while keeping action signatures short.
|
|
```python
|
|
@controller.action('Fill in the country form field')
|
|
def input_country_field(country: str, page: Page) -> ActionResult:
|
|
await some_action(123, page=page) # ❌ not allowed: positional args, use kwarg syntax when calling
|
|
await some_action(abc=123, page=page) # ✅ allowed: action params & special kwargs
|
|
await some_other_action(params=OtherAction(abc=123), page=page) # ✅ allowed: params=model & special kwargs
|
|
```
|
|
|
|
```python
|
|
# Using Pydantic Model to define action params (recommended)
|
|
class PinCodeParams(BaseModel):
|
|
code: int
|
|
retries: int = 3 # ✅ supports optional/defaults
|
|
|
|
@controller.action('...', param_model=PinCodeParams)
|
|
async def input_pin_code(params: PinCodeParams, page: Page): ... # ✅ special params at the end
|
|
|
|
# Using function arguments to define action params
|
|
async def input_pin_code(code: int, retries: int, page: Page): ... # ✅ params first, special params second, no defaults
|
|
async def input_pin_code(code: int, retries: int=3): ... # ✅ defaults ok only if no special params needed
|
|
async def input_pin_code(code: int, retries: int=3, page: Page): ... # ❌ Python SyntaxError! not allowed
|
|
```
|
|
|
|
|
|
---
|
|
|
|
|
|
## Reusing Custom Actions Across Agents
|
|
|
|
You can use the same controller for multiple agents.
|
|
|
|
```python
|
|
controller = Controller()
|
|
|
|
# ... register actions to the controller
|
|
|
|
agent = Agent(
|
|
task="Go to website X and find the latest news",
|
|
llm=llm,
|
|
controller=controller
|
|
)
|
|
|
|
# Run the agent
|
|
await agent.run()
|
|
|
|
agent2 = Agent(
|
|
task="Go to website Y and find the latest news",
|
|
llm=llm,
|
|
controller=controller
|
|
)
|
|
|
|
await agent2.run()
|
|
```
|
|
|
|
<Note>
|
|
The controller is stateless and can be used to register multiple actions and
|
|
multiple agents.
|
|
</Note>
|
|
|
|
|
|
|
|
## Exclude functions
|
|
|
|
If you want to exclude some registered actions and make them unavailable to the agent, you can do:
|
|
```python
|
|
controller = Controller(exclude_actions=['search_google'])
|
|
agent = Agent(controller=controller, ...)
|
|
```
|
|
|
|
|
|
If you want actions to only be available on certain pages, and to not tell the LLM about them on other pages,
|
|
you can use the `allowed_domains` and `page_filter`:
|
|
|
|
```python
|
|
from pydantic import BaseModel
|
|
from browser_use import Controller, ActionResult
|
|
|
|
controller = Controller()
|
|
|
|
async def is_ai_allowed(page: Page):
|
|
if api.some_service.check_url(page.url):
|
|
logger.warning('Allowing AI agent to visit url:', page.url)
|
|
return True
|
|
return False
|
|
|
|
@controller.action('Fill out secret_form', allowed_domains=['https://*.example.com'], page_filter=is_ai_allowed)
|
|
def fill_out_form(...) -> ActionResult:
|
|
... will only be runnable by LLM on pages that match https://*.example.com *AND* where is_ai_allowed(page) returns True
|
|
|
|
```
|