mirror of
https://github.com/browser-use/browser-use
synced 2026-05-13 17:56:35 +02:00
246 lines
9.1 KiB
Plaintext
246 lines
9.1 KiB
Plaintext
---
|
|
title: "Custom Functions"
|
|
description: "Extend default agent and write custom action functions to do certain tasks"
|
|
icon: "function"
|
|
---
|
|
|
|
## Basic Function Registration
|
|
|
|
Custom action functions can be either `sync` or `async`. Keep them focused and single-purpose.
|
|
|
|
```python
|
|
from pydantic import BaseModel
|
|
from browser_use import Controller, ActionResult
|
|
|
|
controller = Controller()
|
|
|
|
@controller.action('Ask human for help with a question', domains=['https://difficult.example.com']) # pass allowed_domains= or page_filter= to limit actions to certain pages
|
|
def ask_human(question: str) -> ActionResult:
|
|
answer = input(f'\n{question}\nInput: ')
|
|
return ActionResult(extracted_content=answer, include_in_memory=True)
|
|
```
|
|
|
|
<Note>
|
|
Basic `Controller` has all basic functionality you might need to interact with
|
|
the browser already implemented. We provide a default set of common actions, e.g. `new_tab`, `input_text_into_element`, etc.
|
|
</Note>
|
|
|
|
```python
|
|
# pass controller to the agent to use it
|
|
agent = Agent(
|
|
task=task,
|
|
llm=llm,
|
|
controller=controller,
|
|
)
|
|
```
|
|
|
|
<Note>
|
|
Keep the function name and description short and concise. The LLM chooses between actions
|
|
to run solely based on the function name and description. The LLM decides the params to pass
|
|
to the action based on their names and type signatures. The stringified version of the [`ActionResult`](https://github.com/search?q=repo%3Abrowser-use%2Fbrowser-use+%22class+ActionResult%28BaseModel%29%22&type=code) returned
|
|
is passed back to the LLM, and optionally persisted in the long-term memory when `ActionResult(..., include_in_memory=True)`.
|
|
</Note>
|
|
|
|
## Browser-Aware Functions
|
|
|
|
For actions that need browser access, add special parameters after your action parameters. The framework will automatically inject these if you request them:
|
|
|
|
### Available Special Parameters
|
|
|
|
- `page: Page` - The current Playwright page (shortcut for `browser_session.get_current_page()`)
|
|
- `browser_session: BrowserSession` - The current browser session (and playwright context via `browser_session.browser_context`)
|
|
- `context: AgentContext` - Any optional top-level context object passed to the Agent, e.g. Agent(context=user_provided_obj)
|
|
- `page_extraction_llm: BaseChatModel` - LLM instance used for page content extraction
|
|
- `available_file_paths: list[str]` - List of available file paths for upload / processing
|
|
- `has_sensitive_data: bool` - Whether the action content contains sensitive data markers (check this to avoid logging sensitive data to terminal by accident)
|
|
|
|
```python
|
|
from playwright.async_api import Page
|
|
from browser_use import Controller, ActionResult
|
|
|
|
controller = Controller()
|
|
|
|
@controller.action('Type keyboard input into a page')
|
|
async def input_text_into_page(text: str, page: Page) -> ActionResult:
|
|
await page.keyboard.type(text)
|
|
return ActionResult(extracted_content='Website opened')
|
|
```
|
|
```python
|
|
from browser_use import BrowserSession, Controller, ActionResult
|
|
|
|
controller = Controller()
|
|
|
|
@controller.action('Open website')
|
|
async def open_website(url: str, browser_session: BrowserSession) -> ActionResult:
|
|
all_tabs = await browser_session.browser_context.pages
|
|
for tab in all_tabs:
|
|
if tab.url == url:
|
|
await tab.bring_to_foreground()
|
|
return ActionResult(extracted_content=f'Switched to tab with url {url}')
|
|
# Create a new tab if URL not found
|
|
new_tab = await browser_session.browser_context.new_page()
|
|
await new_tab.goto(url)
|
|
return ActionResult(extracted_content=f'Opened new tab with url {url}')
|
|
```
|
|
|
|
## Structured Parameters with Pydantic
|
|
|
|
For complex actions, define parameter schemas using Pydantic models. When using `param_model`, your function should accept the model as its first parameter:
|
|
|
|
```python
|
|
from pydantic import BaseModel
|
|
from typing import Optional
|
|
from playwright.async_api import Page
|
|
from browser_use import Controller, ActionResult, BrowserSession
|
|
|
|
controller = Controller()
|
|
|
|
class JobDetails(BaseModel):
|
|
title: str
|
|
company: str
|
|
job_link: str
|
|
salary: str | None = None # optional parameters are allowed
|
|
|
|
@controller.action(
|
|
'Save job details which you found on page',
|
|
param_model=JobDetails
|
|
)
|
|
async def save_job(params: JobDetails, page: Page) -> ActionResult:
|
|
print(f"Saving job: {params.title} at {params.company}")
|
|
|
|
await page.goto(params.job_link)
|
|
|
|
return ActionResult(extracted_content=f"Saved job: {params.title}")
|
|
```
|
|
|
|
## Action Definition Patterns
|
|
|
|
Browser Use supports two patterns for defining actions:
|
|
|
|
### Pattern 1: Pydantic Model First (Recommended for complex parameters)
|
|
When using `param_model`, your function receives the model instance as the first parameter:
|
|
|
|
```python
|
|
class MyParams(BaseModel):
|
|
field1: str
|
|
field2: int
|
|
|
|
@controller.action('My action', param_model=MyParams)
|
|
def my_action(params: MyParams, page: Page) -> ActionResult:
|
|
# params contains all action parameters
|
|
return ActionResult(extracted_content=f"Processed {params.field1}")
|
|
```
|
|
|
|
### Pattern 2: Loose Parameters (Recommended for simple actions)
|
|
Define parameters directly as function arguments:
|
|
|
|
```python
|
|
@controller.action('Click element')
|
|
def click_element(index: int, browser_session: BrowserSession) -> ActionResult:
|
|
# index is the action parameter
|
|
# browser_session is a special injected parameter
|
|
return ActionResult(extracted_content=f"Clicked element {index}")
|
|
```
|
|
|
|
### Important Rules
|
|
|
|
1. **Return an [`ActionResult`](https://github.com/search?q=repo%3Abrowser-use%2Fbrowser-use+%22class+ActionResult%28BaseModel%29%22&type=code)**: All actions should return an `ActionResult | str | None`.
|
|
2. **Type hints on arguments are required**: They are used to verify that action params don't conflict with special arguments injected by the controller (e.g. `page`)
|
|
3. **Actions called directly must be passed kwargs**: When calling actions from other actions, you must **call using kwargs only**, even though the actions are usually defined using positional args (for the same reasons as [pluggy](https://pluggy.readthedocs.io/en/stable/index.html#calling-hooks)).
|
|
Action arguments are always matched by name and type, **not** positional order, so this helps prevent ambiguity / reordering issues while keeping action signatures short.
|
|
```python
|
|
@controller.action('Fill in the country form field')
|
|
def input_country_field(country: str, page: Page) -> ActionResult:
|
|
await some_action(123, page=page) # ❌ not allowed: positional args, use kwarg syntax when calling
|
|
await some_action(abc=123, page=page) # ✅ allowed: action params & special kwargs
|
|
await some_other_action(params=OtherAction(abc=123), page=page) # ✅ allowed: params=model & special kwargs
|
|
```
|
|
|
|
```python3
|
|
# Pattern 2: Simple actions without defaults
|
|
async def input_pin_code(code: int, retries: int, page: Page): ... # ✅ all params required
|
|
|
|
# Pattern 2: Actions with defaults CANNOT use special params
|
|
async def wait(seconds: int = 3): ... # ✅ no special params needed
|
|
|
|
# This is a Python SyntaxError - non-default after default
|
|
async def input_pin_code(code: int, retries: int=3, page: Page): ... # ❌ Python SyntaxError!
|
|
|
|
# Pattern 1: Use Pydantic model for defaults + special params
|
|
class PinCodeParams(BaseModel):
|
|
code: int
|
|
retries: int = 3
|
|
|
|
@controller.action('...', param_model=PinCodeParams)
|
|
async def input_pin_code(params: PinCodeParams, page: Page): ... # ✅ best for defaults
|
|
```
|
|
|
|
---
|
|
|
|
|
|
## Using Custom Actions with multiple agents
|
|
|
|
You can use the same controller for multiple agents.
|
|
|
|
```python
|
|
controller = Controller()
|
|
|
|
# ... register actions to the controller
|
|
|
|
agent = Agent(
|
|
task="Go to website X and find the latest news",
|
|
llm=llm,
|
|
controller=controller
|
|
)
|
|
|
|
# Run the agent
|
|
await agent.run()
|
|
|
|
agent2 = Agent(
|
|
task="Go to website Y and find the latest news",
|
|
llm=llm,
|
|
controller=controller
|
|
)
|
|
|
|
await agent2.run()
|
|
```
|
|
|
|
<Note>
|
|
The controller is stateless and can be used to register multiple actions and
|
|
multiple agents.
|
|
</Note>
|
|
|
|
|
|
|
|
## Exclude functions
|
|
|
|
If you want to exclude some registered actions and make them unavailable to the agent, you can do:
|
|
```python
|
|
controller = Controller(exclude_actions=['open_tab', 'search_google'])
|
|
agent = Agent(controller=controller, ...)
|
|
```
|
|
|
|
For more examples like file upload or notifications, visit [examples/custom-functions](https://github.com/browser-use/browser-use/tree/main/examples/custom-functions).
|
|
|
|
|
|
If you want actions to only be available on certain pages, and to not tell the LLM about them on other pages,
|
|
you can use the `allowed_domains` and `page_filter`:
|
|
|
|
```python
|
|
from pydantic import BaseModel
|
|
from browser_use import Controller, ActionResult
|
|
|
|
controller = Controller()
|
|
|
|
async def is_ai_allowed(page: Page):
|
|
if api.some_service.check_url(page.url):
|
|
logger.warning('Allowing AI agent to visit url:', page.url)
|
|
return True
|
|
return False
|
|
|
|
@controller.action('Fill out secret_form', allowed_domains=['https://*.example.com'], page_filter=is_ai_allowed)
|
|
def fill_out_form(...) -> ActionResult:
|
|
... will only be runnable by LLM on pages that match https://*.example.com *AND* where is_ai_allowed(page) returns True
|
|
|
|
```
|