mirror of
https://github.com/browser-use/browser-use
synced 2026-05-06 17:52:15 +02:00
383 lines
14 KiB
Plaintext
383 lines
14 KiB
Plaintext
---
|
|
title: "Lifecycle Hooks"
|
|
description: "Customize agent behavior with lifecycle hooks"
|
|
icon: "Wrench"
|
|
author: "Carlos A. Planchón"
|
|
---
|
|
|
|
Browser-Use provides lifecycle hooks that allow you to execute custom code at specific points during the agent's execution.
|
|
Hook functions can be used to read and modify agent state while running, implement custom logic, change configuration, integrate the Agent with external applications.
|
|
|
|
## Available Hooks
|
|
|
|
Currently, Browser-Use provides the following hooks:
|
|
|
|
| Hook | Description | When it's called |
|
|
| --------------- | -------------------------------------------- | ------------------------------------------------------------------------------------------------- |
|
|
| `on_step_start` | Executed at the beginning of each agent step | Before the agent processes the current state and decides on the next action |
|
|
| `on_step_end` | Executed at the end of each agent step | After the agent has executed all the actions for the current step, before it starts the next step |
|
|
|
|
```python
|
|
await agent.run(on_step_start=..., on_step_end=...)
|
|
```
|
|
|
|
Each hook should be an `async` callable function that accepts the `agent` instance as its only parameter.
|
|
|
|
### Basic Example
|
|
|
|
```python
|
|
from browser_use import Agent
|
|
from browser_use.llm import ChatOpenAI
|
|
|
|
|
|
async def my_step_hook(agent: Agent):
|
|
# inside a hook you can access all the state and methods under the Agent object:
|
|
# agent.settings, agent.state, agent.task
|
|
# agent.controller, agent.llm, agent.browser_session
|
|
# agent.pause(), agent.resume(), agent.add_new_task(...), etc.
|
|
|
|
# You also have direct access to the playwright Page and Browser Context
|
|
page = await agent.browser_session.get_current_page()
|
|
# https://playwright.dev/python/docs/api/class-page
|
|
|
|
current_url = page.url
|
|
visit_log = agent.state.history.urls()
|
|
previous_url = visit_log[-2] if len(visit_log) >= 2 else None
|
|
print(f"Agent was last on URL: {previous_url} and is now on {current_url}")
|
|
|
|
# Example: listen for events on the page, interact with the DOM, run JS directly, etc.
|
|
await page.on('domcontentloaded', lambda: print('page navigated to a new url...'))
|
|
await page.locator("css=form > input[type=submit]").click()
|
|
await page.evaluate('() => alert(1)')
|
|
await page.browser.new_tab
|
|
await agent.browser_session.session.context.add_init_script('/* some JS to run on every page */')
|
|
|
|
# Example: monitor or intercept all network requests
|
|
async def handle_request(route):
|
|
# Print, modify, block, etc. do anything to the requests here
|
|
# https://playwright.dev/python/docs/network#handle-requests
|
|
print(route.request, route.request.headers)
|
|
await route.continue_(headers=route.request.headers)
|
|
await page.route("**/*", handle_route)
|
|
|
|
# Example: pause agent execution and resume it based on some custom code
|
|
if '/completed' in current_url:
|
|
agent.pause()
|
|
Path('result.txt').write_text(await page.content())
|
|
input('Saved "completed" page content to result.txt, press [Enter] to resume...')
|
|
agent.resume()
|
|
|
|
agent = Agent(
|
|
task="Search for the latest news about AI",
|
|
llm=ChatOpenAI(model="gpt-4o"),
|
|
)
|
|
|
|
await agent.run(
|
|
on_step_start=my_step_hook,
|
|
# on_step_end=...
|
|
max_steps=10
|
|
)
|
|
```
|
|
|
|
## Data Available in Hooks
|
|
|
|
When working with agent hooks, you have access to the entire `Agent` instance. Here are some useful data points you can access:
|
|
|
|
- `agent.task` lets you see what the main task is, `agent.add_new_task(...)` lets you queue up a new one
|
|
- `agent.controller` give access to the `Controller()` object and `Registry()` containing the available actions
|
|
- `agent.controller.registry.execute_action('click_element_by_index', {'index': 123}, browser_session=agent.browser_session)`
|
|
- `agent.context` lets you access any user-provided context object passed in to `Agent(context=...)`
|
|
- `agent.sensitive_data` contains the sensitive data dict, which can be updated in-place to add/remove/modify items
|
|
- `agent.settings` contains all the configuration options passed to the `Agent(...)` at init time
|
|
- `agent.llm` gives direct access to the main LLM object (e.g. `ChatOpenAI`)
|
|
- `agent.state` gives access to lots of internal state, including agent thoughts, outputs, actions, etc.
|
|
- `agent.state.history.model_thoughts()`: Reasoning from Browser Use's model.
|
|
- `agent.state.history.model_outputs()`: Raw outputs from the Browsre Use's model.
|
|
- `agent.state.history.model_actions()`: Actions taken by the agent
|
|
- `agent.state.history.extracted_content()`: Content extracted from web pages
|
|
- `agent.state.history.urls()`: URLs visited by the agent
|
|
- `agent.browser_session` gives direct access to the `BrowserSession()` and playwright objects
|
|
- `agent.browser_session.get_current_page()`: Get the current playwright `Page` object the agent is focused on
|
|
- `agent.browser_session.browser_context`: Get the current playwright `BrowserContext` object
|
|
- `agent.browser_session.browser_context.pages`: Get all the tabs currently open in the context
|
|
- `agent.browser_session.get_page_html()`: Current page HTML
|
|
- `agent.browser_session.take_screenshot()`: Screenshot of the current page
|
|
|
|
## Tips for Using Hooks
|
|
|
|
- **Avoid blocking operations**: Since hooks run in the same execution thread as the agent, try to keep them efficient or use asynchronous patterns.
|
|
- **Handle exceptions**: Make sure your hook functions handle exceptions gracefully to prevent interrupting the agent's main flow.
|
|
- **Use custom actions instead**: hooks are fairly advanced, most things can be implemented with [custom action functions](/customize/custom-functions) instead
|
|
|
|
---
|
|
|
|
## Complex Example: Agent Activity Recording System
|
|
|
|
This comprehensive example demonstrates a complete implementation for recording and saving Browser-Use agent activity, consisting of both server and client components.
|
|
|
|
### Setup Instructions
|
|
|
|
To use this example, you'll need to:
|
|
|
|
1. Set up the required dependencies:
|
|
|
|
```bash
|
|
pip install fastapi uvicorn prettyprinter pyobjtojson dotenv browser-use
|
|
```
|
|
|
|
2. Create two separate Python files:
|
|
|
|
- `api.py` - The FastAPI server component
|
|
- `client.py` - The Browser-Use agent with recording hook
|
|
|
|
3. Run both components:
|
|
- Start the API server first: `python api.py`
|
|
- Then run the client: `python client.py`
|
|
|
|
### Server Component (api.py)
|
|
|
|
The server component handles receiving and storing the agent's activity data:
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
|
|
#
|
|
# FastAPI API to record and save Browser-Use activity data.
|
|
# Save this code to api.py and run with `python api.py`
|
|
#
|
|
|
|
import json
|
|
import base64
|
|
from pathlib import Path
|
|
|
|
from fastapi import FastAPI, Request
|
|
import prettyprinter
|
|
import uvicorn
|
|
|
|
prettyprinter.install_extras()
|
|
|
|
# Utility function to save screenshots
|
|
def b64_to_png(b64_string: str, output_file):
|
|
"""
|
|
Convert a Base64-encoded string to a PNG file.
|
|
|
|
:param b64_string: A string containing Base64-encoded data
|
|
:param output_file: The path to the output PNG file
|
|
"""
|
|
with open(output_file, "wb") as f:
|
|
f.write(base64.b64decode(b64_string))
|
|
|
|
# Initialize FastAPI app
|
|
app = FastAPI()
|
|
|
|
|
|
@app.post("/post_agent_history_step")
|
|
async def post_agent_history_step(request: Request):
|
|
data = await request.json()
|
|
prettyprinter.cpprint(data)
|
|
|
|
# Ensure the "recordings" folder exists using pathlib
|
|
recordings_folder = Path("recordings")
|
|
recordings_folder.mkdir(exist_ok=True)
|
|
|
|
# Determine the next file number by examining existing .json files
|
|
existing_numbers = []
|
|
for item in recordings_folder.iterdir():
|
|
if item.is_file() and item.suffix == ".json":
|
|
try:
|
|
file_num = int(item.stem)
|
|
existing_numbers.append(file_num)
|
|
except ValueError:
|
|
# In case the file name isn't just a number
|
|
pass
|
|
|
|
if existing_numbers:
|
|
next_number = max(existing_numbers) + 1
|
|
else:
|
|
next_number = 1
|
|
|
|
# Construct the file path
|
|
file_path = recordings_folder / f"{next_number}.json"
|
|
|
|
# Save the JSON data to the file
|
|
with file_path.open("w") as f:
|
|
json.dump(data, f, indent=2)
|
|
|
|
# Optionally save screenshot if needed
|
|
# if "website_screenshot" in data and data["website_screenshot"]:
|
|
# screenshot_folder = Path("screenshots")
|
|
# screenshot_folder.mkdir(exist_ok=True)
|
|
# b64_to_png(data["website_screenshot"], screenshot_folder / f"{next_number}.png")
|
|
|
|
return {"status": "ok", "message": f"Saved to {file_path}"}
|
|
|
|
if __name__ == "__main__":
|
|
print("Starting Browser-Use recording API on http://0.0.0.0:9000")
|
|
uvicorn.run(app, host="0.0.0.0", port=9000)
|
|
```
|
|
|
|
### Client Component (client.py)
|
|
|
|
The client component runs the Browser-Use agent with a recording hook:
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
|
|
#
|
|
# Client to record and save Browser-Use activity.
|
|
# Save this code to client.py and run with `python client.py`
|
|
#
|
|
|
|
import asyncio
|
|
import requests
|
|
from dotenv import load_dotenv
|
|
from pyobjtojson import obj_to_json
|
|
from browser_use.llm import ChatOpenAI
|
|
from browser_use import Agent
|
|
|
|
# Load environment variables (for API keys)
|
|
load_dotenv()
|
|
|
|
|
|
def send_agent_history_step(data):
|
|
"""Send the agent step data to the recording API"""
|
|
url = "http://127.0.0.1:9000/post_agent_history_step"
|
|
response = requests.post(url, json=data)
|
|
return response.json()
|
|
|
|
|
|
async def record_activity(agent_obj):
|
|
"""Hook function that captures and records agent activity at each step"""
|
|
website_html = None
|
|
website_screenshot = None
|
|
urls_json_last_elem = None
|
|
model_thoughts_last_elem = None
|
|
model_outputs_json_last_elem = None
|
|
model_actions_json_last_elem = None
|
|
extracted_content_json_last_elem = None
|
|
|
|
print('--- ON_STEP_START HOOK ---')
|
|
|
|
# Capture current page state
|
|
website_html = await agent_obj.browser_session.get_page_html()
|
|
website_screenshot = await agent_obj.browser_session.take_screenshot()
|
|
|
|
# Make sure we have state history
|
|
if hasattr(agent_obj, "state"):
|
|
history = agent_obj.state.history
|
|
else:
|
|
history = None
|
|
print("Warning: Agent has no state history")
|
|
return
|
|
|
|
# Process model thoughts
|
|
model_thoughts = obj_to_json(
|
|
obj=history.model_thoughts(),
|
|
check_circular=False
|
|
)
|
|
if len(model_thoughts) > 0:
|
|
model_thoughts_last_elem = model_thoughts[-1]
|
|
|
|
# Process model outputs
|
|
model_outputs = agent_obj.state.history.model_outputs()
|
|
model_outputs_json = obj_to_json(
|
|
obj=model_outputs,
|
|
check_circular=False
|
|
)
|
|
if len(model_outputs_json) > 0:
|
|
model_outputs_json_last_elem = model_outputs_json[-1]
|
|
|
|
# Process model actions
|
|
model_actions = agent_obj.state.history.model_actions()
|
|
model_actions_json = obj_to_json(
|
|
obj=model_actions,
|
|
check_circular=False
|
|
)
|
|
if len(model_actions_json) > 0:
|
|
model_actions_json_last_elem = model_actions_json[-1]
|
|
|
|
# Process extracted content
|
|
extracted_content = agent_obj.state.history.extracted_content()
|
|
extracted_content_json = obj_to_json(
|
|
obj=extracted_content,
|
|
check_circular=False
|
|
)
|
|
if len(extracted_content_json) > 0:
|
|
extracted_content_json_last_elem = extracted_content_json[-1]
|
|
|
|
# Process URLs
|
|
urls = agent_obj.state.history.urls()
|
|
urls_json = obj_to_json(
|
|
obj=urls,
|
|
check_circular=False
|
|
)
|
|
if len(urls_json) > 0:
|
|
urls_json_last_elem = urls_json[-1]
|
|
|
|
# Create a summary of all data for this step
|
|
model_step_summary = {
|
|
"website_html": website_html,
|
|
"website_screenshot": website_screenshot,
|
|
"url": urls_json_last_elem,
|
|
"model_thoughts": model_thoughts_last_elem,
|
|
"model_outputs": model_outputs_json_last_elem,
|
|
"model_actions": model_actions_json_last_elem,
|
|
"extracted_content": extracted_content_json_last_elem
|
|
}
|
|
|
|
print("--- MODEL STEP SUMMARY ---")
|
|
print(f"URL: {urls_json_last_elem}")
|
|
|
|
# Send data to the API
|
|
result = send_agent_history_step(data=model_step_summary)
|
|
print(f"Recording API response: {result}")
|
|
|
|
|
|
async def run_agent():
|
|
"""Run the Browser-Use agent with the recording hook"""
|
|
agent = Agent(
|
|
task="Compare the price of gpt-4o and DeepSeek-V3",
|
|
llm=ChatOpenAI(model="gpt-4o"),
|
|
)
|
|
|
|
try:
|
|
print("Starting Browser-Use agent with recording hook")
|
|
await agent.run(
|
|
on_step_start=record_activity,
|
|
max_steps=30
|
|
)
|
|
except Exception as e:
|
|
print(f"Error running agent: {e}")
|
|
|
|
|
|
if __name__ == "__main__":
|
|
# Check if API is running
|
|
try:
|
|
requests.get("http://127.0.0.1:9000")
|
|
print("Recording API is available")
|
|
except:
|
|
print("Warning: Recording API may not be running. Start api.py first.")
|
|
|
|
# Run the agent
|
|
asyncio.run(run_agent())
|
|
```
|
|
|
|
Contribution by Carlos A. Planchón.
|
|
|
|
### Working with the Recorded Data
|
|
|
|
After running the agent, you'll find the recorded data in the `recordings` directory. Here's how you can use this data:
|
|
|
|
1. **View recorded sessions**: Each JSON file contains a snapshot of agent activity for one step
|
|
2. **Extract screenshots**: You can modify the API to save screenshots separately
|
|
3. **Analyze agent behavior**: Use the recorded data to study how the agent navigates websites
|
|
|
|
### Extending the Example
|
|
|
|
You can extend this recording system in several ways:
|
|
|
|
1. **Save screenshots separately**: Uncomment the screenshot saving code in the API
|
|
2. **Add a web dashboard**: Create a simple web interface to view recorded sessions
|
|
3. **Add session IDs**: Modify the API to group steps by agent session
|
|
4. **Add filtering**: Implement filters to record only specific types of actions
|