mirror of
https://github.com/browser-use/browser-use
synced 2026-05-06 17:52:15 +02:00
347 lines
11 KiB
Plaintext
347 lines
11 KiB
Plaintext
---
|
|
title: "Lifecycle Hooks"
|
|
description: "Customize agent behavior with lifecycle hooks"
|
|
icon: "Wrench"
|
|
author: "Carlos A. Planchón"
|
|
---
|
|
|
|
# Using Agent Lifecycle Hooks
|
|
|
|
Browser-Use provides lifecycle hooks that allow you to execute custom code at specific points during the agent's execution. These hooks enable you to capture detailed information about the agent's actions, modify behavior, or integrate with external systems.
|
|
|
|
## Available Hooks
|
|
|
|
Currently, Browser-Use provides the following hooks:
|
|
|
|
| Hook | Description | When it's called |
|
|
| ---- | ----------- | ---------------- |
|
|
| `on_step_start` | Executed at the beginning of each agent step | Before the agent processes the current state and decides on the next action |
|
|
| `on_step_end` | Executed at the end of each agent step | After the agent has executed the action for the current step |
|
|
|
|
## Using Hooks
|
|
|
|
Hooks are passed as parameters to the `agent.run()` method. Each hook should be a callable function that accepts the agent instance as its parameter.
|
|
|
|
### Basic Example
|
|
|
|
```python
|
|
from browser_use import Agent
|
|
from langchain_openai import ChatOpenAI
|
|
|
|
async def my_step_hook(agent):
|
|
# inside a hook you can access all the state and methods under the Agent object:
|
|
# agent.settings, agent.state, agent.task
|
|
# agent.controller, agent.llm, agent.browser, agent.browser_context
|
|
# agent.pause(), agent.resume(), agent.add_new_task(...), etc.
|
|
|
|
current_page = await agent.browser_context.get_current_page()
|
|
|
|
visit_log = agent.state.history.urls()
|
|
current_url = current_page.url
|
|
previous_url = visit_log[-2] if len(visit_log) >= 2 else None
|
|
print(f"Agent was last on URL: {previous_url} and is now on {current_url}")
|
|
|
|
if 'completed' in current_url:
|
|
agent.pause()
|
|
Path('result.txt').write_text(await current_page.content())
|
|
input('Saved "completed" page content to result.txt, press [Enter] to resume...')
|
|
agent.resume()
|
|
|
|
agent = Agent(
|
|
task="Search for the latest news about AI",
|
|
llm=ChatOpenAI(model="gpt-4o"),
|
|
)
|
|
|
|
await agent.run(
|
|
on_step_start=my_step_hook,
|
|
# on_step_end=...
|
|
max_steps=10
|
|
)
|
|
```
|
|
|
|
## Complete Example: Agent Activity Recording System
|
|
|
|
This comprehensive example demonstrates a complete implementation for recording and saving Browser-Use agent activity, consisting of both server and client components.
|
|
|
|
### Setup Instructions
|
|
|
|
To use this example, you'll need to:
|
|
|
|
1. Set up the required dependencies:
|
|
```bash
|
|
pip install fastapi uvicorn prettyprinter pyobjtojson dotenv browser-use langchain-openai
|
|
```
|
|
|
|
2. Create two separate Python files:
|
|
- `api.py` - The FastAPI server component
|
|
- `client.py` - The Browser-Use agent with recording hook
|
|
|
|
3. Run both components:
|
|
- Start the API server first: `python api.py`
|
|
- Then run the client: `python client.py`
|
|
|
|
### Server Component (api.py)
|
|
|
|
The server component handles receiving and storing the agent's activity data:
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
|
|
#
|
|
# FastAPI API to record and save Browser-Use activity data.
|
|
# Save this code to api.py and run with `python api.py`
|
|
#
|
|
|
|
import json
|
|
import base64
|
|
from pathlib import Path
|
|
|
|
from fastapi import FastAPI, Request
|
|
import prettyprinter
|
|
import uvicorn
|
|
|
|
prettyprinter.install_extras()
|
|
|
|
# Utility function to save screenshots
|
|
def b64_to_png(b64_string: str, output_file):
|
|
"""
|
|
Convert a Base64-encoded string to a PNG file.
|
|
|
|
:param b64_string: A string containing Base64-encoded data
|
|
:param output_file: The path to the output PNG file
|
|
"""
|
|
with open(output_file, "wb") as f:
|
|
f.write(base64.b64decode(b64_string))
|
|
|
|
# Initialize FastAPI app
|
|
app = FastAPI()
|
|
|
|
|
|
@app.post("/post_agent_history_step")
|
|
async def post_agent_history_step(request: Request):
|
|
data = await request.json()
|
|
prettyprinter.cpprint(data)
|
|
|
|
# Ensure the "recordings" folder exists using pathlib
|
|
recordings_folder = Path("recordings")
|
|
recordings_folder.mkdir(exist_ok=True)
|
|
|
|
# Determine the next file number by examining existing .json files
|
|
existing_numbers = []
|
|
for item in recordings_folder.iterdir():
|
|
if item.is_file() and item.suffix == ".json":
|
|
try:
|
|
file_num = int(item.stem)
|
|
existing_numbers.append(file_num)
|
|
except ValueError:
|
|
# In case the file name isn't just a number
|
|
pass
|
|
|
|
if existing_numbers:
|
|
next_number = max(existing_numbers) + 1
|
|
else:
|
|
next_number = 1
|
|
|
|
# Construct the file path
|
|
file_path = recordings_folder / f"{next_number}.json"
|
|
|
|
# Save the JSON data to the file
|
|
with file_path.open("w") as f:
|
|
json.dump(data, f, indent=2)
|
|
|
|
# Optionally save screenshot if needed
|
|
# if "website_screenshot" in data and data["website_screenshot"]:
|
|
# screenshot_folder = Path("screenshots")
|
|
# screenshot_folder.mkdir(exist_ok=True)
|
|
# b64_to_png(data["website_screenshot"], screenshot_folder / f"{next_number}.png")
|
|
|
|
return {"status": "ok", "message": f"Saved to {file_path}"}
|
|
|
|
if __name__ == "__main__":
|
|
print("Starting Browser-Use recording API on http://0.0.0.0:9000")
|
|
uvicorn.run(app, host="0.0.0.0", port=9000)
|
|
```
|
|
|
|
### Client Component (client.py)
|
|
|
|
The client component runs the Browser-Use agent with a recording hook:
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
|
|
#
|
|
# Client to record and save Browser-Use activity.
|
|
# Save this code to client.py and run with `python client.py`
|
|
#
|
|
|
|
import asyncio
|
|
import requests
|
|
from dotenv import load_dotenv
|
|
from pyobjtojson import obj_to_json
|
|
from langchain_openai import ChatOpenAI
|
|
from browser_use import Agent
|
|
|
|
# Load environment variables (for API keys)
|
|
load_dotenv()
|
|
|
|
|
|
def send_agent_history_step(data):
|
|
"""Send the agent step data to the recording API"""
|
|
url = "http://127.0.0.1:9000/post_agent_history_step"
|
|
response = requests.post(url, json=data)
|
|
return response.json()
|
|
|
|
|
|
async def record_activity(agent_obj):
|
|
"""Hook function that captures and records agent activity at each step"""
|
|
website_html = None
|
|
website_screenshot = None
|
|
urls_json_last_elem = None
|
|
model_thoughts_last_elem = None
|
|
model_outputs_json_last_elem = None
|
|
model_actions_json_last_elem = None
|
|
extracted_content_json_last_elem = None
|
|
|
|
print('--- ON_STEP_START HOOK ---')
|
|
|
|
# Capture current page state
|
|
website_html = await agent_obj.browser_context.get_page_html()
|
|
website_screenshot = await agent_obj.browser_context.take_screenshot()
|
|
|
|
# Make sure we have state history
|
|
if hasattr(agent_obj, "state"):
|
|
history = agent_obj.state.history
|
|
else:
|
|
history = None
|
|
print("Warning: Agent has no state history")
|
|
return
|
|
|
|
# Process model thoughts
|
|
model_thoughts = obj_to_json(
|
|
obj=history.model_thoughts(),
|
|
check_circular=False
|
|
)
|
|
if len(model_thoughts) > 0:
|
|
model_thoughts_last_elem = model_thoughts[-1]
|
|
|
|
# Process model outputs
|
|
model_outputs = agent_obj.state.history.model_outputs()
|
|
model_outputs_json = obj_to_json(
|
|
obj=model_outputs,
|
|
check_circular=False
|
|
)
|
|
if len(model_outputs_json) > 0:
|
|
model_outputs_json_last_elem = model_outputs_json[-1]
|
|
|
|
# Process model actions
|
|
model_actions = agent_obj.state.history.model_actions()
|
|
model_actions_json = obj_to_json(
|
|
obj=model_actions,
|
|
check_circular=False
|
|
)
|
|
if len(model_actions_json) > 0:
|
|
model_actions_json_last_elem = model_actions_json[-1]
|
|
|
|
# Process extracted content
|
|
extracted_content = agent_obj.state.history.extracted_content()
|
|
extracted_content_json = obj_to_json(
|
|
obj=extracted_content,
|
|
check_circular=False
|
|
)
|
|
if len(extracted_content_json) > 0:
|
|
extracted_content_json_last_elem = extracted_content_json[-1]
|
|
|
|
# Process URLs
|
|
urls = agent_obj.state.history.urls()
|
|
urls_json = obj_to_json(
|
|
obj=urls,
|
|
check_circular=False
|
|
)
|
|
if len(urls_json) > 0:
|
|
urls_json_last_elem = urls_json[-1]
|
|
|
|
# Create a summary of all data for this step
|
|
model_step_summary = {
|
|
"website_html": website_html,
|
|
"website_screenshot": website_screenshot,
|
|
"url": urls_json_last_elem,
|
|
"model_thoughts": model_thoughts_last_elem,
|
|
"model_outputs": model_outputs_json_last_elem,
|
|
"model_actions": model_actions_json_last_elem,
|
|
"extracted_content": extracted_content_json_last_elem
|
|
}
|
|
|
|
print("--- MODEL STEP SUMMARY ---")
|
|
print(f"URL: {urls_json_last_elem}")
|
|
|
|
# Send data to the API
|
|
result = send_agent_history_step(data=model_step_summary)
|
|
print(f"Recording API response: {result}")
|
|
|
|
|
|
async def run_agent():
|
|
"""Run the Browser-Use agent with the recording hook"""
|
|
agent = Agent(
|
|
task="Compare the price of gpt-4o and DeepSeek-V3",
|
|
llm=ChatOpenAI(model="gpt-4o"),
|
|
)
|
|
|
|
try:
|
|
print("Starting Browser-Use agent with recording hook")
|
|
await agent.run(
|
|
on_step_start=record_activity,
|
|
max_steps=30
|
|
)
|
|
except Exception as e:
|
|
print(f"Error running agent: {e}")
|
|
|
|
|
|
if __name__ == "__main__":
|
|
# Check if API is running
|
|
try:
|
|
requests.get("http://127.0.0.1:9000")
|
|
print("Recording API is available")
|
|
except:
|
|
print("Warning: Recording API may not be running. Start api.py first.")
|
|
|
|
# Run the agent
|
|
asyncio.run(run_agent())
|
|
```
|
|
|
|
### Working with the Recorded Data
|
|
|
|
After running the agent, you'll find the recorded data in the `recordings` directory. Here's how you can use this data:
|
|
|
|
1. **View recorded sessions**: Each JSON file contains a snapshot of agent activity for one step
|
|
2. **Extract screenshots**: You can modify the API to save screenshots separately
|
|
3. **Analyze agent behavior**: Use the recorded data to study how the agent navigates websites
|
|
|
|
### Extending the Example
|
|
|
|
You can extend this recording system in several ways:
|
|
|
|
1. **Save screenshots separately**: Uncomment the screenshot saving code in the API
|
|
2. **Add a web dashboard**: Create a simple web interface to view recorded sessions
|
|
3. **Add session IDs**: Modify the API to group steps by agent session
|
|
4. **Add filtering**: Implement filters to record only specific types of actions
|
|
|
|
## Data Available in Hooks
|
|
|
|
When working with agent hooks, you have access to the entire agent instance. Here are some useful data points you can access:
|
|
|
|
- `agent.state.history.model_thoughts()`: Reasoning from Browser Use's model.
|
|
- `agent.state.history.model_outputs()`: Raw outputs from the Browsre Use's model.
|
|
- `agent.state.history.model_actions()`: Actions taken by the agent
|
|
- `agent.state.history.extracted_content()`: Content extracted from web pages
|
|
- `agent.state.history.urls()`: URLs visited by the agent
|
|
- `agent.browser_context.get_page_html()`: Current page HTML
|
|
- `agent.browser_context.take_screenshot()`: Screenshot of the current page
|
|
|
|
## Tips for Using Hooks
|
|
|
|
- **Avoid blocking operations**: Since hooks run in the same execution thread as the agent, try to keep them efficient or use asynchronous patterns.
|
|
- **Handle exceptions**: Make sure your hook functions handle exceptions gracefully to prevent interrupting the agent's main flow.
|
|
- **Consider storage needs**: When capturing full HTML and screenshots, be mindful of storage requirements.
|
|
|
|
Contribution by Carlos A. Planchón.
|