--- title: "Lifecycle Hooks" description: "Customize agent behavior with lifecycle hooks" icon: "Wrench" author: "Carlos A. Planchón" --- Browser-Use provides lifecycle hooks that allow you to execute custom code at specific points during the agent's execution. Hook functions can be used to read and modify agent state while running, implement custom logic, change configuration, integrate the Agent with external applications. ## Available Hooks Currently, Browser-Use provides the following hooks: | Hook | Description | When it's called | | ---- | ----------- | ---------------- | | `on_step_start` | Executed at the beginning of each agent step | Before the agent processes the current state and decides on the next action | | `on_step_end` | Executed at the end of each agent step | After the agent has executed all the actions for the current step, before it starts the next step | ```python await agent.run(on_step_start=..., on_step_end=...) ``` Each hook should be an `async` callable function that accepts the `agent` instance as its only parameter. ### Basic Example ```python from browser_use import Agent from langchain_openai import ChatOpenAI async def my_step_hook(agent: Agent): # inside a hook you can access all the state and methods under the Agent object: # agent.settings, agent.state, agent.task # agent.controller, agent.llm, agent.browser_session # agent.pause(), agent.resume(), agent.add_new_task(...), etc. # You also have direct access to the playwright Page and Browser Context page = await agent.browser_session.get_current_page() # https://playwright.dev/python/docs/api/class-page current_url = page.url visit_log = agent.state.history.urls() previous_url = visit_log[-2] if len(visit_log) >= 2 else None print(f"Agent was last on URL: {previous_url} and is now on {current_url}") # Example: listen for events on the page, interact with the DOM, run JS directly, etc. await page.on('domcontentloaded', lambda: print('page navigated to a new url...')) await page.locator("css=form > input[type=submit]").click() await page.evaluate('() => alert(1)') await page.browser.new_tab await agent.browser_session.session.context.add_init_script('/* some JS to run on every page */') # Example: monitor or intercept all network requests async def handle_request(route): # Print, modify, block, etc. do anything to the requests here # https://playwright.dev/python/docs/network#handle-requests print(route.request, route.request.headers) await route.continue_(headers=route.request.headers) await page.route("**/*", handle_route) # Example: pause agent execution and resume it based on some custom code if '/completed' in current_url: agent.pause() Path('result.txt').write_text(await page.content()) input('Saved "completed" page content to result.txt, press [Enter] to resume...') agent.resume() agent = Agent( task="Search for the latest news about AI", llm=ChatOpenAI(model="gpt-4o"), ) await agent.run( on_step_start=my_step_hook, # on_step_end=... max_steps=10 ) ``` ## Data Available in Hooks When working with agent hooks, you have access to the entire `Agent` instance. Here are some useful data points you can access: - `agent.task` lets you see what the main task is, `agent.add_new_task(...)` lets you queue up a new one - `agent.controller` give access to the `Controller()` object and `Registry()` containing the available actions - `agent.controller.registry.execute_action('click_element_by_index', {'index': 123}, browser_session=agent.browser_session)` - `agent.context` lets you access any user-provided context object passed in to `Agent(context=...)` - `agent.sensitive_data` contains the sensitive data dict, which can be updated in-place to add/remove/modify items - `agent.settings` contains all the configuration options passed to the `Agent(...)` at init time - `agent.llm` gives direct access to the main LLM object (e.g. `ChatOpenAI`) - `agent.state` gives access to lots of internal state, including agent thoughts, outputs, actions, etc. - `agent.state.history.model_thoughts()`: Reasoning from Browser Use's model. - `agent.state.history.model_outputs()`: Raw outputs from the Browsre Use's model. - `agent.state.history.model_actions()`: Actions taken by the agent - `agent.state.history.extracted_content()`: Content extracted from web pages - `agent.state.history.urls()`: URLs visited by the agent - `agent.browser_session` gives direct access to the `BrowserSession()` and playwright objects - `agent.browser_session.get_current_page()`: Get the current playwright `Page` object the agent is focused on - `agent.browser_session.browser_context`: Get the current playwright `BrowserContext` object - `agent.browser_session.browser_context.pages`: Get all the tabs currently open in the context - `agent.browser_session.get_page_html()`: Current page HTML - `agent.browser_session.take_screenshot()`: Screenshot of the current page ## Tips for Using Hooks - **Avoid blocking operations**: Since hooks run in the same execution thread as the agent, try to keep them efficient or use asynchronous patterns. - **Handle exceptions**: Make sure your hook functions handle exceptions gracefully to prevent interrupting the agent's main flow. - **Use custom actions instead**: hooks are fairly advanced, most things can be implemented with [custom action functions](/customize/custom-functions) instead --- ## Complex Example: Agent Activity Recording System This comprehensive example demonstrates a complete implementation for recording and saving Browser-Use agent activity, consisting of both server and client components. ### Setup Instructions To use this example, you'll need to: 1. Set up the required dependencies: ```bash pip install fastapi uvicorn prettyprinter pyobjtojson dotenv browser-use langchain-openai ``` 2. Create two separate Python files: - `api.py` - The FastAPI server component - `client.py` - The Browser-Use agent with recording hook 3. Run both components: - Start the API server first: `python api.py` - Then run the client: `python client.py` ### Server Component (api.py) The server component handles receiving and storing the agent's activity data: ```python #!/usr/bin/env python3 # # FastAPI API to record and save Browser-Use activity data. # Save this code to api.py and run with `python api.py` # import json import base64 from pathlib import Path from fastapi import FastAPI, Request import prettyprinter import uvicorn prettyprinter.install_extras() # Utility function to save screenshots def b64_to_png(b64_string: str, output_file): """ Convert a Base64-encoded string to a PNG file. :param b64_string: A string containing Base64-encoded data :param output_file: The path to the output PNG file """ with open(output_file, "wb") as f: f.write(base64.b64decode(b64_string)) # Initialize FastAPI app app = FastAPI() @app.post("/post_agent_history_step") async def post_agent_history_step(request: Request): data = await request.json() prettyprinter.cpprint(data) # Ensure the "recordings" folder exists using pathlib recordings_folder = Path("recordings") recordings_folder.mkdir(exist_ok=True) # Determine the next file number by examining existing .json files existing_numbers = [] for item in recordings_folder.iterdir(): if item.is_file() and item.suffix == ".json": try: file_num = int(item.stem) existing_numbers.append(file_num) except ValueError: # In case the file name isn't just a number pass if existing_numbers: next_number = max(existing_numbers) + 1 else: next_number = 1 # Construct the file path file_path = recordings_folder / f"{next_number}.json" # Save the JSON data to the file with file_path.open("w") as f: json.dump(data, f, indent=2) # Optionally save screenshot if needed # if "website_screenshot" in data and data["website_screenshot"]: # screenshot_folder = Path("screenshots") # screenshot_folder.mkdir(exist_ok=True) # b64_to_png(data["website_screenshot"], screenshot_folder / f"{next_number}.png") return {"status": "ok", "message": f"Saved to {file_path}"} if __name__ == "__main__": print("Starting Browser-Use recording API on http://0.0.0.0:9000") uvicorn.run(app, host="0.0.0.0", port=9000) ``` ### Client Component (client.py) The client component runs the Browser-Use agent with a recording hook: ```python #!/usr/bin/env python3 # # Client to record and save Browser-Use activity. # Save this code to client.py and run with `python client.py` # import asyncio import requests from dotenv import load_dotenv from pyobjtojson import obj_to_json from langchain_openai import ChatOpenAI from browser_use import Agent # Load environment variables (for API keys) load_dotenv() def send_agent_history_step(data): """Send the agent step data to the recording API""" url = "http://127.0.0.1:9000/post_agent_history_step" response = requests.post(url, json=data) return response.json() async def record_activity(agent_obj): """Hook function that captures and records agent activity at each step""" website_html = None website_screenshot = None urls_json_last_elem = None model_thoughts_last_elem = None model_outputs_json_last_elem = None model_actions_json_last_elem = None extracted_content_json_last_elem = None print('--- ON_STEP_START HOOK ---') # Capture current page state website_html = await agent_obj.browser_session.get_page_html() website_screenshot = await agent_obj.browser_session.take_screenshot() # Make sure we have state history if hasattr(agent_obj, "state"): history = agent_obj.state.history else: history = None print("Warning: Agent has no state history") return # Process model thoughts model_thoughts = obj_to_json( obj=history.model_thoughts(), check_circular=False ) if len(model_thoughts) > 0: model_thoughts_last_elem = model_thoughts[-1] # Process model outputs model_outputs = agent_obj.state.history.model_outputs() model_outputs_json = obj_to_json( obj=model_outputs, check_circular=False ) if len(model_outputs_json) > 0: model_outputs_json_last_elem = model_outputs_json[-1] # Process model actions model_actions = agent_obj.state.history.model_actions() model_actions_json = obj_to_json( obj=model_actions, check_circular=False ) if len(model_actions_json) > 0: model_actions_json_last_elem = model_actions_json[-1] # Process extracted content extracted_content = agent_obj.state.history.extracted_content() extracted_content_json = obj_to_json( obj=extracted_content, check_circular=False ) if len(extracted_content_json) > 0: extracted_content_json_last_elem = extracted_content_json[-1] # Process URLs urls = agent_obj.state.history.urls() urls_json = obj_to_json( obj=urls, check_circular=False ) if len(urls_json) > 0: urls_json_last_elem = urls_json[-1] # Create a summary of all data for this step model_step_summary = { "website_html": website_html, "website_screenshot": website_screenshot, "url": urls_json_last_elem, "model_thoughts": model_thoughts_last_elem, "model_outputs": model_outputs_json_last_elem, "model_actions": model_actions_json_last_elem, "extracted_content": extracted_content_json_last_elem } print("--- MODEL STEP SUMMARY ---") print(f"URL: {urls_json_last_elem}") # Send data to the API result = send_agent_history_step(data=model_step_summary) print(f"Recording API response: {result}") async def run_agent(): """Run the Browser-Use agent with the recording hook""" agent = Agent( task="Compare the price of gpt-4o and DeepSeek-V3", llm=ChatOpenAI(model="gpt-4o"), ) try: print("Starting Browser-Use agent with recording hook") await agent.run( on_step_start=record_activity, max_steps=30 ) except Exception as e: print(f"Error running agent: {e}") if __name__ == "__main__": # Check if API is running try: requests.get("http://127.0.0.1:9000") print("Recording API is available") except: print("Warning: Recording API may not be running. Start api.py first.") # Run the agent asyncio.run(run_agent()) ``` Contribution by Carlos A. Planchón. ### Working with the Recorded Data After running the agent, you'll find the recorded data in the `recordings` directory. Here's how you can use this data: 1. **View recorded sessions**: Each JSON file contains a snapshot of agent activity for one step 2. **Extract screenshots**: You can modify the API to save screenshots separately 3. **Analyze agent behavior**: Use the recorded data to study how the agent navigates websites ### Extending the Example You can extend this recording system in several ways: 1. **Save screenshots separately**: Uncomment the screenshot saving code in the API 2. **Add a web dashboard**: Create a simple web interface to view recorded sessions 3. **Add session IDs**: Modify the API to group steps by agent session 4. **Add filtering**: Implement filters to record only specific types of actions