mirror of
https://github.com/browser-use/browser-use
synced 2026-05-06 17:52:15 +02:00
Merge branch 'main' into feat/scalable-memory-backend
This commit is contained in:
83
.cursor/rules/browser-use-rules.mdc
Normal file
83
.cursor/rules/browser-use-rules.mdc
Normal file
@@ -0,0 +1,83 @@
|
||||
---
|
||||
description:
|
||||
globs:
|
||||
alwaysApply: true
|
||||
---
|
||||
## 🧠 General Guidelines for Contributing to `browser-use`
|
||||
|
||||
**Browser-Use** is an AI agent that autonomously interacts with the web. It takes a user-defined task, navigates web pages using Chromium via Playwright, processes HTML, and repeatedly queries a language model (like `gpt-4o`) to decide the next action—until the task is completed.
|
||||
|
||||
### 🗂️ File Documentation
|
||||
|
||||
When you create a **new file**:
|
||||
|
||||
* **For humans**: At the top of the file, include a docstring in natural language explaining:
|
||||
|
||||
* What this file does.
|
||||
* How it fits into the browser-use system.
|
||||
* If it introduces a new abstraction or replaces an old one.
|
||||
* **For LLMs/AI**: Include structured metadata using standardized comments such as:
|
||||
|
||||
```python
|
||||
# @file purpose: Defines <purpose>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 🧰 Development Rules
|
||||
|
||||
* ✅ **Always use [`uv`](mdc:https:/github.com/astral-sh/uv) instead of `pip`**
|
||||
For deterministic and fast dependency installs.
|
||||
|
||||
```bash
|
||||
uv venv --python 3.11
|
||||
source .venv/bin/activate
|
||||
uv sync
|
||||
```
|
||||
|
||||
* ✅ **Use real model names**
|
||||
Do **not** replace `gpt-4o` with `gpt-4`. The model `gpt-4o` is a distinct release and supported.
|
||||
|
||||
* ✅ **Type-safe coding**
|
||||
Use **Pydantic models** for all internal action schemas, task inputs/outputs, and controller I/O. This ensures robust validation and LLM-call integrity.
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ Adding New Actions
|
||||
|
||||
To add a new action that your browser agent can execute:
|
||||
|
||||
```python
|
||||
from browser_use.core.controller import Controller, ActionResult
|
||||
|
||||
controller = Controller()
|
||||
|
||||
@controller.registry.action("Search the web for a specific query")
|
||||
async def search_web(query: str):
|
||||
# Implement your logic here, e.g., query a search engine and return results
|
||||
result = ...
|
||||
return ActionResult(extracted_content=result, include_in_memory=True)
|
||||
```
|
||||
|
||||
### Notes:
|
||||
|
||||
* Use descriptive names and docstrings for each action.
|
||||
* Prefer returning `ActionResult` with structured content to help the agent reason better.
|
||||
|
||||
---
|
||||
|
||||
## 🧠 Creating and Running an Agent
|
||||
|
||||
To define a task and run a browser-use agent:
|
||||
|
||||
```python
|
||||
from browser_use.core.agent import Agent
|
||||
from langchain.chat_models import ChatOpenAI
|
||||
|
||||
task = "Find the CEO of OpenAI and return their name"
|
||||
model = ChatOpenAI(model="gpt-4o")
|
||||
|
||||
agent = Agent(task=task, llm=model, controller=controller)
|
||||
|
||||
history = await agent.run()
|
||||
```
|
||||
@@ -1,6 +1,9 @@
|
||||
name: 🎯 Agent Page Interaction Issue
|
||||
name: 🎯 AI Agent ✚ Page Interaction Issue
|
||||
description: Agent fails to detect, click, scroll, input, or otherwise interact with some type of element on some page(s)
|
||||
labels: ["bug", "element-detection"]
|
||||
title: "Interaction Issue: ..."
|
||||
assignees:
|
||||
- pirate
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
@@ -11,7 +14,7 @@ body:
|
||||
id: version
|
||||
attributes:
|
||||
label: Browser Use Version
|
||||
description: What version of the `browser-use` library are you using? (Run `uv pip show browser-use` or `git log -n 1` to find out) **DO NOT JUST WRITE `latest version` or `main`**
|
||||
description: What version of `browser-use` are you using? (Run `uv pip show browser-use` or `git log -n 1`) **DO NOT JUST WRITE `latest release` or `main`**
|
||||
placeholder: "e.g. 0.4.45 or 62760baaefd"
|
||||
validations:
|
||||
required: true
|
||||
@@ -45,29 +48,32 @@ body:
|
||||
- type: textarea
|
||||
id: prompt
|
||||
attributes:
|
||||
label: Screenshots, Description, and Task Prompt Given to Agent
|
||||
description: The full task prompt you're giving the agent (redact any sensitive data) + a description of the issue and screenshots.
|
||||
label: Screenshots, Description, and task prompt given to Agent
|
||||
description: |
|
||||
A description of the issue + screenshots, and the full task prompt you're giving the agent (redact sensitive data).
|
||||
To help us fix it even faster, screenshot the Chome devtools [`Computed Styles` pane](https://developer.chrome.com/docs/devtools/css/reference#computed) for each failing element.
|
||||
placeholder: |
|
||||
1. go to https://example.com and click the xyz button...
|
||||
2. type "abc" in the dropdown search to find the "abc" option <- agent fails to click dropdown here
|
||||
3. Click the "Submit" button, then extract the result as JSON
|
||||
...
|
||||
include relevant URLs and/or redacted screenshots of the relevant page(s) if possible
|
||||
🎯 High-level goal: Compare the prices of 3 items on a few different seller pages
|
||||
💬 Agent(task='''
|
||||
1. go to https://example.com and click the "xyz" dropdown
|
||||
2. type "abc" into search then select the "abc" option <- ❌ agent fails to select this option
|
||||
3. ...
|
||||
☝️ please include real URLs 🔗 and screenshots 📸 when possible!
|
||||
validations:
|
||||
required: true
|
||||
|
||||
- type: textarea
|
||||
id: html
|
||||
attributes:
|
||||
label: HTML around where it's failing
|
||||
label: "HTML around where it's failing"
|
||||
description: A snippet of the HTML from the failing page around where the Agent is failing to interact.
|
||||
render: html
|
||||
placeholder: |
|
||||
<form na-someform="abc">
|
||||
<form na-someform="abc"> <!-- ⬅️ at least one parent element above -->
|
||||
<div class="element-to-click">
|
||||
<div data-isbutton="true">Click me</div>
|
||||
</div>
|
||||
<input id="someinput" name="someinput" type="text" />
|
||||
<input id="someinput" name="someinput" type="text" /> <!-- ⬅️ failing element -->
|
||||
...
|
||||
</form>
|
||||
validations:
|
||||
@@ -76,11 +82,11 @@ body:
|
||||
- type: input
|
||||
id: os
|
||||
attributes:
|
||||
label: Operating System
|
||||
description: What operating system are you using?
|
||||
placeholder: "e.g., macOS 13.1, Windows 11, Ubuntu 22.04"
|
||||
label: Operating System & Browser Versions
|
||||
description: What operating system and browser are you using?
|
||||
placeholder: "e.g. Ubuntu 24.04 + playwright chromium v136, Windows 11 + Chrome.exe v133, macOS ..."
|
||||
validations:
|
||||
required: true
|
||||
required: false
|
||||
|
||||
- type: textarea
|
||||
id: code
|
||||
@@ -90,13 +96,15 @@ body:
|
||||
render: python
|
||||
placeholder: |
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv()
|
||||
load_dotenv() # tip: always load_dotenv() before other imports
|
||||
from browser_use import Agent, BrowserSession, Controller
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
llm = ChatOpenAI(model="gpt-4o")
|
||||
browser_session = BrowserSession(executable_path='/usr/bin/google-chrome')
|
||||
agent = Agent(llm=llm, browser_session=browser_session)
|
||||
agent = Agent(
|
||||
task='...',
|
||||
llm=ChatOpenAI(model="gpt-4o"),
|
||||
browser_session=BrowserSession(headless=False),
|
||||
)
|
||||
...
|
||||
|
||||
- type: textarea
|
||||
@@ -114,3 +122,15 @@ body:
|
||||
DEBUG [langsmith.client] Sending multipart request with context: trace=91282a01-6667-48a1-8cd7-21aa9337a580,id=91282a01-6667-48a1-8cd7-21aa9337a580
|
||||
DEBUG [agent] 🪪 LLM API keys OPENAI_API_KEY work, ChatOpenAI model is connected & responding correctly.
|
||||
...
|
||||
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
---
|
||||
> [!IMPORTANT]
|
||||
> 🙏 Please **go check *right now before submitting* that that you are on the [⬆️ LATEST VERSION](https://github.com/browser-use/browser-use/releases)**.
|
||||
> 🚀 We ship new agent and element detection improvements every day and we might've already fixed your issue!
|
||||
> <a href="https://github.com/browser-use/browser-use/releases"><img src="https://github.com/user-attachments/assets/4cd34ee6-bafb-4f24-87e2-27a31dc5b9a4" width="500px"/></a>
|
||||
> If you are running an old version, the **first thing we will ask you to do is *try the latest `beta`***:
|
||||
> - 🆕 [`beta`](https://docs.browser-use.com/development/local-setup): `uv pip install --upgrade git+https://github.com/browser-use/browser-use.git@main`
|
||||
> - 📦 [`stable`](https://pypi.org/project/browser-use/#history): `uv pip install --upgrade browser-use`
|
||||
|
||||
35
.github/ISSUE_TEMPLATE/2_bug_report.yml
vendored
35
.github/ISSUE_TEMPLATE/2_bug_report.yml
vendored
@@ -1,6 +1,9 @@
|
||||
name: 🐛 Library Bug Report
|
||||
name: 👾 Library Bug Report
|
||||
description: Report a bug in the browser-use Python library
|
||||
labels: ["bug", "triage"]
|
||||
title: "Bug: ..."
|
||||
assignees:
|
||||
- pirate
|
||||
body:
|
||||
# - type: markdown
|
||||
# attributes:
|
||||
@@ -11,7 +14,7 @@ body:
|
||||
id: version
|
||||
attributes:
|
||||
label: Browser Use Version
|
||||
description: What version of the `browser-use` library are you using? (Run `uv pip show browser-use` or `git log -n 1` to find out) **DO NOT JUST WRITE `latest version` or `main`**
|
||||
description: What version of `browser-use` are you using? (Run `uv pip show browser-use` or `git log -n 1`) **DO NOT JUST WRITE `latest` or `main`**
|
||||
placeholder: "e.g. 0.4.45 or 62760baaefd"
|
||||
validations:
|
||||
required: true
|
||||
@@ -37,13 +40,15 @@ body:
|
||||
render: python
|
||||
placeholder: |
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv()
|
||||
load_dotenv() # tip: always load_dotenv() before other imports
|
||||
from browser_use import Agent, BrowserSession, Controller
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
llm = ChatOpenAI(model="gpt-4o")
|
||||
browser_session = BrowserSession(executable_path='/usr/bin/google-chrome')
|
||||
agent = Agent(llm=llm, browser_session=browser_session)
|
||||
agent = Agent(
|
||||
task='...',
|
||||
llm=ChatOpenAI(model="gpt-4o"),
|
||||
browser_session=BrowserSession(headless=False),
|
||||
)
|
||||
...
|
||||
|
||||
- type: dropdown
|
||||
@@ -75,9 +80,9 @@ body:
|
||||
- type: input
|
||||
id: os
|
||||
attributes:
|
||||
label: Operating System
|
||||
description: What operating system are you using?
|
||||
placeholder: "e.g., macOS 13.1, Windows 11, Ubuntu 22.04"
|
||||
label: Operating System & Browser Versions
|
||||
description: What operating system and browser are you using?
|
||||
placeholder: "e.g. Ubuntu 24.04 + playwright chromium v136, Windows 11 + Chrome.exe v133, macOS ..."
|
||||
validations:
|
||||
required: true
|
||||
|
||||
@@ -96,3 +101,15 @@ body:
|
||||
DEBUG [langsmith.client] Sending multipart request with context: trace=91282a01-6667-48a1-8cd7-21aa9337a580,id=91282a01-6667-48a1-8cd7-21aa9337a580
|
||||
DEBUG [agent] 🪪 LLM API keys OPENAI_API_KEY work, ChatOpenAI model is connected & responding correctly.
|
||||
...
|
||||
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
---
|
||||
> [!IMPORTANT]
|
||||
> 🙏 Please **go check *right now before submitting* that that you are on the [⬆️ LATEST VERSION](https://github.com/browser-use/browser-use/releases)**.
|
||||
> 🚀 We ship changes every day and we might've already fixed your issue yesterday!
|
||||
> <a href="https://github.com/browser-use/browser-use/releases"><img src="https://github.com/user-attachments/assets/4cd34ee6-bafb-4f24-87e2-27a31dc5b9a4" width="500px"/></a>
|
||||
> If you are running an old version, the **first thing we will ask you to do is *try the latest `beta`***:
|
||||
> - 🆕 [`beta`](https://docs.browser-use.com/development/local-setup): `uv pip install --upgrade git+https://github.com/browser-use/browser-use.git@main`
|
||||
> - 📦 [`stable`](https://pypi.org/project/browser-use/#history): `uv pip install --upgrade browser-use`
|
||||
|
||||
41
.github/ISSUE_TEMPLATE/3_feature_request.yml
vendored
41
.github/ISSUE_TEMPLATE/3_feature_request.yml
vendored
@@ -1,10 +1,10 @@
|
||||
name: 💡 Feature or enhancement request
|
||||
name: 💡 New Feature or Enhancement Request
|
||||
description: Suggest an idea or improvement for the browser-use library or Agent capabilities
|
||||
title: "Feature Request: ..."
|
||||
assignees:
|
||||
- pirate
|
||||
type: 'Enhancement'
|
||||
labels: 'enhancement'
|
||||
labels: ['enhancement']
|
||||
body:
|
||||
- type: textarea
|
||||
id: current_problem
|
||||
@@ -24,9 +24,9 @@ body:
|
||||
description: |
|
||||
Describe the ideal specific solution you'd want, *and whether it fits into any broader scope of changes*.
|
||||
placeholder: |
|
||||
e.g. I want to add a default controller action that can hover/drag the mouse on a path when given a series
|
||||
of x,y coordinates. More broadly it may be useful add a computer-use x,y-coordinate style automation
|
||||
fallback method that can do complex mouse interaction tasks.
|
||||
e.g. I want to add a default action that can hover/drag the mouse on a path when given a series
|
||||
of x,y coordinates. More broadly it may be useful add a computer-use/x,y-coordinate-style automation
|
||||
method fallback that can do complex mouse movements.
|
||||
validations:
|
||||
required: true
|
||||
|
||||
@@ -48,10 +48,10 @@ body:
|
||||
attributes:
|
||||
label: What version of browser-use are you currently using?
|
||||
description: |
|
||||
Run `pip show browser-use` or `git log -n 1` and share the exact number of git hash. DO NOT JUST ENTER "the latest release" OR "main".
|
||||
We need to know what version of the browser-use library you're currently running in order to contextualize your feature request.
|
||||
Sometimes we've already added your feature in a newer version, sometimes features already exist but may not be available in your specific environment.
|
||||
placeholder: 0.1.48
|
||||
Run `pip show browser-use` or `git log -n 1` and share the exact number or git hash. DO NOT JUST ENTER `latest release` OR `main`.
|
||||
We need to know what version of the browser-use library you're running in order to contextualize your feature request.
|
||||
Sometimes features are already available and just need to be enabled with config on certain versions.
|
||||
placeholder: "e.g. 0.1.48 or 62760baaefd"
|
||||
validations:
|
||||
required: true
|
||||
|
||||
@@ -59,8 +59,13 @@ body:
|
||||
attributes:
|
||||
value: |
|
||||
---
|
||||
> [!TIP]
|
||||
> 🚀 Please ***double-check you are on the [latest release](https://github.com/browser-use/browser-use/releases)***, we might've already shipped your feature!
|
||||
> [!IMPORTANT]
|
||||
> 🙏 Please **go check *right now before submitting* that that you have tried the [⬆️ LATEST VERSION](https://github.com/browser-use/browser-use/releases)**.
|
||||
> 🚀 We ship new features every day and we might've already added a solution to your need yesterday!
|
||||
> <a href="https://github.com/browser-use/browser-use/releases"><img src="https://github.com/user-attachments/assets/4cd34ee6-bafb-4f24-87e2-27a31dc5b9a4" width="500px"/></a>
|
||||
> If you are running an old version, the **first thing we will ask you to do is *try the latest `beta`***:
|
||||
> - 🆕 [`beta`](https://docs.browser-use.com/development/local-setup): `uv pip install --upgrade git+https://github.com/browser-use/browser-use.git@main`
|
||||
> - 📦 [`stable`](https://pypi.org/project/browser-use/#history): `uv pip install --upgrade browser-use`
|
||||
|
||||
- type: checkboxes
|
||||
id: priority
|
||||
@@ -71,11 +76,11 @@ body:
|
||||
required: false
|
||||
- label: "It's important to add it in the near-mid term future"
|
||||
required: false
|
||||
- label: "It would be nice to have eventually"
|
||||
- label: "It would be nice to add it sometime in the next 2 years"
|
||||
required: false
|
||||
- label: "I'm willing to [start a PR](https://docs.browser-use.com/development/contribution-guide) to develop this myself"
|
||||
- label: "💪 I'm willing to [start a PR](https://docs.browser-use.com/development/contribution-guide) to work on this myself"
|
||||
required: false
|
||||
- label: "My company would spend >$5k/mo on Browser-Use Cloud if it solved this need completely for us"
|
||||
- label: "💼 My company would spend >$5k on [Browser-Use Cloud](https://browser-use.com) if it solved this reliably for us"
|
||||
required: false
|
||||
|
||||
- type: markdown
|
||||
@@ -83,8 +88,8 @@ body:
|
||||
value: |
|
||||
---
|
||||
> [!TIP]
|
||||
> Start discussions about your feature request in other places too,
|
||||
> the more 📣 hype we see around a request the more likely we are to add it!
|
||||
> Start conversations about your feature request in other places too, the more
|
||||
> 📣 hype we see around a request the more likely we are to add it!
|
||||
>
|
||||
> - 💬 Discord: [https://link.browser-use.com/discord](https://link.browser-use.com/discord)
|
||||
> - 🦋 Twitter/X: [https://x.com/browser_use](https://x.com/browser_use)
|
||||
> - 👾 Discord: [https://link.browser-use.com/discord](https://link.browser-use.com/discord)
|
||||
> - 𝕏 Twitter: [https://x.com/browser_use](https://x.com/browser_use)
|
||||
|
||||
20
.github/ISSUE_TEMPLATE/4_docs_issue.yml
vendored
20
.github/ISSUE_TEMPLATE/4_docs_issue.yml
vendored
@@ -1,11 +1,12 @@
|
||||
name: 📚 Documentation Issue
|
||||
description: Report an issue in the browser-use documentation
|
||||
labels: ["documentation"]
|
||||
title: "Documentation: ..."
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
Thanks for taking the time to improve our documentation! Please fill out the form below to help us understand the issue.
|
||||
Thanks for taking the time to improve our documentation! Please fill out the form below to help us fix the issue quickly.
|
||||
|
||||
- type: dropdown
|
||||
id: type
|
||||
@@ -26,7 +27,7 @@ body:
|
||||
attributes:
|
||||
label: Documentation Page
|
||||
description: Which page or section of the documentation is this about?
|
||||
placeholder: "e.g., https://docs.browser-use.com/getting-started or Installation Guide"
|
||||
placeholder: "e.g. https://docs.browser-use.com/customize/browser-settings > Context Configuration > headless"
|
||||
validations:
|
||||
required: true
|
||||
|
||||
@@ -34,8 +35,8 @@ body:
|
||||
id: description
|
||||
attributes:
|
||||
label: Issue Description
|
||||
description: Describe what's wrong or missing in the documentation
|
||||
placeholder: The documentation should...
|
||||
description: "Describe what's wrong or missing in the documentation"
|
||||
placeholder: e.g. Docs should clarify whether BrowserSession(no_viewport=False) is supported when running in BrowserSession(headless=False) mode...
|
||||
validations:
|
||||
required: true
|
||||
|
||||
@@ -45,11 +46,10 @@ body:
|
||||
label: Suggested Changes
|
||||
description: If you have specific suggestions for how to improve the documentation, please share them
|
||||
placeholder: |
|
||||
The documentation could be improved by...
|
||||
|
||||
Example:
|
||||
```python
|
||||
# Your suggested code example or text here
|
||||
e.g. The documentation could be improved by adding one more line here:
|
||||
```diff
|
||||
Use `BrowserSession(headless=False)` to open the browser window (aka headful mode).
|
||||
+ Viewports are not supported when headful, if `headless=False` it will force `no_viewport=True`.
|
||||
```
|
||||
validations:
|
||||
required: true
|
||||
required: false
|
||||
|
||||
4
.github/ISSUE_TEMPLATE/config.yml
vendored
4
.github/ISSUE_TEMPLATE/config.yml
vendored
@@ -1,9 +1,9 @@
|
||||
blank_issues_enabled: false # Set to true if you want to allow blank issues
|
||||
contact_links:
|
||||
- name: 🤔 Quickstart Guide
|
||||
- name: 🔢 Quickstart Guide
|
||||
url: https://docs.browser-use.com/quickstart
|
||||
about: Most common issues can be resolved by following our quickstart guide
|
||||
- name: 🤔 Questions and Help
|
||||
- name: 💬 Questions and Help
|
||||
url: https://link.browser-use.com/discord
|
||||
about: Please ask questions in our Discord community
|
||||
- name: 📖 Documentation
|
||||
|
||||
54
.github/workflows/test.yaml
vendored
54
.github/workflows/test.yaml
vendored
@@ -1,4 +1,6 @@
|
||||
name: test
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
on:
|
||||
push:
|
||||
@@ -12,42 +14,30 @@ on:
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
find_tests:
|
||||
runs-on: ubuntu-latest
|
||||
outputs:
|
||||
filename_list: ${{ steps.list_test_files.outputs.filename_list }} # ./tests/ci/test_controller.py, ./tests/ci/test_browser.py, etc.
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- id: list_test_files
|
||||
run: echo "::set-output name=filename_list::$(ls tests/ci/*.py | jq -R -s -c 'split("\n")[:-1]')"
|
||||
# https://code.dblock.org/2021/09/03/generating-task-matrix-by-looping-over-repo-files-with-github-actions.html
|
||||
|
||||
tests:
|
||||
name: ${{matrix.test}}
|
||||
name: ${{matrix.test_filename}}
|
||||
runs-on: ubuntu-latest
|
||||
env:
|
||||
IN_DOCKER: 'True'
|
||||
strategy:
|
||||
matrix:
|
||||
test:
|
||||
# TODO:
|
||||
# - browser/patchright
|
||||
# - browser/playwright
|
||||
# - browser/user_binary
|
||||
# - browser/remote_cdp
|
||||
# - models/openai
|
||||
# - models/google
|
||||
# - models/anthropic
|
||||
# - models/azure
|
||||
# - models/deepseek
|
||||
# - models/grok
|
||||
# - functionality/click
|
||||
# - functionality/tabs
|
||||
# - functionality/input
|
||||
# - functionality/scroll
|
||||
# - functionality/upload
|
||||
# - functionality/download
|
||||
# - functionality/save
|
||||
# - functionality/vision
|
||||
# - functionality/memory
|
||||
# - functionality/planner
|
||||
# - functionality/hooks
|
||||
- test_browser
|
||||
- test_controller
|
||||
- test_browser_session
|
||||
- test_tab_management
|
||||
- test_sensitive_data
|
||||
- test_url_allowlist_security
|
||||
test_filename: ${{ fromJson(needs.find_tests.outputs.filename_list) }}
|
||||
# autodiscovers all the files in tests/ci/test_*.py
|
||||
# - test_browser
|
||||
# - test_controller
|
||||
# - test_browser_session
|
||||
# - test_tab_management
|
||||
# ... and more
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: astral-sh/setup-uv@v6
|
||||
@@ -57,7 +47,7 @@ jobs:
|
||||
|
||||
- run: uv sync
|
||||
|
||||
- name: Detect installed Playwright or Patchright version
|
||||
- name: Detect installed Playwright version
|
||||
run: echo "PLAYWRIGHT_VERSION=$(uv pip list --format json | jq -r '.[] | select(.name == "playwright") | .version')" >> $GITHUB_ENV
|
||||
|
||||
- name: Cache playwright binaries
|
||||
@@ -70,4 +60,4 @@ jobs:
|
||||
- run: playwright install chrome
|
||||
- run: playwright install chromium
|
||||
|
||||
- run: pytest tests/${{ matrix.test }}.py
|
||||
- run: pytest tests/ci/${{ matrix.test_filename }}.py
|
||||
|
||||
2
.gitignore
vendored
2
.gitignore
vendored
@@ -40,3 +40,5 @@ private_example.py
|
||||
private_example
|
||||
|
||||
uv.lock
|
||||
temp
|
||||
tmp
|
||||
|
||||
12
bin/lint.sh
Executable file
12
bin/lint.sh
Executable file
@@ -0,0 +1,12 @@
|
||||
#!/usr/bin/env bash
|
||||
# This script is used to run the formatter, linter, and type checker pre-commit hooks.
|
||||
# Usage:
|
||||
# $ ./bin/lint.sh
|
||||
|
||||
IFS=$'\n'
|
||||
|
||||
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
|
||||
|
||||
cd "$SCRIPT_DIR/.." || exit 1
|
||||
|
||||
exec uv run pre-commit run --all-files
|
||||
52
bin/setup.sh
Executable file
52
bin/setup.sh
Executable file
@@ -0,0 +1,52 @@
|
||||
#!/usr/bin/env bash
|
||||
# This script is used to setup a local development environment for the browser-use project.
|
||||
# Usage:
|
||||
# $ ./bin/setup.sh
|
||||
|
||||
### Bash Environment Setup
|
||||
# http://redsymbol.net/articles/unofficial-bash-strict-mode/
|
||||
# https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html
|
||||
# set -o xtrace
|
||||
# set -x
|
||||
# shopt -s nullglob
|
||||
set -o errexit
|
||||
set -o errtrace
|
||||
set -o nounset
|
||||
set -o pipefail
|
||||
IFS=$'\n'
|
||||
|
||||
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
|
||||
cd "$SCRIPT_DIR"
|
||||
|
||||
|
||||
if [ -f "$SCRIPT_DIR/lint.sh" ]; then
|
||||
echo "[√] already inside a cloned browser-use repo"
|
||||
else
|
||||
echo "[+] Cloning browser-use repo into current directory: $SCRIPT_DIR"
|
||||
git clone https://github.com/browser-use/browser-use
|
||||
cd browser-use
|
||||
fi
|
||||
|
||||
echo "[+] Installing uv..."
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
|
||||
#git checkout main git pull
|
||||
echo
|
||||
echo "[+] Setting up venv"
|
||||
uv venv
|
||||
echo
|
||||
echo "[+] Installing packages in venv"
|
||||
uv sync --dev --all-extras
|
||||
echo
|
||||
echo "[i] Tip: make sure to set BROWSER_USE_LOGGING_LEVEL=debug and your LLM API keys in your .env file"
|
||||
echo
|
||||
uv pip show browser-use
|
||||
|
||||
echo "Usage:"
|
||||
echo " $ browser-use use the CLI"
|
||||
echo " or"
|
||||
echo " $ source .venv/bin/activate"
|
||||
echo " $ ipython use the library"
|
||||
echo " >>> from browser_use import BrowserSession, Agent"
|
||||
echo " >>> await Agent(task='book me a flight to fiji', browser=BrowserSession(headless=False)).run()"
|
||||
echo ""
|
||||
9
bin/test.sh
Executable file
9
bin/test.sh
Executable file
@@ -0,0 +1,9 @@
|
||||
#!/usr/bin/env bash
|
||||
# This script is used to run all the main project tests that run on CI via .github/workflows/test.yaml.
|
||||
# Usage:
|
||||
# $ ./bin/test.sh
|
||||
|
||||
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
|
||||
cd "$SCRIPT_DIR/.." || exit 1
|
||||
|
||||
exec uv run pytest tests/ci
|
||||
@@ -1,11 +1,3 @@
|
||||
import warnings
|
||||
|
||||
# Suppress specific deprecation warnings from FAISS
|
||||
warnings.filterwarnings('ignore', category=DeprecationWarning, module='faiss.loader')
|
||||
warnings.filterwarnings('ignore', message='builtin type SwigPyPacked has no __module__ attribute')
|
||||
warnings.filterwarnings('ignore', message='builtin type SwigPyObject has no __module__ attribute')
|
||||
warnings.filterwarnings('ignore', message='builtin type swigvarlink has no __module__ attribute')
|
||||
|
||||
from browser_use.logging_config import setup_logging
|
||||
|
||||
setup_logging()
|
||||
|
||||
@@ -89,7 +89,7 @@ class Memory:
|
||||
Args:
|
||||
current_step: The current step number of the agent
|
||||
"""
|
||||
logger.info(f'Creating procedural memory at step {current_step}')
|
||||
logger.debug(f'Creating procedural memory at step {current_step}')
|
||||
|
||||
# Get all messages
|
||||
all_messages = self.message_manager.state.history.messages
|
||||
@@ -108,7 +108,7 @@ class Memory:
|
||||
|
||||
# Need at least 2 messages to create a meaningful summary
|
||||
if len(messages_to_process) <= 1:
|
||||
logger.info('Not enough non-memory messages to summarize')
|
||||
logger.debug('Not enough non-memory messages to summarize')
|
||||
return
|
||||
# Create a procedural memory
|
||||
memory_content = self._create([m.message for m in messages_to_process], current_step)
|
||||
|
||||
@@ -1,6 +1,8 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import re
|
||||
import textwrap
|
||||
|
||||
from langchain_core.messages import (
|
||||
AIMessage,
|
||||
@@ -26,7 +28,8 @@ class MessageManagerSettings(BaseModel):
|
||||
image_tokens: int = 800
|
||||
include_attributes: list[str] = []
|
||||
message_context: str | None = None
|
||||
sensitive_data: dict[str, str] | None = None
|
||||
# Support both old format {key: value} and new format {domain: {key: value}}
|
||||
sensitive_data: dict[str, str | dict[str, str]] | None = None
|
||||
available_file_paths: list[str] | None = None
|
||||
|
||||
|
||||
@@ -180,18 +183,134 @@ class MessageManager:
|
||||
msg = AIMessage(content=plan)
|
||||
self._add_message_with_tokens(msg, position)
|
||||
|
||||
def _get_message_emoji(self, message_type: str) -> str:
|
||||
"""Get emoji for a message type"""
|
||||
emoji_map = {
|
||||
'HumanMessage': '💬',
|
||||
'AIMessage': '🧠',
|
||||
'ToolMessage': '🔨',
|
||||
}
|
||||
return emoji_map.get(message_type, '🎮')
|
||||
|
||||
def _clean_whitespace(self, text: str) -> str:
|
||||
"""Replace all repeated whitespace with single space and strip"""
|
||||
return re.sub(r'\s+', ' ', text).strip()
|
||||
|
||||
def _truncate_text(self, text: str, max_length: int) -> str:
|
||||
"""Truncate text to max_length and add ellipsis if needed"""
|
||||
if len(text) <= max_length:
|
||||
return text
|
||||
return text[:max_length] + '...'
|
||||
|
||||
def _extract_text_from_list_content(self, content: list) -> str:
|
||||
"""Extract text from list content structure"""
|
||||
text_content = ''
|
||||
for item in content:
|
||||
if isinstance(item, dict) and 'text' in item:
|
||||
text_content += item['text']
|
||||
return text_content
|
||||
|
||||
def _format_agent_output_content(self, tool_call: dict) -> str:
|
||||
"""Format AgentOutput tool call into readable content"""
|
||||
args = tool_call.get('args', {})
|
||||
action_info = ''
|
||||
|
||||
# Get action name
|
||||
if 'action' in args and args['action']:
|
||||
first_action = args['action'][0] if isinstance(args['action'], list) and args['action'] else args['action']
|
||||
if isinstance(first_action, dict):
|
||||
action_name = next(iter(first_action.keys())) if first_action else 'unknown'
|
||||
action_info = f'{action_name}()'
|
||||
|
||||
# Get goal
|
||||
goal_info = ''
|
||||
if 'current_state' in args and isinstance(args['current_state'], dict):
|
||||
next_goal = args['current_state'].get('next_goal', '').strip()
|
||||
if next_goal:
|
||||
goal_info = f': {self._truncate_text(next_goal, 40)}'
|
||||
|
||||
# Combine action and goal info
|
||||
if action_info and goal_info:
|
||||
return f'{action_info}{goal_info}'
|
||||
elif action_info:
|
||||
return action_info
|
||||
elif goal_info:
|
||||
return goal_info[2:] # Remove ': ' prefix for goal-only
|
||||
else:
|
||||
return 'AgentOutput'
|
||||
|
||||
def _generate_history_log(self) -> str:
|
||||
"""Generate a formatted log string of message history for debugging / printing to terminal"""
|
||||
total_input_tokens = 0
|
||||
message_lines = []
|
||||
|
||||
for i, m in enumerate(self.state.history.messages):
|
||||
total_input_tokens += m.metadata.tokens
|
||||
is_last_message = i == len(self.state.history.messages) - 1
|
||||
|
||||
# Get emoji based on message type
|
||||
message_type = m.message.__class__.__name__
|
||||
emoji = self._get_message_emoji(message_type)
|
||||
|
||||
# Extract content based on message structure
|
||||
if is_last_message and message_type == 'HumanMessage' and isinstance(m.message.content, list):
|
||||
# Special handling for last message with list content
|
||||
text_content = self._extract_text_from_list_content(m.message.content)
|
||||
text_content = self._clean_whitespace(text_content)
|
||||
|
||||
# Look for current state section
|
||||
if '[Current state starts here]' in text_content:
|
||||
start_idx = text_content.find('[Current state starts here]')
|
||||
content = self._truncate_text(text_content[start_idx:], 150)
|
||||
else:
|
||||
content = self._truncate_text(text_content, 150)
|
||||
else:
|
||||
# Standard content extraction
|
||||
content = self._clean_whitespace(str(m.message.content)[:80])
|
||||
|
||||
# Shorten "Action result:" to "Result:" for display
|
||||
if content.startswith('Action result:'):
|
||||
content = 'Result:' + content[14:]
|
||||
|
||||
# Handle AIMessages with tool calls
|
||||
if hasattr(m.message, 'tool_calls') and m.message.tool_calls and not content:
|
||||
tool_call = m.message.tool_calls[0]
|
||||
tool_name = tool_call.get('name', 'unknown')
|
||||
|
||||
if tool_name == 'AgentOutput':
|
||||
content = self._format_agent_output_content(tool_call)
|
||||
else:
|
||||
content = f'[TOOL: {tool_name}]'
|
||||
elif len(str(m.message.content)) > 80:
|
||||
content += '...'
|
||||
|
||||
# Format the message line
|
||||
left_part = f' {emoji}[{m.metadata.tokens}]'
|
||||
|
||||
# For last message, allow multiple lines if needed
|
||||
if is_last_message and '\n' not in content:
|
||||
wrapped = textwrap.wrap(content, width=80, subsequent_indent=' ' * 20)
|
||||
if len(wrapped) > 2:
|
||||
wrapped = wrapped[:2]
|
||||
wrapped[-1] = self._truncate_text(wrapped[-1], 77)
|
||||
message_lines.append(f'{left_part.ljust(16)}: {wrapped[0]}')
|
||||
message_lines.extend(wrapped[1:])
|
||||
else:
|
||||
message_lines.append(f'{left_part.ljust(16)}: {content}')
|
||||
|
||||
# Build final log message
|
||||
return (
|
||||
f'📜 LLM Message history ({len(self.state.history.messages)} messages, {total_input_tokens} tokens):\n'
|
||||
+ '\n'.join(message_lines)
|
||||
)
|
||||
|
||||
@time_execution_sync('--get_messages')
|
||||
def get_messages(self) -> list[BaseMessage]:
|
||||
"""Get current message list, potentially trimmed to max tokens"""
|
||||
|
||||
msg = [m.message for m in self.state.history.messages]
|
||||
# debug which messages are in history with token count # log
|
||||
total_input_tokens = 0
|
||||
logger.debug(f'Messages in history: {len(self.state.history.messages)}:')
|
||||
for m in self.state.history.messages:
|
||||
total_input_tokens += m.metadata.tokens
|
||||
logger.debug(f'{m.message.__class__.__name__} - Token count: {m.metadata.tokens}')
|
||||
logger.debug(f'Total input tokens: {total_input_tokens}')
|
||||
|
||||
# Log message history for debugging
|
||||
logger.debug(self._generate_history_log())
|
||||
|
||||
return msg
|
||||
|
||||
@@ -218,16 +337,27 @@ class MessageManager:
|
||||
if not self.settings.sensitive_data:
|
||||
return value
|
||||
|
||||
# Create a dictionary with all key-value pairs from sensitive_data where value is not None or empty
|
||||
valid_sensitive_data = {k: v for k, v in self.settings.sensitive_data.items() if v}
|
||||
# Collect all sensitive values, immediately converting old format to new format
|
||||
sensitive_values: dict[str, str] = {}
|
||||
|
||||
# Process all sensitive data entries
|
||||
for key_or_domain, content in self.settings.sensitive_data.items():
|
||||
if isinstance(content, dict):
|
||||
# Already in new format: {domain: {key: value}}
|
||||
for key, val in content.items():
|
||||
if val: # Skip empty values
|
||||
sensitive_values[key] = val
|
||||
elif content: # Old format: {key: value} - convert to new format internally
|
||||
# We treat this as if it was {'http*://*': {key_or_domain: content}}
|
||||
sensitive_values[key_or_domain] = content
|
||||
|
||||
# If there are no valid sensitive data entries, just return the original value
|
||||
if not valid_sensitive_data:
|
||||
if not sensitive_values:
|
||||
logger.warning('No valid entries found in sensitive_data dictionary')
|
||||
return value
|
||||
|
||||
# Replace all valid sensitive data values with their placeholder tags
|
||||
for key, val in valid_sensitive_data.items():
|
||||
for key, val in sensitive_values.items():
|
||||
value = value.replace(val, f'<secret>{key}</secret>')
|
||||
|
||||
return value
|
||||
|
||||
@@ -13,6 +13,8 @@ from typing import Any, Generic, TypeVar
|
||||
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from browser_use.browser.session import DEFAULT_BROWSER_PROFILE
|
||||
|
||||
load_dotenv()
|
||||
|
||||
from langchain_core.language_models.chat_models import BaseChatModel
|
||||
@@ -70,21 +72,19 @@ logger = logging.getLogger(__name__)
|
||||
SKIP_LLM_API_KEY_VERIFICATION = os.environ.get('SKIP_LLM_API_KEY_VERIFICATION', 'false').lower()[0] in 'ty1'
|
||||
|
||||
|
||||
def log_response(response: AgentOutput) -> None:
|
||||
def log_response(response: AgentOutput, registry=None) -> None:
|
||||
"""Utility function to log the model's response."""
|
||||
|
||||
if 'Success' in response.current_state.evaluation_previous_goal:
|
||||
emoji = '👍'
|
||||
elif 'Failed' in response.current_state.evaluation_previous_goal:
|
||||
emoji = '⚠'
|
||||
emoji = '⚠️'
|
||||
else:
|
||||
emoji = '🤷'
|
||||
emoji = '❓'
|
||||
|
||||
logger.info(f'{emoji} Eval: {response.current_state.evaluation_previous_goal}')
|
||||
logger.info(f'🧠 Memory: {response.current_state.memory}')
|
||||
logger.info(f'🎯 Next goal: {response.current_state.next_goal}')
|
||||
for i, action in enumerate(response.action):
|
||||
logger.info(f'🛠️ Action {i + 1}/{len(response.action)}: {action.model_dump_json(exclude_unset=True)}')
|
||||
|
||||
|
||||
Context = TypeVar('Context')
|
||||
@@ -105,7 +105,7 @@ class Agent(Generic[Context]):
|
||||
browser_session: BrowserSession | None = None,
|
||||
controller: Controller[Context] = Controller(),
|
||||
# Initial agent run parameters
|
||||
sensitive_data: dict[str, str] | None = None,
|
||||
sensitive_data: dict[str, str | dict[str, str]] | None = None,
|
||||
initial_actions: list[dict[str, dict[str, Any]]] | None = None,
|
||||
# Cloud Callbacks
|
||||
register_new_step_callback: (
|
||||
@@ -227,15 +227,15 @@ class Agent(Generic[Context]):
|
||||
self.settings.use_vision_for_planner = False
|
||||
|
||||
logger.info(
|
||||
f'🧠 Starting a v{self.version} agent with main_model={self.model_name}'
|
||||
f'🧠 Starting a browser-use agent {self.version} with base_model={self.model_name}'
|
||||
f'{" +tools" if self.tool_calling_method == "function_calling" else ""}'
|
||||
f'{" +rawtools" if self.tool_calling_method == "raw" else ""}'
|
||||
f'{" +vision" if self.settings.use_vision else ""}'
|
||||
f'{" +memory" if self.enable_memory else ""}, '
|
||||
f'planner_model={self.planner_model_name}'
|
||||
f'{" +memory" if self.enable_memory else ""}'
|
||||
f' extraction_model={getattr(self.settings.page_extraction_llm, "model_name", None)}'
|
||||
f'{f" planner_model={self.planner_model_name}" if self.planner_model_name else ""}'
|
||||
f'{" +reasoning" if self.settings.is_planner_reasoning else ""}'
|
||||
f'{" +vision" if self.settings.use_vision_for_planner else ""}, '
|
||||
f'extraction_model={getattr(self.settings.page_extraction_llm, "model_name", None)} '
|
||||
f'{" +vision" if self.settings.use_vision_for_planner else ""} '
|
||||
)
|
||||
|
||||
# Verify we can connect to the LLM
|
||||
@@ -291,29 +291,71 @@ class Agent(Generic[Context]):
|
||||
assert not (browser_profile and browser_context), 'Cannot provide both browser_profile and browser_context'
|
||||
assert not (browser and browser_context), 'Cannot provide both browser and browser_context'
|
||||
assert not (browser_session and browser_context), 'Cannot provide both browser_session and browser_context'
|
||||
|
||||
browser_profile = browser_profile or DEFAULT_BROWSER_PROFILE
|
||||
self.browser_session = browser_session or BrowserSession(
|
||||
profile=browser_profile, browser=browser, browser_context=browser_context
|
||||
)
|
||||
|
||||
if self.sensitive_data and not self.browser_profile.allowed_domains:
|
||||
logger.error(
|
||||
'⚠️⚠️⚠️ Agent(sensitive_data=••••••••) was provided but BrowserSession(allowed_domains=[...]) is not locked down! ⚠️⚠️⚠️\n'
|
||||
' ☠️ If the agent visits a malicious website and encounters a prompt-injection attack, your sensitive_data may be exposed!\n\n'
|
||||
' https://docs.browser-use.com/customize/browser-settings#restrict-urls\n'
|
||||
'Waiting 10 seconds before continuing... Press [Ctrl+C] to abort.'
|
||||
)
|
||||
if sys.stdin.isatty():
|
||||
try:
|
||||
time.sleep(10)
|
||||
except KeyboardInterrupt:
|
||||
print(
|
||||
'\n\n 🛑 Exiting now... set BrowserSession(allowed_domains=["example.com", "example.org"]) to only domains you trust to see your sensitive_data.'
|
||||
)
|
||||
sys.exit(0)
|
||||
else:
|
||||
pass # no point waiting if we're not in an interactive shell
|
||||
logger.warning('‼️ Continuing with insecure settings for now... but this will become a hard error in the future!')
|
||||
if self.sensitive_data:
|
||||
# Check if sensitive_data has domain-specific credentials
|
||||
has_domain_specific_credentials = any(isinstance(v, dict) for v in self.sensitive_data.values())
|
||||
|
||||
# If no allowed_domains are configured, show a security warning
|
||||
if not self.browser_profile.allowed_domains:
|
||||
logger.error(
|
||||
'⚠️⚠️⚠️ Agent(sensitive_data=••••••••) was provided but BrowserSession(allowed_domains=[...]) is not locked down! ⚠️⚠️⚠️\n'
|
||||
' ☠️ If the agent visits a malicious website and encounters a prompt-injection attack, your sensitive_data may be exposed!\n\n'
|
||||
' https://docs.browser-use.com/customize/browser-settings#restrict-urls\n'
|
||||
'Waiting 10 seconds before continuing... Press [Ctrl+C] to abort.'
|
||||
)
|
||||
if sys.stdin.isatty():
|
||||
try:
|
||||
time.sleep(10)
|
||||
except KeyboardInterrupt:
|
||||
print(
|
||||
'\n\n 🛑 Exiting now... set BrowserSession(allowed_domains=["example.com", "example.org"]) to only domains you trust to see your sensitive_data.'
|
||||
)
|
||||
sys.exit(0)
|
||||
else:
|
||||
pass # no point waiting if we're not in an interactive shell
|
||||
logger.warning('‼️ Continuing with insecure settings for now... but this will become a hard error in the future!')
|
||||
|
||||
# If we're using domain-specific credentials, validate domain patterns
|
||||
elif has_domain_specific_credentials:
|
||||
# For domain-specific format, ensure all domain patterns are included in allowed_domains
|
||||
domain_patterns = [k for k, v in self.sensitive_data.items() if isinstance(v, dict)]
|
||||
|
||||
# Validate each domain pattern against allowed_domains
|
||||
for domain_pattern in domain_patterns:
|
||||
is_allowed = False
|
||||
for allowed_domain in self.browser_profile.allowed_domains:
|
||||
# Special cases that don't require URL matching
|
||||
if domain_pattern == allowed_domain or allowed_domain == '*':
|
||||
is_allowed = True
|
||||
break
|
||||
|
||||
# Need to create example URLs to compare the patterns
|
||||
# Extract the domain parts, ignoring scheme
|
||||
pattern_domain = domain_pattern.split('://')[-1] if '://' in domain_pattern else domain_pattern
|
||||
allowed_domain_part = allowed_domain.split('://')[-1] if '://' in allowed_domain else allowed_domain
|
||||
|
||||
# Check if pattern is covered by an allowed domain
|
||||
# Example: "google.com" is covered by "*.google.com"
|
||||
if pattern_domain == allowed_domain_part or (
|
||||
allowed_domain_part.startswith('*.')
|
||||
and (
|
||||
pattern_domain == allowed_domain_part[2:]
|
||||
or pattern_domain.endswith('.' + allowed_domain_part[2:])
|
||||
)
|
||||
):
|
||||
is_allowed = True
|
||||
break
|
||||
|
||||
if not is_allowed:
|
||||
logger.warning(
|
||||
f'⚠️ Domain pattern "{domain_pattern}" in sensitive_data is not covered by any pattern in allowed_domains={self.browser_profile.allowed_domains}\n'
|
||||
f' This may be a security risk as credentials could be used on unintended domains.'
|
||||
)
|
||||
|
||||
# Callbacks
|
||||
self.register_new_step_callback = register_new_step_callback
|
||||
@@ -427,7 +469,7 @@ class Agent(Generic[Context]):
|
||||
# Azure OpenAI API requires 'tools' parameter for GPT-4
|
||||
# The error 'content must be either a string or an array' occurs when
|
||||
# the API expects a tools array but gets something else
|
||||
if 'gpt-4' in self.model_name.lower():
|
||||
if 'gpt-4-' in self.model_name.lower():
|
||||
return 'tools'
|
||||
else:
|
||||
return 'function_calling'
|
||||
@@ -454,8 +496,7 @@ class Agent(Generic[Context]):
|
||||
@time_execution_async('--step (agent)')
|
||||
async def step(self, step_info: AgentStepInfo | None = None) -> None:
|
||||
"""Execute one step of the task"""
|
||||
logger.info(f'📍 Step {self.state.n_steps}')
|
||||
state = None
|
||||
browser_state_summary = None
|
||||
model_output = None
|
||||
result: list[ActionResult] = []
|
||||
step_start_time = time.time()
|
||||
@@ -465,6 +506,8 @@ class Agent(Generic[Context]):
|
||||
browser_state_summary = await self.browser_session.get_state_summary(cache_clickable_elements_hashes=True)
|
||||
current_page = await self.browser_session.get_current_page()
|
||||
|
||||
self._log_step_context(current_page, browser_state_summary)
|
||||
|
||||
# generate procedural memory if needed
|
||||
if self.enable_memory and self.memory and self.state.n_steps % self.memory.config.memory_interval == 0:
|
||||
self.memory.create_procedural_memory(self.state.n_steps)
|
||||
@@ -615,7 +658,7 @@ class Agent(Generic[Context]):
|
||||
if not result:
|
||||
return
|
||||
|
||||
if state:
|
||||
if browser_state_summary:
|
||||
metadata = StepMetadata(
|
||||
step_number=self.state.n_steps,
|
||||
step_start_time=step_start_time,
|
||||
@@ -624,6 +667,9 @@ class Agent(Generic[Context]):
|
||||
)
|
||||
self._make_history_item(model_output, browser_state_summary, result, metadata)
|
||||
|
||||
# Log step completion summary
|
||||
self._log_step_completion_summary(step_start_time, result)
|
||||
|
||||
@time_execution_async('--handle_step_error (agent)')
|
||||
async def _handle_step_error(self, error: Exception) -> list[ActionResult]:
|
||||
"""Handle all types of errors that can occur during a step"""
|
||||
@@ -719,7 +765,7 @@ class Agent(Generic[Context]):
|
||||
input_messages = self._convert_input_messages(input_messages)
|
||||
|
||||
if self.tool_calling_method == 'raw':
|
||||
logger.debug(f'Using {self.tool_calling_method} for {self.chat_model_library}')
|
||||
self._log_llm_call_info(input_messages, self.tool_calling_method)
|
||||
try:
|
||||
output = self.llm.invoke(input_messages)
|
||||
response = {'raw': output, 'parsed': None}
|
||||
@@ -747,7 +793,7 @@ class Agent(Generic[Context]):
|
||||
raise LLMException(401, 'LLM API call failed') from e
|
||||
|
||||
else:
|
||||
logger.debug(f'Using {self.tool_calling_method} for {self.chat_model_library}')
|
||||
self._log_llm_call_info(input_messages, self.tool_calling_method)
|
||||
structured_llm = self.llm.with_structured_output(self.AgentOutput, include_raw=True, method=self.tool_calling_method)
|
||||
response: dict[str, Any] = await structured_llm.ainvoke(input_messages) # type: ignore
|
||||
|
||||
@@ -792,8 +838,9 @@ class Agent(Generic[Context]):
|
||||
parsed.action = parsed.action[: self.settings.max_actions_per_step]
|
||||
|
||||
if not (hasattr(self.state, 'paused') and (self.state.paused or self.state.stopped)):
|
||||
log_response(parsed)
|
||||
log_response(parsed, self.controller.registry.registry)
|
||||
|
||||
self._log_next_action_summary(parsed)
|
||||
return parsed
|
||||
|
||||
def _log_agent_run(self) -> None:
|
||||
@@ -802,6 +849,97 @@ class Agent(Generic[Context]):
|
||||
|
||||
logger.debug(f'Version: {self.version}, Source: {self.source}')
|
||||
|
||||
def _log_step_context(self, current_page, browser_state_summary) -> None:
|
||||
"""Log step context information"""
|
||||
url_short = current_page.url[:50] + '...' if len(current_page.url) > 50 else current_page.url
|
||||
interactive_count = len(browser_state_summary.selector_map) if browser_state_summary else 0
|
||||
logger.info(
|
||||
f'📍 Step {self.state.n_steps}: Evaluating page with {interactive_count} interactive elements on: {url_short}'
|
||||
)
|
||||
|
||||
def _log_next_action_summary(self, parsed: 'AgentOutput') -> None:
|
||||
"""Log a comprehensive summary of the next action(s)"""
|
||||
if not (logger.isEnabledFor(logging.DEBUG) and parsed.action):
|
||||
return
|
||||
|
||||
action_count = len(parsed.action)
|
||||
|
||||
# Collect action details
|
||||
action_details = []
|
||||
for i, action in enumerate(parsed.action):
|
||||
action_data = action.model_dump(exclude_unset=True)
|
||||
action_name = next(iter(action_data.keys())) if action_data else 'unknown'
|
||||
action_params = action_data.get(action_name, {}) if action_data else {}
|
||||
|
||||
# Format key parameters concisely
|
||||
param_summary = []
|
||||
if isinstance(action_params, dict):
|
||||
for key, value in action_params.items():
|
||||
if key == 'index':
|
||||
param_summary.append(f'#{value}')
|
||||
elif key == 'text' and isinstance(value, str):
|
||||
text_preview = value[:30] + '...' if len(value) > 30 else value
|
||||
param_summary.append(f'text="{text_preview}"')
|
||||
elif key == 'url':
|
||||
param_summary.append(f'url="{value}"')
|
||||
elif key == 'success':
|
||||
param_summary.append(f'success={value}')
|
||||
elif isinstance(value, (str, int, bool)) and len(str(value)) < 20:
|
||||
param_summary.append(f'{key}={value}')
|
||||
|
||||
param_str = f'({", ".join(param_summary)})' if param_summary else ''
|
||||
action_details.append(f'{action_name}{param_str}')
|
||||
|
||||
# Create summary based on single vs multi-action
|
||||
if action_count == 1:
|
||||
logger.info(f'⚡️ Decided next action: {action_details[0]}')
|
||||
else:
|
||||
summary_lines = [f'⚡️ Decided next {action_count} multi-actions:']
|
||||
for i, detail in enumerate(action_details):
|
||||
summary_lines.append(f' {i + 1}. {detail}')
|
||||
logger.info('\n'.join(summary_lines))
|
||||
|
||||
def _log_step_completion_summary(self, step_start_time: float, result: list[ActionResult]) -> None:
|
||||
"""Log step completion summary with action count, timing, and success/failure stats"""
|
||||
if not result:
|
||||
return
|
||||
|
||||
step_duration = time.time() - step_start_time
|
||||
action_count = len(result)
|
||||
|
||||
# Count success and failures
|
||||
success_count = sum(1 for r in result if not r.error)
|
||||
failure_count = action_count - success_count
|
||||
|
||||
# Format success/failure indicators
|
||||
success_indicator = f'✅ {success_count}' if success_count > 0 else ''
|
||||
failure_indicator = f'❌ {failure_count}' if failure_count > 0 else ''
|
||||
status_parts = [part for part in [success_indicator, failure_indicator] if part]
|
||||
status_str = ' | '.join(status_parts) if status_parts else '✅ 0'
|
||||
|
||||
logger.info(f'📍 Step {self.state.n_steps}: Ran {action_count} actions in {step_duration:.2f}s: {status_str}')
|
||||
|
||||
def _log_llm_call_info(self, input_messages: list[BaseMessage], method: str) -> None:
|
||||
"""Log comprehensive information about the LLM call being made"""
|
||||
# Count messages and check for images
|
||||
message_count = len(input_messages)
|
||||
total_chars = sum(len(str(msg.content)) for msg in input_messages)
|
||||
has_images = any(
|
||||
hasattr(msg, 'content')
|
||||
and isinstance(msg.content, list)
|
||||
and any(isinstance(item, dict) and item.get('type') == 'image_url' for item in msg.content)
|
||||
for msg in input_messages
|
||||
)
|
||||
current_tokens = getattr(self._message_manager.state.history, 'current_tokens', 0)
|
||||
|
||||
# Determine output type
|
||||
output_type = 'raw text output' if method == 'raw' else 'structured output + tools'
|
||||
image_status = '📷 images' if has_images else 'no images'
|
||||
|
||||
logger.info(
|
||||
f'🧠 LLM call: {self.chat_model_library} ({method}) | {message_count} msgs, ~{current_tokens} tokens, {total_chars} chars | {image_status} | {output_type}'
|
||||
)
|
||||
|
||||
def _log_agent_event(self, max_steps: int, agent_run_error: str | None = None) -> None:
|
||||
"""Sent the agent event for this run to telemetry"""
|
||||
|
||||
@@ -923,7 +1061,7 @@ class Agent(Generic[Context]):
|
||||
|
||||
# Check control flags before each step
|
||||
if self.state.stopped:
|
||||
logger.info('Agent stopped')
|
||||
logger.info('🛑 Agent stopped')
|
||||
agent_run_error = 'Agent stopped programmatically'
|
||||
break
|
||||
|
||||
@@ -989,7 +1127,6 @@ class Agent(Generic[Context]):
|
||||
if not self._force_exit_telemetry_logged: # MODIFIED: Check the flag
|
||||
try:
|
||||
self._log_agent_event(max_steps=max_steps, agent_run_error=agent_run_error)
|
||||
logger.info('Agent run telemetry logged.')
|
||||
except Exception as log_e: # Catch potential errors during logging itself
|
||||
logger.error(f'Failed to log telemetry event: {log_e}', exc_info=True)
|
||||
else:
|
||||
@@ -1075,7 +1212,10 @@ class Agent(Generic[Context]):
|
||||
|
||||
results.append(result)
|
||||
|
||||
logger.debug(f'Executed action {i + 1} / {len(actions)}')
|
||||
# Get action name from the action model
|
||||
action_data = action.model_dump(exclude_unset=True)
|
||||
action_name = next(iter(action_data.keys())) if action_data else 'unknown'
|
||||
logger.info(f'☑️ Executed action {i + 1}/{len(actions)}: {action_name}')
|
||||
if results[-1].is_done or results[-1].error or i == len(actions) - 1:
|
||||
break
|
||||
|
||||
@@ -1140,14 +1280,13 @@ class Agent(Generic[Context]):
|
||||
|
||||
async def log_completion(self) -> None:
|
||||
"""Log the completion of the task"""
|
||||
logger.info('✅ Task completed')
|
||||
if self.state.history.is_successful():
|
||||
logger.info('✅ Successfully')
|
||||
logger.info('✅ Task completed successfully')
|
||||
else:
|
||||
logger.info('❌ Unfinished')
|
||||
logger.info('❌ Task completed without success')
|
||||
|
||||
total_tokens = self.state.history.total_input_tokens()
|
||||
logger.info(f'📝 Total input tokens used (approximate): {total_tokens}')
|
||||
logger.debug(f'📝 Total input tokens used (approximate): {total_tokens}')
|
||||
|
||||
if self.register_done_callback:
|
||||
if inspect.iscoroutinefunction(self.register_done_callback):
|
||||
|
||||
@@ -337,7 +337,7 @@ class BrowserContextArgs(BaseModel):
|
||||
proxy: ProxySettings | None = None
|
||||
permissions: list[str] = Field(
|
||||
default_factory=lambda: ['clipboard-read', 'clipboard-write', 'notifications'],
|
||||
description='Browser permissions to grant.',
|
||||
description='Browser permissions to grant (see playwright docs for valid permissions).',
|
||||
# clipboard is for google sheets and pyperclip automations
|
||||
# notifications are to avoid browser fingerprinting
|
||||
)
|
||||
@@ -552,7 +552,10 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro
|
||||
# custom options we provide that aren't native playwright kwargs
|
||||
disable_security: bool = Field(default=False, description='Disable browser security features.')
|
||||
deterministic_rendering: bool = Field(default=False, description='Enable deterministic rendering flags.')
|
||||
allowed_domains: list[str] | None = Field(default=None, description='List of allowed domains for navigation.')
|
||||
allowed_domains: list[str] | None = Field(
|
||||
default=None,
|
||||
description='List of allowed domains for navigation e.g. ["*.google.com", "https://example.com", "chrome-extension://*"]',
|
||||
)
|
||||
keep_alive: bool | None = Field(default=None, description='Keep browser alive after agent run.')
|
||||
window_size: ViewportSize | None = Field(
|
||||
default=None,
|
||||
@@ -570,6 +573,8 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro
|
||||
)
|
||||
|
||||
# --- Page load/wait timings ---
|
||||
default_navigation_timeout: float | None = Field(default=None, description='Default page navigation timeout.')
|
||||
default_timeout: float | None = Field(default=None, description='Default playwright call timeout.')
|
||||
minimum_wait_page_load_time: float = Field(default=0.25, description='Minimum time to wait before capturing page state.')
|
||||
wait_for_network_idle_page_load_time: float = Field(default=0.5, description='Time to wait for network idle.')
|
||||
maximum_wait_page_load_time: float = Field(default=5.0, description='Maximum time to wait for page load.')
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -28,7 +28,7 @@ class BrowserStateSummary(DOMState):
|
||||
url: str
|
||||
title: str
|
||||
tabs: list[TabInfo]
|
||||
screenshot: str | None = None
|
||||
screenshot: str | None = field(default=None, repr=False)
|
||||
pixels_above: int = 0
|
||||
pixels_below: int = 0
|
||||
browser_errors: list[str] = field(default_factory=list)
|
||||
|
||||
@@ -1,10 +1,12 @@
|
||||
import asyncio
|
||||
import logging
|
||||
import re
|
||||
from collections.abc import Callable
|
||||
from inspect import iscoroutinefunction, signature
|
||||
from typing import Any, Generic, Optional, TypeVar
|
||||
|
||||
from langchain_core.language_models.chat_models import BaseChatModel
|
||||
from playwright.async_api import Page
|
||||
from pydantic import BaseModel, Field, create_model
|
||||
|
||||
from browser_use.browser import BrowserSession
|
||||
@@ -18,13 +20,33 @@ from browser_use.telemetry.views import (
|
||||
ControllerRegisteredFunctionsTelemetryEvent,
|
||||
RegisteredFunction,
|
||||
)
|
||||
from browser_use.utils import time_execution_async
|
||||
from browser_use.utils import match_url_with_domain_pattern, time_execution_async
|
||||
|
||||
Context = TypeVar('Context')
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SpecialActionParameters(BaseModel):
|
||||
"""Model defining all special parameters that can be injected into actions"""
|
||||
|
||||
model_config = {'arbitrary_types_allowed': True}
|
||||
|
||||
context: Context | None = None
|
||||
browser_session: BrowserSession | None = None
|
||||
browser: BrowserSession | None = None # legacy support
|
||||
browser_context: BrowserSession | None = None # legacy support
|
||||
page: Page | None = None
|
||||
page_extraction_llm: BaseChatModel | None = None
|
||||
available_file_paths: list[str] | None = None
|
||||
has_sensitive_data: bool = False
|
||||
|
||||
@classmethod
|
||||
def get_browser_requiring_params(cls) -> set[str]:
|
||||
"""Get parameter names that require browser_session"""
|
||||
return {'browser_session', 'browser', 'browser_context', 'page'}
|
||||
|
||||
|
||||
class Registry(Generic[Context]):
|
||||
"""Service for registering and managing actions"""
|
||||
|
||||
@@ -37,14 +59,11 @@ class Registry(Generic[Context]):
|
||||
def _create_param_model(self, function: Callable) -> type[BaseModel]:
|
||||
"""Creates a Pydantic model from function signature"""
|
||||
sig = signature(function)
|
||||
special_param_names = set(SpecialActionParameters.model_fields.keys())
|
||||
params = {
|
||||
name: (param.annotation, ... if param.default == param.empty else param.default)
|
||||
for name, param in sig.parameters.items()
|
||||
if name != 'browser'
|
||||
and name != 'page_extraction_llm'
|
||||
and name != 'available_file_paths'
|
||||
and name != 'browser_session'
|
||||
and name != 'browser_context'
|
||||
if name not in special_param_names
|
||||
}
|
||||
# TODO: make the types here work
|
||||
return create_model(
|
||||
@@ -58,9 +77,15 @@ class Registry(Generic[Context]):
|
||||
description: str,
|
||||
param_model: type[BaseModel] | None = None,
|
||||
domains: list[str] | None = None,
|
||||
allowed_domains: list[str] | None = None,
|
||||
page_filter: Callable[[Any], bool] | None = None,
|
||||
):
|
||||
"""Decorator for registering actions"""
|
||||
# Handle aliases: domains and allowed_domains are the same parameter
|
||||
if allowed_domains is not None and domains is not None:
|
||||
raise ValueError("Cannot specify both 'domains' and 'allowed_domains' - they are aliases for the same parameter")
|
||||
|
||||
final_domains = allowed_domains if allowed_domains is not None else domains
|
||||
|
||||
def decorator(func: Callable):
|
||||
# Skip registration if action is in exclude_actions
|
||||
@@ -89,7 +114,7 @@ class Registry(Generic[Context]):
|
||||
description=description,
|
||||
function=wrapped_func,
|
||||
param_model=actual_param_model,
|
||||
domains=domains,
|
||||
domains=final_domains,
|
||||
page_filter=page_filter,
|
||||
)
|
||||
self.registry.actions[func.__name__] = action
|
||||
@@ -104,12 +129,12 @@ class Registry(Generic[Context]):
|
||||
params: dict,
|
||||
browser_session: BrowserSession | None = None,
|
||||
page_extraction_llm: BaseChatModel | None = None,
|
||||
sensitive_data: dict[str, str] | None = None,
|
||||
sensitive_data: dict[str, str | dict[str, str]] | None = None,
|
||||
available_file_paths: list[str] | None = None,
|
||||
#
|
||||
context: Context | None = None,
|
||||
) -> Any:
|
||||
"""Execute a registered action"""
|
||||
"""Execute a registered action with enhanced parameter handling for backward compatibility"""
|
||||
if action_name not in self.registry.actions:
|
||||
raise ValueError(f'Action {action_name} not found')
|
||||
|
||||
@@ -121,77 +146,184 @@ class Registry(Generic[Context]):
|
||||
except Exception as e:
|
||||
raise ValueError(f'Invalid parameters {params} for action {action_name}: {type(e)}: {e}') from e
|
||||
|
||||
# Check if the first parameter is a Pydantic model
|
||||
# Analyze function signature for smart parameter injection
|
||||
sig = signature(action.function)
|
||||
parameters = list(sig.parameters.values())
|
||||
is_pydantic = parameters and issubclass(parameters[0].annotation, BaseModel)
|
||||
parameter_names = [param.name for param in parameters]
|
||||
|
||||
if sensitive_data:
|
||||
validated_params = self._replace_sensitive_data(validated_params, sensitive_data)
|
||||
# Check if the first parameter is a Pydantic model (using original safe logic)
|
||||
# Only consider it pydantic if:
|
||||
# 1. There are parameters
|
||||
# 2. First parameter has a BaseModel annotation
|
||||
# 3. AND the function signature actually takes a BaseModel as first param (not auto-generated)
|
||||
try:
|
||||
is_pydantic = (
|
||||
parameters
|
||||
and len(parameters) > 0
|
||||
and hasattr(parameters[0], 'annotation')
|
||||
and parameters[0].annotation != parameters[0].empty
|
||||
and issubclass(parameters[0].annotation, BaseModel)
|
||||
and
|
||||
# Additional check: make sure the first parameter name suggests it's actually a pydantic model
|
||||
parameters[0].name in ['params', 'param', 'model']
|
||||
or parameters[0].name.endswith('_model')
|
||||
)
|
||||
except (TypeError, AttributeError):
|
||||
is_pydantic = False
|
||||
|
||||
# Check if the action requires browser
|
||||
if sensitive_data:
|
||||
validated_params = self._replace_sensitive_data(validated_params, sensitive_data, browser_session)
|
||||
|
||||
# Check if the action requires special parameters and validate they're provided
|
||||
if (
|
||||
'browser_session' in parameter_names or 'browser' in parameter_names or 'browser_context' in parameter_names
|
||||
'browser_session' in parameter_names
|
||||
or 'browser' in parameter_names
|
||||
or 'browser_context' in parameter_names
|
||||
or 'page' in parameter_names
|
||||
) and not browser_session:
|
||||
raise ValueError(f'Action {action_name} requires browser_session but none provided.')
|
||||
if 'page_extraction_llm' in parameter_names and not page_extraction_llm:
|
||||
raise ValueError(f'Action {action_name} requires page_extraction_llm but none provided.')
|
||||
if 'available_file_paths' in parameter_names and not available_file_paths:
|
||||
raise ValueError(f'Action {action_name} requires available_file_paths but none provided.')
|
||||
|
||||
if 'context' in parameter_names and not context:
|
||||
raise ValueError(f'Action {action_name} requires context but none provided.')
|
||||
|
||||
# Prepare arguments based on parameter type
|
||||
extra_args = {}
|
||||
if 'context' in parameter_names:
|
||||
extra_args['context'] = context
|
||||
if 'browser_session' in parameter_names:
|
||||
extra_args['browser_session'] = browser_session
|
||||
if 'browser' in parameter_names: # support legacy browser: BrowserContext arg
|
||||
# Create special parameters model with all available values
|
||||
special_params_data = {
|
||||
'context': context,
|
||||
'browser_session': browser_session,
|
||||
'browser': browser_session, # legacy support
|
||||
'browser_context': browser_session, # legacy support
|
||||
'page_extraction_llm': page_extraction_llm,
|
||||
'available_file_paths': available_file_paths,
|
||||
'has_sensitive_data': action_name == 'input_text' and bool(sensitive_data),
|
||||
}
|
||||
|
||||
# Handle async page parameter if needed
|
||||
if 'page' in parameter_names and browser_session:
|
||||
special_params_data['page'] = await browser_session.get_current_page()
|
||||
|
||||
# Create special parameters object without validation to preserve BrowserSession state
|
||||
# We bypass model_validate to avoid copying BrowserSession and losing private attributes
|
||||
special_params = SpecialActionParameters.model_construct(**special_params_data)
|
||||
|
||||
# Log legacy usage
|
||||
if 'browser' in parameter_names:
|
||||
logger.debug(
|
||||
f'You should update this action {action_name}(browser: BrowserContext) -> to take {action_name}(browser_session: BrowserSession) instead'
|
||||
f'You should update this action {action_name}(browser: BrowserContext) -> to take {action_name}(browser_session: BrowserSession) instead'
|
||||
)
|
||||
extra_args['browser'] = browser_session
|
||||
if 'browser_context' in parameter_names: # support legacy browser: BrowserContext arg
|
||||
if 'browser_context' in parameter_names:
|
||||
logger.debug(
|
||||
f'You should update this action {action_name}(browser_context: BrowserContext) -> to take {action_name}(browser_session: BrowserSession) instead'
|
||||
f'You should update this action {action_name}(browser_context: BrowserContext) -> to take {action_name}(browser_session: BrowserSession) instead'
|
||||
)
|
||||
extra_args['browser_context'] = browser_session
|
||||
if 'page_extraction_llm' in parameter_names:
|
||||
extra_args['page_extraction_llm'] = page_extraction_llm
|
||||
if 'available_file_paths' in parameter_names:
|
||||
extra_args['available_file_paths'] = available_file_paths
|
||||
if action_name == 'input_text' and sensitive_data:
|
||||
extra_args['has_sensitive_data'] = True
|
||||
|
||||
# Enhanced parameter injection logic using Pydantic
|
||||
if is_pydantic:
|
||||
return await action.function(validated_params, **extra_args)
|
||||
return await action.function(**validated_params.model_dump(), **extra_args)
|
||||
# For pydantic functions: function(pydantic_model, **special_params)
|
||||
# Extract special parameters needed by this function (keep objects, don't serialize)
|
||||
needed_special_params = set(parameter_names[1:]) & set(SpecialActionParameters.model_fields.keys())
|
||||
injection_params = {}
|
||||
for param_name in needed_special_params:
|
||||
value = getattr(special_params, param_name, None)
|
||||
if value is not None:
|
||||
injection_params[param_name] = value
|
||||
|
||||
return await action.function(validated_params, **injection_params)
|
||||
else:
|
||||
# For individual parameter functions: function(**all_params)
|
||||
# Merge user params with needed special params, avoiding conflicts
|
||||
param_dict = validated_params.model_dump()
|
||||
|
||||
# Extract special parameters needed by this function (keep objects, don't serialize)
|
||||
needed_special_params = set(parameter_names) & set(SpecialActionParameters.model_fields.keys())
|
||||
injection_params = {}
|
||||
for param_name in needed_special_params:
|
||||
value = getattr(special_params, param_name, None)
|
||||
if value is not None:
|
||||
injection_params[param_name] = value
|
||||
|
||||
# Remove any special params from user params to avoid conflicts (special params take precedence)
|
||||
for param_name in injection_params:
|
||||
if param_name in param_dict:
|
||||
logger.debug(f'Removing {param_name} from param_dict to avoid conflict')
|
||||
param_dict.pop(param_name)
|
||||
|
||||
# Combine all parameters
|
||||
final_params = {**param_dict, **injection_params}
|
||||
return await action.function(**final_params)
|
||||
|
||||
except Exception as e:
|
||||
raise RuntimeError(f'Error executing action {action_name}: {str(e)}') from e
|
||||
|
||||
def _replace_sensitive_data(self, params: BaseModel, sensitive_data: dict[str, str]) -> BaseModel:
|
||||
"""Replaces the sensitive data in the params"""
|
||||
# if there are any str with <secret>placeholder</secret> in the params, replace them with the actual value from sensitive_data
|
||||
def _log_sensitive_data_usage(self, placeholders_used: set[str], current_url: str | None) -> None:
|
||||
"""Log when sensitive data is being used on a page"""
|
||||
if placeholders_used:
|
||||
url_info = f' on {current_url}' if current_url and current_url != 'about:blank' else ''
|
||||
logger.info(f'🔒 Using sensitive data placeholders: {", ".join(sorted(placeholders_used))}{url_info}')
|
||||
|
||||
import logging
|
||||
import re
|
||||
def _replace_sensitive_data(
|
||||
self, params: BaseModel, sensitive_data: dict[str, Any], browser_session: BrowserSession = None
|
||||
) -> BaseModel:
|
||||
"""
|
||||
Replaces sensitive data placeholders in params with actual values.
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
Args:
|
||||
params: The parameter object containing <secret>placeholder</secret> tags
|
||||
sensitive_data: Dictionary of sensitive data, either in old format {key: value}
|
||||
or new format {domain_pattern: {key: value}}
|
||||
browser_session: Optional browser session to get the current URL for domain matching
|
||||
|
||||
Returns:
|
||||
BaseModel: The parameter object with placeholders replaced by actual values
|
||||
"""
|
||||
secret_pattern = re.compile(r'<secret>(.*?)</secret>')
|
||||
|
||||
# Set to track all missing placeholders across the full object
|
||||
all_missing_placeholders = set()
|
||||
# Set to track successfully replaced placeholders
|
||||
replaced_placeholders = set()
|
||||
|
||||
# Determine current URL if browser_session is provided
|
||||
current_url = None
|
||||
if browser_session:
|
||||
try:
|
||||
# Get current URL from browser session - do this synchronously to avoid complications
|
||||
loop = asyncio.get_event_loop()
|
||||
current_page = loop.run_until_complete(browser_session.get_current_page())
|
||||
current_url = current_page.url if current_page else None
|
||||
except Exception as e:
|
||||
logger.debug(f'Failed to get current URL from browser session: {e}')
|
||||
|
||||
# Process sensitive data based on format and current URL
|
||||
applicable_secrets = {}
|
||||
|
||||
for domain_or_key, content in sensitive_data.items():
|
||||
if isinstance(content, dict):
|
||||
# New format: {domain_pattern: {key: value}}
|
||||
# Only include secrets for domains that match the current URL
|
||||
if current_url is None:
|
||||
# No URL available, include all secrets for all domains
|
||||
applicable_secrets.update(content)
|
||||
elif current_url != 'about:blank':
|
||||
# Don't expose domain-specific secrets on about:blank
|
||||
if match_url_with_domain_pattern(current_url, domain_or_key):
|
||||
applicable_secrets.update(content)
|
||||
else:
|
||||
# Old format: {key: value}
|
||||
applicable_secrets[domain_or_key] = content
|
||||
|
||||
# Filter out empty values
|
||||
applicable_secrets = {k: v for k, v in applicable_secrets.items() if v}
|
||||
|
||||
def replace_secrets(value):
|
||||
if isinstance(value, str):
|
||||
matches = secret_pattern.findall(value)
|
||||
|
||||
for placeholder in matches:
|
||||
if placeholder in sensitive_data and sensitive_data[placeholder]:
|
||||
value = value.replace(f'<secret>{placeholder}</secret>', sensitive_data[placeholder])
|
||||
if placeholder in applicable_secrets:
|
||||
value = value.replace(f'<secret>{placeholder}</secret>', applicable_secrets[placeholder])
|
||||
replaced_placeholders.add(placeholder)
|
||||
else:
|
||||
# Keep track of missing placeholders
|
||||
all_missing_placeholders.add(placeholder)
|
||||
@@ -207,6 +339,9 @@ class Registry(Generic[Context]):
|
||||
params_dump = params.model_dump()
|
||||
processed_params = replace_secrets(params_dump)
|
||||
|
||||
# Log sensitive data usage
|
||||
self._log_sensitive_data_usage(replaced_placeholders, current_url)
|
||||
|
||||
# Log a warning if any placeholders are missing
|
||||
if all_missing_placeholders:
|
||||
logger.warning(f'Missing or empty keys in sensitive_data dictionary: {", ".join(all_missing_placeholders)}')
|
||||
|
||||
@@ -76,7 +76,7 @@ class ActionRegistry(BaseModel):
|
||||
Match a list of domain glob patterns against a URL.
|
||||
|
||||
Args:
|
||||
domain_patterns: A list of domain patterns that can include glob patterns (* wildcard)
|
||||
domains: A list of domain patterns that can include glob patterns (* wildcard)
|
||||
url: The URL to match against
|
||||
|
||||
Returns:
|
||||
@@ -86,26 +86,13 @@ class ActionRegistry(BaseModel):
|
||||
if domains is None or not url:
|
||||
return True
|
||||
|
||||
import fnmatch
|
||||
from urllib.parse import urlparse
|
||||
# Use the centralized URL matching logic from utils
|
||||
from browser_use.utils import match_url_with_domain_pattern
|
||||
|
||||
# Parse the URL to get the domain
|
||||
try:
|
||||
parsed_url = urlparse(url)
|
||||
if not parsed_url.netloc:
|
||||
return False
|
||||
|
||||
domain = parsed_url.netloc
|
||||
# Remove port if present
|
||||
if ':' in domain:
|
||||
domain = domain.split(':')[0]
|
||||
|
||||
for domain_pattern in domains:
|
||||
if fnmatch.fnmatch(domain, domain_pattern): # Perform glob *.matching.*
|
||||
return True
|
||||
return False
|
||||
except Exception:
|
||||
return False
|
||||
for domain_pattern in domains:
|
||||
if match_url_with_domain_pattern(url, domain_pattern):
|
||||
return True
|
||||
return False
|
||||
|
||||
@staticmethod
|
||||
def _match_page_filter(page_filter: Callable[[Page], bool] | None, page: Page) -> bool:
|
||||
|
||||
@@ -79,18 +79,19 @@ class Controller(Generic[Context]):
|
||||
|
||||
# Basic Navigation Actions
|
||||
@self.registry.action(
|
||||
'Search the query in Google in the current tab, the query should be a search query like humans search in Google, concrete and not vague or super long. More the single most important items. ',
|
||||
'Search the query in Google, the query should be a search query like humans search in Google, concrete and not vague or super long.',
|
||||
param_model=SearchGoogleAction,
|
||||
)
|
||||
async def search_google(params: SearchGoogleAction, browser_session: BrowserSession):
|
||||
search_url = f'https://www.google.com/search?q={params.query}&udm=14'
|
||||
|
||||
page = await browser_session.get_current_page()
|
||||
if page:
|
||||
if page.url in ('about:blank', 'https://www.google.com'):
|
||||
await page.goto(search_url)
|
||||
await page.wait_for_load_state()
|
||||
else:
|
||||
page = await browser_session.create_new_tab(search_url)
|
||||
|
||||
msg = f'🔍 Searched for "{params.query}" in Google'
|
||||
logger.info(msg)
|
||||
return ActionResult(extracted_content=msg, include_in_memory=True)
|
||||
@@ -108,7 +109,7 @@ class Controller(Generic[Context]):
|
||||
return ActionResult(extracted_content=msg, include_in_memory=True)
|
||||
|
||||
@self.registry.action('Go back', param_model=NoParamsAction)
|
||||
async def go_back(_: NoParamsAction, browser_session: BrowserSession):
|
||||
async def go_back(params: NoParamsAction, browser_session: BrowserSession):
|
||||
await browser_session.go_back()
|
||||
msg = '🔙 Navigated back'
|
||||
logger.info(msg)
|
||||
@@ -179,9 +180,7 @@ class Controller(Generic[Context]):
|
||||
return ActionResult(extracted_content=msg, include_in_memory=True)
|
||||
|
||||
# Save PDF
|
||||
@self.registry.action(
|
||||
'Save the current page as a PDF file',
|
||||
)
|
||||
@self.registry.action('Save the current page as a PDF file')
|
||||
async def save_pdf(browser_session: BrowserSession):
|
||||
page = await browser_session.get_current_page()
|
||||
short_url = re.sub(r'^https?://(?:www\.)?|/$', '', page.url)
|
||||
@@ -205,7 +204,7 @@ class Controller(Generic[Context]):
|
||||
logger.info(msg)
|
||||
return ActionResult(extracted_content=msg, include_in_memory=True)
|
||||
|
||||
@self.registry.action('Open url in new tab', param_model=OpenTabAction)
|
||||
@self.registry.action('Open a specific url in new tab', param_model=OpenTabAction)
|
||||
async def open_tab(params: OpenTabAction, browser_session: BrowserSession):
|
||||
await browser_session.create_new_tab(params.url)
|
||||
msg = f'🔗 Opened new tab with {params.url}'
|
||||
@@ -218,22 +217,27 @@ class Controller(Generic[Context]):
|
||||
page = await browser_session.get_current_page()
|
||||
url = page.url
|
||||
await page.close()
|
||||
msg = f'❌ Closed tab #{params.page_id} with url {url}'
|
||||
new_page = await browser_session.get_current_page()
|
||||
new_page_idx = browser_session.tabs.index(new_page)
|
||||
msg = f'❌ Closed tab #{params.page_id} with {url}, now focused on tab #{new_page_idx} with url {new_page.url}'
|
||||
logger.info(msg)
|
||||
return ActionResult(extracted_content=msg, include_in_memory=True)
|
||||
|
||||
# Content Actions
|
||||
@self.registry.action(
|
||||
'Extract page content to retrieve specific information from the page, e.g. all company names, a specific description, all information about, links with companies in structured format or simply links',
|
||||
'Extract page content to retrieve specific information from the page, e.g. all company names, a specific description, all information about xyc, 4 links with companies in structured format. Use include_links true if the goal requires links',
|
||||
)
|
||||
async def extract_content(
|
||||
goal: str, should_strip_link_urls: bool, browser_session: BrowserSession, page_extraction_llm: BaseChatModel
|
||||
goal: str,
|
||||
browser_session: BrowserSession,
|
||||
page_extraction_llm: BaseChatModel,
|
||||
include_links: bool = False,
|
||||
):
|
||||
page = await browser_session.get_current_page()
|
||||
import markdownify
|
||||
|
||||
strip = []
|
||||
if should_strip_link_urls:
|
||||
if not include_links:
|
||||
strip = ['a', 'img']
|
||||
|
||||
content = markdownify.markdownify(await page.content(), strip=strip)
|
||||
@@ -257,6 +261,28 @@ class Controller(Generic[Context]):
|
||||
logger.info(msg)
|
||||
return ActionResult(extracted_content=msg)
|
||||
|
||||
@self.registry.action(
|
||||
'Get the accessibility tree of the page in the format "role name" with the number_of_elements to return',
|
||||
)
|
||||
async def get_ax_tree(number_of_elements: int, browser_session: BrowserSession):
|
||||
page = await browser_session.get_current_page()
|
||||
node = await page.accessibility.snapshot(interesting_only=True)
|
||||
|
||||
def flatten_ax_tree(node, lines):
|
||||
if not node:
|
||||
return
|
||||
role = node.get('role', '')
|
||||
name = node.get('name', '')
|
||||
lines.append(f'{role} {name}')
|
||||
for child in node.get('children', []):
|
||||
flatten_ax_tree(child, lines)
|
||||
|
||||
lines = []
|
||||
flatten_ax_tree(node, lines)
|
||||
msg = '\n'.join(lines)
|
||||
logger.info(msg)
|
||||
return ActionResult(extracted_content=msg, include_in_memory=False)
|
||||
|
||||
@self.registry.action(
|
||||
'Scroll down the page by pixel amount - if none is given, scroll one page',
|
||||
param_model=ScrollAction,
|
||||
@@ -343,7 +369,7 @@ class Controller(Generic[Context]):
|
||||
if await locator.count() == 0:
|
||||
continue
|
||||
|
||||
element = await locator.first
|
||||
element = locator.first
|
||||
is_visible = await element.is_visible()
|
||||
bbox = await element.bounding_box()
|
||||
|
||||
@@ -747,8 +773,8 @@ class Controller(Generic[Context]):
|
||||
logger.error(error_msg)
|
||||
return ActionResult(error=error_msg, include_in_memory=True)
|
||||
|
||||
@self.registry.action('Google Sheets: Get the contents of the entire sheet', domains=['sheets.google.com'])
|
||||
async def get_sheet_contents(browser_session: BrowserSession):
|
||||
@self.registry.action('Google Sheets: Get the contents of the entire sheet', domains=['https://docs.google.com'])
|
||||
async def read_sheet_contents(browser_session: BrowserSession):
|
||||
page = await browser_session.get_current_page()
|
||||
|
||||
# select all cells
|
||||
@@ -760,7 +786,44 @@ class Controller(Generic[Context]):
|
||||
extracted_tsv = await page.evaluate('() => navigator.clipboard.readText()')
|
||||
return ActionResult(extracted_content=extracted_tsv, include_in_memory=True)
|
||||
|
||||
@self.registry.action('Google Sheets: Select a specific cell or range of cells', domains=['sheets.google.com'])
|
||||
@self.registry.action('Google Sheets: Get the contents of a cell or range of cells', domains=['https://docs.google.com'])
|
||||
async def read_cell_contents(browser_session: BrowserSession, cell_or_range: str):
|
||||
page = await browser_session.get_current_page()
|
||||
|
||||
await select_cell_or_range(browser_session, cell_or_range)
|
||||
|
||||
await page.keyboard.press('ControlOrMeta+C')
|
||||
await asyncio.sleep(0.1)
|
||||
extracted_tsv = await page.evaluate('() => navigator.clipboard.readText()')
|
||||
return ActionResult(extracted_content=extracted_tsv, include_in_memory=True)
|
||||
|
||||
@self.registry.action(
|
||||
'Google Sheets: Update the content of a cell or range of cells', domains=['https://docs.google.com']
|
||||
)
|
||||
async def update_cell_contents(browser_session: BrowserSession, cell_or_range: str, new_contents_tsv: str):
|
||||
page = await browser_session.get_current_page()
|
||||
|
||||
await select_cell_or_range(browser_session, cell_or_range)
|
||||
|
||||
# simulate paste event from clipboard with TSV content
|
||||
await page.evaluate(f"""
|
||||
const clipboardData = new DataTransfer();
|
||||
clipboardData.setData('text/plain', `{new_contents_tsv}`);
|
||||
document.activeElement.dispatchEvent(new ClipboardEvent('paste', {{clipboardData}}));
|
||||
""")
|
||||
|
||||
return ActionResult(extracted_content=f'Updated cells: {cell_or_range} = {new_contents_tsv}', include_in_memory=False)
|
||||
|
||||
@self.registry.action('Google Sheets: Clear whatever cells are currently selected', domains=['https://docs.google.com'])
|
||||
async def clear_cell_contents(browser_session: BrowserSession, cell_or_range: str):
|
||||
page = await browser_session.get_current_page()
|
||||
|
||||
await select_cell_or_range(browser_session, cell_or_range)
|
||||
|
||||
await page.keyboard.press('Backspace')
|
||||
return ActionResult(extracted_content=f'Cleared cells: {cell_or_range}', include_in_memory=False)
|
||||
|
||||
@self.registry.action('Google Sheets: Select a specific cell or range of cells', domains=['https://docs.google.com'])
|
||||
async def select_cell_or_range(browser_session: BrowserSession, cell_or_range: str):
|
||||
page = await browser_session.get_current_page()
|
||||
|
||||
@@ -777,30 +840,13 @@ class Controller(Generic[Context]):
|
||||
await page.keyboard.press('Enter')
|
||||
await asyncio.sleep(0.2)
|
||||
await page.keyboard.press('Escape') # to make sure the popup still closes in the case where the jump failed
|
||||
return ActionResult(extracted_content=f'Selected cell {cell_or_range}', include_in_memory=False)
|
||||
return ActionResult(extracted_content=f'Selected cells: {cell_or_range}', include_in_memory=False)
|
||||
|
||||
@self.registry.action(
|
||||
'Google Sheets: Get the contents of a specific cell or range of cells', domains=['sheets.google.com']
|
||||
'Google Sheets: Fallback method to type text into (only one) currently selected cell',
|
||||
domains=['https://docs.google.com'],
|
||||
)
|
||||
async def get_range_contents(browser_session: BrowserSession, cell_or_range: str):
|
||||
page = await browser_session.get_current_page()
|
||||
|
||||
await select_cell_or_range(browser_session, cell_or_range)
|
||||
|
||||
await page.keyboard.press('ControlOrMeta+C')
|
||||
await asyncio.sleep(0.1)
|
||||
extracted_tsv = await page.evaluate('() => navigator.clipboard.readText()')
|
||||
return ActionResult(extracted_content=extracted_tsv, include_in_memory=True)
|
||||
|
||||
@self.registry.action('Google Sheets: Clear the currently selected cells', domains=['sheets.google.com'])
|
||||
async def clear_selected_range(browser_session: BrowserSession):
|
||||
page = await browser_session.get_current_page()
|
||||
|
||||
await page.keyboard.press('Backspace')
|
||||
return ActionResult(extracted_content='Cleared selected range', include_in_memory=False)
|
||||
|
||||
@self.registry.action('Google Sheets: Input text into the currently selected cell', domains=['sheets.google.com'])
|
||||
async def input_selected_cell_text(browser_session: BrowserSession, text: str):
|
||||
async def fallback_input_into_single_selected_cell(browser_session: BrowserSession, text: str):
|
||||
page = await browser_session.get_current_page()
|
||||
|
||||
await page.keyboard.type(text, delay=0.1)
|
||||
@@ -808,21 +854,6 @@ class Controller(Generic[Context]):
|
||||
await page.keyboard.press('ArrowUp')
|
||||
return ActionResult(extracted_content=f'Inputted text {text}', include_in_memory=False)
|
||||
|
||||
@self.registry.action('Google Sheets: Batch update a range of cells', domains=['sheets.google.com'])
|
||||
async def update_range_contents(browser_session: BrowserSession, range: str, new_contents_tsv: str):
|
||||
page = await browser_session.get_current_page()
|
||||
|
||||
await select_cell_or_range(browser_session, range)
|
||||
|
||||
# simulate paste event from clipboard with TSV content
|
||||
await page.evaluate(f"""
|
||||
const clipboardData = new DataTransfer();
|
||||
clipboardData.setData('text/plain', `{new_contents_tsv}`);
|
||||
document.activeElement.dispatchEvent(new ClipboardEvent('paste', {{clipboardData}}));
|
||||
""")
|
||||
|
||||
return ActionResult(extracted_content=f'Updated cell {range} with {new_contents_tsv}', include_in_memory=False)
|
||||
|
||||
# Register ---------------------------------------------------------------
|
||||
|
||||
def action(self, description: str, **kwargs):
|
||||
|
||||
@@ -824,30 +824,12 @@
|
||||
|
||||
if (hasInteractiveRole) return true;
|
||||
|
||||
// check whether element has event listeners
|
||||
try {
|
||||
if (typeof getEventListeners === 'function') {
|
||||
const listeners = getEventListeners(element);
|
||||
const mouseEvents = ['click', 'mousedown', 'mouseup', 'dblclick'];
|
||||
for (const eventType of mouseEvents) {
|
||||
for (const listener of listeners) {
|
||||
if (listener.type === eventType) {
|
||||
return true; // Found a mouse interaction listener
|
||||
}
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// Fallback: Check common event attributes if getEventListeners is not available
|
||||
const commonMouseAttrs = ['onclick', 'onmousedown', 'onmouseup', 'ondblclick'];
|
||||
for (const attr of commonMouseAttrs) {
|
||||
if (element.hasAttribute(attr) || typeof element[attr] === 'function') {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
// Check common event attributes (getEventListeners doesn't work in page.evaluate context)
|
||||
const commonMouseAttrs = ['onclick', 'onmousedown', 'onmouseup', 'ondblclick'];
|
||||
for (const attr of commonMouseAttrs) {
|
||||
if (element.hasAttribute(attr) || typeof element[attr] === 'function') {
|
||||
return true;
|
||||
}
|
||||
} catch (e) {
|
||||
// console.warn(`Could not check event listeners for ${element.tagName}:`, e);
|
||||
// If checking listeners fails, rely on other checks
|
||||
}
|
||||
|
||||
return false
|
||||
@@ -1116,29 +1098,11 @@
|
||||
if (element.hasAttribute('onclick') || typeof element.onclick === 'function') {
|
||||
return true;
|
||||
}
|
||||
// Check for other common interaction event listeners
|
||||
try {
|
||||
const getEventListeners = window.getEventListenersForNode;
|
||||
if (typeof getEventListeners === 'function') {
|
||||
const listeners = getEventListeners(element);
|
||||
const interactionEvents = ['click', 'mousedown', 'mouseup', 'keydown', 'keyup', 'submit', 'change', 'input', 'focus', 'blur'];
|
||||
for (const eventType of interactionEvents) {
|
||||
for (const listener of listeners) {
|
||||
if (listener.type === eventType) {
|
||||
return true; // Found a common interaction listener
|
||||
}
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// Fallback: Check common event attributes if getEventListeners is not available
|
||||
const commonEventAttrs = ['onmousedown', 'onmouseup', 'onkeydown', 'onkeyup', 'onsubmit', 'onchange', 'oninput', 'onfocus', 'onblur'];
|
||||
if (commonEventAttrs.some(attr => element.hasAttribute(attr))) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
} catch (e) {
|
||||
// console.warn(`Could not check event listeners for ${element.tagName}:`, e);
|
||||
// If checking listeners fails, rely on other checks
|
||||
|
||||
// Check common event attributes (getEventListenersForNode doesn't work in page.evaluate context)
|
||||
const commonEventAttrs = ['onmousedown', 'onmouseup', 'onkeydown', 'onkeyup', 'onsubmit', 'onchange', 'oninput', 'onfocus', 'onblur'];
|
||||
if (commonEventAttrs.some(attr => element.hasAttribute(attr))) {
|
||||
return true;
|
||||
}
|
||||
|
||||
// if the element is not strictly interactive but appears clickable based on heuristic signals
|
||||
|
||||
@@ -1,4 +1,3 @@
|
||||
import json
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
from importlib import resources
|
||||
@@ -105,10 +104,27 @@ class DomService:
|
||||
|
||||
# Only log performance metrics in debug mode
|
||||
if debug_mode and 'perfMetrics' in eval_page:
|
||||
perf = eval_page['perfMetrics']
|
||||
|
||||
# Get key metrics for summary
|
||||
total_nodes = perf.get('nodeMetrics', {}).get('totalNodes', 0)
|
||||
# processed_nodes = perf.get('nodeMetrics', {}).get('processedNodes', 0)
|
||||
|
||||
# Count interactive elements from the DOM map
|
||||
interactive_count = 0
|
||||
if 'map' in eval_page:
|
||||
for node_data in eval_page['map'].values():
|
||||
if isinstance(node_data, dict) and node_data.get('isInteractive'):
|
||||
interactive_count += 1
|
||||
|
||||
# Create concise summary
|
||||
url_short = self.page.url[:50] + '...' if len(self.page.url) > 50 else self.page.url
|
||||
logger.debug(
|
||||
'DOM Tree Building Performance Metrics for: %s\n%s',
|
||||
self.page.url,
|
||||
json.dumps(eval_page['perfMetrics'], indent=2),
|
||||
'🔎 Ran buildDOMTree.js interactive element detection on: %s interactive=%d/%d',
|
||||
url_short,
|
||||
interactive_count,
|
||||
total_nodes,
|
||||
# processed_nodes,
|
||||
)
|
||||
|
||||
return await self._construct_dom_tree(eval_page)
|
||||
|
||||
95
browser_use/dom/tests/test_accessibility_playground.py
Normal file
95
browser_use/dom/tests/test_accessibility_playground.py
Normal file
@@ -0,0 +1,95 @@
|
||||
"""
|
||||
Accessibility Tree Playground for browser-use
|
||||
|
||||
- Launches a browser and navigates to a target URL (default: amazon.com)
|
||||
- Extracts both the full and interesting-only accessibility trees using Playwright
|
||||
- Prints and saves both trees to JSON files
|
||||
- Recursively prints relevant info for each node (role, name, value, description, focusable, focused, checked, selected, disabled, children count)
|
||||
- Explains the difference between the accessibility tree and the DOM tree
|
||||
- Notes on React/Vue/SPA apps
|
||||
- Easy to modify for your own experiments
|
||||
|
||||
Run with: python browser_use/dom/tests/test_accessibility_playground.py
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
# Change this to any site you want to test
|
||||
|
||||
|
||||
# Helper to recursively print relevant info from the accessibility tree
|
||||
def print_ax_tree(node, depth=0):
|
||||
if not node:
|
||||
return
|
||||
indent = ' ' * depth
|
||||
info = [
|
||||
f'role={node.get("role")!r}',
|
||||
f'name={node.get("name")!r}' if node.get('name') else None,
|
||||
f'value={node.get("value")!r}' if node.get('value') else None,
|
||||
f'desc={node.get("description")!r}' if node.get('description') else None,
|
||||
f'focusable={node.get("focusable")!r}' if 'focusable' in node else None,
|
||||
f'focused={node.get("focused")!r}' if 'focused' in node else None,
|
||||
f'checked={node.get("checked")!r}' if 'checked' in node else None,
|
||||
f'selected={node.get("selected")!r}' if 'selected' in node else None,
|
||||
f'disabled={node.get("disabled")!r}' if 'disabled' in node else None,
|
||||
f'children={len(node.get("children", []))}' if node.get('children') else None,
|
||||
]
|
||||
print('--------------------------------')
|
||||
print(indent + ', '.join([x for x in info if x]))
|
||||
for child in node.get('children', []):
|
||||
print_ax_tree(child, depth + 1)
|
||||
|
||||
|
||||
# Helper to print all available accessibility node attributes
|
||||
# Prints all key-value pairs for each node (except 'children'), then recurses into children
|
||||
def print_all_fields(node, depth=0):
|
||||
if not node:
|
||||
return
|
||||
indent = ' ' * depth
|
||||
for k, v in node.items():
|
||||
if k != 'children':
|
||||
print(f'{indent}{k}: {v!r}')
|
||||
if 'children' in node:
|
||||
print(f'{indent}children: {len(node["children"])}')
|
||||
for child in node['children']:
|
||||
print_all_fields(child, depth + 1)
|
||||
|
||||
|
||||
def flatten_ax_tree(node, lines):
|
||||
if not node:
|
||||
return
|
||||
role = node.get('role', '')
|
||||
name = node.get('name', '')
|
||||
lines.append(f'{role} {name}')
|
||||
for child in node.get('children', []):
|
||||
flatten_ax_tree(child, lines)
|
||||
|
||||
|
||||
async def get_ax_tree(TARGET_URL):
|
||||
async with async_playwright() as p:
|
||||
browser = await p.chromium.launch(headless=True)
|
||||
page = await browser.new_page()
|
||||
print(f'Navigating to {TARGET_URL}')
|
||||
await page.goto(TARGET_URL, wait_until='domcontentloaded')
|
||||
|
||||
ax_tree_interesting = await page.accessibility.snapshot(interesting_only=True)
|
||||
lines = []
|
||||
flatten_ax_tree(ax_tree_interesting, lines)
|
||||
print(lines)
|
||||
print(f'length of ax_tree_interesting: {len(lines)}')
|
||||
|
||||
await browser.close()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
TARGET_URL = [
|
||||
# 'https://amazon.com/',
|
||||
# 'https://www.google.com/',
|
||||
# 'https://www.facebook.com/',
|
||||
# 'https://platform.openai.com/tokenizer',
|
||||
'https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/input/checkbox',
|
||||
]
|
||||
for url in TARGET_URL:
|
||||
asyncio.run(get_ax_tree(url))
|
||||
@@ -1,6 +1,7 @@
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
import warnings
|
||||
|
||||
from dotenv import load_dotenv
|
||||
|
||||
@@ -59,6 +60,12 @@ def addLoggingLevel(levelName, levelNum, methodName=None):
|
||||
|
||||
|
||||
def setup_logging():
|
||||
# Suppress specific deprecation warnings from FAISS
|
||||
warnings.filterwarnings('ignore', category=DeprecationWarning, module='faiss.loader')
|
||||
warnings.filterwarnings('ignore', message='builtin type SwigPyPacked has no __module__ attribute')
|
||||
warnings.filterwarnings('ignore', message='builtin type SwigPyObject has no __module__ attribute')
|
||||
warnings.filterwarnings('ignore', message='builtin type swigvarlink has no __module__ attribute')
|
||||
|
||||
# Try to add RESULT level, but ignore if it already exists
|
||||
try:
|
||||
addLoggingLevel('RESULT', 35) # This allows ERROR, FATAL and CRITICAL
|
||||
@@ -110,8 +117,8 @@ def setup_logging():
|
||||
|
||||
logger = logging.getLogger('browser_use')
|
||||
# logger.info('BrowserUse logging setup complete with level %s', log_type)
|
||||
# Silence third-party loggers
|
||||
for logger in [
|
||||
# Silence or adjust third-party loggers
|
||||
third_party_loggers = [
|
||||
'WDM',
|
||||
'httpx',
|
||||
'selenium',
|
||||
@@ -119,6 +126,8 @@ def setup_logging():
|
||||
'urllib3',
|
||||
'asyncio',
|
||||
'langchain',
|
||||
'langsmith',
|
||||
'langsmith.client',
|
||||
'openai',
|
||||
'httpcore',
|
||||
'charset_normalizer',
|
||||
@@ -126,7 +135,12 @@ def setup_logging():
|
||||
'PIL.PngImagePlugin',
|
||||
'trafilatura.htmlprocessing',
|
||||
'trafilatura',
|
||||
]:
|
||||
third_party = logging.getLogger(logger)
|
||||
'mem0',
|
||||
'mem0.vector_stores.faiss',
|
||||
'mem0.vector_stores',
|
||||
'mem0.memory',
|
||||
]
|
||||
for logger_name in third_party_loggers:
|
||||
third_party = logging.getLogger(logger_name)
|
||||
third_party.setLevel(logging.ERROR)
|
||||
third_party.propagate = False
|
||||
|
||||
@@ -5,9 +5,11 @@ import platform
|
||||
import signal
|
||||
import time
|
||||
from collections.abc import Callable, Coroutine
|
||||
from fnmatch import fnmatch
|
||||
from functools import wraps
|
||||
from sys import stderr
|
||||
from typing import Any, ParamSpec, TypeVar
|
||||
from urllib.parse import urlparse
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -304,7 +306,9 @@ def time_execution_sync(additional_text: str = '') -> Callable[[Callable[P, R]],
|
||||
start_time = time.time()
|
||||
result = func(*args, **kwargs)
|
||||
execution_time = time.time() - start_time
|
||||
logger.debug(f'{additional_text} Execution time: {execution_time:.2f} seconds')
|
||||
# Only log if execution takes more than 0.25 seconds
|
||||
if execution_time > 0.25:
|
||||
logger.debug(f'⏳ {additional_text.strip("-")}() took {execution_time:.2f}s')
|
||||
return result
|
||||
|
||||
return wrapper
|
||||
@@ -321,7 +325,10 @@ def time_execution_async(
|
||||
start_time = time.time()
|
||||
result = await func(*args, **kwargs)
|
||||
execution_time = time.time() - start_time
|
||||
logger.debug(f'{additional_text} Execution time: {execution_time:.2f} seconds')
|
||||
# Only log if execution takes more than 0.25 seconds to avoid spamming the logs
|
||||
# you can lower this threshold locally when you're doing dev work to performance optimize stuff
|
||||
if execution_time > 0.25:
|
||||
logger.debug(f'⏳ {additional_text.strip("-")}() took {execution_time:.2f}s')
|
||||
return result
|
||||
|
||||
return wrapper
|
||||
@@ -343,3 +350,126 @@ def singleton(cls):
|
||||
def check_env_variables(keys: list[str], any_or_all=all) -> bool:
|
||||
"""Check if all required environment variables are set"""
|
||||
return any_or_all(os.getenv(key, '').strip() for key in keys)
|
||||
|
||||
|
||||
def is_unsafe_pattern(pattern: str) -> bool:
|
||||
"""
|
||||
Check if a domain pattern has complex wildcards that could match too many domains.
|
||||
|
||||
Args:
|
||||
pattern: The domain pattern to check
|
||||
|
||||
Returns:
|
||||
bool: True if the pattern has unsafe wildcards, False otherwise
|
||||
"""
|
||||
# Extract domain part if there's a scheme
|
||||
if '://' in pattern:
|
||||
_, pattern = pattern.split('://', 1)
|
||||
|
||||
# Remove safe patterns (*.domain and domain.*)
|
||||
bare_domain = pattern.replace('.*', '').replace('*.', '')
|
||||
|
||||
# If there are still wildcards, it's potentially unsafe
|
||||
return '*' in bare_domain
|
||||
|
||||
|
||||
def match_url_with_domain_pattern(url: str, domain_pattern: str, log_warnings: bool = False) -> bool:
|
||||
"""
|
||||
Check if a URL matches a domain pattern. SECURITY CRITICAL.
|
||||
|
||||
Supports optional glob patterns and schemes:
|
||||
- *.example.com will match sub.example.com and example.com
|
||||
- *google.com will match google.com, agoogle.com, and www.google.com
|
||||
- http*://example.com will match http://example.com, https://example.com
|
||||
- chrome-extension://* will match chrome-extension://aaaaaaaaaaaa and chrome-extension://bbbbbbbbbbbbb
|
||||
|
||||
When no scheme is specified, https is used by default for security.
|
||||
For example, 'example.com' will match 'https://example.com' but not 'http://example.com'.
|
||||
|
||||
Note: about:blank must be handled at the callsite, not inside this function.
|
||||
|
||||
Args:
|
||||
url: The URL to check
|
||||
domain_pattern: Domain pattern to match against
|
||||
log_warnings: Whether to log warnings about unsafe patterns
|
||||
|
||||
Returns:
|
||||
bool: True if the URL matches the pattern, False otherwise
|
||||
"""
|
||||
try:
|
||||
# Note: about:blank should be handled at the callsite, not here
|
||||
if url == 'about:blank':
|
||||
return False
|
||||
|
||||
parsed_url = urlparse(url)
|
||||
|
||||
# Extract only the hostname and scheme components
|
||||
scheme = parsed_url.scheme.lower() if parsed_url.scheme else ''
|
||||
domain = parsed_url.hostname.lower() if parsed_url.hostname else ''
|
||||
|
||||
if not scheme or not domain:
|
||||
return False
|
||||
|
||||
# Normalize the domain pattern
|
||||
domain_pattern = domain_pattern.lower()
|
||||
|
||||
# Handle pattern with scheme
|
||||
if '://' in domain_pattern:
|
||||
pattern_scheme, pattern_domain = domain_pattern.split('://', 1)
|
||||
else:
|
||||
pattern_scheme = 'https' # Default to matching only https for security
|
||||
pattern_domain = domain_pattern
|
||||
|
||||
# Handle port in pattern (we strip ports from patterns since we already
|
||||
# extracted only the hostname from the URL)
|
||||
if ':' in pattern_domain and not pattern_domain.startswith(':'):
|
||||
pattern_domain = pattern_domain.split(':', 1)[0]
|
||||
|
||||
# If scheme doesn't match, return False
|
||||
if not fnmatch(scheme, pattern_scheme):
|
||||
return False
|
||||
|
||||
# Check for exact match
|
||||
if pattern_domain == '*' or domain == pattern_domain:
|
||||
return True
|
||||
|
||||
# Handle glob patterns
|
||||
if '*' in pattern_domain:
|
||||
# Check for unsafe glob patterns
|
||||
# First, check for patterns like *.*.domain which are unsafe
|
||||
if pattern_domain.count('*.') > 1 or pattern_domain.count('.*') > 1:
|
||||
if log_warnings:
|
||||
logger = logging.getLogger(__name__)
|
||||
logger.error(f'⛔️ Multiple wildcards in pattern=[{domain_pattern}] are not supported')
|
||||
return False # Don't match unsafe patterns
|
||||
|
||||
# Check for wildcards in TLD part (example.*)
|
||||
if pattern_domain.endswith('.*'):
|
||||
if log_warnings:
|
||||
logger = logging.getLogger(__name__)
|
||||
logger.error(f'⛔️ Wildcard TLDs like in pattern=[{domain_pattern}] are not supported for security')
|
||||
return False # Don't match unsafe patterns
|
||||
|
||||
# Then check for embedded wildcards
|
||||
bare_domain = pattern_domain.replace('*.', '')
|
||||
if '*' in bare_domain:
|
||||
if log_warnings:
|
||||
logger = logging.getLogger(__name__)
|
||||
logger.error(f'⛔️ Only *.domain style patterns are supported, ignoring pattern=[{domain_pattern}]')
|
||||
return False # Don't match unsafe patterns
|
||||
|
||||
# Special handling so that *.google.com also matches bare google.com
|
||||
if pattern_domain.startswith('*.'):
|
||||
parent_domain = pattern_domain[2:]
|
||||
if domain == parent_domain or fnmatch(domain, parent_domain):
|
||||
return True
|
||||
|
||||
# Normal case: match domain against pattern
|
||||
if fnmatch(domain, pattern_domain):
|
||||
return True
|
||||
|
||||
return False
|
||||
except Exception as e:
|
||||
logger = logging.getLogger(__name__)
|
||||
logger.error(f'⛔️ Error matching URL {url} with pattern {domain_pattern}: {type(e).__name__}: {e}')
|
||||
return False
|
||||
|
||||
34
debug_pydantic.py
Normal file
34
debug_pydantic.py
Normal file
@@ -0,0 +1,34 @@
|
||||
import inspect
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from browser_use.controller.views import ClickElementAction
|
||||
|
||||
|
||||
# Check the pydantic detection logic
|
||||
def click_element_by_index(params: ClickElementAction, browser_session):
|
||||
pass
|
||||
|
||||
|
||||
sig = inspect.signature(click_element_by_index)
|
||||
parameters = list(sig.parameters.values())
|
||||
parameter_names = [param.name for param in parameters]
|
||||
|
||||
print('Parameters:', parameter_names)
|
||||
print('First param name:', parameters[0].name)
|
||||
print('First param annotation:', parameters[0].annotation)
|
||||
print('Is BaseModel:', issubclass(parameters[0].annotation, BaseModel))
|
||||
|
||||
# Check the name detection logic
|
||||
name_check = parameters[0].name in ['params', 'param', 'model'] or parameters[0].name.endswith('_model')
|
||||
print('Name check passed:', name_check)
|
||||
|
||||
is_pydantic = (
|
||||
parameters
|
||||
and len(parameters) > 0
|
||||
and hasattr(parameters[0], 'annotation')
|
||||
and parameters[0].annotation != parameters[0].empty
|
||||
and issubclass(parameters[0].annotation, BaseModel)
|
||||
and name_check
|
||||
)
|
||||
print('Is pydantic:', is_pydantic)
|
||||
111
docs/cloud/webhooks.mdx
Normal file
111
docs/cloud/webhooks.mdx
Normal file
@@ -0,0 +1,111 @@
|
||||
---
|
||||
title: "Webhooks"
|
||||
description: "Learn how to integrate webhooks with Browser Use Cloud API"
|
||||
icon: "code"
|
||||
---
|
||||
|
||||
Webhooks allow you to receive real-time notifications about events in your Browser Use tasks. This guide will show you how to set up and verify webhook endpoints.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
<Note>
|
||||
You need an active subscription to create webhooks. See your billing page
|
||||
[cloud.browser-use.com/billing](https://cloud.browser-use.com/billing)
|
||||
</Note>
|
||||
|
||||
## Setting Up Webhooks
|
||||
|
||||
To receive webhook notifications, you need to:
|
||||
|
||||
1. Create an endpoint that can receive HTTPS POST requests
|
||||
2. Configure your webhook URL in the Browser Use dashboard
|
||||
3. Implement signature verification to ensure webhook authenticity
|
||||
|
||||
<Note>
|
||||
When adding a webhook URL in the dashboard, it must be a valid HTTPS URL that can receive POST requests.
|
||||
On creation, we will send a test payload `{"test": "ok"}` to verify the endpoint is working correctly before creating the actual webhook!
|
||||
</Note>
|
||||
|
||||
## Webhook Events
|
||||
|
||||
Browser Use currently only sends status updates for your running tasks:
|
||||
|
||||
| Status | Description |
|
||||
| -------------- | -------------------------------------- |
|
||||
| `initializing` | A task is initializing |
|
||||
| `started` | A Task has started (browser available) |
|
||||
| `paused` | A task has been paused mid execution |
|
||||
| `stopped` | A task has been stopped mid execution |
|
||||
| `finished` | A task has finished |
|
||||
|
||||
## Webhook Payload
|
||||
|
||||
Each webhook call includes:
|
||||
|
||||
- A JSON payload with event details
|
||||
- `X-Browser-Use-Timestamp` header with the current timestamp
|
||||
- `X-Browser-Use-Signature` header for verification
|
||||
|
||||
Example payload:
|
||||
|
||||
```json
|
||||
{
|
||||
"session_id": "602c8809-61ee-461d-acfd-3e8783f23326",
|
||||
"task_id": "b9792a06-0411-4838-96de-c720f34206a2",
|
||||
"status": "initializing"
|
||||
}
|
||||
```
|
||||
|
||||
## Implementing Webhook Verification
|
||||
|
||||
To ensure webhook authenticity, you must verify the signature. Here's an example implementation in Python using FastAPI:
|
||||
|
||||
```python
|
||||
import uvicorn
|
||||
import hmac
|
||||
import hashlib
|
||||
import json
|
||||
import os
|
||||
|
||||
from fastapi import FastAPI, Request, HTTPException
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
SECRET_KEY = os.environ['SECRET_KEY']
|
||||
|
||||
def verify_signature(payload: dict, timestamp: str, received_signature: str) -> bool:
|
||||
message = f'{timestamp}.{json.dumps(payload, separators=(",", ":"), sort_keys=True)}'
|
||||
expected_signature = hmac.new(SECRET_KEY.encode(), message.encode(), hashlib.sha256).hexdigest()
|
||||
return hmac.compare_digest(expected_signature, received_signature)
|
||||
|
||||
@app.post('/webhook')
|
||||
async def webhook(request: Request):
|
||||
body = await request.json()
|
||||
|
||||
timestamp = request.headers.get('X-Browser-Use-Timestamp')
|
||||
signature = request.headers.get('X-Browser-Use-Signature')
|
||||
if not timestamp or not signature:
|
||||
raise HTTPException(status_code=400, detail='Missing timestamp or signature')
|
||||
|
||||
if not verify_signature(body, timestamp, signature):
|
||||
raise HTTPException(status_code=403, detail='Invalid signature')
|
||||
|
||||
print('Valid webhook call received:', body)
|
||||
return {'status': 'success', 'message': 'Webhook received'}
|
||||
|
||||
if __name__ == '__main__':
|
||||
uvicorn.run(app, host='0.0.0.0', port=8080)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always verify signatures**: Never process webhook payloads without verifying the signature
|
||||
2. **Handle retries**: Browser Use will retry failed webhook deliveries up to 5 times
|
||||
3. **Respond quickly**: Return a 200 response as soon as you've verified the signature
|
||||
4. **Process asynchronously**: Handle the webhook payload processing in a background task
|
||||
5. **Monitor failures**: Set up monitoring for webhook delivery failures
|
||||
|
||||
<Note>
|
||||
Need help? Contact our support team at support@browser-use.com or join our
|
||||
[Discord community](https://link.browser-use.com/discord)
|
||||
</Note>
|
||||
@@ -10,13 +10,15 @@ When working with sensitive information like passwords, you can use the `sensiti
|
||||
|
||||
Make sure to always set [`allowed_domains`](https://docs.browser-use.com/customize/browser-settings#restrict-urls) to restrict the domains the Agent is allowed to visit when working with sensitive data or logins.
|
||||
|
||||
Here's an example of how to use sensitive data:
|
||||
### Basic Usage
|
||||
|
||||
Here's a basic example of how to use sensitive data:
|
||||
|
||||
```python
|
||||
from dotenv import load_dotenv
|
||||
from langchain_openai import ChatOpenAI
|
||||
from browser_use import Agent, Browser, BrowserConfig
|
||||
from browser_use.browser.context import BrowserContextConfig
|
||||
from browser_use import Agent
|
||||
from browser_use.browser.session import BrowserSession
|
||||
|
||||
load_dotenv()
|
||||
|
||||
@@ -33,9 +35,9 @@ sensitive_data = {'x_name': 'magnus', 'x_password': '12345678'}
|
||||
# Use the placeholder names in your task description
|
||||
task = 'go to x.com and login with x_name and x_password then write a post about the meaning of life'
|
||||
|
||||
# Configure allowed_domains that the agent should be restricted to in BrowserContextConfig
|
||||
context_config = BrowserContextConfig(
|
||||
allowed_domains=['example.com'],
|
||||
# Configure browser session with allowed domains
|
||||
browser_session = BrowserSession(
|
||||
allowed_domains=['example.com']
|
||||
)
|
||||
|
||||
# Pass the sensitive data to the agent
|
||||
@@ -43,11 +45,7 @@ agent = Agent(
|
||||
task=task,
|
||||
llm=llm,
|
||||
sensitive_data=sensitive_data,
|
||||
browser=Browser(
|
||||
config=BrowserConfig(
|
||||
new_context_config=context_config
|
||||
)
|
||||
)
|
||||
browser_session=browser_session
|
||||
)
|
||||
|
||||
async def main():
|
||||
@@ -63,6 +61,79 @@ In this example:
|
||||
3. When your password is visible on the current page, we replace it in the LLM input - so that the model never has it in its state.
|
||||
4. The agent will be prevented from going to any site not on `example.com` to protect from prompt injection attacks and jailbreaks
|
||||
|
||||
### Domain-Specific Sensitive Data
|
||||
|
||||
For enhanced security, you can associate sensitive data with specific domains. This ensures credentials are only used on the domains they're intended for:
|
||||
|
||||
```python
|
||||
from dotenv import load_dotenv
|
||||
from langchain_openai import ChatOpenAI
|
||||
from browser_use import Agent
|
||||
from browser_use.browser.session import BrowserSession
|
||||
|
||||
load_dotenv()
|
||||
|
||||
# Initialize the model
|
||||
llm = ChatOpenAI(
|
||||
model='gpt-4o',
|
||||
temperature=0.0,
|
||||
)
|
||||
|
||||
# Domain-specific sensitive data
|
||||
sensitive_data = {
|
||||
'https://*.google.com': {'x_email': '...', 'x_pass': '...'},
|
||||
'chrome-extension://abcd': {'x_api_key': '...'},
|
||||
'http*://example.com': {'x_authcode': '123123'}
|
||||
}
|
||||
|
||||
# Set browser session with allowed domains that match all domain patterns in sensitive_data
|
||||
browser_session = BrowserSession(
|
||||
allowed_domains=[
|
||||
'https://*.google.com',
|
||||
'chrome-extension://abcd',
|
||||
'http://example.com', # Explicitly include http:// if needed
|
||||
'https://example.com' # By default, only https:// is matched
|
||||
]
|
||||
)
|
||||
|
||||
# Pass the sensitive data to the agent
|
||||
agent = Agent(
|
||||
task="Log into Google, then check my account information",
|
||||
llm=llm,
|
||||
sensitive_data=sensitive_data,
|
||||
browser_session=browser_session
|
||||
)
|
||||
|
||||
async def main():
|
||||
await agent.run()
|
||||
|
||||
if __name__ == '__main__':
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
With this approach:
|
||||
1. The Google credentials (`x_email` and `x_pass`) will only be used on Google domains (any subdomain)
|
||||
2. The API key (`x_api_key`) will only be used in the specific Chrome extension
|
||||
3. The auth code (`x_authcode`) will only be used on example.com via http or https
|
||||
|
||||
### Domain Pattern Format
|
||||
|
||||
Domain patterns in sensitive_data follow the same format as `allowed_domains`:
|
||||
|
||||
- `example.com` - Matches only example.com
|
||||
- `*.example.com` - Matches any subdomain of example.com
|
||||
- `http*://example.com` - Matches both http and https protocols for example.com
|
||||
- `chrome-extension://*` - Matches any Chrome extension
|
||||
|
||||
> **Security Warning**: For security reasons, certain patterns are explicitly rejected:
|
||||
> - Wildcards in TLD part (e.g., `example.*`) are not allowed as they could match any TLD
|
||||
> - Embedded wildcards (e.g., `g*e.com`) are rejected to prevent overly broad matches
|
||||
> - Multiple wildcards like `*.*.domain` are not supported to avoid security issues
|
||||
|
||||
The default protocol when no scheme is specified is now `https` for enhanced security.
|
||||
|
||||
The system will validate that all domain patterns used in `sensitive_data` are covered by the patterns in `allowed_domains`.
|
||||
|
||||
### Missing or Empty Values
|
||||
|
||||
When working with sensitive data, keep these details in mind:
|
||||
|
||||
@@ -4,9 +4,68 @@ description: "Learn how to contribute to Browser Use"
|
||||
icon: "github"
|
||||
---
|
||||
|
||||
# Join the Browser Use Community!
|
||||
|
||||
- check out our most active issues or ask in [Discord](https://discord.gg/zXJJHtJf3k) for ideas of what to work on
|
||||
- get inspiration / share what you build in the [`#showcase-your-work`](https://discord.com/channels/1303749220842340412/1305549200678850642) channel and on [`awesome-browser-use-prompts`](https://github.com/browser-use/awesome-prompts)!
|
||||
- no typo/style-only nit PRs, you can submit nit fixes but only if part of larger bugfix or new feature PRs
|
||||
- include a demo screenshot/gif, tests, and ideally an example script demonstrating any changes in your PR
|
||||
- bump your issues/PRs with comments periodically if you want them to be merged faster
|
||||
We're thrilled you're interested in contributing to Browser Use! This guide will help you get started with contributing to our project. Your contributions are what make the open-source community such an amazing place to learn, inspire, and create.
|
||||
|
||||
## Quick Setup
|
||||
|
||||
Get started with Browser Use development in minutes:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/browser-use/browser-use
|
||||
cd browser-use
|
||||
uv sync --all-extras --dev
|
||||
# or pip install -U git+https://github.com/browser-use/browser-use.git@main
|
||||
|
||||
echo "BROWSER_USE_LOGGING_LEVEL=debug" >> .env
|
||||
```
|
||||
|
||||
For more detailed setup instructions, see our [Local Setup Guide](/development/local-setup).
|
||||
|
||||
## How to Contribute
|
||||
|
||||
### Find Something to Work On
|
||||
|
||||
- Browse our [GitHub Issues](https://github.com/browser-use/browser-use/issues) for beginner-friendly issues labeled `good-first-issue`
|
||||
- Check out our most active issues or ask in [Discord](https://discord.gg/zXJJHtJf3k) for ideas of what to work on
|
||||
- Get inspiration and share what you build in the [`#showcase-your-work`](https://discord.com/channels/1303749220842340412/1305549200678850642) channel
|
||||
- Explore or contribute to [`awesome-browser-use-prompts`](https://github.com/browser-use/awesome-prompts)!
|
||||
|
||||
### Making a Great Pull Request
|
||||
|
||||
When submitting a pull request, please:
|
||||
|
||||
- Include a clear description of what the PR does and why it's needed
|
||||
- Add tests that cover your changes
|
||||
- Include a demo screenshot/gif or an example script demonstrating your changes
|
||||
- Make sure the PR passes all CI checks and tests
|
||||
- Keep your PR focused on a single issue or feature to make it easier to review
|
||||
|
||||
Note: We appreciate quality over quantity. Instead of submitting small typo/style-only PRs, consider including those fixes as part of larger bugfix or feature PRs.
|
||||
|
||||
### Contribution Process
|
||||
|
||||
1. Fork the repository
|
||||
2. Create a new branch for your feature or bugfix
|
||||
3. Make your changes
|
||||
4. Run tests to ensure everything works
|
||||
5. Submit a pull request
|
||||
6. Respond to any feedback from maintainers
|
||||
7. Celebrate your contribution!
|
||||
|
||||
Feel free to bump your issues/PRs with comments periodically if you need faster feedback.
|
||||
|
||||
## Code of Conduct
|
||||
|
||||
We're committed to providing a welcoming and inclusive environment for all contributors. Please be respectful and constructive in all interactions.
|
||||
|
||||
## Getting Help
|
||||
|
||||
If you need help at any point:
|
||||
|
||||
- Join our [Discord community](https://link.browser-use.com/discord)
|
||||
- Ask questions in the appropriate GitHub issue
|
||||
- Check our [documentation](/introduction)
|
||||
|
||||
We're here to help you succeed in contributing to Browser Use!
|
||||
|
||||
@@ -4,11 +4,45 @@ description: "Set up Browser Use development environment locally"
|
||||
icon: "laptop-code"
|
||||
---
|
||||
|
||||
# Welcome to Browser Use Development!
|
||||
|
||||
We're excited to have you join our community of contributors. This guide will help you set up your local development environment quickly and easily.
|
||||
|
||||
## Quick Setup
|
||||
|
||||
If you're familiar with Python development, here's the quick way to get started:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/browser-use/browser-use
|
||||
cd browser-use
|
||||
uv sync --all-extras --dev
|
||||
# or pip install -U git+https://github.com/browser-use/browser-use.git@main
|
||||
|
||||
echo "BROWSER_USE_LOGGING_LEVEL=debug" >> .env
|
||||
```
|
||||
|
||||
## Helper Scripts
|
||||
|
||||
We provide several convenient shell scripts in the `bin/` directory to help with common development tasks:
|
||||
|
||||
```bash
|
||||
# Complete setup script - installs uv, creates a venv, and installs dependencies
|
||||
./bin/setup.sh
|
||||
|
||||
# Run all pre-commit hooks (formatting, linting, type checking)
|
||||
./bin/lint.sh
|
||||
|
||||
# Run the core test suite that's executed in CI
|
||||
./bin/test.sh
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Browser Use requires Python 3.11 or higher. We recommend using [uv](https://docs.astral.sh/uv/) for Python environment management.
|
||||
|
||||
## Clone the Repository
|
||||
## Detailed Setup Instructions
|
||||
|
||||
### Clone the Repository
|
||||
|
||||
First, clone the Browser Use repository:
|
||||
|
||||
@@ -17,7 +51,7 @@ git clone https://github.com/browser-use/browser-use
|
||||
cd browser-use
|
||||
```
|
||||
|
||||
## Environment Setup
|
||||
### Environment Setup
|
||||
|
||||
1. Create and activate a virtual environment:
|
||||
|
||||
@@ -56,6 +90,7 @@ GOOGLE_API_KEY=
|
||||
DEEPSEEK_API_KEY=
|
||||
GROK_API_KEY=
|
||||
NOVITA_API_KEY=
|
||||
BROWSER_USE_LOGGING_LEVEL=debug # Helpful for development
|
||||
```
|
||||
|
||||
<Note>
|
||||
@@ -78,6 +113,8 @@ After setup, you can:
|
||||
```bash
|
||||
# Run the linter on the whole project (must pass for PR to be allowed to merge)
|
||||
uv run pre-commit run --all-files
|
||||
# or use our convenience script
|
||||
./bin/lint.sh
|
||||
|
||||
# Install the linter & formatter pre-commit hooks to run automatically
|
||||
pre-commit install --install-hooks
|
||||
@@ -89,7 +126,10 @@ uv run type
|
||||
### Tests
|
||||
|
||||
```bash
|
||||
# Run tests
|
||||
# Run all tests that run in CI
|
||||
./bin/test.sh
|
||||
|
||||
# Run specific tests
|
||||
uv run pytest # run everything
|
||||
uv run pytest tests/test_controller.py # run a specific test file
|
||||
uv run pytest tests/test_sensitive_data.py tests/test_tab_management.py # run two test files
|
||||
@@ -102,7 +142,7 @@ uv run pytest tests/test_tab_management.py::TestTabManagement::test_user_changes
|
||||
uv build
|
||||
uv pip install dist/*.whl
|
||||
|
||||
# bush build to PyPI (automatically run by Github Actions CI)
|
||||
# push build to PyPI (automatically run by Github Actions CI)
|
||||
uv publish
|
||||
```
|
||||
|
||||
|
||||
@@ -77,7 +77,7 @@
|
||||
},
|
||||
{
|
||||
"group": "Cloud API",
|
||||
"pages": ["cloud/quickstart", "cloud/implementation"]
|
||||
"pages": ["cloud/quickstart", "cloud/implementation", "cloud/webhooks"]
|
||||
}
|
||||
],
|
||||
"footerSocials": {
|
||||
|
||||
@@ -71,7 +71,7 @@ async def main():
|
||||
"""Main function to run the example"""
|
||||
browser_session = BrowserSession()
|
||||
await browser_session.start()
|
||||
llm = ChatOpenAI(model_name='gpt-4o')
|
||||
llm = ChatOpenAI(model='gpt-4o')
|
||||
|
||||
# Create the agent
|
||||
agent = Agent( # disco mode will not be triggered on apple.com because the LLM won't be able to see that action available, it should work on Google.com though.
|
||||
|
||||
@@ -57,7 +57,7 @@ async def search_web(query: str):
|
||||
# to string
|
||||
serp_data_str = json.dumps(serp_data)
|
||||
|
||||
return ActionResult(extracted_content=serp_data_str, include_in_memory=True)
|
||||
return ActionResult(extracted_content=serp_data_str, include_in_memory=False)
|
||||
|
||||
|
||||
names = [
|
||||
@@ -85,7 +85,7 @@ names = [
|
||||
|
||||
|
||||
async def main():
|
||||
task = 'use search_web with "find email address of the following ETH professor:" for each of the following persons in a list of actions. Finally return the list with name and email if provided'
|
||||
task = 'use search_web with "find email address of the following ETH professor:" for each of the following persons in a list of actions. Finally return the list with name and email if provided - do always 5 at once'
|
||||
task += '\n' + '\n'.join(names)
|
||||
model = ChatOpenAI(model='gpt-4o')
|
||||
browser_profile = BrowserProfile()
|
||||
|
||||
@@ -17,11 +17,34 @@ llm = ChatOpenAI(
|
||||
model='gpt-4o',
|
||||
temperature=0.0,
|
||||
)
|
||||
# the model will see x_name and x_password, but never the actual values.
|
||||
sensitive_data = {'x_name': 'my_x_name', 'x_password': 'my_x_password'}
|
||||
task = 'go to x.com and login with x_name and x_password then find interesting posts and like them'
|
||||
# Simple case: the model will see x_name and x_password, but never the actual values.
|
||||
# sensitive_data = {'x_name': 'my_x_name', 'x_password': 'my_x_password'}
|
||||
|
||||
agent = Agent(task=task, llm=llm, sensitive_data=sensitive_data)
|
||||
# Advanced case: domain-specific credentials with reusable data
|
||||
# Define a single credential set that can be reused
|
||||
company_credentials = {'company_username': 'user@example.com', 'company_password': 'securePassword123'}
|
||||
|
||||
# Map the same credentials to multiple domains for secure access control
|
||||
sensitive_data = {
|
||||
'https://example.com': company_credentials,
|
||||
'https://admin.example.com': company_credentials,
|
||||
'https://*.example-staging.com': company_credentials,
|
||||
'http*://test.example.com': company_credentials,
|
||||
# You can also add domain-specific credentials
|
||||
'https://*.google.com': {'g_email': 'user@gmail.com', 'g_pass': 'google_password'},
|
||||
}
|
||||
# Update task to use one of the credentials above
|
||||
task = 'Go to example.com and login with company_username and company_password'
|
||||
|
||||
# Always set allowed_domains when using sensitive_data for security
|
||||
from browser_use.browser.session import BrowserSession
|
||||
|
||||
browser_session = BrowserSession(
|
||||
allowed_domains=list(sensitive_data.keys())
|
||||
+ ['https://*.trusted-partner.com'] # Domain patterns from sensitive_data + additional allowed domains
|
||||
)
|
||||
|
||||
agent = Agent(task=task, llm=llm, sensitive_data=sensitive_data, browser_session=browser_session)
|
||||
|
||||
|
||||
async def main():
|
||||
|
||||
58
examples/integrations/browserbase_stagehand.py
Normal file
58
examples/integrations/browserbase_stagehand.py
Normal file
@@ -0,0 +1,58 @@
|
||||
import asyncio
|
||||
import os
|
||||
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv()
|
||||
|
||||
from stagehand import Stagehand, StagehandConfig
|
||||
|
||||
from browser_use.agent.service import Agent
|
||||
|
||||
|
||||
async def main():
|
||||
# Configure Stagehand
|
||||
# https://pypi.org/project/stagehand-py/
|
||||
# https://github.com/browserbase/stagehand-python-examples/blob/main/agent_example.py
|
||||
config = StagehandConfig(
|
||||
env='BROWSERBASE',
|
||||
api_key=os.getenv('BROWSERBASE_API_KEY'),
|
||||
project_id=os.getenv('BROWSERBASE_PROJECT_ID'),
|
||||
headless=False,
|
||||
dom_settle_timeout_ms=3000,
|
||||
model_name='gpt-4o',
|
||||
self_heal=True,
|
||||
wait_for_captcha_solves=True,
|
||||
system_prompt='You are a browser automation assistant that helps users navigate websites effectively.',
|
||||
model_client_options={'model_api_key': os.getenv('OPENAI_API_KEY')},
|
||||
verbose=2,
|
||||
)
|
||||
|
||||
# Create a Stagehand client using the configuration object.
|
||||
stagehand = Stagehand(
|
||||
config=config,
|
||||
model_api_key=os.getenv('OPENAI_API_KEY'),
|
||||
# server_url=os.getenv('STAGEHAND_SERVER_URL'),
|
||||
)
|
||||
|
||||
# Initialize - this creates a new session automatically.
|
||||
await stagehand.init()
|
||||
print(f'\nCreated new session: {stagehand.session_id}')
|
||||
print(f'🌐 View your live browser: https://www.browserbase.com/sessions/{stagehand.session_id}')
|
||||
|
||||
await stagehand.page.goto('https://google.com/')
|
||||
|
||||
await stagehand.page.act('search for openai')
|
||||
|
||||
# Combine with Browser Use
|
||||
agent = Agent(task='click the first result', page=stagehand.page)
|
||||
await agent.run()
|
||||
|
||||
# go back and forth
|
||||
await stagehand.page.act('open the 3 first links on the page in new tabs')
|
||||
|
||||
await Agent(task='click the first result', page=stagehand.page).run()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
asyncio.run(main())
|
||||
@@ -27,10 +27,9 @@ if not azure_openai_api_key or not azure_openai_endpoint:
|
||||
|
||||
# Initialize the Azure OpenAI client
|
||||
llm = AzureChatOpenAI(
|
||||
model_name='gpt-4o',
|
||||
openai_api_key=azure_openai_api_key,
|
||||
model='gpt-4o',
|
||||
api_key=azure_openai_api_key,
|
||||
azure_endpoint=azure_openai_endpoint, # Corrected to use azure_endpoint instead of openai_api_base
|
||||
deployment_name='gpt-4o', # Use deployment_name for Azure models
|
||||
api_version='2024-08-01-preview', # Explicitly set the API version here
|
||||
)
|
||||
|
||||
|
||||
@@ -17,8 +17,7 @@ llm = ChatOpenAI(
|
||||
model='gpt-4o',
|
||||
temperature=0.0,
|
||||
)
|
||||
task = 'Go to kayak.com and find the cheapest flight from Zurich to San Francisco on 2025-05-01'
|
||||
|
||||
task = 'Go to kayak.com and find the cheapest one-way flight from Zurich to San Francisco in 3 weeks.'
|
||||
agent = Agent(task=task, llm=llm)
|
||||
|
||||
|
||||
|
||||
@@ -4,11 +4,10 @@ import sys
|
||||
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
import asyncio
|
||||
|
||||
import pyperclip
|
||||
from dotenv import load_dotenv
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
from browser_use import ActionResult, Agent, Controller
|
||||
from browser_use import Agent, Controller
|
||||
from browser_use.browser import BrowserProfile, BrowserSession
|
||||
|
||||
# Load environment variables
|
||||
@@ -16,106 +15,17 @@ load_dotenv()
|
||||
if not os.getenv('OPENAI_API_KEY'):
|
||||
raise ValueError('OPENAI_API_KEY is not set. Please add it to your environment variables.')
|
||||
|
||||
|
||||
# Use the default controller with built-in Google Sheets actions
|
||||
# The controller already includes all the necessary Google Sheets actions:
|
||||
# - select_cell_or_range: Select specific cells or ranges (Ctrl+G navigation)
|
||||
# - get_range_contents: Get contents of cells using clipboard
|
||||
# - get_sheet_contents: Get entire sheet contents
|
||||
# - clear_selected_range: Clear selected cells
|
||||
# - input_selected_cell_text: Input text into selected cells
|
||||
# - update_range_contents: Batch update ranges with TSV data
|
||||
controller = Controller()
|
||||
|
||||
|
||||
def is_google_sheet(page) -> bool:
|
||||
return page.url.startswith('https://docs.google.com/spreadsheets/')
|
||||
|
||||
|
||||
@controller.registry.action('Google Sheets: Open a specific Google Sheet')
|
||||
async def open_google_sheet(browser_session: BrowserSession, google_sheet_url: str):
|
||||
page = await browser_session.get_current_page()
|
||||
if page.url != google_sheet_url:
|
||||
await page.goto(google_sheet_url)
|
||||
await page.wait_for_load_state()
|
||||
if not is_google_sheet(page):
|
||||
return ActionResult(error='Failed to open Google Sheet, are you sure you have permissions to access this sheet?')
|
||||
return ActionResult(extracted_content=f'Opened Google Sheet {google_sheet_url}', include_in_memory=False)
|
||||
|
||||
|
||||
@controller.registry.action('Google Sheets: Get the contents of the entire sheet', page_filter=is_google_sheet)
|
||||
async def get_sheet_contents(browser_session: BrowserSession):
|
||||
page = await browser_session.get_current_page()
|
||||
|
||||
# select all cells
|
||||
await page.keyboard.press('Enter')
|
||||
await page.keyboard.press('Escape')
|
||||
await page.keyboard.press('ControlOrMeta+A')
|
||||
await page.keyboard.press('ControlOrMeta+C')
|
||||
|
||||
extracted_tsv = pyperclip.paste()
|
||||
return ActionResult(extracted_content=extracted_tsv, include_in_memory=True)
|
||||
|
||||
|
||||
@controller.registry.action('Google Sheets: Select a specific cell or range of cells', page_filter=is_google_sheet)
|
||||
async def select_cell_or_range(browser_session: BrowserSession, cell_or_range: str):
|
||||
page = await browser_session.get_current_page()
|
||||
|
||||
await page.keyboard.press('Enter') # make sure we dont delete current cell contents if we were last editing
|
||||
await page.keyboard.press('Escape') # to clear current focus (otherwise select range popup is additive)
|
||||
await asyncio.sleep(0.1)
|
||||
await page.keyboard.press('Home') # move cursor to the top left of the sheet first
|
||||
await page.keyboard.press('ArrowUp')
|
||||
await asyncio.sleep(0.1)
|
||||
await page.keyboard.press('Control+G') # open the goto range popup
|
||||
await asyncio.sleep(0.2)
|
||||
await page.keyboard.type(cell_or_range, delay=0.05)
|
||||
await asyncio.sleep(0.2)
|
||||
await page.keyboard.press('Enter')
|
||||
await asyncio.sleep(0.2)
|
||||
await page.keyboard.press('Escape') # to make sure the popup still closes in the case where the jump failed
|
||||
return ActionResult(extracted_content=f'Selected cell {cell_or_range}', include_in_memory=False)
|
||||
|
||||
|
||||
@controller.registry.action('Google Sheets: Get the contents of a specific cell or range of cells', page_filter=is_google_sheet)
|
||||
async def get_range_contents(browser_session: BrowserSession, cell_or_range: str):
|
||||
page = await browser_session.get_current_page()
|
||||
|
||||
await select_cell_or_range(browser_session, cell_or_range)
|
||||
|
||||
await page.keyboard.press('ControlOrMeta+C')
|
||||
await asyncio.sleep(0.1)
|
||||
extracted_tsv = pyperclip.paste()
|
||||
return ActionResult(extracted_content=extracted_tsv, include_in_memory=True)
|
||||
|
||||
|
||||
@controller.registry.action('Google Sheets: Clear the currently selected cells', page_filter=is_google_sheet)
|
||||
async def clear_selected_range(browser_session: BrowserSession):
|
||||
page = await browser_session.get_current_page()
|
||||
|
||||
await page.keyboard.press('Backspace')
|
||||
return ActionResult(extracted_content='Cleared selected range', include_in_memory=False)
|
||||
|
||||
|
||||
@controller.registry.action('Google Sheets: Input text into the currently selected cell', page_filter=is_google_sheet)
|
||||
async def input_selected_cell_text(browser_session: BrowserSession, text: str):
|
||||
page = await browser_session.get_current_page()
|
||||
|
||||
await page.keyboard.type(text, delay=0.1)
|
||||
await page.keyboard.press('Enter') # make sure to commit the input so it doesn't get overwritten by the next action
|
||||
await page.keyboard.press('ArrowUp')
|
||||
return ActionResult(extracted_content=f'Inputted text {text}', include_in_memory=False)
|
||||
|
||||
|
||||
@controller.registry.action('Google Sheets: Batch update a range of cells', page_filter=is_google_sheet)
|
||||
async def update_range_contents(browser_session: BrowserSession, range: str, new_contents_tsv: str):
|
||||
page = await browser_session.get_current_page()
|
||||
|
||||
await select_cell_or_range(browser_session, range)
|
||||
|
||||
# simulate paste event from clipboard with TSV content
|
||||
await page.evaluate(f"""
|
||||
const clipboardData = new DataTransfer();
|
||||
clipboardData.setData('text/plain', `{new_contents_tsv}`);
|
||||
document.activeElement.dispatchEvent(new ClipboardEvent('paste', {{clipboardData}}));
|
||||
""")
|
||||
|
||||
return ActionResult(extracted_content=f'Updated cell {range} with {new_contents_tsv}', include_in_memory=False)
|
||||
|
||||
|
||||
# many more snippets for keyboard-shortcut based Google Sheets automation can be found here, see:
|
||||
# For more Google Sheets keyboard shortcuts and automation ideas, see:
|
||||
# - https://github.com/philc/sheetkeys/blob/master/content_scripts/sheet_actions.js
|
||||
# - https://github.com/philc/sheetkeys/blob/master/content_scripts/commands.js
|
||||
# - https://support.google.com/docs/answer/181110?hl=en&co=GENIE.Platform%3DDesktop#zippy=%2Cmac-shortcuts
|
||||
@@ -129,7 +39,8 @@ async def main():
|
||||
browser_profile=BrowserProfile(
|
||||
executable_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
|
||||
user_data_dir='~/.config/browseruse/profiles/default',
|
||||
)
|
||||
),
|
||||
keep_alive=True,
|
||||
)
|
||||
|
||||
async with browser_session:
|
||||
@@ -137,7 +48,7 @@ async def main():
|
||||
|
||||
eraser = Agent(
|
||||
task="""
|
||||
Clear all the existing values in columns A through F in this Google Sheet:
|
||||
Clear all the existing values in columns A through M in this Google Sheet:
|
||||
https://docs.google.com/spreadsheets/d/1INaIcfpYXlMRWO__de61SHFCaqt1lfHlcvtXZPItlpI/edit
|
||||
""",
|
||||
llm=model,
|
||||
@@ -148,15 +59,16 @@ async def main():
|
||||
|
||||
researcher = Agent(
|
||||
task="""
|
||||
Google to find the full name, nationality, and date of birth of the CEO of the top 10 Fortune 100 companies.
|
||||
For each company, append a row to this existing Google Sheet: https://docs.google.com/spreadsheets/d/1INaIcfpYXlMRWO__de61SHFCaqt1lfHlcvtXZPItlpI/edit
|
||||
Open this Google Sheet and read it to understand the structure: https://docs.google.com/spreadsheets/d/1INaIcfpYXlMRWO__de61SHFCaqt1lfHlcvtXZPItlpI/edit
|
||||
Make sure column headers are present and all existing values in the sheet are formatted correctly.
|
||||
Columns:
|
||||
A: Company Name
|
||||
B: CEO Full Name
|
||||
C: CEO Country of Birth
|
||||
D: CEO Date of Birth (YYYY-MM-DD)
|
||||
E: Source URL where the information was found
|
||||
D: Source URL where the information was found
|
||||
Then Google to find the full name and nationality of the CEO of the top 10 Fortune 100 companies.
|
||||
For each company, append a row to this existing Google Sheet.
|
||||
At the end, double check the formatting and structure and fix any issues by updating/overwriting cells.
|
||||
""",
|
||||
llm=model,
|
||||
browser_session=browser_session,
|
||||
@@ -175,17 +87,17 @@ async def main():
|
||||
)
|
||||
await improvised_continuer.run()
|
||||
|
||||
final_fact_checker = Agent(
|
||||
task="""
|
||||
Read the Google Sheet https://docs.google.com/spreadsheets/d/1INaIcfpYXlMRWO__de61SHFCaqt1lfHlcvtXZPItlpI/edit
|
||||
Fact-check every entry, add a new column F with your findings for each row.
|
||||
Make sure to check the source URL for each row, and make sure the information is correct.
|
||||
""",
|
||||
llm=model,
|
||||
browser_session=browser_session,
|
||||
controller=controller,
|
||||
)
|
||||
await final_fact_checker.run()
|
||||
# final_fact_checker = Agent(
|
||||
# task="""
|
||||
# Read the Google Sheet https://docs.google.com/spreadsheets/d/1INaIcfpYXlMRWO__de61SHFCaqt1lfHlcvtXZPItlpI/edit
|
||||
# Fact-check every entry, add a new column F with your findings for each row.
|
||||
# Make sure to check the source URL for each row, and make sure the information is correct.
|
||||
# """,
|
||||
# llm=model,
|
||||
# browser_session=browser_session,
|
||||
# controller=controller,
|
||||
# )
|
||||
# await final_fact_checker.run()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
@@ -63,6 +63,8 @@ examples = [
|
||||
# botocore: only needed for Bedrock Claude boto3 examples/models/bedrock_claude.py
|
||||
"botocore>=1.37.23",
|
||||
"imgcat>=0.6.0",
|
||||
"stagehand-py>=0.3.6",
|
||||
"browserbase>=0.4.0",
|
||||
]
|
||||
all = [
|
||||
"browser-use[memory,cli,examples]",
|
||||
|
||||
625
tests/ci/test_action_registry.py
Normal file
625
tests/ci/test_action_registry.py
Normal file
@@ -0,0 +1,625 @@
|
||||
"""
|
||||
Comprehensive tests for the action registry system to ensure backward compatibility
|
||||
and proper parameter handling for all existing patterns.
|
||||
|
||||
Tests cover:
|
||||
1. Existing parameter patterns (individual params, pydantic models)
|
||||
2. Special parameter injection (browser_session, page_extraction_llm, etc.)
|
||||
3. Action-to-action calling scenarios
|
||||
4. Mixed parameter patterns
|
||||
5. Registry execution edge cases
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
|
||||
import pytest
|
||||
from playwright.async_api import Page
|
||||
from pydantic import Field
|
||||
from pytest_httpserver import HTTPServer
|
||||
|
||||
from browser_use.agent.views import ActionResult
|
||||
from browser_use.browser import BrowserSession
|
||||
from browser_use.controller.registry.service import Registry
|
||||
from browser_use.controller.registry.views import ActionModel as BaseActionModel
|
||||
from browser_use.controller.views import (
|
||||
ClickElementAction,
|
||||
InputTextAction,
|
||||
NoParamsAction,
|
||||
SearchGoogleAction,
|
||||
)
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(level=logging.DEBUG)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class MockLLM:
|
||||
"""Mock LLM for testing"""
|
||||
|
||||
async def ainvoke(self, prompt):
|
||||
class MockResponse:
|
||||
content = 'Mocked LLM response'
|
||||
|
||||
return MockResponse()
|
||||
|
||||
|
||||
class TestContext:
|
||||
"""Simple context for testing"""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
# Test parameter models
|
||||
class SimpleParams(BaseActionModel):
|
||||
"""Simple parameter model"""
|
||||
|
||||
value: str = Field(description='Test value')
|
||||
|
||||
|
||||
class ComplexParams(BaseActionModel):
|
||||
"""Complex parameter model with multiple fields"""
|
||||
|
||||
text: str = Field(description='Text input')
|
||||
number: int = Field(description='Number input', default=42)
|
||||
optional_flag: bool = Field(description='Optional boolean', default=False)
|
||||
|
||||
|
||||
# Test fixtures
|
||||
@pytest.fixture(scope='module')
|
||||
def event_loop():
|
||||
"""Create and provide an event loop for async tests."""
|
||||
loop = asyncio.get_event_loop_policy().new_event_loop()
|
||||
yield loop
|
||||
loop.close()
|
||||
|
||||
|
||||
@pytest.fixture(scope='module')
|
||||
def http_server():
|
||||
"""Create and provide a test HTTP server that serves static content."""
|
||||
server = HTTPServer()
|
||||
server.start()
|
||||
|
||||
# Add a simple test page
|
||||
server.expect_request('/test').respond_with_data(
|
||||
'<html><head><title>Test Page</title></head><body><h1>Test Page</h1><p>Hello from test page</p></body></html>',
|
||||
content_type='text/html',
|
||||
)
|
||||
|
||||
yield server
|
||||
server.stop()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def base_url(http_server):
|
||||
"""Return the base URL for the test HTTP server."""
|
||||
return f'http://{http_server.host}:{http_server.port}'
|
||||
|
||||
|
||||
@pytest.fixture(scope='module')
|
||||
async def browser_session(event_loop):
|
||||
"""Create and provide a real BrowserSession instance."""
|
||||
browser_session = BrowserSession(
|
||||
headless=True,
|
||||
user_data_dir=None,
|
||||
)
|
||||
await browser_session.start()
|
||||
yield browser_session
|
||||
await browser_session.stop()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_llm():
|
||||
"""Create a mock LLM"""
|
||||
return MockLLM()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def registry():
|
||||
"""Create a fresh registry for each test"""
|
||||
return Registry[TestContext]()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
async def test_browser(base_url):
|
||||
"""Create a real BrowserSession for testing"""
|
||||
browser_session = BrowserSession(
|
||||
headless=True,
|
||||
user_data_dir=None,
|
||||
)
|
||||
await browser_session.start()
|
||||
# Navigate to test page
|
||||
await browser_session.create_new_tab(f'{base_url}/test')
|
||||
yield browser_session
|
||||
await browser_session.stop()
|
||||
|
||||
|
||||
class TestActionRegistryParameterPatterns:
|
||||
"""Test different parameter patterns that should all continue to work"""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_individual_parameters_no_browser(self, registry):
|
||||
"""Test action with individual parameters, no special injection"""
|
||||
|
||||
@registry.action('Simple action with individual params')
|
||||
async def simple_action(text: str, number: int = 10):
|
||||
return ActionResult(extracted_content=f'Text: {text}, Number: {number}')
|
||||
|
||||
# Test execution
|
||||
result = await registry.execute_action('simple_action', {'text': 'hello', 'number': 42})
|
||||
|
||||
assert isinstance(result, ActionResult)
|
||||
assert 'Text: hello, Number: 42' in result.extracted_content
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_individual_parameters_with_browser(self, registry, browser_session, base_url):
|
||||
"""Test action with individual parameters plus browser_session injection"""
|
||||
|
||||
@registry.action('Action with individual params and browser')
|
||||
async def action_with_browser(text: str, browser_session: BrowserSession):
|
||||
page = await browser_session.get_current_page()
|
||||
return ActionResult(extracted_content=f'Text: {text}, URL: {page.url}')
|
||||
|
||||
# Navigate to test page first
|
||||
await browser_session.create_new_tab(f'{base_url}/test')
|
||||
|
||||
# Test execution
|
||||
result = await registry.execute_action('action_with_browser', {'text': 'hello'}, browser_session=browser_session)
|
||||
|
||||
assert isinstance(result, ActionResult)
|
||||
assert 'Text: hello, URL:' in result.extracted_content
|
||||
assert base_url in result.extracted_content
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_page_parameter_injection(self, registry, browser_session, base_url):
|
||||
"""Test action with direct Page parameter injection"""
|
||||
|
||||
@registry.action('Action with page parameter')
|
||||
async def action_with_page(text: str, page: Page):
|
||||
title = await page.title()
|
||||
return ActionResult(extracted_content=f'Text: {text}, Page Title: {title}')
|
||||
|
||||
# Navigate to test page first
|
||||
await browser_session.create_new_tab(f'{base_url}/test')
|
||||
|
||||
# Test execution
|
||||
result = await registry.execute_action('action_with_page', {'text': 'hello'}, browser_session=browser_session)
|
||||
|
||||
assert isinstance(result, ActionResult)
|
||||
assert 'Text: hello, Page Title: Test Page' in result.extracted_content
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_pydantic_model_with_page_parameter(self, registry, browser_session, base_url):
|
||||
"""Test pydantic model action with page parameter injection"""
|
||||
|
||||
@registry.action('Pydantic action with page', param_model=ComplexParams)
|
||||
async def pydantic_action_with_page(params: ComplexParams, page: Page):
|
||||
title = await page.title()
|
||||
return ActionResult(extracted_content=f'Text: {params.text}, Number: {params.number}, Page Title: {title}')
|
||||
|
||||
# Navigate to test page first
|
||||
await browser_session.create_new_tab(f'{base_url}/test')
|
||||
|
||||
# Test execution
|
||||
result = await registry.execute_action(
|
||||
'pydantic_action_with_page', {'text': 'test', 'number': 100}, browser_session=browser_session
|
||||
)
|
||||
|
||||
assert isinstance(result, ActionResult)
|
||||
assert 'Text: test, Number: 100, Page Title: Test Page' in result.extracted_content
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_pydantic_model_parameters(self, registry, browser_session, base_url):
|
||||
"""Test action that takes a pydantic model as first parameter"""
|
||||
|
||||
@registry.action('Action with pydantic model', param_model=ComplexParams)
|
||||
async def pydantic_action(params: ComplexParams, browser_session: BrowserSession):
|
||||
page = await browser_session.get_current_page()
|
||||
return ActionResult(
|
||||
extracted_content=f'Text: {params.text}, Number: {params.number}, Flag: {params.optional_flag}, URL: {page.url}'
|
||||
)
|
||||
|
||||
# Navigate to test page first
|
||||
await browser_session.create_new_tab(f'{base_url}/test')
|
||||
|
||||
# Test execution
|
||||
result = await registry.execute_action(
|
||||
'pydantic_action', {'text': 'test', 'number': 100, 'optional_flag': True}, browser_session=browser_session
|
||||
)
|
||||
|
||||
assert isinstance(result, ActionResult)
|
||||
assert 'Text: test, Number: 100, Flag: True' in result.extracted_content
|
||||
assert base_url in result.extracted_content
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_mixed_special_parameters(self, registry, browser_session, base_url, mock_llm):
|
||||
"""Test action with multiple special injected parameters"""
|
||||
|
||||
@registry.action('Action with multiple special params')
|
||||
async def multi_special_action(
|
||||
text: str,
|
||||
browser_session: BrowserSession,
|
||||
page_extraction_llm: MockLLM,
|
||||
available_file_paths: list[str] | None = None,
|
||||
):
|
||||
page = await browser_session.get_current_page()
|
||||
llm_response = await page_extraction_llm.ainvoke('test')
|
||||
files = available_file_paths or []
|
||||
|
||||
return ActionResult(
|
||||
extracted_content=f'Text: {text}, URL: {page.url}, LLM: {llm_response.content}, Files: {len(files)}'
|
||||
)
|
||||
|
||||
# Navigate to test page first
|
||||
await browser_session.create_new_tab(f'{base_url}/test')
|
||||
|
||||
# Test execution
|
||||
result = await registry.execute_action(
|
||||
'multi_special_action',
|
||||
{'text': 'hello'},
|
||||
browser_session=browser_session,
|
||||
page_extraction_llm=mock_llm,
|
||||
available_file_paths=['file1.txt', 'file2.txt'],
|
||||
)
|
||||
|
||||
assert isinstance(result, ActionResult)
|
||||
assert 'Text: hello' in result.extracted_content
|
||||
assert base_url in result.extracted_content
|
||||
assert 'LLM: Mocked LLM response' in result.extracted_content
|
||||
assert 'Files: 2' in result.extracted_content
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_params_action(self, registry, test_browser):
|
||||
"""Test action with NoParamsAction model"""
|
||||
|
||||
@registry.action('No params action', param_model=NoParamsAction)
|
||||
async def no_params_action(params: NoParamsAction, browser_session: BrowserSession):
|
||||
page = await browser_session.get_current_page()
|
||||
return ActionResult(extracted_content=f'No params action executed on {page.url}')
|
||||
|
||||
# Test execution with any parameters (should be ignored)
|
||||
result = await registry.execute_action(
|
||||
'no_params_action', {'random': 'data', 'should': 'be', 'ignored': True}, browser_session=test_browser
|
||||
)
|
||||
|
||||
assert isinstance(result, ActionResult)
|
||||
assert 'No params action executed on' in result.extracted_content
|
||||
assert '/test' in result.extracted_content
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_legacy_browser_parameter_names(self, registry, test_browser):
|
||||
"""Test that legacy browser parameter names still work"""
|
||||
|
||||
@registry.action('Action with legacy browser param')
|
||||
async def legacy_browser_action(text: str, browser: BrowserSession):
|
||||
page = await browser.get_current_page()
|
||||
return ActionResult(extracted_content=f'Legacy browser: {text}, URL: {page.url}')
|
||||
|
||||
@registry.action('Action with legacy browser_context param')
|
||||
async def legacy_context_action(text: str, browser_context: BrowserSession):
|
||||
page = await browser_context.get_current_page()
|
||||
return ActionResult(extracted_content=f'Legacy context: {text}, URL: {page.url}')
|
||||
|
||||
# Test legacy browser parameter
|
||||
result1 = await registry.execute_action('legacy_browser_action', {'text': 'test1'}, browser_session=test_browser)
|
||||
assert 'Legacy browser: test1, URL:' in result1.extracted_content
|
||||
assert '/test' in result1.extracted_content
|
||||
|
||||
# Test legacy browser_context parameter
|
||||
result2 = await registry.execute_action('legacy_context_action', {'text': 'test2'}, browser_session=test_browser)
|
||||
assert 'Legacy context: test2, URL:' in result2.extracted_content
|
||||
assert '/test' in result2.extracted_content
|
||||
|
||||
|
||||
class TestActionToActionCalling:
|
||||
"""Test scenarios where actions call other actions"""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_action_calling_action_with_kwargs(self, registry, test_browser):
|
||||
"""Test action calling another action using kwargs (current problematic pattern)"""
|
||||
|
||||
# Helper function that actions can call
|
||||
async def helper_function(browser_session: BrowserSession, data: str):
|
||||
page = await browser_session.get_current_page()
|
||||
return f'Helper processed: {data} on {page.url}'
|
||||
|
||||
@registry.action('First action')
|
||||
async def first_action(text: str, browser_session: BrowserSession):
|
||||
# This should work without parameter conflicts
|
||||
result = await helper_function(browser_session=browser_session, data=text)
|
||||
return ActionResult(extracted_content=f'First: {result}')
|
||||
|
||||
@registry.action('Calling action')
|
||||
async def calling_action(message: str, browser_session: BrowserSession):
|
||||
# Call the first action through the registry (simulates action-to-action calling)
|
||||
intermediate_result = await registry.execute_action(
|
||||
'first_action', {'text': message}, browser_session=browser_session
|
||||
)
|
||||
return ActionResult(extracted_content=f'Called result: {intermediate_result.extracted_content}')
|
||||
|
||||
# Test the calling chain
|
||||
result = await registry.execute_action('calling_action', {'message': 'test'}, browser_session=test_browser)
|
||||
|
||||
assert isinstance(result, ActionResult)
|
||||
assert 'Called result: First: Helper processed: test on' in result.extracted_content
|
||||
assert '/test' in result.extracted_content
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_google_sheets_style_calling_pattern(self, registry, test_browser):
|
||||
"""Test the specific pattern from Google Sheets actions that causes the error"""
|
||||
|
||||
# Simulate the _select_cell_or_range helper function
|
||||
async def _select_cell_or_range(browser_session: BrowserSession, cell_or_range: str):
|
||||
page = await browser_session.get_current_page()
|
||||
return ActionResult(extracted_content=f'Selected cell {cell_or_range} on {page.url}')
|
||||
|
||||
@registry.action('Select cell or range')
|
||||
async def select_cell_or_range(browser_session: BrowserSession, cell_or_range: str):
|
||||
# This is the PROBLEMATIC pattern that currently fails
|
||||
# Passing browser_session by name causes "multiple values for argument" error
|
||||
return await _select_cell_or_range(browser_session=browser_session, cell_or_range=cell_or_range)
|
||||
|
||||
@registry.action('Select cell or range (fixed)')
|
||||
async def select_cell_or_range_fixed(browser_session: BrowserSession, cell_or_range: str):
|
||||
# This is the WORKING pattern using positional args
|
||||
return await _select_cell_or_range(browser_session, cell_or_range)
|
||||
|
||||
@registry.action('Update range contents')
|
||||
async def update_range_contents(browser_session: BrowserSession, range_name: str, new_contents: str):
|
||||
# This action calls select_cell_or_range, simulating the real Google Sheets pattern
|
||||
await select_cell_or_range_fixed(browser_session, range_name) # Should use positional args
|
||||
return ActionResult(extracted_content=f'Updated range {range_name} with {new_contents}')
|
||||
|
||||
# Test the fixed version (should work)
|
||||
result_fixed = await registry.execute_action(
|
||||
'select_cell_or_range_fixed', {'cell_or_range': 'A1:F100'}, browser_session=test_browser
|
||||
)
|
||||
assert 'Selected cell A1:F100 on' in result_fixed.extracted_content
|
||||
assert '/test' in result_fixed.extracted_content
|
||||
|
||||
# Test the chained calling pattern
|
||||
result_chain = await registry.execute_action(
|
||||
'update_range_contents', {'range_name': 'B2:D4', 'new_contents': 'test data'}, browser_session=test_browser
|
||||
)
|
||||
assert 'Updated range B2:D4 with test data' in result_chain.extracted_content
|
||||
|
||||
# Test the problematic version (may fail with current registry, should work with enhanced registry)
|
||||
try:
|
||||
result_problematic = await registry.execute_action(
|
||||
'select_cell_or_range', {'cell_or_range': 'A1:F100'}, browser_session=test_browser
|
||||
)
|
||||
# If this succeeds, great! The enhanced registry is working
|
||||
assert 'Selected cell A1:F100 on' in result_problematic.extracted_content
|
||||
assert '/test' in result_problematic.extracted_content
|
||||
except TypeError as e:
|
||||
# This is the expected error with the current registry
|
||||
assert 'multiple values for argument' in str(e) or 'got multiple values' in str(e)
|
||||
logger.info(f'Expected error with current registry: {e}')
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_complex_action_chain(self, registry, test_browser):
|
||||
"""Test a complex chain of actions calling other actions"""
|
||||
|
||||
@registry.action('Base action')
|
||||
async def base_action(value: str, browser_session: BrowserSession):
|
||||
page = await browser_session.get_current_page()
|
||||
return ActionResult(extracted_content=f'Base: {value} on {page.url}')
|
||||
|
||||
@registry.action('Middle action')
|
||||
async def middle_action(input_val: str, browser_session: BrowserSession):
|
||||
# Call base action
|
||||
base_result = await registry.execute_action(
|
||||
'base_action', {'value': f'processed-{input_val}'}, browser_session=browser_session
|
||||
)
|
||||
return ActionResult(extracted_content=f'Middle: {base_result.extracted_content}')
|
||||
|
||||
@registry.action('Top action')
|
||||
async def top_action(original: str, browser_session: BrowserSession):
|
||||
# Call middle action
|
||||
middle_result = await registry.execute_action(
|
||||
'middle_action', {'input_val': f'enhanced-{original}'}, browser_session=browser_session
|
||||
)
|
||||
return ActionResult(extracted_content=f'Top: {middle_result.extracted_content}')
|
||||
|
||||
# Test the full chain
|
||||
result = await registry.execute_action('top_action', {'original': 'test'}, browser_session=test_browser)
|
||||
|
||||
assert isinstance(result, ActionResult)
|
||||
assert 'Top: Middle: Base: processed-enhanced-test on' in result.extracted_content
|
||||
assert '/test' in result.extracted_content
|
||||
|
||||
|
||||
class TestRegistryEdgeCases:
|
||||
"""Test edge cases and error conditions"""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_missing_required_browser_session(self, registry):
|
||||
"""Test that actions requiring browser_session fail appropriately when not provided"""
|
||||
|
||||
@registry.action('Requires browser')
|
||||
async def requires_browser(text: str, browser_session: BrowserSession):
|
||||
page = await browser_session.get_current_page()
|
||||
return ActionResult(extracted_content=f'Text: {text}, URL: {page.url}')
|
||||
|
||||
# Should raise RuntimeError when browser_session is required but not provided
|
||||
with pytest.raises(RuntimeError, match='requires browser_session but none provided'):
|
||||
await registry.execute_action(
|
||||
'requires_browser',
|
||||
{'text': 'test'},
|
||||
# No browser_session provided
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_missing_required_llm(self, registry, test_browser):
|
||||
"""Test that actions requiring page_extraction_llm fail appropriately when not provided"""
|
||||
|
||||
@registry.action('Requires LLM')
|
||||
async def requires_llm(text: str, browser_session: BrowserSession, page_extraction_llm: MockLLM):
|
||||
page = await browser_session.get_current_page()
|
||||
llm_response = await page_extraction_llm.ainvoke('test')
|
||||
return ActionResult(extracted_content=f'Text: {text}, LLM: {llm_response.content}')
|
||||
|
||||
# Should raise RuntimeError when page_extraction_llm is required but not provided
|
||||
with pytest.raises(RuntimeError, match='requires page_extraction_llm but none provided'):
|
||||
await registry.execute_action(
|
||||
'requires_llm',
|
||||
{'text': 'test'},
|
||||
browser_session=test_browser,
|
||||
# No page_extraction_llm provided
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_invalid_parameters(self, registry, test_browser):
|
||||
"""Test handling of invalid parameters"""
|
||||
|
||||
@registry.action('Typed action')
|
||||
async def typed_action(number: int, browser_session: BrowserSession):
|
||||
return ActionResult(extracted_content=f'Number: {number}')
|
||||
|
||||
# Should raise RuntimeError when parameter validation fails
|
||||
with pytest.raises(RuntimeError, match='Invalid parameters'):
|
||||
await registry.execute_action(
|
||||
'typed_action',
|
||||
{'number': 'not a number'}, # Invalid type
|
||||
browser_session=test_browser,
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_nonexistent_action(self, registry, test_browser):
|
||||
"""Test calling a non-existent action"""
|
||||
|
||||
with pytest.raises(ValueError, match='Action nonexistent_action not found'):
|
||||
await registry.execute_action('nonexistent_action', {'param': 'value'}, browser_session=test_browser)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_sync_action_wrapper(self, registry, test_browser):
|
||||
"""Test that sync functions are properly wrapped to be async"""
|
||||
|
||||
@registry.action('Sync action')
|
||||
def sync_action(text: str, browser_session: BrowserSession):
|
||||
# This is a sync function that should be wrapped
|
||||
return ActionResult(extracted_content=f'Sync: {text}')
|
||||
|
||||
# Should work even though the original function is sync
|
||||
result = await registry.execute_action('sync_action', {'text': 'test'}, browser_session=test_browser)
|
||||
|
||||
assert isinstance(result, ActionResult)
|
||||
assert 'Sync: test' in result.extracted_content
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_excluded_actions(self, test_browser):
|
||||
"""Test that excluded actions are not registered"""
|
||||
|
||||
registry_with_exclusions = Registry[TestContext](exclude_actions=['excluded_action'])
|
||||
|
||||
@registry_with_exclusions.action('Excluded action')
|
||||
async def excluded_action(text: str):
|
||||
return ActionResult(extracted_content=f'Should not execute: {text}')
|
||||
|
||||
@registry_with_exclusions.action('Included action')
|
||||
async def included_action(text: str):
|
||||
return ActionResult(extracted_content=f'Should execute: {text}')
|
||||
|
||||
# Excluded action should not be in registry
|
||||
assert 'excluded_action' not in registry_with_exclusions.registry.actions
|
||||
assert 'included_action' in registry_with_exclusions.registry.actions
|
||||
|
||||
# Should raise error when trying to execute excluded action
|
||||
with pytest.raises(ValueError, match='Action excluded_action not found'):
|
||||
await registry_with_exclusions.execute_action('excluded_action', {'text': 'test'})
|
||||
|
||||
# Included action should work
|
||||
result = await registry_with_exclusions.execute_action('included_action', {'text': 'test'})
|
||||
assert 'Should execute: test' in result.extracted_content
|
||||
|
||||
|
||||
class TestExistingControllerActions:
|
||||
"""Test that existing controller actions continue to work"""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_existing_action_models(self, registry, test_browser):
|
||||
"""Test that existing action parameter models work correctly"""
|
||||
|
||||
@registry.action('Test search', param_model=SearchGoogleAction)
|
||||
async def test_search(params: SearchGoogleAction, browser_session: BrowserSession):
|
||||
return ActionResult(extracted_content=f'Searched for: {params.query}')
|
||||
|
||||
@registry.action('Test click', param_model=ClickElementAction)
|
||||
async def test_click(params: ClickElementAction, browser_session: BrowserSession):
|
||||
return ActionResult(extracted_content=f'Clicked element: {params.index}')
|
||||
|
||||
@registry.action('Test input', param_model=InputTextAction)
|
||||
async def test_input(params: InputTextAction, browser_session: BrowserSession):
|
||||
return ActionResult(extracted_content=f'Input text: {params.text} at index: {params.index}')
|
||||
|
||||
# Test SearchGoogleAction
|
||||
result1 = await registry.execute_action('test_search', {'query': 'python testing'}, browser_session=test_browser)
|
||||
assert 'Searched for: python testing' in result1.extracted_content
|
||||
|
||||
# Test ClickElementAction
|
||||
result2 = await registry.execute_action('test_click', {'index': 42}, browser_session=test_browser)
|
||||
assert 'Clicked element: 42' in result2.extracted_content
|
||||
|
||||
# Test InputTextAction
|
||||
result3 = await registry.execute_action('test_input', {'index': 5, 'text': 'test input'}, browser_session=test_browser)
|
||||
assert 'Input text: test input at index: 5' in result3.extracted_content
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_pydantic_vs_individual_params_consistency(self, registry, test_browser):
|
||||
"""Test that pydantic and individual parameter patterns produce consistent results"""
|
||||
|
||||
# Action using individual parameters
|
||||
@registry.action('Individual params')
|
||||
async def individual_params_action(text: str, number: int, browser_session: BrowserSession):
|
||||
return ActionResult(extracted_content=f'Individual: {text}-{number}')
|
||||
|
||||
# Action using pydantic model
|
||||
class TestParams(BaseActionModel):
|
||||
text: str
|
||||
number: int
|
||||
|
||||
@registry.action('Pydantic params', param_model=TestParams)
|
||||
async def pydantic_params_action(params: TestParams, browser_session: BrowserSession):
|
||||
return ActionResult(extracted_content=f'Pydantic: {params.text}-{params.number}')
|
||||
|
||||
# Both should produce similar results
|
||||
test_data = {'text': 'hello', 'number': 42}
|
||||
|
||||
result1 = await registry.execute_action('individual_params_action', test_data, browser_session=test_browser)
|
||||
|
||||
result2 = await registry.execute_action('pydantic_params_action', test_data, browser_session=test_browser)
|
||||
|
||||
# Both should extract the same content (just different prefixes)
|
||||
assert 'hello-42' in result1.extracted_content
|
||||
assert 'hello-42' in result2.extracted_content
|
||||
assert 'Individual:' in result1.extracted_content
|
||||
assert 'Pydantic:' in result2.extracted_content
|
||||
|
||||
|
||||
# Test runner for manual execution
|
||||
if __name__ == '__main__':
|
||||
# Run a simple test manually
|
||||
import asyncio
|
||||
|
||||
async def manual_test():
|
||||
"""Manual test runner for debugging"""
|
||||
print('Running manual test...')
|
||||
|
||||
registry = Registry[TestContext]()
|
||||
browser_session = BrowserSession(headless=True)
|
||||
await browser_session.start()
|
||||
await browser_session.create_new_tab('https://example.com')
|
||||
|
||||
@registry.action('Manual test action')
|
||||
async def manual_action(text: str, browser_session: BrowserSession):
|
||||
page = await browser_session.get_current_page()
|
||||
return ActionResult(extracted_content=f'Manual: {text} on {page.url}')
|
||||
|
||||
result = await registry.execute_action('manual_action', {'text': 'test'}, browser_session=browser_session)
|
||||
|
||||
print(f'Result: {result.extracted_content}')
|
||||
await browser_session.stop()
|
||||
print('Manual test passed!')
|
||||
|
||||
if __name__ == '__main__':
|
||||
asyncio.run(manual_test())
|
||||
@@ -85,7 +85,8 @@ class TestBrowserContext:
|
||||
assert context1._is_url_allowed('https://anotherdomain.org/path') is True
|
||||
|
||||
# Scenario 2: allowed_domains is provided.
|
||||
allowed = ['example.com', '*.mysite.org']
|
||||
# Note: match_url_with_domain_pattern defaults to https:// scheme when none is specified
|
||||
allowed = ['https://example.com', 'http://example.com', 'http://*.mysite.org', 'https://*.mysite.org']
|
||||
config2 = BrowserProfile(allowed_domains=allowed)
|
||||
context2 = BrowserSession(browser_profile=config2)
|
||||
|
||||
@@ -93,7 +94,7 @@ class TestBrowserContext:
|
||||
assert context2._is_url_allowed('http://example.com') is True
|
||||
# URL with subdomain (should not be allowed)
|
||||
assert context2._is_url_allowed('http://sub.example.com/path') is False
|
||||
# URL with different domain (should not be allowed)
|
||||
# URL with subdomain for wildcard pattern (should be allowed)
|
||||
assert context2._is_url_allowed('http://sub.mysite.org') is True
|
||||
# URL that matches second allowed domain
|
||||
assert context2._is_url_allowed('https://mysite.org/page') is True
|
||||
257
tests/ci/test_browser_session_param.py
Normal file
257
tests/ci/test_browser_session_param.py
Normal file
@@ -0,0 +1,257 @@
|
||||
"""
|
||||
Test script to reproduce and debug the browser_session parameter issue with actions
|
||||
like select_cell_or_range in Google Sheets.
|
||||
|
||||
This test demonstrates a specific parameter passing issue that can occur in registry.execute_action
|
||||
when a parameter (like browser_session) is:
|
||||
1. Required by a function registered with the Registry
|
||||
2. Added to extra_args by the Registry.execute_action method
|
||||
3. Passed by name when the function calls another function
|
||||
|
||||
The bug would manifest as:
|
||||
"TypeError: select_cell_or_range() got multiple values for argument 'browser_session'"
|
||||
|
||||
The fix is to pass browser_session positionally, not by name, when calling from one action to another,
|
||||
to avoid the conflict when the Registry also adds it to extra_args.
|
||||
|
||||
This test validates the issue exists and confirms the fix works.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
|
||||
from pydantic import Field
|
||||
|
||||
from browser_use.controller.registry.service import Registry
|
||||
from browser_use.controller.registry.views import ActionModel
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(level=logging.DEBUG)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# Use real browser session for testing
|
||||
import pytest
|
||||
|
||||
from browser_use.browser import BrowserSession
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
async def browser_session():
|
||||
"""Create and provide a real BrowserSession instance."""
|
||||
browser_session = BrowserSession(
|
||||
headless=True,
|
||||
user_data_dir=None,
|
||||
)
|
||||
await browser_session.start()
|
||||
yield browser_session
|
||||
await browser_session.stop()
|
||||
|
||||
|
||||
# Model that doesn't include browser_session (renamed to avoid pytest collecting it)
|
||||
class CellActionParams(ActionModel):
|
||||
value: str = Field(description='Test value')
|
||||
|
||||
|
||||
# Model that includes browser_session
|
||||
class ModelWithBrowser(ActionModel):
|
||||
value: str = Field(description='Test value')
|
||||
browser_session: BrowserSession = None
|
||||
|
||||
|
||||
# Simple context for testing
|
||||
class TestContext:
|
||||
pass
|
||||
|
||||
|
||||
async def main(browser_session):
|
||||
"""Run the test to diagnose browser_session parameter issue
|
||||
|
||||
This test demonstrates the problem and our fix. The issue happens because:
|
||||
|
||||
1. In controller/service.py, we have:
|
||||
```python
|
||||
@registry.action('Google Sheets: Select a specific cell or range of cells')
|
||||
async def select_cell_or_range(browser_session: BrowserSession, cell_or_range: str):
|
||||
return await _select_cell_or_range(browser_session=browser_session, cell_or_range=cell_or_range)
|
||||
```
|
||||
|
||||
2. When registry.execute_action calls this function, it adds browser_session to extra_args:
|
||||
```python
|
||||
# In registry/service.py
|
||||
if 'browser_session' in parameter_names:
|
||||
extra_args['browser_session'] = browser_session
|
||||
```
|
||||
|
||||
3. Then later, when calling action.function:
|
||||
```python
|
||||
return await action.function(**params_dict, **extra_args)
|
||||
```
|
||||
|
||||
4. This effectively means browser_session is passed twice:
|
||||
- Once through extra_args['browser_session']
|
||||
- And again through params_dict['browser_session'] (from the original function)
|
||||
|
||||
The fix is to pass browser_session positionally in select_cell_or_range:
|
||||
```python
|
||||
return await _select_cell_or_range(browser_session, cell_or_range)
|
||||
```
|
||||
|
||||
This test confirms that this approach works.
|
||||
"""
|
||||
logger.info('Starting browser_session parameter test')
|
||||
|
||||
# Create registry
|
||||
registry = Registry[TestContext]()
|
||||
|
||||
# Create a custom param model for select_cell_or_range
|
||||
class CellRangeParams(ActionModel):
|
||||
cell_or_range: str = Field(description='Cell or range to select')
|
||||
|
||||
# Use the provided real browser session
|
||||
|
||||
# Test with the real issue: select_cell_or_range
|
||||
logger.info('\n\n=== Test: Simulating select_cell_or_range issue with correct model ===')
|
||||
|
||||
# Define the function without using our registry - this will be a helper function
|
||||
async def _select_cell_or_range(browser_session, cell_or_range):
|
||||
"""Helper function for select_cell_or_range"""
|
||||
logger.info(f'_select_cell_or_range internal implementation called with cell_or_range={cell_or_range}')
|
||||
return f'Selected cell {cell_or_range}'
|
||||
|
||||
# This simulates the actual issue we're seeing in the real code
|
||||
# The browser_session parameter is in both the function signature and passed as a named arg
|
||||
@registry.action('Google Sheets: Select a cell or range', param_model=CellRangeParams)
|
||||
async def select_cell_or_range(browser_session: BrowserSession, cell_or_range: str):
|
||||
logger.info(f'select_cell_or_range called with browser_session={browser_session}, cell_or_range={cell_or_range}')
|
||||
|
||||
# PROBLEMATIC LINE: browser_session is passed by name, matching the parameter name
|
||||
# This is what causes the "got multiple values" error in the real code
|
||||
return await _select_cell_or_range(browser_session=browser_session, cell_or_range=cell_or_range)
|
||||
|
||||
# Fix attempt: Register a version that uses positional args instead
|
||||
@registry.action('Google Sheets: Select a cell or range (fixed)', param_model=CellRangeParams)
|
||||
async def select_cell_or_range_fixed(browser_session: BrowserSession, cell_or_range: str):
|
||||
logger.info(f'select_cell_or_range_fixed called with browser_session={browser_session}, cell_or_range={cell_or_range}')
|
||||
|
||||
# FIXED LINE: browser_session is passed positionally, avoiding the parameter name conflict
|
||||
return await _select_cell_or_range(browser_session, cell_or_range)
|
||||
|
||||
# Another attempt: explicitly call using **kwargs to simulate what the registry does
|
||||
@registry.action('Google Sheets: Select with kwargs', param_model=CellRangeParams)
|
||||
async def select_with_kwargs(browser_session: BrowserSession, cell_or_range: str):
|
||||
logger.info(f'select_with_kwargs called with browser_session={browser_session}, cell_or_range={cell_or_range}')
|
||||
|
||||
# Get params and extra_args, like in Registry.execute_action
|
||||
params = {'cell_or_range': cell_or_range, 'browser_session': browser_session}
|
||||
extra_args = {'browser_session': browser_session}
|
||||
|
||||
# Try to call _select_cell_or_range with both params and extra_args
|
||||
# This will fail with "got multiple values for keyword argument 'browser_session'"
|
||||
try:
|
||||
logger.info('Attempting to call with both params and extra_args (should fail):')
|
||||
await _select_cell_or_range(**params, **extra_args)
|
||||
except TypeError as e:
|
||||
logger.info(f'Expected error: {e}')
|
||||
|
||||
# Remove browser_session from params to avoid the conflict
|
||||
params_fixed = dict(params)
|
||||
del params_fixed['browser_session']
|
||||
|
||||
logger.info(f'Fixed params: {params_fixed}')
|
||||
|
||||
# This should work
|
||||
result = await _select_cell_or_range(**params_fixed, **extra_args)
|
||||
logger.info(f'Success after fix: {result}')
|
||||
return result
|
||||
|
||||
# Test the original problematic version
|
||||
logger.info('\n--- Testing original problematic version ---')
|
||||
try:
|
||||
result1 = await registry.execute_action(
|
||||
'select_cell_or_range', {'cell_or_range': 'A1:F100'}, browser_session=browser_session
|
||||
)
|
||||
logger.info(f'Success! Result: {result1}')
|
||||
except Exception as e:
|
||||
logger.error(f'Error: {str(e)}')
|
||||
|
||||
# Test the fixed version (using positional args)
|
||||
logger.info('\n--- Testing fixed version (positional args) ---')
|
||||
try:
|
||||
result2 = await registry.execute_action(
|
||||
'select_cell_or_range_fixed', {'cell_or_range': 'A1:F100'}, browser_session=browser_session
|
||||
)
|
||||
logger.info(f'Success! Result: {result2}')
|
||||
except Exception as e:
|
||||
logger.error(f'Error: {str(e)}')
|
||||
|
||||
# Test with kwargs version that simulates what Registry.execute_action does
|
||||
logger.info('\n--- Testing kwargs simulation version ---')
|
||||
try:
|
||||
result3 = await registry.execute_action(
|
||||
'select_with_kwargs', {'cell_or_range': 'A1:F100'}, browser_session=browser_session
|
||||
)
|
||||
logger.info(f'Success! Result: {result3}')
|
||||
except Exception as e:
|
||||
logger.error(f'Error: {str(e)}')
|
||||
|
||||
# Manual test of our theory: browser_session is passed twice
|
||||
logger.info('\n--- Direct test of our theory ---')
|
||||
try:
|
||||
# Create the model instance
|
||||
params = CellRangeParams(cell_or_range='A1:F100')
|
||||
|
||||
# First check if the extra_args approach works
|
||||
logger.info('Checking if extra_args approach works:')
|
||||
extra_args = {'browser_session': browser_session}
|
||||
|
||||
# If we were to modify Registry.execute_action:
|
||||
# 1. Check if the function parameter needs browser_session
|
||||
parameter_names = ['browser_session', 'cell_or_range']
|
||||
browser_keys = ['browser_session', 'browser', 'browser_context']
|
||||
|
||||
# Create params dict
|
||||
param_dict = params.model_dump()
|
||||
logger.info(f'params dict before: {param_dict}')
|
||||
|
||||
# Apply our fix: remove browser_session from params dict
|
||||
for key in browser_keys:
|
||||
if key in param_dict and key in extra_args:
|
||||
logger.info(f'Removing {key} from params dict')
|
||||
del param_dict[key]
|
||||
|
||||
logger.info(f'params dict after: {param_dict}')
|
||||
logger.info(f'extra_args: {extra_args}')
|
||||
|
||||
# This would be the fixed code:
|
||||
# return await action.function(**param_dict, **extra_args)
|
||||
|
||||
# Call directly to test
|
||||
result3 = await select_cell_or_range(**param_dict, **extra_args)
|
||||
logger.info(f'Success with our fix! Result: {result3}')
|
||||
except Exception as e:
|
||||
logger.error(f'Error with our manual test: {str(e)}')
|
||||
|
||||
|
||||
# Add a proper pytest test function
|
||||
import pytest
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_browser_session_parameter_issue(browser_session):
|
||||
"""Test that the browser_session parameter issue is fixed."""
|
||||
# Run the main test logic
|
||||
await main(browser_session)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# For direct execution (not through pytest)
|
||||
async def run_with_real_browser():
|
||||
browser_session = BrowserSession(headless=True, user_data_dir=None)
|
||||
await browser_session.start()
|
||||
try:
|
||||
await main(browser_session)
|
||||
finally:
|
||||
await browser_session.stop()
|
||||
|
||||
asyncio.run(run_with_real_browser())
|
||||
436
tests/ci/test_debug_selector_map.py
Normal file
436
tests/ci/test_debug_selector_map.py
Normal file
@@ -0,0 +1,436 @@
|
||||
"""
|
||||
Systematic debugging of the selector map issue.
|
||||
Test each assumption step by step to isolate the problem.
|
||||
"""
|
||||
|
||||
import os
|
||||
|
||||
import pytest
|
||||
|
||||
from browser_use.browser import BrowserProfile, BrowserSession
|
||||
from browser_use.controller.service import Controller
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
async def browser_session():
|
||||
"""Create a real browser session for testing."""
|
||||
session = BrowserSession(
|
||||
browser_profile=BrowserProfile(
|
||||
executable_path=os.getenv('BROWSER_PATH'),
|
||||
user_data_dir=None, # Use temporary profile
|
||||
headless=True,
|
||||
)
|
||||
)
|
||||
async with session:
|
||||
yield session
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def controller():
|
||||
"""Create a controller instance."""
|
||||
return Controller()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_assumption_1_dom_processing_works(browser_session):
|
||||
"""Test assumption 1: DOM processing works and finds elements."""
|
||||
# Go to a simple page
|
||||
page = await browser_session.get_current_page()
|
||||
await page.goto('https://www.google.com')
|
||||
await page.wait_for_load_state()
|
||||
|
||||
# Trigger DOM processing
|
||||
state = await browser_session.get_state_summary(cache_clickable_elements_hashes=False)
|
||||
|
||||
print('DOM processing result:')
|
||||
print(f' - Elements found: {len(state.selector_map)}')
|
||||
print(f' - Element indices: {list(state.selector_map.keys())}')
|
||||
|
||||
# Verify DOM processing works
|
||||
assert len(state.selector_map) > 0, 'DOM processing should find elements'
|
||||
assert 0 in state.selector_map, 'Element index 0 should exist'
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_assumption_2_cached_selector_map_persists(browser_session):
|
||||
"""Test assumption 2: Cached selector map persists after get_state_summary."""
|
||||
# Go to a simple page
|
||||
page = await browser_session.get_current_page()
|
||||
await page.goto('https://www.google.com')
|
||||
await page.wait_for_load_state()
|
||||
|
||||
# Trigger DOM processing and cache
|
||||
state = await browser_session.get_state_summary(cache_clickable_elements_hashes=False)
|
||||
initial_selector_map = dict(state.selector_map)
|
||||
|
||||
# Check if cached selector map is still available
|
||||
cached_selector_map = await browser_session.get_selector_map()
|
||||
|
||||
print('Selector map persistence:')
|
||||
print(f' - Initial elements: {len(initial_selector_map)}')
|
||||
print(f' - Cached elements: {len(cached_selector_map)}')
|
||||
print(f' - Maps are identical: {initial_selector_map.keys() == cached_selector_map.keys()}')
|
||||
|
||||
# Verify the cached map persists
|
||||
assert len(cached_selector_map) > 0, 'Cached selector map should persist'
|
||||
assert initial_selector_map.keys() == cached_selector_map.keys(), 'Cached map should match initial map'
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_assumption_3_action_gets_same_selector_map(browser_session, controller):
|
||||
"""Test assumption 3: Action gets the same selector map as cached."""
|
||||
# Go to a simple page
|
||||
page = await browser_session.get_current_page()
|
||||
await page.goto('https://www.google.com')
|
||||
await page.wait_for_load_state()
|
||||
|
||||
# Trigger DOM processing and cache
|
||||
await browser_session.get_state_summary(cache_clickable_elements_hashes=False)
|
||||
cached_selector_map = await browser_session.get_selector_map()
|
||||
|
||||
print('Pre-action state:')
|
||||
print(f' - Cached elements: {len(cached_selector_map)}')
|
||||
print(f' - Element 0 exists in cache: {0 in cached_selector_map}')
|
||||
|
||||
# Create a test action that checks the selector map it receives
|
||||
@controller.registry.action('Test: Check selector map')
|
||||
async def test_check_selector_map(browser_session: BrowserSession):
|
||||
from browser_use import ActionResult
|
||||
|
||||
action_selector_map = await browser_session.get_selector_map()
|
||||
return ActionResult(
|
||||
extracted_content=f'Action sees {len(action_selector_map)} elements, index 0 exists: {0 in action_selector_map}',
|
||||
include_in_memory=False,
|
||||
)
|
||||
|
||||
# Execute the test action
|
||||
result = await controller.registry.execute_action('test_check_selector_map', {}, browser_session=browser_session)
|
||||
|
||||
print(f'Action result: {result.extracted_content}')
|
||||
|
||||
# Verify the action sees the same selector map
|
||||
assert 'index 0 exists: True' in result.extracted_content, 'Action should see element 0'
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_assumption_4_click_action_specific_issue(browser_session, controller):
|
||||
"""Test assumption 4: Specific issue with click_element_by_index action."""
|
||||
# Go to a simple page
|
||||
page = await browser_session.get_current_page()
|
||||
await page.goto('https://www.google.com')
|
||||
await page.wait_for_load_state()
|
||||
|
||||
# Trigger DOM processing and cache
|
||||
await browser_session.get_state_summary(cache_clickable_elements_hashes=False)
|
||||
cached_selector_map = await browser_session.get_selector_map()
|
||||
|
||||
print('Pre-click state:')
|
||||
print(f' - Cached elements: {len(cached_selector_map)}')
|
||||
print(f' - Element 0 exists: {0 in cached_selector_map}')
|
||||
|
||||
# Create a test action that replicates click_element_by_index logic
|
||||
@controller.registry.action('Test: Debug click logic')
|
||||
async def test_debug_click_logic(browser_session: BrowserSession, index: int):
|
||||
from browser_use import ActionResult
|
||||
|
||||
# This is the exact logic from click_element_by_index
|
||||
selector_map = await browser_session.get_selector_map()
|
||||
|
||||
print(f' - Action selector map size: {len(selector_map)}')
|
||||
print(f' - Action selector map keys: {list(selector_map.keys())[:10]}') # First 10
|
||||
print(f' - Index {index} in selector map: {index in selector_map}')
|
||||
|
||||
if index not in selector_map:
|
||||
return ActionResult(
|
||||
error=f'Debug: Element with index {index} does not exist in map of size {len(selector_map)}',
|
||||
include_in_memory=False,
|
||||
)
|
||||
|
||||
return ActionResult(
|
||||
extracted_content=f'Debug: Element {index} found in map of size {len(selector_map)}', include_in_memory=False
|
||||
)
|
||||
|
||||
# Test with index 0
|
||||
result = await controller.registry.execute_action('test_debug_click_logic', {'index': 0}, browser_session=browser_session)
|
||||
|
||||
print(f'Debug click result: {result.extracted_content or result.error}')
|
||||
|
||||
# This will help us see exactly what the click action sees
|
||||
if result.error:
|
||||
pytest.fail(f'Click logic debug failed: {result.error}')
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_assumption_5_multiple_get_selector_map_calls(browser_session):
|
||||
"""Test assumption 5: Multiple calls to get_selector_map return consistent results."""
|
||||
# Go to a simple page
|
||||
page = await browser_session.get_current_page()
|
||||
await page.goto('https://www.google.com')
|
||||
await page.wait_for_load_state()
|
||||
|
||||
# Trigger DOM processing and cache
|
||||
await browser_session.get_state_summary(cache_clickable_elements_hashes=False)
|
||||
|
||||
# Call get_selector_map multiple times
|
||||
map1 = await browser_session.get_selector_map()
|
||||
map2 = await browser_session.get_selector_map()
|
||||
map3 = await browser_session.get_selector_map()
|
||||
|
||||
print('Multiple selector map calls:')
|
||||
print(f' - Call 1: {len(map1)} elements')
|
||||
print(f' - Call 2: {len(map2)} elements')
|
||||
print(f' - Call 3: {len(map3)} elements')
|
||||
print(f' - All calls identical: {map1.keys() == map2.keys() == map3.keys()}')
|
||||
|
||||
# Verify consistency
|
||||
assert len(map1) == len(map2) == len(map3), 'Multiple calls should return same size'
|
||||
assert map1.keys() == map2.keys() == map3.keys(), 'Multiple calls should return same elements'
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_assumption_6_page_changes_affect_selector_map(browser_session):
|
||||
"""Test assumption 6: Check if page navigation affects cached selector map."""
|
||||
# Go to first page
|
||||
page = await browser_session.get_current_page()
|
||||
await page.goto('https://www.google.com')
|
||||
await page.wait_for_load_state()
|
||||
|
||||
# Get initial selector map
|
||||
await browser_session.get_state_summary(cache_clickable_elements_hashes=False)
|
||||
initial_map = await browser_session.get_selector_map()
|
||||
|
||||
print('Page change test:')
|
||||
print(f' - Google.com elements: {len(initial_map)}')
|
||||
|
||||
# Navigate to a different page (without calling get_state_summary)
|
||||
await page.goto('https://www.example.com')
|
||||
await page.wait_for_load_state()
|
||||
|
||||
# Check if cached selector map is still from old page
|
||||
cached_map_after_nav = await browser_session.get_selector_map()
|
||||
|
||||
print(f' - After navigation (cached): {len(cached_map_after_nav)}')
|
||||
print(f' - Cache unchanged after nav: {len(initial_map) == len(cached_map_after_nav)}')
|
||||
|
||||
# Update with new page
|
||||
await browser_session.get_state_summary(cache_clickable_elements_hashes=False)
|
||||
new_page_map = await browser_session.get_selector_map()
|
||||
|
||||
print(f' - Example.com elements (fresh): {len(new_page_map)}')
|
||||
|
||||
# This will tell us if cached maps get stale
|
||||
assert len(new_page_map) != len(initial_map) or initial_map.keys() != new_page_map.keys(), (
|
||||
'Different pages should have different selector maps'
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_assumption_8_same_browser_session_instance(browser_session, controller):
|
||||
"""Test assumption 8: Action gets the same browser_session instance."""
|
||||
# Go to a simple page
|
||||
page = await browser_session.get_current_page()
|
||||
await page.goto('https://www.google.com')
|
||||
await page.wait_for_load_state()
|
||||
|
||||
print('=== BROWSER SESSION INSTANCE DEBUG ===')
|
||||
|
||||
# Get fresh state
|
||||
await browser_session.get_state_summary(cache_clickable_elements_hashes=False)
|
||||
|
||||
# Store the ID of our browser session instance
|
||||
original_session_id = id(browser_session)
|
||||
print(f'1. Original browser_session ID: {original_session_id}')
|
||||
print(f'2. Original cache exists: {browser_session._cached_browser_state_summary is not None}')
|
||||
|
||||
# Create action that checks browser session identity
|
||||
@controller.registry.action('Test: Check browser session identity')
|
||||
async def test_check_session_identity(browser_session: BrowserSession):
|
||||
from browser_use import ActionResult
|
||||
|
||||
action_session_id = id(browser_session)
|
||||
cache_exists = browser_session._cached_browser_state_summary is not None
|
||||
return ActionResult(
|
||||
extracted_content=f'Action session ID: {action_session_id}, Cache exists: {cache_exists}', include_in_memory=False
|
||||
)
|
||||
|
||||
# Execute action
|
||||
result = await controller.registry.execute_action('test_check_session_identity', {}, browser_session=browser_session)
|
||||
|
||||
print(f'3. Action result: {result.extracted_content}')
|
||||
|
||||
# Parse the result to check if session IDs match
|
||||
action_session_id = int(result.extracted_content.split('Action session ID: ')[1].split(',')[0])
|
||||
|
||||
if original_session_id == action_session_id:
|
||||
print('✅ Same browser_session instance passed to action')
|
||||
else:
|
||||
print('❌ DIFFERENT browser_session instance passed to action!')
|
||||
print(f' Original: {original_session_id}')
|
||||
print(f' Action: {action_session_id}')
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_assumption_9_pydantic_private_attrs(browser_session, controller):
|
||||
"""Test assumption 9: Pydantic model validation affects private attributes."""
|
||||
# Go to a simple page
|
||||
page = await browser_session.get_current_page()
|
||||
await page.goto('https://www.google.com')
|
||||
await page.wait_for_load_state()
|
||||
|
||||
print('=== PYDANTIC PRIVATE ATTRS DEBUG ===')
|
||||
|
||||
# Get fresh state
|
||||
await browser_session.get_state_summary(cache_clickable_elements_hashes=False)
|
||||
|
||||
print(f'1. Original browser_session cache: {browser_session._cached_browser_state_summary is not None}')
|
||||
print(f'2. Original browser_session ID: {id(browser_session)}')
|
||||
|
||||
# Import the SpecialActionParameters to test directly
|
||||
from browser_use.controller.registry.service import SpecialActionParameters
|
||||
|
||||
# Test what happens when we put browser_session through model_validate
|
||||
special_params_data = {
|
||||
'context': None,
|
||||
'browser_session': browser_session,
|
||||
'browser': browser_session,
|
||||
'browser_context': browser_session,
|
||||
'page_extraction_llm': None,
|
||||
'available_file_paths': None,
|
||||
'has_sensitive_data': False,
|
||||
}
|
||||
|
||||
print(f'3. Before model_validate - browser_session cache: {browser_session._cached_browser_state_summary is not None}')
|
||||
|
||||
# Test the fixed version using model_construct instead of model_validate
|
||||
special_params = SpecialActionParameters.model_construct(**special_params_data)
|
||||
|
||||
print(
|
||||
f'4. After model_validate - original browser_session cache: {browser_session._cached_browser_state_summary is not None}'
|
||||
)
|
||||
|
||||
# Check the browser_session that comes out of the model
|
||||
extracted_browser_session = special_params.browser_session
|
||||
print(f'5. Extracted browser_session ID: {id(extracted_browser_session)}')
|
||||
print(f'6. Extracted browser_session cache: {extracted_browser_session._cached_browser_state_summary is not None}')
|
||||
|
||||
# Check if they're the same object
|
||||
if id(browser_session) == id(extracted_browser_session):
|
||||
print('✅ Same object - no copying occurred')
|
||||
else:
|
||||
print('❌ DIFFERENT object - Pydantic copied the browser_session!')
|
||||
|
||||
# Check if private attributes were preserved
|
||||
print(f'7. Original has _cached_browser_state_summary attr: {hasattr(browser_session, "_cached_browser_state_summary")}')
|
||||
print(
|
||||
f'8. Extracted has _cached_browser_state_summary attr: {hasattr(extracted_browser_session, "_cached_browser_state_summary")}'
|
||||
)
|
||||
|
||||
if hasattr(extracted_browser_session, '_cached_browser_state_summary'):
|
||||
print(f'9. Extracted _cached_browser_state_summary value: {extracted_browser_session._cached_browser_state_summary}')
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_assumption_7_cache_gets_cleared(browser_session, controller):
|
||||
"""Test assumption 7: Check if _cached_browser_state_summary gets cleared."""
|
||||
# Go to a simple page
|
||||
page = await browser_session.get_current_page()
|
||||
await page.goto('https://www.google.com')
|
||||
await page.wait_for_load_state()
|
||||
|
||||
print('=== CACHE CLEARING DEBUG ===')
|
||||
|
||||
# Check initial cache state
|
||||
print(f'1. Initial cache state: {browser_session._cached_browser_state_summary}')
|
||||
|
||||
# Get fresh state
|
||||
state = await browser_session.get_state_summary(cache_clickable_elements_hashes=False)
|
||||
print(f'2. After get_state_summary: cache exists = {browser_session._cached_browser_state_summary is not None}')
|
||||
print(f'3. Cache has {len(state.selector_map)} elements')
|
||||
|
||||
# Check cache before action
|
||||
print(f'4. Pre-action cache: {browser_session._cached_browser_state_summary is not None}')
|
||||
|
||||
# Create action that checks cache state (NO page parameter)
|
||||
@controller.registry.action('Test: Check cache state no page')
|
||||
async def test_check_cache_state_no_page(browser_session: BrowserSession):
|
||||
from browser_use import ActionResult
|
||||
|
||||
cache_exists = browser_session._cached_browser_state_summary is not None
|
||||
if cache_exists:
|
||||
cache_size = len(browser_session._cached_browser_state_summary.selector_map)
|
||||
else:
|
||||
cache_size = 0
|
||||
return ActionResult(
|
||||
extracted_content=f'NoPage - Cache exists: {cache_exists}, Cache size: {cache_size}', include_in_memory=False
|
||||
)
|
||||
|
||||
# Create action that checks cache state (WITH page parameter)
|
||||
@controller.registry.action('Test: Check cache state with page')
|
||||
async def test_check_cache_state_with_page(browser_session: BrowserSession, page):
|
||||
from browser_use import ActionResult
|
||||
|
||||
cache_exists = browser_session._cached_browser_state_summary is not None
|
||||
if cache_exists:
|
||||
cache_size = len(browser_session._cached_browser_state_summary.selector_map)
|
||||
else:
|
||||
cache_size = 0
|
||||
return ActionResult(
|
||||
extracted_content=f'WithPage - Cache exists: {cache_exists}, Cache size: {cache_size}', include_in_memory=False
|
||||
)
|
||||
|
||||
# Test action WITHOUT page parameter
|
||||
result_no_page = await controller.registry.execute_action(
|
||||
'test_check_cache_state_no_page', {}, browser_session=browser_session
|
||||
)
|
||||
|
||||
print(f'5a. Action result (NO page): {result_no_page.extracted_content}')
|
||||
|
||||
# Test action WITH page parameter
|
||||
result_with_page = await controller.registry.execute_action(
|
||||
'test_check_cache_state_with_page', {}, browser_session=browser_session
|
||||
)
|
||||
|
||||
print(f'5b. Action result (WITH page): {result_with_page.extracted_content}')
|
||||
print(f'6. Post-action cache: {browser_session._cached_browser_state_summary is not None}')
|
||||
|
||||
# This will tell us if the page parameter injection clears the cache
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_final_real_click_with_debug(browser_session, controller):
|
||||
"""Final test: Try actual click with maximum debugging."""
|
||||
# Go to a simple page
|
||||
page = await browser_session.get_current_page()
|
||||
await page.goto('https://www.google.com')
|
||||
await page.wait_for_load_state()
|
||||
|
||||
print('=== FINAL CLICK TEST WITH FULL DEBUG ===')
|
||||
|
||||
# Get fresh state
|
||||
state = await browser_session.get_state_summary(cache_clickable_elements_hashes=False)
|
||||
print(f'1. Fresh state has {len(state.selector_map)} elements')
|
||||
|
||||
# Check cached map
|
||||
cached_map = await browser_session.get_selector_map()
|
||||
print(f'2. Cached map has {len(cached_map)} elements')
|
||||
print(f'3. Element 0 in cached map: {0 in cached_map}')
|
||||
|
||||
# Try the real click action
|
||||
if 0 in cached_map:
|
||||
print('4. Attempting real click_element_by_index...')
|
||||
try:
|
||||
result = await controller.registry.execute_action(
|
||||
'click_element_by_index', {'index': 0}, browser_session=browser_session
|
||||
)
|
||||
print(f'5. Click SUCCESS: {result.extracted_content}')
|
||||
except Exception as e:
|
||||
print(f'5. Click FAILED: {e}')
|
||||
|
||||
# Additional debug: check selector map inside the exception
|
||||
debug_map = await browser_session.get_selector_map()
|
||||
print(f'6. Post-failure selector map: {len(debug_map)} elements')
|
||||
print(f'7. Element 0 still in map: {0 in debug_map}')
|
||||
|
||||
raise e
|
||||
else:
|
||||
pytest.fail('Element 0 not found in cached map - test setup issue')
|
||||
130
tests/ci/test_google_sheets_real.py
Normal file
130
tests/ci/test_google_sheets_real.py
Normal file
@@ -0,0 +1,130 @@
|
||||
"""
|
||||
Real integration tests for Google Sheets actions against the actual Google Sheets website.
|
||||
Tests the enhanced action registry system with Google Sheets keyboard automation.
|
||||
Uses the existing Google Sheets actions from the main controller.
|
||||
"""
|
||||
|
||||
import os
|
||||
|
||||
import pytest
|
||||
|
||||
from browser_use.browser import BrowserProfile, BrowserSession
|
||||
from browser_use.controller.service import Controller
|
||||
|
||||
# Test Google Sheets URL (public read-only spreadsheet for testing)
|
||||
TEST_GOOGLE_SHEET_URL = 'https://docs.google.com/spreadsheets/d/1INaIcfpYXlMRWO__de61SHFCaqt1lfHlcvtXZPItlpI/edit'
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
async def browser_session():
|
||||
"""Create a real browser session for testing."""
|
||||
session = BrowserSession(
|
||||
browser_profile=BrowserProfile(
|
||||
executable_path=os.getenv('BROWSER_PATH'),
|
||||
user_data_dir=None, # Use temporary profile
|
||||
headless=True,
|
||||
)
|
||||
)
|
||||
async with session:
|
||||
yield session
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def controller():
|
||||
"""Create a controller instance (Google Sheets actions are already registered)."""
|
||||
return Controller()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_selector_map_basic(browser_session, controller):
|
||||
"""Test that the selector map gets populated on a basic page."""
|
||||
# Go to a simple page first
|
||||
page = await browser_session.get_current_page()
|
||||
await page.goto('https://www.google.com')
|
||||
await page.wait_for_load_state()
|
||||
|
||||
# Update browser state to populate selector map
|
||||
await browser_session.get_state_summary(cache_clickable_elements_hashes=False)
|
||||
|
||||
# Check selector map
|
||||
selector_map = await browser_session.get_selector_map()
|
||||
print(f'Selector map size: {len(selector_map)}')
|
||||
|
||||
# Should have some elements
|
||||
assert len(selector_map) > 0, 'No clickable elements found in selector map'
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_click_element_basic(browser_session, controller):
|
||||
"""Test basic click element action to verify registry works."""
|
||||
# Go to a simple page
|
||||
page = await browser_session.get_current_page()
|
||||
await page.goto('https://www.google.com')
|
||||
await page.wait_for_load_state()
|
||||
|
||||
# Update browser state to populate selector map
|
||||
await browser_session.get_state_summary(cache_clickable_elements_hashes=False)
|
||||
|
||||
# Check selector map
|
||||
selector_map = await browser_session.get_selector_map()
|
||||
print(f'Available elements: {list(selector_map.keys())}')
|
||||
|
||||
if len(selector_map) > 0:
|
||||
# Try to click the first available element
|
||||
first_index = list(selector_map.keys())[0]
|
||||
print(f'Trying to click element index: {first_index}')
|
||||
|
||||
result = await controller.registry.execute_action(
|
||||
'click_element_by_index', {'index': first_index}, browser_session=browser_session
|
||||
)
|
||||
|
||||
# Should not have an error about element not existing
|
||||
print(f'Click result: {result.extracted_content if result.extracted_content else "No content"}')
|
||||
print(f'Click error: {result.error if result.error else "No error"}')
|
||||
|
||||
# The click might fail for other reasons (like navigation) but shouldn't fail due to "element does not exist"
|
||||
if result.error:
|
||||
assert 'Element with index' not in result.error, f'Element indexing failed: {result.error}'
|
||||
else:
|
||||
pytest.fail('No clickable elements found - DOM processing issue')
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_google_sheets_open(browser_session, controller):
|
||||
"""Test opening a Google Sheet using the existing action."""
|
||||
# First check what actions are available
|
||||
available_actions = list(controller.registry.registry.actions.keys())
|
||||
print(f'Available actions: {[a for a in available_actions if "Google" in a]}')
|
||||
|
||||
# Try to find the right action name
|
||||
google_sheet_actions = [a for a in available_actions if 'google sheet' in a.lower()]
|
||||
|
||||
if not google_sheet_actions:
|
||||
pytest.skip('No Google Sheets actions found in controller')
|
||||
|
||||
# Use the first Google Sheets action we find
|
||||
open_action = google_sheet_actions[0]
|
||||
print(f'Using action: {open_action}')
|
||||
|
||||
result = await controller.registry.execute_action(
|
||||
open_action, {'google_sheet_url': TEST_GOOGLE_SHEET_URL}, browser_session=browser_session
|
||||
)
|
||||
|
||||
print(f'Open result: {result.extracted_content if result.extracted_content else "No content"}')
|
||||
print(f'Open error: {result.error if result.error else "No error"}')
|
||||
|
||||
# Verify we're on the Google Sheets page
|
||||
page = await browser_session.get_current_page()
|
||||
assert 'docs.google.com/spreadsheets' in page.url
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_list_all_actions(browser_session, controller):
|
||||
"""Debug test to list all available actions."""
|
||||
available_actions = list(controller.registry.registry.actions.keys())
|
||||
print('All available actions:')
|
||||
for action in sorted(available_actions):
|
||||
print(f' - {action}')
|
||||
|
||||
# Just verify the controller has some actions
|
||||
assert len(available_actions) > 0
|
||||
255
tests/ci/test_sensitive_data.py
Normal file
255
tests/ci/test_sensitive_data.py
Normal file
@@ -0,0 +1,255 @@
|
||||
import pytest
|
||||
from langchain_core.messages import HumanMessage, SystemMessage
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from browser_use.agent.message_manager.service import MessageManager, MessageManagerSettings
|
||||
from browser_use.agent.views import MessageManagerState
|
||||
from browser_use.controller.registry.service import Registry
|
||||
from browser_use.utils import match_url_with_domain_pattern
|
||||
|
||||
|
||||
class SensitiveParams(BaseModel):
|
||||
"""Test parameter model for sensitive data testing."""
|
||||
|
||||
text: str = Field(description='Text with sensitive data placeholders')
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def registry():
|
||||
return Registry()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def message_manager():
|
||||
return MessageManager(
|
||||
task='Test task',
|
||||
system_message=SystemMessage(content='System message'),
|
||||
settings=MessageManagerSettings(),
|
||||
state=MessageManagerState(),
|
||||
)
|
||||
|
||||
|
||||
def test_replace_sensitive_data_with_missing_keys(registry, caplog):
|
||||
"""Test that _replace_sensitive_data handles missing keys gracefully"""
|
||||
# Set log level to capture warnings
|
||||
import logging
|
||||
|
||||
caplog.set_level(logging.WARNING)
|
||||
|
||||
# Create a simple Pydantic model with sensitive data placeholders
|
||||
params = SensitiveParams(text='Please enter <secret>username</secret> and <secret>password</secret>')
|
||||
|
||||
# Case 1: All keys present
|
||||
sensitive_data = {'username': 'user123', 'password': 'pass456'}
|
||||
result = registry._replace_sensitive_data(params, sensitive_data)
|
||||
assert 'user123' in result.text
|
||||
assert 'pass456' in result.text
|
||||
# Both keys should be replaced
|
||||
assert 'Missing' not in caplog.text
|
||||
caplog.clear()
|
||||
|
||||
# Case 2: One key missing
|
||||
sensitive_data = {'username': 'user123'} # password is missing
|
||||
result = registry._replace_sensitive_data(params, sensitive_data)
|
||||
assert 'user123' in result.text
|
||||
assert '<secret>password</secret>' in result.text
|
||||
# Verify the behavior - username replaced, password kept as tag
|
||||
assert 'password' in caplog.text
|
||||
caplog.clear()
|
||||
|
||||
# Case 3: Multiple keys missing
|
||||
sensitive_data = {} # both keys missing
|
||||
result = registry._replace_sensitive_data(params, sensitive_data)
|
||||
assert '<secret>username</secret>' in result.text
|
||||
assert '<secret>password</secret>' in result.text
|
||||
# Verify both tags are preserved when keys are missing
|
||||
assert 'Missing' in caplog.text
|
||||
caplog.clear()
|
||||
|
||||
# Case 4: One key empty
|
||||
sensitive_data = {'username': 'user123', 'password': ''}
|
||||
result = registry._replace_sensitive_data(params, sensitive_data)
|
||||
assert 'user123' in result.text
|
||||
assert '<secret>password</secret>' in result.text
|
||||
# Empty value should be treated the same as missing key
|
||||
assert 'password' in caplog.text
|
||||
caplog.clear()
|
||||
|
||||
|
||||
def test_simple_domain_specific_sensitive_data(registry, caplog):
|
||||
"""Test the basic functionality of domain-specific sensitive data replacement"""
|
||||
# Set log level to capture warnings
|
||||
import logging
|
||||
|
||||
caplog.set_level(logging.WARNING)
|
||||
|
||||
# Create a simple Pydantic model with sensitive data placeholders
|
||||
params = SensitiveParams(text='Please enter <secret>username</secret> and <secret>password</secret>')
|
||||
|
||||
# Simple test with directly instantiable values
|
||||
sensitive_data = {
|
||||
'example.com': {'username': 'example_user'},
|
||||
'other_data': 'non_secret_value', # Old format mixed with new
|
||||
}
|
||||
|
||||
# Without a browser_session, it should still replace known keys
|
||||
result = registry._replace_sensitive_data(params, sensitive_data)
|
||||
assert 'example_user' in result.text
|
||||
assert '<secret>password</secret>' in result.text # Password is missing in sensitive_data
|
||||
assert 'password' in caplog.text
|
||||
caplog.clear()
|
||||
|
||||
|
||||
def test_match_url_with_domain_pattern():
|
||||
"""Test that the domain pattern matching utility works correctly"""
|
||||
|
||||
# Test exact domain matches
|
||||
assert match_url_with_domain_pattern('https://example.com', 'example.com') is True
|
||||
assert match_url_with_domain_pattern('http://example.com', 'example.com') is False # Default scheme is now https
|
||||
assert match_url_with_domain_pattern('https://google.com', 'example.com') is False
|
||||
|
||||
# Test subdomain pattern matches
|
||||
assert match_url_with_domain_pattern('https://sub.example.com', '*.example.com') is True
|
||||
assert match_url_with_domain_pattern('https://example.com', '*.example.com') is True # Base domain should match too
|
||||
assert match_url_with_domain_pattern('https://sub.sub.example.com', '*.example.com') is True
|
||||
assert match_url_with_domain_pattern('https://example.org', '*.example.com') is False
|
||||
|
||||
# Test protocol pattern matches
|
||||
assert match_url_with_domain_pattern('https://example.com', 'http*://example.com') is True
|
||||
assert match_url_with_domain_pattern('http://example.com', 'http*://example.com') is True
|
||||
assert match_url_with_domain_pattern('ftp://example.com', 'http*://example.com') is False
|
||||
|
||||
# Test explicit http protocol
|
||||
assert match_url_with_domain_pattern('http://example.com', 'http://example.com') is True
|
||||
assert match_url_with_domain_pattern('https://example.com', 'http://example.com') is False
|
||||
|
||||
# Test Chrome extension pattern
|
||||
assert match_url_with_domain_pattern('chrome-extension://abcdefghijkl', 'chrome-extension://*') is True
|
||||
assert match_url_with_domain_pattern('chrome-extension://mnopqrstuvwx', 'chrome-extension://abcdefghijkl') is False
|
||||
|
||||
# Test about:blank handling
|
||||
assert match_url_with_domain_pattern('about:blank', 'example.com') is False
|
||||
assert match_url_with_domain_pattern('about:blank', '*://*') is False
|
||||
|
||||
|
||||
def test_unsafe_domain_patterns():
|
||||
"""Test that unsafe domain patterns are rejected"""
|
||||
|
||||
# These are unsafe patterns that could match too many domains
|
||||
assert match_url_with_domain_pattern('https://evil.com', '*google.com') is False
|
||||
assert match_url_with_domain_pattern('https://google.com.evil.com', '*.*.com') is False
|
||||
assert match_url_with_domain_pattern('https://google.com', '**google.com') is False
|
||||
assert match_url_with_domain_pattern('https://google.com', 'g*e.com') is False
|
||||
assert match_url_with_domain_pattern('https://google.com', '*com*') is False
|
||||
|
||||
# Test with patterns that have multiple asterisks in different positions
|
||||
assert match_url_with_domain_pattern('https://subdomain.example.com', '*domain*example*') is False
|
||||
assert match_url_with_domain_pattern('https://sub.domain.example.com', '*.*.example.com') is False
|
||||
|
||||
# Test patterns with wildcards in TLD part
|
||||
assert match_url_with_domain_pattern('https://example.com', 'example.*') is False
|
||||
assert match_url_with_domain_pattern('https://example.org', 'example.*') is False
|
||||
|
||||
|
||||
def test_malformed_urls_and_patterns():
|
||||
"""Test handling of malformed URLs or patterns"""
|
||||
|
||||
# Malformed URLs
|
||||
assert match_url_with_domain_pattern('not-a-url', 'example.com') is False
|
||||
assert match_url_with_domain_pattern('http://', 'example.com') is False
|
||||
assert match_url_with_domain_pattern('https://', 'example.com') is False
|
||||
assert match_url_with_domain_pattern('ftp:/example.com', 'example.com') is False # Missing slash
|
||||
|
||||
# Empty URLs or patterns
|
||||
assert match_url_with_domain_pattern('', 'example.com') is False
|
||||
assert match_url_with_domain_pattern('https://example.com', '') is False
|
||||
|
||||
# URLs with no hostname
|
||||
assert match_url_with_domain_pattern('file:///path/to/file.txt', 'example.com') is False
|
||||
|
||||
# Invalid pattern formats
|
||||
assert match_url_with_domain_pattern('https://example.com', '..example.com') is False
|
||||
assert match_url_with_domain_pattern('https://example.com', '.*.example.com') is False
|
||||
assert match_url_with_domain_pattern('https://example.com', '**') is False
|
||||
|
||||
# Nested URL attacks in path, query or fragments
|
||||
assert match_url_with_domain_pattern('https://example.com/redirect?url=https://evil.com', 'example.com') is True
|
||||
assert match_url_with_domain_pattern('https://example.com/path/https://evil.com', 'example.com') is True
|
||||
assert match_url_with_domain_pattern('https://example.com#https://evil.com', 'example.com') is True
|
||||
# These should match example.com, not evil.com since urlparse extracts the hostname correctly
|
||||
|
||||
# Complex URL obfuscation attempts
|
||||
assert match_url_with_domain_pattern('https://example.com/path?next=//evil.com/attack', 'example.com') is True
|
||||
assert match_url_with_domain_pattern('https://example.com@evil.com', 'example.com') is False
|
||||
assert match_url_with_domain_pattern('https://evil.com?example.com', 'example.com') is False
|
||||
assert match_url_with_domain_pattern('https://user:example.com@evil.com', 'example.com') is False
|
||||
# urlparse correctly identifies evil.com as the hostname in these cases
|
||||
|
||||
|
||||
def test_url_components():
|
||||
"""Test handling of URL components like credentials, ports, fragments, etc."""
|
||||
|
||||
# URLs with credentials (username:password@)
|
||||
assert match_url_with_domain_pattern('https://user:pass@example.com', 'example.com') is True
|
||||
assert match_url_with_domain_pattern('https://user:pass@example.com', '*.example.com') is True
|
||||
|
||||
# URLs with ports
|
||||
assert match_url_with_domain_pattern('https://example.com:8080', 'example.com') is True
|
||||
assert match_url_with_domain_pattern('https://example.com:8080', 'example.com:8080') is True # Port is stripped from pattern
|
||||
|
||||
# URLs with paths
|
||||
assert match_url_with_domain_pattern('https://example.com/path/to/page', 'example.com') is True
|
||||
assert (
|
||||
match_url_with_domain_pattern('https://example.com/path/to/page', 'example.com/path') is False
|
||||
) # Paths in patterns are not supported
|
||||
|
||||
# URLs with query parameters
|
||||
assert match_url_with_domain_pattern('https://example.com?param=value', 'example.com') is True
|
||||
|
||||
# URLs with fragments
|
||||
assert match_url_with_domain_pattern('https://example.com#section', 'example.com') is True
|
||||
|
||||
# URLs with all components
|
||||
assert match_url_with_domain_pattern('https://user:pass@example.com:8080/path?query=val#fragment', 'example.com') is True
|
||||
|
||||
|
||||
def test_filter_sensitive_data(message_manager):
|
||||
"""Test that _filter_sensitive_data handles all sensitive data scenarios correctly"""
|
||||
# Set up a message with sensitive information
|
||||
message = HumanMessage(content='My username is admin and password is secret123')
|
||||
|
||||
# Case 1: No sensitive data provided
|
||||
message_manager.settings.sensitive_data = None
|
||||
result = message_manager._filter_sensitive_data(message)
|
||||
assert result.content == 'My username is admin and password is secret123'
|
||||
|
||||
# Case 2: All sensitive data is properly replaced
|
||||
message_manager.settings.sensitive_data = {'username': 'admin', 'password': 'secret123'}
|
||||
result = message_manager._filter_sensitive_data(message)
|
||||
assert '<secret>username</secret>' in result.content
|
||||
assert '<secret>password</secret>' in result.content
|
||||
|
||||
# Case 3: Make sure it works with nested content
|
||||
nested_message = HumanMessage(content=[{'type': 'text', 'text': 'My username is admin and password is secret123'}])
|
||||
result = message_manager._filter_sensitive_data(nested_message)
|
||||
assert '<secret>username</secret>' in result.content[0]['text']
|
||||
assert '<secret>password</secret>' in result.content[0]['text']
|
||||
|
||||
# Case 4: Test with empty values
|
||||
message_manager.settings.sensitive_data = {'username': 'admin', 'password': ''}
|
||||
result = message_manager._filter_sensitive_data(message)
|
||||
assert '<secret>username</secret>' in result.content
|
||||
# Only username should be replaced since password is empty
|
||||
|
||||
# Case 5: Test with domain-specific sensitive data format
|
||||
message_manager.settings.sensitive_data = {
|
||||
'example.com': {'username': 'admin', 'password': 'secret123'},
|
||||
'google.com': {'email': 'user@example.com', 'password': 'google_pass'},
|
||||
}
|
||||
# Update the message to include the values we're going to test
|
||||
message = HumanMessage(content='My username is admin, email is user@example.com and password is secret123 or google_pass')
|
||||
result = message_manager._filter_sensitive_data(message)
|
||||
# All sensitive values should be replaced regardless of domain
|
||||
assert '<secret>username</secret>' in result.content
|
||||
assert '<secret>password</secret>' in result.content
|
||||
assert '<secret>email</secret>' in result.content
|
||||
@@ -203,24 +203,28 @@ class TestTabManagement:
|
||||
"""Test that agent_current_page changes and human_current_page remains the same when a new tab is opened."""
|
||||
|
||||
initial_tab = await self._reset_tab_state(browser_session, base_url)
|
||||
assert initial_tab.url == 'about:blank'
|
||||
await initial_tab.goto(f'{base_url}/page1')
|
||||
await self._simulate_human_tab_change(initial_tab, browser_session)
|
||||
assert initial_tab.url == f'{base_url}/page1'
|
||||
initial_tab_count = len(browser_session.tabs)
|
||||
assert initial_tab_count == 1
|
||||
|
||||
# test opening a new tab
|
||||
new_tab = await browser_session.create_new_tab(f'{base_url}/page2')
|
||||
new_tab_count = len(browser_session.browser_context.pages)
|
||||
assert new_tab_count == len(browser_session.tabs) == 2
|
||||
assert (
|
||||
new_tab_count == len(browser_session.tabs) == 2
|
||||
) # get_current_page/create_new_tab should have auto-closed unused about:blank pages
|
||||
|
||||
# test agent open new tab updates agent focus + doesn't steal human focus
|
||||
assert browser_session.agent_current_page.url == new_tab.url == f'{base_url}/page2'
|
||||
assert browser_session.human_current_page.url == initial_tab.url == 'about:blank'
|
||||
assert browser_session.human_current_page.url == initial_tab.url == f'{base_url}/page1'
|
||||
|
||||
# test agent navigation updates agent focus +doesn't steal human focus
|
||||
await browser_session.navigate(f'{base_url}/page3')
|
||||
assert browser_session.agent_current_page.url == f'{base_url}/page3' # agent should now be on the new tab
|
||||
assert (
|
||||
browser_session.human_current_page.url == initial_tab.url == 'about:blank'
|
||||
browser_session.human_current_page.url == initial_tab.url == f'{base_url}/page1'
|
||||
) # human should still be on the very first tab
|
||||
|
||||
@pytest.mark.asyncio
|
||||
@@ -38,23 +38,31 @@ class TestUrlAllowlistSecurity:
|
||||
assert browser_session._is_url_allowed('https://example.org') is False
|
||||
|
||||
# Test more complex glob patterns
|
||||
browser_profile = BrowserProfile(allowed_domains=['*google.com', 'wiki*'])
|
||||
browser_profile = BrowserProfile(
|
||||
allowed_domains=['*.google.com', 'https://wiki.org', 'https://good.com', 'chrome://version', 'brave://*']
|
||||
)
|
||||
browser_session = BrowserSession(browser_profile=browser_profile)
|
||||
|
||||
# Should match domains ending with google.com
|
||||
assert browser_session._is_url_allowed('https://google.com') is True
|
||||
assert browser_session._is_url_allowed('https://www.google.com') is True
|
||||
assert browser_session._is_url_allowed('https://anygoogle.com') is True
|
||||
assert (
|
||||
browser_session._is_url_allowed('https://evilgood.com') is False
|
||||
) # make sure we dont allow *good.com patterns, only *.good.com
|
||||
|
||||
# Should match domains starting with wiki
|
||||
assert browser_session._is_url_allowed('http://wiki.org') is False
|
||||
assert browser_session._is_url_allowed('https://wiki.org') is True
|
||||
assert browser_session._is_url_allowed('https://wikipedia.org') is True
|
||||
|
||||
# Should not match other domains
|
||||
assert browser_session._is_url_allowed('https://example.com') is False
|
||||
# Should not match internal domains because scheme was not provided
|
||||
assert browser_session._is_url_allowed('chrome://google.com') is False
|
||||
assert browser_session._is_url_allowed('chrome://abc.google.com') is False
|
||||
|
||||
# Test browser internal URLs
|
||||
assert browser_session._is_url_allowed('chrome://settings') is True
|
||||
assert browser_session._is_url_allowed('chrome://settings') is False
|
||||
assert browser_session._is_url_allowed('chrome://version') is True
|
||||
assert browser_session._is_url_allowed('chrome-extension://version/') is False
|
||||
assert browser_session._is_url_allowed('brave://anything/') is True
|
||||
assert browser_session._is_url_allowed('about:blank') is True
|
||||
|
||||
# Test security for glob patterns (authentication credentials bypass attempts)
|
||||
@@ -67,7 +75,7 @@ class TestUrlAllowlistSecurity:
|
||||
def test_glob_pattern_edge_cases(self):
|
||||
"""Test edge cases for glob pattern matching to ensure proper behavior."""
|
||||
# Test with domains containing glob pattern in the middle
|
||||
browser_profile = BrowserProfile(allowed_domains=['*google.com', 'wiki*'])
|
||||
browser_profile = BrowserProfile(allowed_domains=['*.google.com', 'https://wiki.org'])
|
||||
browser_session = BrowserSession(browser_profile=browser_profile)
|
||||
|
||||
# Verify that 'wiki*' pattern doesn't match domains that merely contain 'wiki' in the middle
|
||||
@@ -79,13 +87,13 @@ class TestUrlAllowlistSecurity:
|
||||
assert browser_session._is_url_allowed('https://mygoogle.company.com') is False
|
||||
|
||||
# Create context with potentially risky glob pattern that demonstrates security concerns
|
||||
browser_profile = BrowserProfile(allowed_domains=['*.google.*'])
|
||||
browser_profile = BrowserProfile(allowed_domains=['*.google.com', '*.google.co.uk'])
|
||||
browser_session = BrowserSession(browser_profile=browser_profile)
|
||||
|
||||
# Should match legitimate Google domains
|
||||
assert browser_session._is_url_allowed('https://www.google.com') is True
|
||||
assert browser_session._is_url_allowed('https://mail.google.co.uk') is True
|
||||
|
||||
# But could also match potentially malicious domains with a subdomain structure
|
||||
# This demonstrates why such wildcard patterns can be risky
|
||||
assert browser_session._is_url_allowed('https://www.google.evil.com') is True
|
||||
# Shouldn't match potentially malicious domains with a similar structure
|
||||
# This demonstrates why the previous pattern was risky and why it's now rejected
|
||||
assert browser_session._is_url_allowed('https://www.google.evil.com') is False
|
||||
91
tests/test_action_params.py
Normal file
91
tests/test_action_params.py
Normal file
@@ -0,0 +1,91 @@
|
||||
import asyncio
|
||||
import logging
|
||||
from inspect import signature
|
||||
|
||||
import pytest
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from browser_use.browser import BrowserSession
|
||||
from browser_use.controller.registry.service import Registry
|
||||
from browser_use.controller.registry.views import ActionModel
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(level=logging.DEBUG)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# Test model
|
||||
class TestActionParams(ActionModel):
|
||||
value: str = Field(description='Test value')
|
||||
|
||||
|
||||
# Our Context type for the Registry
|
||||
class TestContext:
|
||||
def __init__(self, value):
|
||||
self.value = value
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_registry_param_handling():
|
||||
"""Test how Registry handles parameter passing for different function signatures."""
|
||||
# Create a Registry instance
|
||||
registry = Registry[TestContext]()
|
||||
|
||||
# Create test functions with different signatures
|
||||
|
||||
# 1. Function with browser_session as a positional parameter
|
||||
@registry.action('Test action with browser_session', param_model=TestActionParams)
|
||||
async def action_with_browser_session(params: TestActionParams, browser_session: BrowserSession):
|
||||
logger.debug(f'action_with_browser_session called with params={params}, browser_session={browser_session}')
|
||||
return {'params': params.model_dump(), 'has_browser': browser_session is not None}
|
||||
|
||||
# 2. Function with browser_session in the model
|
||||
class ModelWithBrowserSession(BaseModel):
|
||||
value: str
|
||||
browser_session: BrowserSession = None
|
||||
|
||||
@registry.action('Test action with browser_session in model')
|
||||
async def action_with_browser_in_model(params: ModelWithBrowserSession):
|
||||
logger.debug(f'action_with_browser_in_model called with params={params}')
|
||||
return {'params': params.model_dump(), 'has_browser': params.browser_session is not None}
|
||||
|
||||
# 3. Function using **kwargs
|
||||
@registry.action('Test action with kwargs')
|
||||
async def action_with_kwargs(params: TestActionParams, **kwargs):
|
||||
logger.debug(f'action_with_kwargs called with params={params}, kwargs={kwargs}')
|
||||
return {'params': params.model_dump(), 'kwargs': kwargs}
|
||||
|
||||
# Create a mock browser session
|
||||
mock_browser_session = object() # Just a placeholder
|
||||
|
||||
# Execute the actions
|
||||
logger.debug('\n\n=== Testing action_with_browser_session ===')
|
||||
result1 = await registry.execute_action(
|
||||
'action_with_browser_session', {'value': 'test1'}, browser_session=mock_browser_session
|
||||
)
|
||||
logger.debug(f'Result: {result1}')
|
||||
|
||||
logger.debug('\n\n=== Testing action_with_browser_in_model ===')
|
||||
result2 = await registry.execute_action(
|
||||
'action_with_browser_in_model',
|
||||
{'value': 'test2', 'browser_session': None}, # Browser session in model is None
|
||||
browser_session=mock_browser_session, # Browser session in execute_action is provided
|
||||
)
|
||||
logger.debug(f'Result: {result2}')
|
||||
|
||||
logger.debug('\n\n=== Testing action_with_kwargs ===')
|
||||
result3 = await registry.execute_action('action_with_kwargs', {'value': 'test3'}, browser_session=mock_browser_session)
|
||||
logger.debug(f'Result: {result3}')
|
||||
|
||||
# Print all signatures
|
||||
logger.debug('\n\n=== Function Signatures ===')
|
||||
logger.debug(f'action_with_browser_session: {signature(action_with_browser_session)}')
|
||||
logger.debug(f'action_with_browser_in_model: {signature(action_with_browser_in_model)}')
|
||||
logger.debug(f'action_with_kwargs: {signature(action_with_kwargs)}')
|
||||
|
||||
return result1, result2, result3
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# Run the test
|
||||
asyncio.run(test_registry_param_handling())
|
||||
@@ -12,9 +12,7 @@ async def test_proxy_settings_pydantic_model():
|
||||
Test that ProxySettings as a Pydantic model is correctly converted to a dictionary when used.
|
||||
"""
|
||||
# Create ProxySettings with Pydantic model
|
||||
proxy_settings = ProxySettings(
|
||||
server='http://example.proxy:8080', bypass='localhost', username='testuser', password='testpass'
|
||||
)
|
||||
proxy_settings = dict(server='http://example.proxy:8080', bypass='localhost', username='testuser', password='testpass')
|
||||
|
||||
# Verify the model has correct dict-like access
|
||||
assert proxy_settings['server'] == 'http://example.proxy:8080'
|
||||
@@ -22,7 +20,7 @@ async def test_proxy_settings_pydantic_model():
|
||||
assert proxy_settings.get('nonexistent', 'default') == 'default'
|
||||
|
||||
# Verify model_dump works correctly
|
||||
proxy_dict = proxy_settings.model_dump()
|
||||
proxy_dict = dict(proxy_settings)
|
||||
assert isinstance(proxy_dict, dict)
|
||||
assert proxy_dict['server'] == 'http://example.proxy:8080'
|
||||
assert proxy_dict['bypass'] == 'localhost'
|
||||
|
||||
@@ -1,91 +0,0 @@
|
||||
import pytest
|
||||
from langchain_core.messages import HumanMessage, SystemMessage
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from browser_use.agent.message_manager.service import MessageManager, MessageManagerSettings
|
||||
from browser_use.agent.views import MessageManagerState
|
||||
from browser_use.controller.registry.service import Registry
|
||||
|
||||
|
||||
class SensitiveParams(BaseModel):
|
||||
"""Test parameter model for sensitive data testing."""
|
||||
|
||||
text: str = Field(description='Text with sensitive data placeholders')
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def registry():
|
||||
return Registry()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def message_manager():
|
||||
return MessageManager(
|
||||
task='Test task',
|
||||
system_message=SystemMessage(content='System message'),
|
||||
settings=MessageManagerSettings(),
|
||||
state=MessageManagerState(),
|
||||
)
|
||||
|
||||
|
||||
def test_replace_sensitive_data_with_missing_keys(registry):
|
||||
"""Test that _replace_sensitive_data handles missing keys gracefully"""
|
||||
# Create a simple Pydantic model with sensitive data placeholders
|
||||
params = SensitiveParams(text='Please enter <secret>username</secret> and <secret>password</secret>')
|
||||
|
||||
# Case 1: All keys present
|
||||
sensitive_data = {'username': 'user123', 'password': 'pass456'}
|
||||
result = registry._replace_sensitive_data(params, sensitive_data)
|
||||
assert 'user123' in result.text
|
||||
assert 'pass456' in result.text
|
||||
# Both keys should be replaced
|
||||
|
||||
# Case 2: One key missing
|
||||
sensitive_data = {'username': 'user123'} # password is missing
|
||||
result = registry._replace_sensitive_data(params, sensitive_data)
|
||||
assert 'user123' in result.text
|
||||
assert '<secret>password</secret>' in result.text
|
||||
# Verify the behavior - username replaced, password kept as tag
|
||||
|
||||
# Case 3: Multiple keys missing
|
||||
sensitive_data = {} # both keys missing
|
||||
result = registry._replace_sensitive_data(params, sensitive_data)
|
||||
assert '<secret>username</secret>' in result.text
|
||||
assert '<secret>password</secret>' in result.text
|
||||
# Verify both tags are preserved when keys are missing
|
||||
|
||||
# Case 4: One key empty
|
||||
sensitive_data = {'username': 'user123', 'password': ''}
|
||||
result = registry._replace_sensitive_data(params, sensitive_data)
|
||||
assert 'user123' in result.text
|
||||
assert '<secret>password</secret>' in result.text
|
||||
# Empty value should be treated the same as missing key
|
||||
|
||||
|
||||
def test_filter_sensitive_data(message_manager):
|
||||
"""Test that _filter_sensitive_data handles all sensitive data scenarios correctly"""
|
||||
# Set up a message with sensitive information
|
||||
message = HumanMessage(content='My username is admin and password is secret123')
|
||||
|
||||
# Case 1: No sensitive data provided
|
||||
message_manager.settings.sensitive_data = None
|
||||
result = message_manager._filter_sensitive_data(message)
|
||||
assert result.content == 'My username is admin and password is secret123'
|
||||
|
||||
# Case 2: All sensitive data is properly replaced
|
||||
message_manager.settings.sensitive_data = {'username': 'admin', 'password': 'secret123'}
|
||||
result = message_manager._filter_sensitive_data(message)
|
||||
assert '<secret>username</secret>' in result.content
|
||||
assert '<secret>password</secret>' in result.content
|
||||
|
||||
# Case 3: Make sure it works with nested content
|
||||
nested_message = HumanMessage(content=[{'type': 'text', 'text': 'My username is admin and password is secret123'}])
|
||||
result = message_manager._filter_sensitive_data(nested_message)
|
||||
assert '<secret>username</secret>' in result.content[0]['text']
|
||||
assert '<secret>password</secret>' in result.content[0]['text']
|
||||
|
||||
# Case 4: Test with empty values
|
||||
message_manager.settings.sensitive_data = {'username': 'admin', 'password': ''}
|
||||
result = message_manager._filter_sensitive_data(message)
|
||||
assert '<secret>username</secret>' in result.content
|
||||
# Only username should be replaced since password is empty
|
||||
Reference in New Issue
Block a user