From 781f2131ee1a138e4827b38746d5e7adf452b2d1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Magnus=20M=C3=BCller?= <67061560+MagMueller@users.noreply.github.com> Date: Sat, 30 Aug 2025 01:00:48 -0700 Subject: [PATCH 01/15] Remove efficiency_guidelines --- browser_use/agent/system_prompt.md | 43 ------------------- browser_use/agent/system_prompt_flash.md | 42 ------------------ .../agent/system_prompt_no_thinking.md | 41 ------------------ 3 files changed, 126 deletions(-) diff --git a/browser_use/agent/system_prompt.md b/browser_use/agent/system_prompt.md index a19305678..f8c42c80a 100644 --- a/browser_use/agent/system_prompt.md +++ b/browser_use/agent/system_prompt.md @@ -131,49 +131,6 @@ If you are allowed multiple actions, you can specify multiple actions in the lis - If the page changes after an action, the sequence is interrupted and you get the new state. You can see this in your agent history when this happens. - - -**IMPORTANT: Be More Efficient with Multi-Action Outputs** - -Maximize efficiency by combining related actions in one step instead of doing them separately: - -**Highly Recommended Action Combinations:** -- `click_element_by_index` + `extract_structured_data` → Click element and immediately extract information -- `go_to_url` + `extract_structured_data` → Navigate and extract data in one step -- `input_text` + `click_element_by_index` → Fill form field and submit/search in one step -- `click_element_by_index` + `input_text` → Click input field and fill it immediately -- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows (when safe) -- File operations + browser actions - -**Examples of Efficient Combinations:** -```json -"action": [ - {{"click_element_by_index": {{"index": 15}}}}, - {{"extract_structured_data": {{"query": "Extract the first 3 headlines", "extract_links": false}}}} -] -``` - -```json -"action": [ - {{"input_text": {{"index": 23, "text": "laptop"}}}}, - {{"click_element_by_index": {{"index": 24}}}} -] -``` - -```json -"action": [ - {{"go_to_url": {{"url": "https://example.com/search"}}}}, - {{"extract_structured_data": {{"query": "product listings", "extract_links": false}}}} -] -``` - -**When to Use Single Actions:** -- When next action depends on previous action's specific result - - -**Efficiency Mindset:** Think "What's the logical sequence of actions I would do?" and group them together when safe. - - You must reason explicitly and systematically at every step in your `thinking` block. diff --git a/browser_use/agent/system_prompt_flash.md b/browser_use/agent/system_prompt_flash.md index 40b4ee1bb..a57d997e5 100644 --- a/browser_use/agent/system_prompt_flash.md +++ b/browser_use/agent/system_prompt_flash.md @@ -129,48 +129,6 @@ If you are allowed multiple actions, you can specify multiple actions in the lis - If the page changes after an action, the sequence is interrupted and you get the new state. You can see this in your agent history when this happens. - - -**IMPORTANT: Be More Efficient with Multi-Action Outputs** - -Maximize efficiency by combining related actions in one step instead of doing them separately: - -**Highly Recommended Action Combinations:** -- `click_element_by_index` + `extract_structured_data` → Click element and immediately extract information -- `go_to_url` + `extract_structured_data` → Navigate and extract data in one step -- `input_text` + `click_element_by_index` → Fill form field and submit/search in one step -- `click_element_by_index` + `input_text` → Click input field and fill it immediately -- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows (when safe) -- File operations + browser actions - -**Examples of Efficient Combinations:** -```json -"action": [ - {{"click_element_by_index": {{"index": 15}}}}, - {{"extract_structured_data": {{"query": "Extract the first 3 headlines", "extract_links": false}}}} -] -``` - -```json -"action": [ - {{"input_text": {{"index": 23, "text": "laptop"}}}}, - {{"click_element_by_index": {{"index": 24}}}} -] -``` - -```json -"action": [ - {{"go_to_url": {{"url": "https://example.com/search"}}}}, - {{"extract_structured_data": {{"query": "product listings", "extract_links": false}}}} -] -``` - -**When to Use Single Actions:** -- When next action depends on previous action's specific result - - -**Efficiency Mindset:** Think "What's the logical sequence of actions I would do?" and group them together when safe. - Be clear and concise in your decision-making. Exhibit the following reasoning patterns to successfully achieve the : - Reason about to track progress and context toward . diff --git a/browser_use/agent/system_prompt_no_thinking.md b/browser_use/agent/system_prompt_no_thinking.md index 9e6117d4a..05dcf2c4d 100644 --- a/browser_use/agent/system_prompt_no_thinking.md +++ b/browser_use/agent/system_prompt_no_thinking.md @@ -131,47 +131,6 @@ If you are allowed multiple actions, you can specify multiple actions in the lis - If the page changes after an action, the sequence is interrupted and you get the new state. You can see this in your agent history when this happens. - -**IMPORTANT: Be More Efficient with Multi-Action Outputs** - -Maximize efficiency by combining related actions in one step instead of doing them separately: - -**Highly Recommended Action Combinations:** -- `click_element_by_index` + `extract_structured_data` → Click element and immediately extract information -- `go_to_url` + `extract_structured_data` → Navigate and extract data in one step -- `input_text` + `click_element_by_index` → Fill form field and submit/search in one step -- `click_element_by_index` + `input_text` → Click input field and fill it immediately -- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows (when safe) -- File operations + browser actions - -**Examples of Efficient Combinations:** -```json -"action": [ - {{"click_element_by_index": {{"index": 15}}}}, - {{"extract_structured_data": {{"query": "Extract the first 3 headlines", "extract_links": false}}}} -] -``` - -```json -"action": [ - {{"input_text": {{"index": 23, "text": "laptop"}}}}, - {{"click_element_by_index": {{"index": 24}}}} -] -``` - -```json -"action": [ - {{"go_to_url": {{"url": "https://example.com/search"}}}}, - {{"extract_structured_data": {{"query": "product listings", "extract_links": false}}}} -] -``` - -**When to Use Single Actions:** -- When next action depends on previous action's specific result - - -**Efficiency Mindset:** Think "What's the logical sequence of actions I would do?" and group them together when safe. - Be clear and concise in your decision-making. Exhibit the following reasoning patterns to successfully achieve the : From 3dac24ddec3bf65c2f799e3486e31235676e814e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Magnus=20M=C3=BCller?= <67061560+MagMueller@users.noreply.github.com> Date: Sat, 30 Aug 2025 01:01:03 -0700 Subject: [PATCH 02/15] Update docstrings for click_element_by_index and input_text actions to clarify usage restrictions regarding browser_state indices. --- browser_use/tools/service.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/browser_use/tools/service.py b/browser_use/tools/service.py index 492c8c2c5..e685a0f2c 100644 --- a/browser_use/tools/service.py +++ b/browser_use/tools/service.py @@ -263,7 +263,7 @@ class Tools(Generic[Context]): # Element Interaction Actions @self.registry.action( - 'Click element by index, set while_holding_ctrl=True to open any resulting navigation in a new tab. Only click on indices that are inside your current browser_state. Never click or assume not existing indices.', + 'Click element by index. Only indices from your browser_state are allowed. Never use and index that is not inside your current browser_state. Set while_holding_ctrl=True to open any resulting navigation in a new tab.', param_model=ClickElementAction, ) async def click_element_by_index(params: ClickElementAction, browser_session: BrowserSession): @@ -314,7 +314,7 @@ class Tools(Generic[Context]): return ActionResult(error=error_msg) @self.registry.action( - 'Click and input text into a input interactive element. Only input text into indices that are inside your current browser_state. Never input text into indices that are not inside your current browser_state.', + 'Input text into an input interactive element. Only input text into indices that are inside your current browser_state. Never input text into indices that are not inside your current browser_state.', param_model=InputTextAction, ) async def input_text(params: InputTextAction, browser_session: BrowserSession, has_sensitive_data: bool = False): From b638958bb5d0e1872f534a588a3689e045a5b6d1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Magnus=20M=C3=BCller?= <67061560+MagMueller@users.noreply.github.com> Date: Sat, 30 Aug 2025 09:46:48 -0700 Subject: [PATCH 03/15] Revert "Remove efficiency_guidelines" This reverts commit 781f2131ee1a138e4827b38746d5e7adf452b2d1. --- browser_use/agent/system_prompt.md | 43 +++++++++++++++++++ browser_use/agent/system_prompt_flash.md | 42 ++++++++++++++++++ .../agent/system_prompt_no_thinking.md | 41 ++++++++++++++++++ 3 files changed, 126 insertions(+) diff --git a/browser_use/agent/system_prompt.md b/browser_use/agent/system_prompt.md index f8c42c80a..a19305678 100644 --- a/browser_use/agent/system_prompt.md +++ b/browser_use/agent/system_prompt.md @@ -131,6 +131,49 @@ If you are allowed multiple actions, you can specify multiple actions in the lis - If the page changes after an action, the sequence is interrupted and you get the new state. You can see this in your agent history when this happens. + + +**IMPORTANT: Be More Efficient with Multi-Action Outputs** + +Maximize efficiency by combining related actions in one step instead of doing them separately: + +**Highly Recommended Action Combinations:** +- `click_element_by_index` + `extract_structured_data` → Click element and immediately extract information +- `go_to_url` + `extract_structured_data` → Navigate and extract data in one step +- `input_text` + `click_element_by_index` → Fill form field and submit/search in one step +- `click_element_by_index` + `input_text` → Click input field and fill it immediately +- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows (when safe) +- File operations + browser actions + +**Examples of Efficient Combinations:** +```json +"action": [ + {{"click_element_by_index": {{"index": 15}}}}, + {{"extract_structured_data": {{"query": "Extract the first 3 headlines", "extract_links": false}}}} +] +``` + +```json +"action": [ + {{"input_text": {{"index": 23, "text": "laptop"}}}}, + {{"click_element_by_index": {{"index": 24}}}} +] +``` + +```json +"action": [ + {{"go_to_url": {{"url": "https://example.com/search"}}}}, + {{"extract_structured_data": {{"query": "product listings", "extract_links": false}}}} +] +``` + +**When to Use Single Actions:** +- When next action depends on previous action's specific result + + +**Efficiency Mindset:** Think "What's the logical sequence of actions I would do?" and group them together when safe. + + You must reason explicitly and systematically at every step in your `thinking` block. diff --git a/browser_use/agent/system_prompt_flash.md b/browser_use/agent/system_prompt_flash.md index a57d997e5..40b4ee1bb 100644 --- a/browser_use/agent/system_prompt_flash.md +++ b/browser_use/agent/system_prompt_flash.md @@ -129,6 +129,48 @@ If you are allowed multiple actions, you can specify multiple actions in the lis - If the page changes after an action, the sequence is interrupted and you get the new state. You can see this in your agent history when this happens. + + +**IMPORTANT: Be More Efficient with Multi-Action Outputs** + +Maximize efficiency by combining related actions in one step instead of doing them separately: + +**Highly Recommended Action Combinations:** +- `click_element_by_index` + `extract_structured_data` → Click element and immediately extract information +- `go_to_url` + `extract_structured_data` → Navigate and extract data in one step +- `input_text` + `click_element_by_index` → Fill form field and submit/search in one step +- `click_element_by_index` + `input_text` → Click input field and fill it immediately +- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows (when safe) +- File operations + browser actions + +**Examples of Efficient Combinations:** +```json +"action": [ + {{"click_element_by_index": {{"index": 15}}}}, + {{"extract_structured_data": {{"query": "Extract the first 3 headlines", "extract_links": false}}}} +] +``` + +```json +"action": [ + {{"input_text": {{"index": 23, "text": "laptop"}}}}, + {{"click_element_by_index": {{"index": 24}}}} +] +``` + +```json +"action": [ + {{"go_to_url": {{"url": "https://example.com/search"}}}}, + {{"extract_structured_data": {{"query": "product listings", "extract_links": false}}}} +] +``` + +**When to Use Single Actions:** +- When next action depends on previous action's specific result + + +**Efficiency Mindset:** Think "What's the logical sequence of actions I would do?" and group them together when safe. + Be clear and concise in your decision-making. Exhibit the following reasoning patterns to successfully achieve the : - Reason about to track progress and context toward . diff --git a/browser_use/agent/system_prompt_no_thinking.md b/browser_use/agent/system_prompt_no_thinking.md index 05dcf2c4d..9e6117d4a 100644 --- a/browser_use/agent/system_prompt_no_thinking.md +++ b/browser_use/agent/system_prompt_no_thinking.md @@ -131,6 +131,47 @@ If you are allowed multiple actions, you can specify multiple actions in the lis - If the page changes after an action, the sequence is interrupted and you get the new state. You can see this in your agent history when this happens. + +**IMPORTANT: Be More Efficient with Multi-Action Outputs** + +Maximize efficiency by combining related actions in one step instead of doing them separately: + +**Highly Recommended Action Combinations:** +- `click_element_by_index` + `extract_structured_data` → Click element and immediately extract information +- `go_to_url` + `extract_structured_data` → Navigate and extract data in one step +- `input_text` + `click_element_by_index` → Fill form field and submit/search in one step +- `click_element_by_index` + `input_text` → Click input field and fill it immediately +- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows (when safe) +- File operations + browser actions + +**Examples of Efficient Combinations:** +```json +"action": [ + {{"click_element_by_index": {{"index": 15}}}}, + {{"extract_structured_data": {{"query": "Extract the first 3 headlines", "extract_links": false}}}} +] +``` + +```json +"action": [ + {{"input_text": {{"index": 23, "text": "laptop"}}}}, + {{"click_element_by_index": {{"index": 24}}}} +] +``` + +```json +"action": [ + {{"go_to_url": {{"url": "https://example.com/search"}}}}, + {{"extract_structured_data": {{"query": "product listings", "extract_links": false}}}} +] +``` + +**When to Use Single Actions:** +- When next action depends on previous action's specific result + + +**Efficiency Mindset:** Think "What's the logical sequence of actions I would do?" and group them together when safe. + Be clear and concise in your decision-making. Exhibit the following reasoning patterns to successfully achieve the : From d97462e49ab6a2c54163d343fdb47ef03ed51ba3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Magnus=20M=C3=BCller?= <67061560+MagMueller@users.noreply.github.com> Date: Sat, 30 Aug 2025 09:54:40 -0700 Subject: [PATCH 04/15] New efficiency guidelines --- browser_use/agent/system_prompt.md | 43 +++---------------- browser_use/agent/system_prompt_flash.md | 42 +++--------------- .../agent/system_prompt_no_thinking.md | 41 +++--------------- 3 files changed, 17 insertions(+), 109 deletions(-) diff --git a/browser_use/agent/system_prompt.md b/browser_use/agent/system_prompt.md index a19305678..3320063db 100644 --- a/browser_use/agent/system_prompt.md +++ b/browser_use/agent/system_prompt.md @@ -128,50 +128,19 @@ The `done` action is your opportunity to terminate and share your findings with - You are allowed to use a maximum of {max_actions} actions per step. If you are allowed multiple actions, you can specify multiple actions in the list to be executed sequentially (one after another). -- If the page changes after an action, the sequence is interrupted and you get the new state. You can see this in your agent history when this happens. +- If the page changes after an action, the sequence is interrupted and you get the new state. -**IMPORTANT: Be More Efficient with Multi-Action Outputs** +You can output multiple actions in one step. Try to be efficient where it makes sense. Do not predict actions which do not make sense for the current page. -Maximize efficiency by combining related actions in one step instead of doing them separately: - -**Highly Recommended Action Combinations:** -- `click_element_by_index` + `extract_structured_data` → Click element and immediately extract information -- `go_to_url` + `extract_structured_data` → Navigate and extract data in one step +**Recommended Action Combinations:** - `input_text` + `click_element_by_index` → Fill form field and submit/search in one step -- `click_element_by_index` + `input_text` → Click input field and fill it immediately -- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows (when safe) +- `input_text` + `input_text` → Fill form fields +- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows - File operations + browser actions - -**Examples of Efficient Combinations:** -```json -"action": [ - {{"click_element_by_index": {{"index": 15}}}}, - {{"extract_structured_data": {{"query": "Extract the first 3 headlines", "extract_links": false}}}} -] -``` - -```json -"action": [ - {{"input_text": {{"index": 23, "text": "laptop"}}}}, - {{"click_element_by_index": {{"index": 24}}}} -] -``` - -```json -"action": [ - {{"go_to_url": {{"url": "https://example.com/search"}}}}, - {{"extract_structured_data": {{"query": "product listings", "extract_links": false}}}} -] -``` - -**When to Use Single Actions:** -- When next action depends on previous action's specific result - - -**Efficiency Mindset:** Think "What's the logical sequence of actions I would do?" and group them together when safe. +Think "What's the logical sequence of actions I would do?" and group them together when safe. diff --git a/browser_use/agent/system_prompt_flash.md b/browser_use/agent/system_prompt_flash.md index 40b4ee1bb..9180d9433 100644 --- a/browser_use/agent/system_prompt_flash.md +++ b/browser_use/agent/system_prompt_flash.md @@ -131,46 +131,16 @@ If you are allowed multiple actions, you can specify multiple actions in the lis -**IMPORTANT: Be More Efficient with Multi-Action Outputs** +You can output multiple actions in one step. Try to be efficient where it makes sense. Do not predict actions which do not make sense for the current page. -Maximize efficiency by combining related actions in one step instead of doing them separately: - -**Highly Recommended Action Combinations:** -- `click_element_by_index` + `extract_structured_data` → Click element and immediately extract information -- `go_to_url` + `extract_structured_data` → Navigate and extract data in one step +**Recommended Action Combinations:** - `input_text` + `click_element_by_index` → Fill form field and submit/search in one step -- `click_element_by_index` + `input_text` → Click input field and fill it immediately -- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows (when safe) +- `input_text` + `input_text` → Fill form fields +- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows - File operations + browser actions - -**Examples of Efficient Combinations:** -```json -"action": [ - {{"click_element_by_index": {{"index": 15}}}}, - {{"extract_structured_data": {{"query": "Extract the first 3 headlines", "extract_links": false}}}} -] -``` - -```json -"action": [ - {{"input_text": {{"index": 23, "text": "laptop"}}}}, - {{"click_element_by_index": {{"index": 24}}}} -] -``` - -```json -"action": [ - {{"go_to_url": {{"url": "https://example.com/search"}}}}, - {{"extract_structured_data": {{"query": "product listings", "extract_links": false}}}} -] -``` - -**When to Use Single Actions:** -- When next action depends on previous action's specific result - - -**Efficiency Mindset:** Think "What's the logical sequence of actions I would do?" and group them together when safe. +Think "What's the logical sequence of actions I would do?" and group them together when safe. + Be clear and concise in your decision-making. Exhibit the following reasoning patterns to successfully achieve the : - Reason about to track progress and context toward . diff --git a/browser_use/agent/system_prompt_no_thinking.md b/browser_use/agent/system_prompt_no_thinking.md index 9e6117d4a..a003ea731 100644 --- a/browser_use/agent/system_prompt_no_thinking.md +++ b/browser_use/agent/system_prompt_no_thinking.md @@ -132,45 +132,14 @@ If you are allowed multiple actions, you can specify multiple actions in the lis -**IMPORTANT: Be More Efficient with Multi-Action Outputs** +You can output multiple actions in one step. Try to be efficient where it makes sense. Do not predict actions which do not make sense for the current page. -Maximize efficiency by combining related actions in one step instead of doing them separately: - -**Highly Recommended Action Combinations:** -- `click_element_by_index` + `extract_structured_data` → Click element and immediately extract information -- `go_to_url` + `extract_structured_data` → Navigate and extract data in one step +**Recommended Action Combinations:** - `input_text` + `click_element_by_index` → Fill form field and submit/search in one step -- `click_element_by_index` + `input_text` → Click input field and fill it immediately -- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows (when safe) +- `input_text` + `input_text` → Fill form fields +- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows - File operations + browser actions - -**Examples of Efficient Combinations:** -```json -"action": [ - {{"click_element_by_index": {{"index": 15}}}}, - {{"extract_structured_data": {{"query": "Extract the first 3 headlines", "extract_links": false}}}} -] -``` - -```json -"action": [ - {{"input_text": {{"index": 23, "text": "laptop"}}}}, - {{"click_element_by_index": {{"index": 24}}}} -] -``` - -```json -"action": [ - {{"go_to_url": {{"url": "https://example.com/search"}}}}, - {{"extract_structured_data": {{"query": "product listings", "extract_links": false}}}} -] -``` - -**When to Use Single Actions:** -- When next action depends on previous action's specific result - - -**Efficiency Mindset:** Think "What's the logical sequence of actions I would do?" and group them together when safe. +Think "What's the logical sequence of actions I would do?" and group them together when safe. From 89cb9ef9545727625a8412696494d8fb244e1e63 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Magnus=20M=C3=BCller?= <67061560+MagMueller@users.noreply.github.com> Date: Sat, 30 Aug 2025 10:00:28 -0700 Subject: [PATCH 05/15] More efficient scroll instruction --- browser_use/tools/service.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/browser_use/tools/service.py b/browser_use/tools/service.py index e685a0f2c..204fd5796 100644 --- a/browser_use/tools/service.py +++ b/browser_use/tools/service.py @@ -647,7 +647,9 @@ Provide the extracted information in a clear, structured format.""" raise RuntimeError(str(e)) @self.registry.action( - 'Scroll the page by specified number of pages (set down=True to scroll down, down=False to scroll up, num_pages=number of pages to scroll like 0.5 for half page, 1.0 for one page, etc.). Optional index parameter to scroll within a specific element or its scroll container (works well for dropdowns and custom UI components). Use index=0 or omit index to scroll the entire page.', + """Scroll the page by specified number of pages (set down=True to scroll down, down=False to scroll up, num_pages=number of pages to scroll like 0.5 for half page, 3.0 for three pages, etc.). Optional index parameter to scroll within a specific element or its scroll container (works well for dropdowns and custom UI components). Don't use index to scroll the entire page. + Instead of scrolling multiple step after step, use a high number of pages at once like 10 to get to the bottom of the page. + """, param_model=ScrollAction, ) async def scroll(params: ScrollAction, browser_session: BrowserSession): From 3d8f7934063c41467317147787fb1500c1cb2cc9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Magnus=20M=C3=BCller?= <67061560+MagMueller@users.noreply.github.com> Date: Sat, 30 Aug 2025 10:24:26 -0700 Subject: [PATCH 06/15] Enhance efficiency guidelines by clarifying action chaining restrictions and emphasizing the importance of clear goals per step. Added instructions to avoid multiple state changes in a single action sequence. --- browser_use/agent/system_prompt.md | 7 ++++++- browser_use/agent/system_prompt_flash.md | 8 ++++++-- browser_use/agent/system_prompt_no_thinking.md | 7 ++++++- 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/browser_use/agent/system_prompt.md b/browser_use/agent/system_prompt.md index 3320063db..c74d947ef 100644 --- a/browser_use/agent/system_prompt.md +++ b/browser_use/agent/system_prompt.md @@ -135,12 +135,17 @@ If you are allowed multiple actions, you can specify multiple actions in the lis You can output multiple actions in one step. Try to be efficient where it makes sense. Do not predict actions which do not make sense for the current page. + **Recommended Action Combinations:** - `input_text` + `click_element_by_index` → Fill form field and submit/search in one step - `input_text` + `input_text` → Fill form fields - `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows - File operations + browser actions -Think "What's the logical sequence of actions I would do?" and group them together when safe. + +Do not try multiple different paths in one step. Always have one clear goal per step. +Its important that you see in the next step if your action was successful, so do not chain actions which change the browser state multiple times, like do not use click and then go to url, because you would not see if the click was successful or not. + +Scroll allows you with num_pages to directly execute it multiple times. diff --git a/browser_use/agent/system_prompt_flash.md b/browser_use/agent/system_prompt_flash.md index 9180d9433..e73937682 100644 --- a/browser_use/agent/system_prompt_flash.md +++ b/browser_use/agent/system_prompt_flash.md @@ -129,16 +129,20 @@ If you are allowed multiple actions, you can specify multiple actions in the lis - If the page changes after an action, the sequence is interrupted and you get the new state. You can see this in your agent history when this happens. - You can output multiple actions in one step. Try to be efficient where it makes sense. Do not predict actions which do not make sense for the current page. + **Recommended Action Combinations:** - `input_text` + `click_element_by_index` → Fill form field and submit/search in one step - `input_text` + `input_text` → Fill form fields - `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows - File operations + browser actions -Think "What's the logical sequence of actions I would do?" and group them together when safe. + +Do not try multiple different paths in one step. Always have one clear goal per step. +Its important that you see in the next step if your action was successful, so do not chain actions which change the browser state multiple times, like do not use click and then go to url, because you would not see if the click was successful or not. + +Scroll allows you with num_pages to directly execute it multiple times. diff --git a/browser_use/agent/system_prompt_no_thinking.md b/browser_use/agent/system_prompt_no_thinking.md index a003ea731..85157fdf2 100644 --- a/browser_use/agent/system_prompt_no_thinking.md +++ b/browser_use/agent/system_prompt_no_thinking.md @@ -134,12 +134,17 @@ If you are allowed multiple actions, you can specify multiple actions in the lis You can output multiple actions in one step. Try to be efficient where it makes sense. Do not predict actions which do not make sense for the current page. + **Recommended Action Combinations:** - `input_text` + `click_element_by_index` → Fill form field and submit/search in one step - `input_text` + `input_text` → Fill form fields - `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows - File operations + browser actions -Think "What's the logical sequence of actions I would do?" and group them together when safe. + +Do not try multiple different paths in one step. Always have one clear goal per step. +Its important that you see in the next step if your action was successful, so do not chain actions which change the browser state multiple times, like do not use click and then go to url, because you would not see if the click was successful or not. + +Scroll allows you with num_pages to directly execute it multiple times. From b9ab642d0c264b552ca6584d83b1bc10e7be4ef1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Magnus=20M=C3=BCller?= <67061560+MagMueller@users.noreply.github.com> Date: Sat, 30 Aug 2025 10:39:50 -0700 Subject: [PATCH 07/15] Update efficiency guidelines to include new scrolling instructions for extracting structured data. Clarified usage of the scroll action to enhance clarity and efficiency in multi-step processes. --- browser_use/agent/system_prompt.md | 3 +-- browser_use/agent/system_prompt_flash.md | 3 +-- browser_use/agent/system_prompt_no_thinking.md | 3 +-- browser_use/tools/service.py | 4 ++-- 4 files changed, 5 insertions(+), 8 deletions(-) diff --git a/browser_use/agent/system_prompt.md b/browser_use/agent/system_prompt.md index c74d947ef..36b969949 100644 --- a/browser_use/agent/system_prompt.md +++ b/browser_use/agent/system_prompt.md @@ -140,12 +140,11 @@ You can output multiple actions in one step. Try to be efficient where it makes - `input_text` + `click_element_by_index` → Fill form field and submit/search in one step - `input_text` + `input_text` → Fill form fields - `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows +- `scroll` with num_pages 10 + `extract_structured_data` → Scroll to the bottom of the page to load more content before extracting structured data - File operations + browser actions Do not try multiple different paths in one step. Always have one clear goal per step. Its important that you see in the next step if your action was successful, so do not chain actions which change the browser state multiple times, like do not use click and then go to url, because you would not see if the click was successful or not. - -Scroll allows you with num_pages to directly execute it multiple times. diff --git a/browser_use/agent/system_prompt_flash.md b/browser_use/agent/system_prompt_flash.md index e73937682..3161b6a68 100644 --- a/browser_use/agent/system_prompt_flash.md +++ b/browser_use/agent/system_prompt_flash.md @@ -137,12 +137,11 @@ You can output multiple actions in one step. Try to be efficient where it makes - `input_text` + `click_element_by_index` → Fill form field and submit/search in one step - `input_text` + `input_text` → Fill form fields - `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows +- `scroll` with num_pages 10 + `extract_structured_data` → Scroll to the bottom of the page to load more content before extracting structured data - File operations + browser actions Do not try multiple different paths in one step. Always have one clear goal per step. Its important that you see in the next step if your action was successful, so do not chain actions which change the browser state multiple times, like do not use click and then go to url, because you would not see if the click was successful or not. - -Scroll allows you with num_pages to directly execute it multiple times. diff --git a/browser_use/agent/system_prompt_no_thinking.md b/browser_use/agent/system_prompt_no_thinking.md index 85157fdf2..cf2803d2a 100644 --- a/browser_use/agent/system_prompt_no_thinking.md +++ b/browser_use/agent/system_prompt_no_thinking.md @@ -139,12 +139,11 @@ You can output multiple actions in one step. Try to be efficient where it makes - `input_text` + `click_element_by_index` → Fill form field and submit/search in one step - `input_text` + `input_text` → Fill form fields - `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows +- `scroll` with num_pages 10 + `extract_structured_data` → Scroll to the bottom of the page to load more content before extracting structured data - File operations + browser actions Do not try multiple different paths in one step. Always have one clear goal per step. Its important that you see in the next step if your action was successful, so do not chain actions which change the browser state multiple times, like do not use click and then go to url, because you would not see if the click was successful or not. - -Scroll allows you with num_pages to directly execute it multiple times. diff --git a/browser_use/tools/service.py b/browser_use/tools/service.py index 204fd5796..d0ef4d93c 100644 --- a/browser_use/tools/service.py +++ b/browser_use/tools/service.py @@ -647,8 +647,8 @@ Provide the extracted information in a clear, structured format.""" raise RuntimeError(str(e)) @self.registry.action( - """Scroll the page by specified number of pages (set down=True to scroll down, down=False to scroll up, num_pages=number of pages to scroll like 0.5 for half page, 3.0 for three pages, etc.). Optional index parameter to scroll within a specific element or its scroll container (works well for dropdowns and custom UI components). Don't use index to scroll the entire page. - Instead of scrolling multiple step after step, use a high number of pages at once like 10 to get to the bottom of the page. + """Scroll the page by specified number of pages (set down=True to scroll down, down=False to scroll up, num_pages=number of pages to scroll like 0.5 for half page, 10.0 for ten pages, etc.). Optional index parameter to scroll within a specific element or its scroll container (works well for dropdowns and custom UI components). If you want to scroll the entire page, don't use index. + Instead of scrolling step after step, use a high number of pages at once like 10 to get to the bottom of the page. """, param_model=ScrollAction, ) From 2c40bd23a826b44bea59a5532fcd93021c9233d6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Magnus=20M=C3=BCller?= <67061560+MagMueller@users.noreply.github.com> Date: Sat, 30 Aug 2025 10:42:25 -0700 Subject: [PATCH 08/15] Fix url preload --- browser_use/agent/service.py | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/browser_use/agent/service.py b/browser_use/agent/service.py index 98b01e8c3..a015a381f 100644 --- a/browser_use/agent/service.py +++ b/browser_use/agent/service.py @@ -1139,23 +1139,21 @@ class Agent(Generic[Context, AgentStructuredOutput]): """Extract URL from task string using naive pattern matching.""" import re + # Remove email addresses from task before looking for URLs + task_without_emails = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '', task) + # Look for common URL patterns patterns = [ r'https?://[^\s<>"\']+', # Full URLs with http/https r'(?:www\.)?[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*\.[a-zA-Z]{2,}(?:/[^\s<>"\']*)?', # Domain names with subdomains and optional paths ] - # Email pattern to exclude - email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' - found_urls = [] for pattern in patterns: - matches = re.finditer(pattern, task) + matches = re.finditer(pattern, task_without_emails) for match in matches: url = match.group(0) - # Skip if this looks like an email address - if re.search(email_pattern, url): - continue + # Remove trailing punctuation that's not part of URLs url = re.sub(r'[.,;:!?()\[\]]+$', '', url) # Add https:// if missing From 799fe88aa5cf1fd4266d87f512f839572c0248c0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Magnus=20M=C3=BCller?= <67061560+MagMueller@users.noreply.github.com> Date: Fri, 29 Aug 2025 23:48:55 -0700 Subject: [PATCH 09/15] New extension --- browser_use/browser/profile.py | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/browser_use/browser/profile.py b/browser_use/browser/profile.py index 86e4eada5..15f638e97 100644 --- a/browser_use/browser/profile.py +++ b/browser_use/browser/profile.py @@ -759,11 +759,11 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro 'id': 'cjpalhdlnbpafiamejdnhcphjbkeiagm', 'url': 'https://clients2.google.com/service/update2/crx?response=redirect&prodversion=130&acceptformat=crx3&x=id%3Dcjpalhdlnbpafiamejdnhcphjbkeiagm%26uc', }, - { - 'name': "I still don't care about cookies", - 'id': 'edibdbjcniadpccecjdfdjjppcpchdlm', - 'url': 'https://clients2.google.com/service/update2/crx?response=redirect&prodversion=130&acceptformat=crx3&x=id%3Dedibdbjcniadpccecjdfdjjppcpchdlm%26uc', - }, + # { + # 'name': "I still don't care about cookies", + # 'id': 'edibdbjcniadpccecjdfdjjppcpchdlm', + # 'url': 'https://clients2.google.com/service/update2/crx?response=redirect&prodversion=130&acceptformat=crx3&x=id%3Dedibdbjcniadpccecjdfdjjppcpchdlm%26uc', + # }, { 'name': 'ClearURLs', 'id': 'lckanjgmijmafbedllaakclkaicjfmnk', @@ -774,11 +774,11 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro # 'id': 'pgojnojmmhpofjgdmaebadhbocahppod', # 'url': 'https://clients2.google.com/service/update2/crx?response=redirect&prodversion=130&acceptformat=crx3&x=id%3Dpgojnojmmhpofjgdmaebadhbocahppod%26uc', # }, - # { - # 'name': 'Consent-O-Matic', - # 'id': 'mdjildafknihdffpkfmmpnpoiajfjnjd', - # 'url': 'https://clients2.google.com/service/update2/crx?response=redirect&prodversion=130&acceptformat=crx3&x=id%3Dmdjildafknihdffpkfmmpnpoiajfjnjd%26uc', - # }, + { + 'name': 'Consent-O-Matic', + 'id': 'mdjildafknihdffpkfmmpnpoiajfjnjd', + 'url': 'https://clients2.google.com/service/update2/crx?response=redirect&prodversion=130&acceptformat=crx3&x=id%3Dmdjildafknihdffpkfmmpnpoiajfjnjd%26uc', + }, # { # 'name': 'Privacy | Protect Your Payments', # 'id': 'hmgpakheknboplhmlicfkkgjipfabmhp', From 89af16361f4dcce6ac17d7a705bb1bb115f98e03 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Magnus=20M=C3=BCller?= <67061560+MagMueller@users.noreply.github.com> Date: Fri, 29 Aug 2025 23:49:29 -0700 Subject: [PATCH 10/15] Add timeouts --- browser_use/browser/watchdogs/dom_watchdog.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/browser_use/browser/watchdogs/dom_watchdog.py b/browser_use/browser/watchdogs/dom_watchdog.py index 33cad6257..4362801b6 100644 --- a/browser_use/browser/watchdogs/dom_watchdog.py +++ b/browser_use/browser/watchdogs/dom_watchdog.py @@ -255,16 +255,16 @@ class DOMWatchdog(BaseWatchdog): # Get target title safely try: self.logger.debug('🔍 DOMWatchdog.on_BrowserStateRequestEvent: Getting page title...') - title = await asyncio.wait_for(self.browser_session.get_current_page_title(), timeout=2.0) + title = await asyncio.wait_for(self.browser_session.get_current_page_title(), timeout=1.0) self.logger.debug(f'🔍 DOMWatchdog.on_BrowserStateRequestEvent: Got title: {title}') except Exception as e: self.logger.debug(f'🔍 DOMWatchdog.on_BrowserStateRequestEvent: Failed to get title: {e}') title = 'Page' - # Get comprehensive page info from CDP + # Get comprehensive page info from CDP with timeout try: self.logger.debug('🔍 DOMWatchdog.on_BrowserStateRequestEvent: Getting page info from CDP...') - page_info = await self._get_page_info() + page_info = await asyncio.wait_for(self._get_page_info(), timeout=1.0) self.logger.debug(f'🔍 DOMWatchdog.on_BrowserStateRequestEvent: Got page info from CDP: {page_info}') except Exception as e: self.logger.debug( From f50cc45b51db7583dd677e994385f3515981d0cb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Magnus=20M=C3=BCller?= <67061560+MagMueller@users.noreply.github.com> Date: Sat, 30 Aug 2025 18:14:59 -0700 Subject: [PATCH 11/15] Works to remove cookies --- browser_use/browser/profile.py | 146 ++++++++++++++++++++++++++++++--- browser_use/browser/session.py | 41 +++++---- 2 files changed, 160 insertions(+), 27 deletions(-) diff --git a/browser_use/browser/profile.py b/browser_use/browser/profile.py index 15f638e97..cce3286c8 100644 --- a/browser_use/browser/profile.py +++ b/browser_use/browser/profile.py @@ -172,6 +172,11 @@ CHROME_DEFAULT_ARGS = [ '--disable-extensions-http-throttling', '--extensions-on-chrome-urls', '--disable-default-apps', + '--disable-component-extensions-with-background-pages', + '--disable-background-networking', + '--disable-extensions-except-api', + '--disable-extension-content-verification', + '--allow-running-insecure-content', f'--disable-features={",".join(CHROME_DISABLED_COMPONENTS)}', ] @@ -558,6 +563,7 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro default=True, description="Enable automation-optimized extensions: ad blocking (uBlock Origin), cookie handling (I still don't care about cookies), and URL cleaning (ClearURLs). All extensions work automatically without manual intervention. Extensions are automatically downloaded and loaded when enabled.", ) + window_size: ViewportSize | None = Field( default=None, description='Browser window size to use when headless=False.', @@ -753,32 +759,34 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro """ # Extension definitions - optimized for automation and content extraction + # Combines uBlock Origin (ad blocking) + "I still don't care about cookies" (cookie banner handling) extensions = [ { 'name': 'uBlock Origin', 'id': 'cjpalhdlnbpafiamejdnhcphjbkeiagm', - 'url': 'https://clients2.google.com/service/update2/crx?response=redirect&prodversion=130&acceptformat=crx3&x=id%3Dcjpalhdlnbpafiamejdnhcphjbkeiagm%26uc', + 'url': 'https://clients2.google.com/service/update2/crx?response=redirect&prodversion=133&acceptformat=crx3&x=id%3Dcjpalhdlnbpafiamejdnhcphjbkeiagm%26uc', + }, + { + 'name': "I still don't care about cookies", + 'id': 'edibdbjcniadpccecjdfdjjppcpchdlm', + 'url': 'https://clients2.google.com/service/update2/crx?response=redirect&prodversion=133&acceptformat=crx3&x=id%3Dedibdbjcniadpccecjdfdjjppcpchdlm%26uc', }, - # { - # 'name': "I still don't care about cookies", - # 'id': 'edibdbjcniadpccecjdfdjjppcpchdlm', - # 'url': 'https://clients2.google.com/service/update2/crx?response=redirect&prodversion=130&acceptformat=crx3&x=id%3Dedibdbjcniadpccecjdfdjjppcpchdlm%26uc', - # }, { 'name': 'ClearURLs', 'id': 'lckanjgmijmafbedllaakclkaicjfmnk', - 'url': 'https://clients2.google.com/service/update2/crx?response=redirect&prodversion=130&acceptformat=crx3&x=id%3Dlckanjgmijmafbedllaakclkaicjfmnk%26uc', + 'url': 'https://clients2.google.com/service/update2/crx?response=redirect&prodversion=133&acceptformat=crx3&x=id%3Dlckanjgmijmafbedllaakclkaicjfmnk%26uc', }, # { # 'name': 'Captcha Solver: Auto captcha solving service', # 'id': 'pgojnojmmhpofjgdmaebadhbocahppod', # 'url': 'https://clients2.google.com/service/update2/crx?response=redirect&prodversion=130&acceptformat=crx3&x=id%3Dpgojnojmmhpofjgdmaebadhbocahppod%26uc', # }, - { - 'name': 'Consent-O-Matic', - 'id': 'mdjildafknihdffpkfmmpnpoiajfjnjd', - 'url': 'https://clients2.google.com/service/update2/crx?response=redirect&prodversion=130&acceptformat=crx3&x=id%3Dmdjildafknihdffpkfmmpnpoiajfjnjd%26uc', - }, + # Consent-O-Matic disabled - using uBlock Origin's cookie lists instead for simplicity + # { + # 'name': 'Consent-O-Matic', + # 'id': 'mdjildafknihdffpkfmmpnpoiajfjnjd', + # 'url': 'https://clients2.google.com/service/update2/crx?response=redirect&prodversion=130&acceptformat=crx3&x=id%3Dmdjildafknihdffpkfmmpnpoiajfjnjd%26uc', + # }, # { # 'name': 'Privacy | Protect Your Payments', # 'id': 'hmgpakheknboplhmlicfkkgjipfabmhp', @@ -816,6 +824,10 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro # Extract extension logger.info(f'📂 Extracting {ext["name"]} extension...') self._extract_extension(crx_file, ext_dir) + + # Log extension version info + self._log_extension_version(ext_dir, ext['name']) + extension_paths.append(str(ext_dir)) loaded_extension_names.append(ext['name']) @@ -823,6 +835,11 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro logger.warning(f'⚠️ Failed to setup {ext["name"]} extension: {e}') continue + # Apply minimal patch to cookie extension - hardcode nature.com whitelist + for i, path in enumerate(extension_paths): + if loaded_extension_names[i] == "I still don't care about cookies": + self._apply_minimal_extension_patch(Path(path)) + if extension_paths: logger.debug(f'[BrowserProfile] 🧩 Extensions loaded ({len(extension_paths)}): [{", ".join(loaded_extension_names)}]') else: @@ -830,6 +847,111 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro return extension_paths + def _log_extension_version(self, ext_dir: Path, ext_name: str) -> None: + """Log extension version information from manifest.""" + try: + manifest_path = ext_dir / 'manifest.json' + if manifest_path.exists(): + import json + + with open(manifest_path, 'r', encoding='utf-8') as f: + manifest = json.load(f) + + version = manifest.get('version', 'unknown') + manifest_version = manifest.get('manifest_version', 'unknown') + + logger.info(f'📦 {ext_name} v{version} (manifest v{manifest_version}) loaded') + + # Special logging for uBlock Origin to show it's current + if ext_name == 'uBlock Origin': + logger.info(f'🛡️ uBlock Origin version {version} - Latest is 1.61.2+ (as of Aug 2025)') + + except Exception as e: + logger.debug(f'Could not read version for {ext_name}: {e}') + + def _get_extension_name(self, ext_dir: Path) -> str | None: + """Get the actual extension name from manifest.""" + try: + manifest_path = ext_dir / 'manifest.json' + if manifest_path.exists(): + import json + + with open(manifest_path, 'r', encoding='utf-8') as f: + manifest = json.load(f) + + name = manifest.get('name', '') + if name.startswith('__MSG_'): + # Resolve localized name + locale_path = ext_dir / '_locales' / 'en' / 'messages.json' + if locale_path.exists(): + with open(locale_path, 'r', encoding='utf-8') as f: + messages = json.load(f) + key = name.replace('__MSG_', '').replace('__', '') + return messages.get(key, {}).get('message', name) + return name + except Exception: + pass + return None + + def _apply_minimal_extension_patch(self, ext_dir: Path) -> None: + """Minimal patch: pre-populate chrome.storage.local with nature.com whitelist.""" + try: + bg_path = ext_dir / 'data' / 'background.js' + if not bg_path.exists(): + return + + with open(bg_path, 'r', encoding='utf-8') as f: + content = f.read() + + # Find the initialize() function and inject storage setup before updateSettings() + # The actual function uses 2-space indentation, not tabs + old_init = """async function initialize(checkInitialized, magic) { + if (checkInitialized && initialized) { + return; + } + loadCachedRules(); + await updateSettings(); + await recreateTabList(magic); + initialized = true; +}""" + + # New function with nature.com whitelist initialization + new_init = """// Pre-populate storage with nature.com whitelist if empty +async function ensureWhitelistStorage() { + const result = await chrome.storage.local.get({ settings: null }); + if (!result.settings) { + const defaultSettings = { + statusIndicators: true, + whitelistedDomains: { "nature.com": true } + }; + await chrome.storage.local.set({ settings: defaultSettings }); + } +} + +async function initialize(checkInitialized, magic) { + if (checkInitialized && initialized) { + return; + } + loadCachedRules(); + await ensureWhitelistStorage(); // Add storage initialization + await updateSettings(); + await recreateTabList(magic); + initialized = true; +}""" + + if old_init in content: + content = content.replace(old_init, new_init) + + with open(bg_path, 'w', encoding='utf-8') as f: + f.write(content) + + logger.info('[BrowserProfile] ✅ Cookie extension: nature.com pre-populated in storage') + else: + logger.debug('[BrowserProfile] Initialize function not found for patching') + + except Exception as e: + logger.debug(f'[BrowserProfile] Could not patch extension storage: {e}') + def _download_extension(self, url: str, output_path: Path) -> None: """Download extension .crx file.""" import urllib.request diff --git a/browser_use/browser/session.py b/browser_use/browser/session.py index 5e6e9d578..be3858964 100644 --- a/browser_use/browser/session.py +++ b/browser_use/browser/session.py @@ -630,6 +630,9 @@ class BrowserSession(BaseModel): # # Wait a bit to ensure page starts loading # await asyncio.sleep(0.5) + # Close any extension options pages that might have opened + await self._close_extension_options_pages() + # Dispatch navigation complete self.logger.debug(f'Dispatching NavigationCompleteEvent for {event.url} (tab #{target_id[-4:]})') await self.event_bus.dispatch( @@ -1589,21 +1592,29 @@ class BrowserSession(BaseModel): except Exception as e: self.logger.warning(f'Failed to remove highlights: {e}') - # Try again with simpler script if the complex one fails - try: - simple_script = """ - const highlights = document.querySelectorAll('[data-browser-use-highlight]'); - highlights.forEach(el => el.remove()); - const container = document.getElementById('browser-use-debug-highlights'); - if (container) container.remove(); - """ - cdp_session = await self.get_or_create_cdp_session() - await cdp_session.cdp_client.send.Runtime.evaluate( - params={'expression': simple_script}, session_id=cdp_session.session_id - ) - self.logger.debug('Fallback highlight removal completed') - except Exception as fallback_error: - self.logger.error(f'Both highlight removal attempts failed: {fallback_error}') + + async def _close_extension_options_pages(self) -> None: + """Close any extension options/welcome pages that have opened.""" + try: + # Get all open pages + targets = await self._cdp_get_all_pages() + + for target in targets: + target_url = target.get('url', '') + target_id = target.get('targetId', '') + + # Check if this is an extension options/welcome page + if 'chrome-extension://' in target_url and ( + 'options.html' in target_url or 'welcome.html' in target_url or 'onboarding.html' in target_url + ): + self.logger.info(f'[BrowserSession] 🚫 Closing extension options page: {target_url}') + try: + await self._cdp_close_page(target_id) + except Exception as e: + self.logger.debug(f'[BrowserSession] Could not close extension page {target_id}: {e}') + + except Exception as e: + self.logger.debug(f'[BrowserSession] Error closing extension options pages: {e}') @property def downloaded_files(self) -> list[str]: From 84c644aedbacd68d1f4489cbfc197e1a0babe0a1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Magnus=20M=C3=BCller?= <67061560+MagMueller@users.noreply.github.com> Date: Sat, 30 Aug 2025 18:27:21 -0700 Subject: [PATCH 12/15] Patch extension with whitelist --- browser_use/browser/profile.py | 103 ++++++++++----------------------- browser_use/browser/session.py | 1 + 2 files changed, 30 insertions(+), 74 deletions(-) diff --git a/browser_use/browser/profile.py b/browser_use/browser/profile.py index cce3286c8..32777a25a 100644 --- a/browser_use/browser/profile.py +++ b/browser_use/browser/profile.py @@ -172,11 +172,6 @@ CHROME_DEFAULT_ARGS = [ '--disable-extensions-http-throttling', '--extensions-on-chrome-urls', '--disable-default-apps', - '--disable-component-extensions-with-background-pages', - '--disable-background-networking', - '--disable-extensions-except-api', - '--disable-extension-content-verification', - '--allow-running-insecure-content', f'--disable-features={",".join(CHROME_DISABLED_COMPONENTS)}', ] @@ -563,6 +558,10 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro default=True, description="Enable automation-optimized extensions: ad blocking (uBlock Origin), cookie handling (I still don't care about cookies), and URL cleaning (ClearURLs). All extensions work automatically without manual intervention. Extensions are automatically downloaded and loaded when enabled.", ) + cookie_whitelist_domains: list[str] = Field( + default_factory=lambda: ['nature.com', 'qatarairways.com'], + description='List of domains to whitelist in the "I still don\'t care about cookies" extension, preventing automatic cookie banner handling on these sites.', + ) window_size: ViewportSize | None = Field( default=None, @@ -825,9 +824,6 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro logger.info(f'📂 Extracting {ext["name"]} extension...') self._extract_extension(crx_file, ext_dir) - # Log extension version info - self._log_extension_version(ext_dir, ext['name']) - extension_paths.append(str(ext_dir)) loaded_extension_names.append(ext['name']) @@ -835,10 +831,10 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro logger.warning(f'⚠️ Failed to setup {ext["name"]} extension: {e}') continue - # Apply minimal patch to cookie extension - hardcode nature.com whitelist + # Apply minimal patch to cookie extension with configurable whitelist for i, path in enumerate(extension_paths): if loaded_extension_names[i] == "I still don't care about cookies": - self._apply_minimal_extension_patch(Path(path)) + self._apply_minimal_extension_patch(Path(path), self.cookie_whitelist_domains) if extension_paths: logger.debug(f'[BrowserProfile] 🧩 Extensions loaded ({len(extension_paths)}): [{", ".join(loaded_extension_names)}]') @@ -847,54 +843,8 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro return extension_paths - def _log_extension_version(self, ext_dir: Path, ext_name: str) -> None: - """Log extension version information from manifest.""" - try: - manifest_path = ext_dir / 'manifest.json' - if manifest_path.exists(): - import json - - with open(manifest_path, 'r', encoding='utf-8') as f: - manifest = json.load(f) - - version = manifest.get('version', 'unknown') - manifest_version = manifest.get('manifest_version', 'unknown') - - logger.info(f'📦 {ext_name} v{version} (manifest v{manifest_version}) loaded') - - # Special logging for uBlock Origin to show it's current - if ext_name == 'uBlock Origin': - logger.info(f'🛡️ uBlock Origin version {version} - Latest is 1.61.2+ (as of Aug 2025)') - - except Exception as e: - logger.debug(f'Could not read version for {ext_name}: {e}') - - def _get_extension_name(self, ext_dir: Path) -> str | None: - """Get the actual extension name from manifest.""" - try: - manifest_path = ext_dir / 'manifest.json' - if manifest_path.exists(): - import json - - with open(manifest_path, 'r', encoding='utf-8') as f: - manifest = json.load(f) - - name = manifest.get('name', '') - if name.startswith('__MSG_'): - # Resolve localized name - locale_path = ext_dir / '_locales' / 'en' / 'messages.json' - if locale_path.exists(): - with open(locale_path, 'r', encoding='utf-8') as f: - messages = json.load(f) - key = name.replace('__MSG_', '').replace('__', '') - return messages.get(key, {}).get('message', name) - return name - except Exception: - pass - return None - - def _apply_minimal_extension_patch(self, ext_dir: Path) -> None: - """Minimal patch: pre-populate chrome.storage.local with nature.com whitelist.""" + def _apply_minimal_extension_patch(self, ext_dir: Path, whitelist_domains: list[str]) -> None: + """Minimal patch: pre-populate chrome.storage.local with configurable domain whitelist.""" try: bg_path = ext_dir / 'data' / 'background.js' if not bg_path.exists(): @@ -903,6 +853,10 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro with open(bg_path, 'r', encoding='utf-8') as f: content = f.read() + # Create the whitelisted domains object for JavaScript with proper indentation + whitelist_entries = [f' "{domain}": true' for domain in whitelist_domains] + whitelist_js = '{\n' + ',\n'.join(whitelist_entries) + '\n }' + # Find the initialize() function and inject storage setup before updateSettings() # The actual function uses 2-space indentation, not tabs old_init = """async function initialize(checkInitialized, magic) { @@ -915,29 +869,29 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro initialized = true; }""" - # New function with nature.com whitelist initialization - new_init = """// Pre-populate storage with nature.com whitelist if empty -async function ensureWhitelistStorage() { - const result = await chrome.storage.local.get({ settings: null }); - if (!result.settings) { - const defaultSettings = { + # New function with configurable whitelist initialization + new_init = f"""// Pre-populate storage with configurable domain whitelist if empty +async function ensureWhitelistStorage() {{ + const result = await chrome.storage.local.get({{ settings: null }}); + if (!result.settings) {{ + const defaultSettings = {{ statusIndicators: true, - whitelistedDomains: { "nature.com": true } - }; - await chrome.storage.local.set({ settings: defaultSettings }); - } -} + whitelistedDomains: {whitelist_js} + }}; + await chrome.storage.local.set({{ settings: defaultSettings }}); + }} +}} -async function initialize(checkInitialized, magic) { - if (checkInitialized && initialized) { +async function initialize(checkInitialized, magic) {{ + if (checkInitialized && initialized) {{ return; - } + }} loadCachedRules(); await ensureWhitelistStorage(); // Add storage initialization await updateSettings(); await recreateTabList(magic); initialized = true; -}""" +}}""" if old_init in content: content = content.replace(old_init, new_init) @@ -945,7 +899,8 @@ async function initialize(checkInitialized, magic) { with open(bg_path, 'w', encoding='utf-8') as f: f.write(content) - logger.info('[BrowserProfile] ✅ Cookie extension: nature.com pre-populated in storage') + domain_list = ', '.join(whitelist_domains) + logger.info(f'[BrowserProfile] ✅ Cookie extension: {domain_list} pre-populated in storage') else: logger.debug('[BrowserProfile] Initialize function not found for patching') diff --git a/browser_use/browser/session.py b/browser_use/browser/session.py index be3858964..c85631b23 100644 --- a/browser_use/browser/session.py +++ b/browser_use/browser/session.py @@ -268,6 +268,7 @@ class BrowserSession(BaseModel): filter_highlight_ids: bool | None = None, auto_download_pdfs: bool | None = None, profile_directory: str | None = None, + cookie_whitelist_domains: list[str] | None = None, ): # Following the same pattern as AgentSettings in service.py # Only pass non-None values to avoid validation errors From 889657a27371aaf3426f52cca9fe9905751d5b0c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Magnus=20M=C3=BCller?= <67061560+MagMueller@users.noreply.github.com> Date: Sat, 30 Aug 2025 18:30:49 -0700 Subject: [PATCH 13/15] Linter --- browser_use/browser/profile.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/browser_use/browser/profile.py b/browser_use/browser/profile.py index 32777a25a..4fa41b1ad 100644 --- a/browser_use/browser/profile.py +++ b/browser_use/browser/profile.py @@ -850,7 +850,7 @@ class BrowserProfile(BrowserConnectArgs, BrowserLaunchPersistentContextArgs, Bro if not bg_path.exists(): return - with open(bg_path, 'r', encoding='utf-8') as f: + with open(bg_path, encoding='utf-8') as f: content = f.read() # Create the whitelisted domains object for JavaScript with proper indentation From f17a13b308e018959a265caf35185e7da66cf862 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Magnus=20M=C3=BCller?= <67061560+MagMueller@users.noreply.github.com> Date: Sat, 30 Aug 2025 18:33:30 -0700 Subject: [PATCH 14/15] Typo --- browser_use/tools/service.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/browser_use/tools/service.py b/browser_use/tools/service.py index d0ef4d93c..c659ab7dd 100644 --- a/browser_use/tools/service.py +++ b/browser_use/tools/service.py @@ -263,7 +263,7 @@ class Tools(Generic[Context]): # Element Interaction Actions @self.registry.action( - 'Click element by index. Only indices from your browser_state are allowed. Never use and index that is not inside your current browser_state. Set while_holding_ctrl=True to open any resulting navigation in a new tab.', + 'Click element by index. Only indices from your browser_state are allowed. Never use an index that is not inside your current browser_state. Set while_holding_ctrl=True to open any resulting navigation in a new tab.', param_model=ClickElementAction, ) async def click_element_by_index(params: ClickElementAction, browser_session: BrowserSession): From 1596f25a5af67851d620928b270aeb68ae11bdcc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Magnus=20M=C3=BCller?= <67061560+MagMueller@users.noreply.github.com> Date: Sat, 30 Aug 2025 18:39:10 -0700 Subject: [PATCH 15/15] System prompt efficiency_guidelines --- browser_use/agent/system_prompt.md | 9 +++++---- browser_use/agent/system_prompt_flash.md | 9 +++++---- browser_use/agent/system_prompt_no_thinking.md | 9 +++++---- 3 files changed, 15 insertions(+), 12 deletions(-) diff --git a/browser_use/agent/system_prompt.md b/browser_use/agent/system_prompt.md index ceda7ef73..83640b832 100644 --- a/browser_use/agent/system_prompt.md +++ b/browser_use/agent/system_prompt.md @@ -136,16 +136,17 @@ If you are allowed multiple actions, you can specify multiple actions in the lis You can output multiple actions in one step. Try to be efficient where it makes sense. Do not predict actions which do not make sense for the current page. - **Recommended Action Combinations:** - `input_text` + `click_element_by_index` → Fill form field and submit/search in one step -- `input_text` + `input_text` → Fill form fields -- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows +- `input_text` + `input_text` → Fill multiple form fields +- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows (when the page does not navigate between clicks) - `scroll` with num_pages 10 + `extract_structured_data` → Scroll to the bottom of the page to load more content before extracting structured data - File operations + browser actions Do not try multiple different paths in one step. Always have one clear goal per step. -Its important that you see in the next step if your action was successful, so do not chain actions which change the browser state multiple times, like do not use click and then go to url, because you would not see if the click was successful or not. +Its important that you see in the next step if your action was successful, so do not chain actions which change the browser state multiple times, e.g. +- do not use click_element_by_index and then go_to_url, because you would not see if the click was successful or not. +- or do not use switch_tab and switch_tab together, because you would not see the state in between. diff --git a/browser_use/agent/system_prompt_flash.md b/browser_use/agent/system_prompt_flash.md index cb2bdefbe..a2f9257fc 100644 --- a/browser_use/agent/system_prompt_flash.md +++ b/browser_use/agent/system_prompt_flash.md @@ -133,16 +133,17 @@ If you are allowed multiple actions, you can specify multiple actions in the lis You can output multiple actions in one step. Try to be efficient where it makes sense. Do not predict actions which do not make sense for the current page. - **Recommended Action Combinations:** - `input_text` + `click_element_by_index` → Fill form field and submit/search in one step -- `input_text` + `input_text` → Fill form fields -- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows +- `input_text` + `input_text` → Fill multiple form fields +- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows (when the page does not navigate between clicks) - `scroll` with num_pages 10 + `extract_structured_data` → Scroll to the bottom of the page to load more content before extracting structured data - File operations + browser actions Do not try multiple different paths in one step. Always have one clear goal per step. -Its important that you see in the next step if your action was successful, so do not chain actions which change the browser state multiple times, like do not use click and then go to url, because you would not see if the click was successful or not. +Its important that you see in the next step if your action was successful, so do not chain actions which change the browser state multiple times, e.g. +- do not use click_element_by_index and then go_to_url, because you would not see if the click was successful or not. +- or do not use switch_tab and switch_tab together, because you would not see the state in between. diff --git a/browser_use/agent/system_prompt_no_thinking.md b/browser_use/agent/system_prompt_no_thinking.md index a3038cb6d..a79cb569b 100644 --- a/browser_use/agent/system_prompt_no_thinking.md +++ b/browser_use/agent/system_prompt_no_thinking.md @@ -135,16 +135,17 @@ If you are allowed multiple actions, you can specify multiple actions in the lis You can output multiple actions in one step. Try to be efficient where it makes sense. Do not predict actions which do not make sense for the current page. - **Recommended Action Combinations:** - `input_text` + `click_element_by_index` → Fill form field and submit/search in one step -- `input_text` + `input_text` → Fill form fields -- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows +- `input_text` + `input_text` → Fill multiple form fields +- `click_element_by_index` + `click_element_by_index` → Navigate through multi-step flows (when the page does not navigate between clicks) - `scroll` with num_pages 10 + `extract_structured_data` → Scroll to the bottom of the page to load more content before extracting structured data - File operations + browser actions Do not try multiple different paths in one step. Always have one clear goal per step. -Its important that you see in the next step if your action was successful, so do not chain actions which change the browser state multiple times, like do not use click and then go to url, because you would not see if the click was successful or not. +Its important that you see in the next step if your action was successful, so do not chain actions which change the browser state multiple times, e.g. +- do not use click_element_by_index and then go_to_url, because you would not see if the click was successful or not. +- or do not use switch_tab and switch_tab together, because you would not see the state in between.