Commit Graph

3270 Commits

Author SHA1 Message Date
Mert Unsal
25a2eecbfd Enhance system prompt reasoning (#2022)
- Improve clarity on using extract_structured_data and user request
processing.
- Added guidance on file system to avoid overwriting existing content.
- Included additional reasoning patterns for better task management and
progress tracking.
2025-06-21 09:56:34 +02:00
Mert Unsal
59c05a4f59 Merge branch 'main' into mert/improve_system_prompt 2025-06-21 09:55:56 +02:00
Magnus Müller
8194ecbc3e Pass laminar_eval_id from frontend (#2024) 2025-06-21 09:37:51 +02:00
Magnus Müller
f1d5dc5a17 Pass laminar_eval_id from frontend 2025-06-21 09:31:14 +02:00
Magnus Müller
726dd30c82 Monitor eval cpu (#2023)
<!-- This is an auto-generated description by cubic. -->

## Summary by cubic
Added detailed CPU and memory monitoring to the evaluation workflow and
service to help track resource usage and catch issues like high memory
or CPU load during eval runs.

- **New Features**
- Logs system resource stats before, during, and after evaluation runs.
- Starts a background resource monitor that checks CPU, memory, and
process counts every 30 seconds.
- Adds signal handling and heartbeat logging for better debugging and
graceful shutdowns.
- Collects and uploads resource logs and debug info as workflow
artifacts.

<!-- End of auto-generated description by cubic. -->
2025-06-21 00:33:26 +02:00
Magnus Müller
adfc553692 Merge branch 'main' into monitor-eval 2025-06-21 00:32:20 +02:00
Magnus Müller
83d92513a4 Monitor eval cpu 2025-06-20 23:35:56 +02:00
Mert Unsal
64eeac9a17 Update browser_use/agent/system_prompt.md
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2025-06-20 22:45:44 +02:00
mertunsall
16708e916d Enhance system prompt reasoning
- Improve clarity on using extract_structured_data and user request processing.
- Added guidance on file system to avoid overwriting existing content.
- Included additional reasoning patterns for better task management and progress tracking.
2025-06-20 22:42:21 +02:00
Mert Unsal
6b9263d9f6 Quick Fix to History (#2021)
Fixed a bug where a "No model output" message was added to the agent
history even when step_number was zero, where there is no model output.
2025-06-20 21:38:28 +02:00
Mert Unsal
97bd9fb3d2 Update browser_use/agent/message_manager/service.py
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2025-06-20 21:31:51 +02:00
mertunsall
f3b406f42a Quick Fix to History 2025-06-20 21:28:06 +02:00
Magnus Müller
1c54954a0d Improve AgentOutput format and reasoning style (#2020)
- Change reasoning rules in system prompt
- Flatten AgentOutput to get rid of current_state
- Add agent initialization in history.
- Update the example tool call.
- Add a current_state property to AgentOutput for compatibility.
    
<!-- This is an auto-generated description by cubic. -->
---

## Summary by cubic
Flattened the AgentOutput model by removing the current_state field,
updated the system prompt reasoning rules, and added agent
initialization to the history for better clarity and compatibility.

- **Refactors**
- AgentOutput now uses top-level fields instead of a nested
current_state.
  - Added a current_state property for backward compatibility.
  - Simplified and clarified reasoning rules in the system prompt.
- Updated example tool call and included agent initialization in the
history.

<!-- End of auto-generated description by cubic. -->
2025-06-20 21:24:01 +02:00
mertunsall
0c656cb631 fix more 2025-06-20 19:33:14 +02:00
mertunsall
8bc6c3d2b7 hotfix 2025-06-20 19:32:51 +02:00
mertunsall
2ec840ed4a fix tests for new output 2025-06-20 19:13:58 +02:00
mertunsall
26fdce7357 fix test case with new output structure 2025-06-20 19:02:07 +02:00
mertunsall
9ae6354b76 - Change reasoning rules in system prompt
- Flatten AgentOutput to get rid of current_state
- Add agent initialization in history.
- Update the example tool call.
- Add a current_state property to AgentOutput for compatibility.
2025-06-20 18:55:04 +02:00
Magnus Müller
179f2917f5 Change Agent State, System Prompt, Add FileSystem, etc. (#1874)
## Agent State and System Prompt

This PR aims to improve the agent by rewriting the way the state is
constructed, as described in `system_prompt.md`, new state consists of:
1. Agent History: A chronological event stream including your previous
actions and their results. This may be truncated or partially omitted.
2. Agent State: Includes the ultimate goal provided by the user, current
progress, and relevant contextual memory.
3. Browser State: Contains current URL, open tabs, interactive elements
indexed for actions, visible page content, and (sometimes) any visual
context provided by screenshots or page snapshots.
4. Read State: If your previous action involved reading a file or
extracting content (e.g., from a webpage), the full result will be
included here. This data is **only shown in the current step** and will
not appear in future Agent History. You are responsible for saving or
interpreting the information appropriately during this step.

Please refer to `system_prompt.md` for explanation of what the model
gets in its context. The PR rewrites the `MessageManager` to achieve the
displaying of this state. This helps maintaining a better state as the
model always sees a constant number of messages: system prompt, example
tool calls, and current state.

## File System
The model now has access to a File System it can interact with so that
it can make a plan, write intermediate results, etc.
1. Agent is initialized with two files `todo.md` and `results.md`. First
one is used so that the model can plan and the contents of this document
is always displayed in agent's state. Second one is so that model can
accumulate results for long tasks. File system is always displayed in
Agent State in format:
file_name — num_lines lines
2. Model is allowed to use 3 functionalities, `write_file`,
`append_file`, and `read_file`. We should improve the descriptions of
these functions. Finally, there should be an option to "edit a string"
(Replace line `"{content}"` to `"{new_content}"`.
3. Currently, for safety reasons, agent can ONLY create files in format
`{file_name}.{md|txt}` and cannot create subdirectories or go back. This
makes sure that agent's
4. The file system is currently launched in a temporary directory with a
random uuid - this can be changed so that we just use `tempfile`'s
temporary file and temporary directory functionality, stored in memory
and to be deleted when program terminates.
6. Optionally, the user might want to keep the results saved in the
directory so there should be an option to set a directory which will not
be deleted at the end.
7. Finally, the agent is not allowed to be initialized in an existing
directory. This is to make sure that files written in a previous do not
impact a new agent's behavior accidentally. However, this is annoying as
it requires the user to delete the directory between multiple runs.

## AgentBrain
AgentBrain object has a new thinking field - need to make sure that this
is taken care of in the entire codebase, saved/displayed when necessary.

## AgentResult
AgentResult object has 2 new fields, `memory` which is what will be
added to Action History that the agent will see and `update_read_state`,
a boolean that determines whether the action should result in updating
the read state. This is used for actions such as `extract_content` and
`read_file`, where the model should see the content of the file once but
file contents shouldn't remain in history forever.

## Details and Next Steps
1. Currently browser state is capped at 10k characters - this is very
primitive and should be changed with a smarter semantic processing
layer.
2. I think we want to permanently include file system in the Agent -
this feature seems very important. So the controller functions for using
a file system should be moved from agent into controller.
3. We should add a function call for sending user a message with the
results before calling `done` - this is simply more clean and more
intuitive for the model.
4. Save the current page as PDF file should directly use the file
system.
5. Currently, there seems to be a bug with `append_file`, should be
fixed.
6. Not all language models seem to work with current version - for
example, maybe we should get rid of `current_state` field.
7. Get rid of langchain and do our own LLM calls for better tractability
and more control over what goes into the LLM.

I think there are a lot more things to be addressed regarding this PR,
but this is all I could gather.
2025-06-20 15:40:10 +02:00
Magnus Müller
4a8cf30dac Merge branch 'main' into mert/new_everything 2025-06-20 12:27:19 +02:00
Magnus Müller
902a1dfb66 Add gemini-2.5-flash (#2018) 2025-06-20 12:20:53 +02:00
Magnus Müller
0e5a8942f3 Add gemini-2.5-flash 2025-06-20 12:19:47 +02:00
Nick Sweeting
1cc94b6688 improve cloud sync logging 2025-06-20 03:00:35 -07:00
Nick Sweeting
ca9588d6d6 bump version 0.3.1 2025-06-20 02:38:33 -07:00
Nick Sweeting
3177118aa7 fix lint errors 2025-06-20 02:38:22 -07:00
Nick Sweeting
14e420b74b Revert "Fix cross-origin iframe DOM retrieval" (#2017) 2025-06-20 02:37:52 -07:00
Nick Sweeting
e9b2fea57f Revert "Fix cross-origin iframe DOM retrieval" 2025-06-20 05:37:22 -04:00
Nick Sweeting
51626ef42d Eventbus fixes and cleanup (#2016) 0.3.0 2025-06-20 02:30:13 -07:00
Nick Sweeting
2baf4e7907 Merge branch 'main' into eventbus 2025-06-20 02:26:44 -07:00
Nick Sweeting
933bddc02f improve error logging and bump bubus version 2025-06-20 02:24:01 -07:00
Nick Sweeting
7766b8d630 cleanup event wrapping and unwrapping 2025-06-20 02:06:23 -07:00
Magnus Müller
4c2952d640 Squashed commit of the following:
commit a9cf53a1b1
Merge: 5aa62c11 0f9ffa10
Author: Magnus Müller <67061560+MagMueller@users.noreply.github.com>
Date:   Fri Jun 20 10:41:19 2025 +0200

    Set user_data_dir to None (#2015)

    <!-- This is an auto-generated description by cubic. -->
    Changed browser session setup to use incognito mode by setting
    user_data_dir to None, preventing persistent state between evaluation
    runs.

    <!-- End of auto-generated description by cubic. -->

commit 0f9ffa1072
Author: Magnus Müller <67061560+MagMueller@users.noreply.github.com>
Date:   Fri Jun 20 10:38:01 2025 +0200

    Set user_data_dir to None

commit 5aa62c1113
Merge: d8a9d21b e559ff5e
Author: Nick Sweeting <git@sweeting.me>
Date:   Thu Jun 19 23:01:49 2025 -0700

    Fix cross-origin iframe DOM retrieval (#1965)

commit d8a9d21b00
Merge: 3e5f3049 b6be1583
Author: Nick Sweeting <git@sweeting.me>
Date:   Thu Jun 19 23:01:21 2025 -0700

    Fix critical domain restriction bypass vulnerability (#2006)

commit b6be158319
Author: Sahar <saharhashai@gmail.com>
Date:   Thu Jun 19 02:28:34 2025 -0700

    Delete tests/ci/test_security_url_validation.py

commit aca4b57329
Author: Sahar <saharhashai@gmail.com>
Date:   Thu Jun 19 02:27:57 2025 -0700

    Delete SECURITY_FIX_REPORT.md

commit 45872c1e45
Author: Your Name <your.email@example.com>
Date:   Thu Jun 19 11:24:50 2025 +0200

    fix(security): prevent domain restriction bypass in controller actions

    - Add domain validation to controller.click() and controller.type() methods
    - Implement comprehensive security checks before executing actions
    - Prevent potential prompt injection and unauthorized data access
    - Add extensive test coverage for domain validation scenarios
    - Update documentation with security considerations

    This critical fix prevents complete bypass of domain restrictions that
    could enable attackers to perform unauthorized actions on any domain.

commit e559ff5eaa
Merge: 19ae8a11 f348e0c5
Author: Nick Sweeting <git@sweeting.me>
Date:   Sat Jun 14 01:56:09 2025 -0700

    Merge branch 'main' into main

commit 19ae8a1146
Merge: e1b3ff9e 08ed0be3
Author: Nick Sweeting <git@sweeting.me>
Date:   Sat Jun 14 00:31:30 2025 -0700

    Merge branch 'main' into main

commit e1b3ff9e9d
Author: Ilya Biryukov <ilbiryuk@microsoft.com>
Date:   Thu Jun 12 17:40:40 2025 -0700

    Revert changes to  examples/features/multiple_agents_same_browser.py

commit d20a3b55d6
Author: Ilya Biryukov <ilbiryuk@microsoft.com>
Date:   Thu Jun 12 17:30:59 2025 -0700

    Fix pre-commit lint issues and compile error in multiple_agents_same_browser

commit 13d5468aa2
Author: Ilya Biryukov <ilbiryuk@microsoft.com>
Date:   Thu Jun 12 14:07:21 2025 -0700

    Fix cross-origin iframe DOM retrieval
2025-06-20 10:51:06 +02:00
Magnus Müller
a9cf53a1b1 Set user_data_dir to None (#2015)
<!-- This is an auto-generated description by cubic. -->

## Summary by cubic
Changed browser session setup to use incognito mode by setting
user_data_dir to None, preventing persistent state between evaluation
runs.

<!-- End of auto-generated description by cubic. -->
2025-06-20 10:41:19 +02:00
Magnus Müller
0f9ffa1072 Set user_data_dir to None 2025-06-20 10:38:01 +02:00
Magnus Müller
907867f976 Refactor agent history update logic to handle None model_output case, ensuring proper logging and description formatting for failed parsing scenarios. 2025-06-20 09:28:23 +02:00
Magnus Müller
eda5140363 Improve error logging in message manager by appending ellipsis to truncated error messages for better clarity in action results. 2025-06-20 08:27:46 +02:00
Magnus Müller
1a891202ea Enhance system prompt documentation by clarifying the structure of the JSON response. Added important notes regarding the top-level elements "current_state" and "action" to improve understanding of the expected format. 2025-06-20 08:25:48 +02:00
Nick Sweeting
5aa62c1113 Fix cross-origin iframe DOM retrieval (#1965) 2025-06-19 23:01:49 -07:00
Nick Sweeting
d8a9d21b00 Fix critical domain restriction bypass vulnerability (#2006) 2025-06-19 23:01:21 -07:00
Magnus Müller
3af0a37e71 Rename step information 2025-06-20 07:47:35 +02:00
Magnus Müller
8b93dc626f Only show task once 2025-06-20 07:35:02 +02:00
Magnus Müller
c4f1b5f935 Fix parsing error handling - the entire tool call is in tool_call_args. Before it parsed it wrong with no error 2025-06-20 07:32:24 +02:00
mertunsall
4e55d7c886 Update test case for max string length validation to use MAX_TASK_LENGTH constant, improving maintainability and clarity in error messages. 2025-06-20 01:21:49 +02:00
mertunsall
0c2de169f8 Update maximum length constants in cloud_events.py to accommodate larger data sizes: MAX_STRING_LENGTH increased to 100K, MAX_URL_LENGTH and MAX_TASK_LENGTH adjusted to 10K. 2025-06-20 01:13:23 +02:00
mertunsall
7cdcbc1385 Remove the test_extract_content_action method from TestControllerIntegration, streamlining the test suite by eliminating outdated or redundant tests. 2025-06-20 01:06:28 +02:00
Mert Unsal
b77393917d Apply suggestions from code review
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2025-06-20 01:00:39 +02:00
Mert Unsal
198dd161b8 Update browser_use/agent/system_prompt.md
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2025-06-20 00:56:37 +02:00
Mert Unsal
832c59c9c3 Update browser_use/agent/system_prompt.md
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
2025-06-20 00:56:17 +02:00
Magnus Müller
2ee381d283 Refactor message_manager tests to use a temporary file system path, enhancing isolation and reliability of test cases. 2025-06-20 00:18:48 +02:00
Magnus Müller
831682e2a3 Add FileSystem dependency to message_manager tests 2025-06-20 00:13:06 +02:00