Commit Graph

3974 Commits

Author SHA1 Message Date
Alezander9
e8db375401 make eval service fetch rotating auth info from server 2025-07-02 16:30:48 -07:00
Magnus Müller
4c1b79cde5 eval-change-timing (#2262)
Auto-generated PR for: eval-change-timing
2025-07-03 00:20:04 +02:00
Magnus Müller
70149369eb eval-change-timing 2025-07-03 00:19:51 +02:00
Mert Unsal
ed9ab09552 fix: groq model config (#2261)
Fix groq model config to use service_tier auto
    
<!-- This is an auto-generated description by cubic. -->
---

## Summary by cubic
Updated Groq model configs to use the correct provider and set
service_tier to auto for all Groq models.

- **Bug Fixes**
  - Fixed provider field for Groq models.
  - Added service_tier: auto to Groq model configs.
  - Updated model loading to use ChatGroq with service_tier support.

<!-- End of auto-generated description by cubic. -->
2025-07-02 23:22:00 +02:00
mertunsall
58ed5da177 migrate evals to ChatGroq 2025-07-02 23:19:10 +02:00
Mert Unsal
91c6b95af0 Fix browser use library output (#2260)
Replaced use of SchemaOptimizer with direct model JSON schema generation
in the Groq chat.
2025-07-02 23:08:46 +02:00
mertunsall
39599b6dd0 fix linter 2025-07-02 23:06:47 +02:00
Cursor Agent
e1a6f6c31c Remove SchemaOptimizer and use direct model JSON schema generation
Co-authored-by: gregor <gregor@browser-use.com>
2025-07-02 20:51:33 +00:00
Gregor Žunič
5bb3bfc03b Refactoring dom process tree script (#2258)
<!-- This is an auto-generated description by cubic. -->

## Summary by cubic
Refactored the DOM processing tree to simplify element extraction,
improve attribute handling, and remove unnecessary performance metrics
and optimizations.

- **Refactors**
- Cleaned up and streamlined DOM tree code, removing debug metrics and
redundant logic.
- Improved clickable element string formatting and attribute
deduplication for token efficiency.
  - Updated file system to only create `todo.md` by default.
- Adjusted tests and documentation to match new file system and DOM
extraction behavior.

<!-- End of auto-generated description by cubic. -->
2025-07-02 20:18:54 +02:00
Gregor Žunič
db4423ead8 Merge branch 'main' into fuckaround/dom-processing-tree 2025-07-02 20:17:23 +02:00
Magnus Müller
3b0f0d7cb9 eval-runner-status-updates (#2256)
Auto-generated PR for: eval-runner-status-updates
    
<!-- This is an auto-generated description by cubic. -->
---

## Summary by cubic
Added runner progress tracking to the evaluation workflow, sending
status updates to the backend at key stages and on completion.

- **New Features**
- Workflow now registers runner start and completion status with the
backend API.
- Service code sends progress updates for each task stage, including
errors.

<!-- End of auto-generated description by cubic. -->
2025-07-02 19:42:00 +02:00
Magnus Müller
5b6a00032c Enhance evaluation workflow with validation and error handling improvements
- Added validation for START_INDEX and TOTAL_TASKS to ensure they are numeric, with default values set to prevent errors.
- Improved logging for task range calculations and runner ID generation, including warnings for non-numeric inputs.
- Enhanced evaluation output handling with comprehensive error capture and logging, ensuring better debugging information is available.
- Implemented checks for the existence of evaluation logs and provided statistics for better visibility into evaluation outcomes.
2025-07-02 19:39:54 +02:00
Magnus Müller
2f8da485b1 Merge branch 'main' into eval-runner-status-updates 2025-07-02 19:28:31 +02:00
Magnus Müller
0cbda40a2c Enhance evaluation workflow with improved runner ID generation and progress tracking
- Added support for dynamic runner ID generation that aligns with GitHub Actions patterns, incorporating start index from environment variables.
- Updated the evaluation script to send detailed progress updates, including task range and total assigned tasks, to the tracking API.
- Improved error handling and logging for runner registration and completion updates to ensure reliability during evaluations.
2025-07-02 19:27:15 +02:00
Gregor Žunič
d5ee99d91d fixed nasty error with null values 2025-07-02 18:54:12 +02:00
Gregor Žunič
ef1f74fd40 added extra line 2025-07-02 18:22:41 +02:00
Gregor Žunič
dcd9be9e21 Merge branch 'main' into fuckaround/dom-processing-tree 2025-07-02 18:20:55 +02:00
Gregor Žunič
177d5bcb7e Merge branch 'main' into fuckaround/dom-processing-tree 2025-07-02 18:14:40 +02:00
Magnus Müller
fe3fe67d50 Refactor evaluation stages in service.py
- Moved the formatting and evaluation stages outside the browser session block to ensure they are executed regardless of session state.
- Updated error handling for evaluation and server save stages to maintain consistent logging and task result management.
- Ensured that server save attempts are always made, improving reliability in task completion.
2025-07-02 18:08:28 +02:00
Mert Unsal
f5a4c75579 add haiku to evals (#2257)
Added support for the claude-3.5-haiku-latest model in evals.
2025-07-02 17:41:47 +02:00
Gregor Žunič
741c554dc2 added some basic types and removed unnecesary code 2025-07-02 17:41:28 +02:00
mertunsall
144870a024 add haiku to evals 2025-07-02 17:40:58 +02:00
Magnus Müller
44a180f716 eval-runner-status-updates 2025-07-02 17:38:58 +02:00
Gregor Žunič
7664a87562 reverted removed element detections 2025-07-02 17:12:50 +02:00
Magnus Müller
31e243d789 eval include runner link (#2253)
Auto-generated PR for: eval include runner link
    
<!-- This is an auto-generated description by cubic. -->
---

## Summary by cubic
Added support for passing the GitHub workflow run URL through the
evaluation pipeline for better tracking and visibility.

- **New Features**
- The workflow URL is now constructed and passed as a command-line
argument.
- The evaluation service accepts and stores the workflow URL for each
run.

<!-- End of auto-generated description by cubic. -->
2025-07-02 14:10:54 +02:00
Magnus Müller
239fd3f86b eval include runner link 2025-07-02 14:10:22 +02:00
Mert Unsal
a8b79accbc New element updates in browser state (#2248)
- Update DOM element highlighting from *[index]* to <new>[index]</new>
- Update system prompts to reflect new <new> tag format
- Provides clearer semantic meaning for new elements in browser state
2025-07-02 11:30:56 +02:00
Mert Unsal
8094204488 Merge branch 'main' into cursor/create-branch-for-new-element-updates-0831 2025-07-02 11:27:48 +02:00
Mert Unsal
c3f2c87f15 Clarify new elements tagging criteria (#2249)
Clarify system prompt description of `<new>` tags to specify new
clickable elements.
    
<!-- This is an auto-generated description by cubic. -->
---

## Summary by cubic
Updated the system prompt to clarify that elements tagged with <new> are
clickable elements that appeared since the last step, if the URL has not
changed.

<!-- End of auto-generated description by cubic. -->
2025-07-02 11:27:22 +02:00
Cursor Agent
9dc85ee203 Checkpoint before follow-up message 2025-07-02 09:25:18 +00:00
Cursor Agent
fd5bb775dc Clarify description of <new> elements in browser state documentation
Co-authored-by: mailmertunsal <mailmertunsal@gmail.com>
2025-07-02 09:25:07 +00:00
Cursor Agent
755b96e6a0 Simplify note about <new> tag in browser state description
Co-authored-by: mailmertunsal <mailmertunsal@gmail.com>
2025-07-02 09:23:26 +00:00
Cursor Agent
e13c4cee9c Clarify definition of <new> tag in browser state documentation
Co-authored-by: mailmertunsal <mailmertunsal@gmail.com>
2025-07-02 09:22:12 +00:00
Mert Unsal
7ae2e70ce2 Remove mem0 dependencies from pyproject (#2245)
Removed the mem0ai dependency from pyproject.toml to simplify project
requirements.
2025-07-02 11:11:17 +02:00
Mert Unsal
bba5b6fbb5 Quick fix for eval (#2246)
Fixed the evaluation service so that both 'gpt-o4-mini' and 'gpt-o3'
models use temperature 1 as required.
2025-07-02 11:08:32 +02:00
mertunsall
3a5f43bb3f eval should run with temperature 1 for o3 2025-07-02 11:07:58 +02:00
Cursor Agent
695171f90b Remove mem0ai dependency from project requirements
Co-authored-by: mailmertunsal <mailmertunsal@gmail.com>
2025-07-02 09:00:45 +00:00
Magnus Müller
5a42354d53 remove git function helpers (#2244)
Auto-generated PR for: remove git function helpers
    
<!-- This is an auto-generated description by cubic. -->
---

## Summary by cubic
Removed the git-functions.sh script, which included helper functions for
automating branch creation, commits, pushes, and pull requests. This
cleans up unused shell helpers from the codebase.

<!-- End of auto-generated description by cubic. -->
2025-07-02 10:45:01 +02:00
Magnus Müller
7ae04893ee Merge branch 'main' into remove-git-function-helpers 2025-07-02 10:44:45 +02:00
Magnus Müller
18db8926bd remove git function helpers 2025-07-02 10:44:24 +02:00
Mert Unsal
95045620de Add sensitive data example (#2243)
Added example for sensitive data handling
2025-07-02 10:41:52 +02:00
mertunsall
8880a2db37 add sensitive data example 2025-07-02 10:40:58 +02:00
Magnus Müller
4a0eab6fb8 test git automation functions 2025-07-02 10:36:41 +02:00
Mert Unsal
1f097275bd Remove xpath from click_element_by_index (#2240)
Removed the optional xpath field from the ClickElementAction model - this was wrong.
2025-07-02 10:15:12 +02:00
Mert Unsal
0d781ce20c fix (#2241)
Fixed a bug where the reasoning_effort parameter was sent to all models.
Now it is only sent to supported models.
2025-07-02 10:13:47 +02:00
mertunsall
b37f5294c5 fix 2025-07-02 10:08:11 +02:00
mertunsall
fbd3a11737 fix 2025-07-02 10:01:56 +02:00
Cursor Agent
ee2c1d2ad0 Remove optional xpath from ClickElementAction model
Co-authored-by: mailmertunsal <mailmertunsal@gmail.com>
2025-07-02 07:43:21 +00:00
Mert Unsal
fc98494b00 Remove xpath from input text action (#2239)
Removed the optional xpath field from the InputTextAction model to
simplify its structure.
2025-07-02 09:37:23 +02:00
Cursor Agent
c24d5b4320 Remove optional xpath parameter from InputTextAction model
Co-authored-by: mamagnus00 <mamagnus00@gmail.com>
2025-07-01 21:51:05 +00:00