browser-use

mirror of https://github.com/browser-use/browser-use synced 2026-05-06 17:52:15 +02:00

Author	SHA1	Message	Date
Magnus Müller	239fd3f86b	eval include runner link	2025-07-02 14:10:22 +02:00
Magnus Müller	a9488feead	Change to 2 core runners	2025-07-01 12:56:36 +02:00
Magnus Müller	9376d9d91e	Update evaluation workflow to use new runner label 'eval-4-core-500'	2025-07-01 12:54:18 +02:00
Magnus Müller	31b503fb42	Name group in eval	2025-07-01 11:14:07 +02:00
Magnus Müller	4e2b5b2f5c	Fix typo in evaluation workflow runner name from '16-cores' to '16-core'	2025-07-01 10:45:42 +02:00
Magnus Müller	bd45b18508	Update evaluation workflow to maintain Ubuntu runner with 16 cores	2025-07-01 10:36:38 +02:00
Magnus Müller	bd4066354a	eval-runner-16-core	2025-07-01 10:31:33 +02:00
Magnus Müller	81765de5ec	Update evaluation workflow to specify Ubuntu runner with 8 cores	2025-07-01 10:09:04 +02:00
Magnus Müller	b9c81ec295	Add support for anchor usage in evaluation script arguments	2025-07-01 08:03:39 +02:00
Magnus Müller	c891af4ad1	Update eval workflow to use ubuntu-latest and streamline dependency installation - Changed runner from Blacksmith to ubuntu-latest for improved compatibility. - Updated setup-uv action to use astral-sh/setup-uv@v6. - Simplified dependency installation steps by removing unnecessary verification and debug outputs. - Adjusted Playwright version detection and caching actions for better performance.	2025-06-30 11:37:23 +02:00
blacksmith-sh[bot]	a172125484	Migrate workflows to Blacksmith	2025-06-30 09:16:19 +00:00
Nick Sweeting	7d0fb62bda	dont re-run entire tests suite before pypi release	2025-06-30 04:35:51 -04:00
Magnus Müller	035fba1f29	eval-runners-cache-enable	2025-06-30 09:30:57 +02:00
Magnus Müller	0954da334d	eval-runners-disable-cache	2025-06-30 09:20:02 +02:00
Magnus Müller	32303e156c	Add debug steps to eval workflow for repository structure and dependency checks	2025-06-30 09:14:46 +02:00
Magnus Müller	b271370f81	Add logging	2025-06-30 09:09:01 +02:00
Magnus Müller	42c5cb7e73	Remove unnecessary activate-environment option from eval workflow	2025-06-30 09:02:05 +02:00
Magnus Müller	6d344b54d9	Change config to blacksmith	2025-06-30 08:46:19 +02:00
Magnus Müller	5905ad949c	feat: add thinking parameter to control agent system prompt - Add --no-thinking flag to disable thinking in agent system prompt - Default is true (thinking enabled) for backward compatibility - Pass thinking parameter through entire evaluation pipeline - Update GitHub Actions workflow to handle thinking parameter	2025-06-29 20:11:45 +02:00
Nick Sweeting	eb88fe98e9	disable fast docker for now	2025-06-27 06:05:57 -07:00
Nick Sweeting	db6b0ae440	fast-docker	2025-06-27 05:36:38 -07:00
Nick Sweeting	f3dc2b300a	Merge branch 'main' into semaphores	2025-06-27 03:28:20 -07:00
Nick Sweeting	fe3af9479a	run only one proc for tests for now	2025-06-27 03:02:03 -07:00
Magnus Müller	3de686d6af	refactor: remove branch argument from single task mode in evaluation workflow - Eliminated the branch argument from both eval.yaml and service.py for single task mode, simplifying argument parsing. - Updated related logic to ensure backward compatibility while maintaining functionality for task ID, text, and website. - Enhanced environment variable loading for improved clarity and consistency.	2025-06-27 10:11:21 +02:00
Magnus Müller	bb11c7e7ca	feat: add single task mode support in evaluation workflow - Introduced parameters for single task mode in eval.yaml, allowing task ID, text, website, and branch to be specified. - Updated service.py to handle single task mode, including conditional saving to the server and local run ID generation. - Enhanced argument parsing to accommodate single task mode, ensuring backward compatibility with existing multi-task functionality.	2025-06-27 09:57:37 +02:00
Nick Sweeting	74d02c07a7	increase screenshot timeout to default timeout	2025-06-26 02:59:57 -07:00
Nick Sweeting	170c3e0bb7	remove duplicate timeout config	2025-06-26 01:24:03 -07:00
Nick Sweeting	7c317e9515	always group tests by class so they can reuse fixtures	2025-06-25 23:31:02 -07:00
Gregor Žunič	7a10ae0c96	Squashed commit langchain to native	2025-06-24 12:26:55 +02:00
Nick Sweeting	27f63622bf	dont sent telemetry or cloud sync events during evals	2025-06-23 15:32:28 -07:00
Magnus Müller	537e86da4c	Simplifies evaluation pipeline execution Removes the `fresh_start` option and the stage for loading existing results. This change streamlines the evaluation pipeline by removing the option to load existing results. The pipeline now always executes from the browser setup stage, ensuring consistent and repeatable evaluation runs.	2025-06-23 10:11:07 +02:00
Magnus Müller	d4a29c4b93	Improves evaluation robustness and reporting Enhances evaluation by improving error handling, providing more detailed logging, and adding a local summary calculation. The changes include: - Adds comprehensive judge fallback to Mind2Web judge and ensures backward compatibility. - Improves error handling during evaluation by capturing and logging the last part of the output on failure. - Adds a new function to calculate a summary of local evaluation results, displaying total tasks, success rate, and average score. - Includes comprehensive evaluation data for debugging purposes.	2025-06-23 00:08:14 +02:00
Magnus Müller	be16ff3f69	Implement comprehensive judge system for task evaluation Added a new judge system in `judge_system.py` that evaluates browser-use agent runs, providing detailed structured feedback. Updated the evaluation workflow in `eval.yaml` to include a new command-line argument for using the comprehensive judge. Modified `service.py` to integrate the new judge system, allowing for fallback to the original Mind2Web evaluation if specified. Enhanced error handling and logging throughout the evaluation process.	2025-06-22 22:43:57 +02:00
Nick Sweeting	ac22e6ae20	Test fixes, evenbus tweaks, docs updates, and better warnings (#2027 )	2025-06-21 06:32:11 -07:00
Nick Sweeting	0af8c8c0fe	imports	2025-06-21 06:29:10 -07:00
Nick Sweeting	eb21d92d34	include extras packages in CI to avoid missing imports errors	2025-06-21 06:23:23 -07:00
Magnus Müller	aeea1788fa	fix: CLI argument default conflict for highlight_elements - Change from --highlight-elements (action='store_true') to --no-highlight-elements (action='store_false') - Fix CLI argument defaulting to False when flag not provided, conflicting with function default of True - Update GitHub workflow to use new flag logic (add flag when highlight_elements=false) - Ensure consistent behavior: highlighting enabled by default, can be disabled with --no-highlight-elements Resolves bug where CLI users got highlighting disabled by default instead of enabled	2025-06-21 12:52:40 +02:00
Magnus Müller	9292e6c48d	feat: add highlight_elements flag for controlling element highlighting - Add --highlight-elements CLI argument to eval/service.py - Pass highlight_elements through entire execution pipeline - Add highlight_elements support to GitHub workflow (eval.yaml) - Allow users to control whether interactive elements are highlighted on pages during automation - Improves debugging and visibility options for browser automation	2025-06-21 12:40:41 +02:00
Magnus Müller	f1d5dc5a17	Pass laminar_eval_id from frontend	2025-06-21 09:31:14 +02:00
Magnus Müller	83d92513a4	Monitor eval cpu	2025-06-20 23:35:56 +02:00
mertunsall	76ef41da45	Merge branch 'main' into mert/new_everything	2025-06-19 14:11:56 +02:00
Magnus Müller	f89a97f256	Delete laminar files since it is integrated to the main eval service	2025-06-19 11:26:53 +02:00
Magnus Müller	0b6ebea431	Merge branch 'main' into mert/new_everything	2025-06-19 09:49:43 +02:00
Nick Sweeting	829eafe982	Merge branch 'main' into eventbus	2025-06-18 10:26:50 -07:00
Magnus Müller	f89a1aac84	Update evaluation workflow to use laminar_eval.py - Changed the script executed in the evaluation workflow from `eval/service.py` to `eval/laminar_eval.py` for consistency with recent updates.	2025-06-18 16:43:06 +02:00
Magnus Müller	7f1d256964	Update dependencies in pyproject.toml and rename evaluation workflow - Removed duplicate lmnr dependency from the `dependencies` section in `pyproject.toml`. - Updated `lmnr` version to `0.6.11` in the `eval` extras group. - Renamed the evaluation workflow from "Run Evaluation Script" to "Run Laminar Eval Script" for clarity. - Adjusted the dependency installation command in the workflow to include the `--extra eval` flag.	2025-06-18 16:31:00 +02:00
Robert Kim	01a36a587e	gh action for laminar evals	2025-06-18 14:38:27 +01:00
mertunsall	1864e52635	Merge branch 'main' into mert/new_everything	2025-06-18 10:00:25 +02:00
Nick Sweeting	d0ec528802	fixed events	2025-06-17 22:38:59 -07:00
Nick Sweeting	ec32fee074	Merge branch 'main' into eventbus	2025-06-17 15:25:24 -07:00

1 2 3 4 5 ...

263 Commits