Commit Graph

18 Commits

Author SHA1 Message Date
Nick Sweeting
3209fd95f7 lint and hint fixes 2025-06-21 06:07:21 -07:00
Nick Sweeting
e3d21d33a1 fix evaluate_tasks.py errors in CI 2025-06-16 17:20:27 -07:00
Nick Sweeting
b0ed680419 use patchright for stealth tests 2025-06-09 15:29:37 -07:00
Magnus Müller
eaab9f04d7 Enhance GitHub Actions workflow and evaluate_tasks.py to include detailed task evaluation results. The workflow now captures and displays detailed results in a structured format, while the Python script outputs detailed results as JSON for better integration with GitHub Actions. This improves visibility and understanding of task outcomes in the evaluation process. 2025-06-07 13:39:16 +02:00
Magnus Müller
e6171e0fc1 Disable Chromium sandbox in BrowserProfile for CI environment in evaluate_tasks.py to improve compatibility with GitHub Actions. 2025-06-07 12:59:22 +02:00
Magnus Müller
43d96da06c Enhance error handling and debugging in evaluate_tasks.py by adding browser and LLM test calls. Capture and log errors during browser session initiation and agent execution, improving overall troubleshooting capabilities. 2025-06-07 12:53:14 +02:00
Magnus Müller
9899262186 Add additional debug logging in evaluate_tasks.py to track task execution flow and subprocess outputs. Enhance error reporting by capturing and displaying full stdout and stderr for better troubleshooting during agent execution. 2025-06-07 12:48:04 +02:00
Magnus Müller
54226c7b51 Add detailed debug logging in evaluate_tasks.py to capture agent execution steps and outputs. Enhance error reporting for subprocess failures and improve output handling when no agent output is provided. 2025-06-07 12:43:51 +02:00
Magnus Müller
592b9afbce Refactor evaluate_tasks.py to run agent tasks in separate subprocesses, enhancing isolation and error handling. Introduce argparse for task selection and improve logging management during task execution. 2025-06-07 12:37:07 +02:00
Magnus Müller
8246bcd299 Refactor run_task in evaluate_tasks.py to create a dedicated BrowserSession for each agent, improving session management and ensuring headless execution with --no-sandbox argument. 2025-06-07 11:58:01 +02:00
Magnus Müller
38cfa86738 Update browser_use_pip.yaml to simplify output requirements and refactor run_task in evaluate_tasks.py to remove shared profile parameter, enhancing browser session management with a dedicated profile. 2025-06-07 11:50:30 +02:00
Magnus Müller
bf13dbb452 Refactor run_task function in evaluate_tasks.py to accept a shared profile for consistent browser sessions across tasks 2025-06-07 11:40:19 +02:00
Magnus Müller
d50581e87c Remove --no-sandbox argument from BrowserProfile in evaluate_tasks.py to simplify configuration 2025-06-07 11:28:46 +02:00
Magnus Müller
df14af7f00 Add --no-sandbox argument to BrowserProfile in evaluate_tasks.py for enhanced security during task execution 2025-06-07 11:20:26 +02:00
Magnus Müller
b29dbe5f2a Remove keep_alive option from BrowserProfile in evaluate_tasks.py to streamline session management 2025-06-07 11:10:51 +02:00
Magnus Müller
576519ee40 Enhance CI workflow by adding agent tasks evaluation step and updating evaluate_tasks.py to output evaluation results 2025-06-07 10:59:24 +02:00
Magnus Müller
412904dc65 Add keep_alive option to BrowserProfile in evaluate_tasks.py for improved session persistence 2025-06-07 10:53:58 +02:00
Magnus Müller
3666d2b077 Add agent tasks evaluation script and update CI workflow to include it 2025-06-07 10:49:03 +02:00