Nick Sweeting
|
3209fd95f7
|
lint and hint fixes
|
2025-06-21 06:07:21 -07:00 |
|
Nick Sweeting
|
e3d21d33a1
|
fix evaluate_tasks.py errors in CI
|
2025-06-16 17:20:27 -07:00 |
|
Nick Sweeting
|
b0ed680419
|
use patchright for stealth tests
|
2025-06-09 15:29:37 -07:00 |
|
Magnus Müller
|
eaab9f04d7
|
Enhance GitHub Actions workflow and evaluate_tasks.py to include detailed task evaluation results. The workflow now captures and displays detailed results in a structured format, while the Python script outputs detailed results as JSON for better integration with GitHub Actions. This improves visibility and understanding of task outcomes in the evaluation process.
|
2025-06-07 13:39:16 +02:00 |
|
Magnus Müller
|
e6171e0fc1
|
Disable Chromium sandbox in BrowserProfile for CI environment in evaluate_tasks.py to improve compatibility with GitHub Actions.
|
2025-06-07 12:59:22 +02:00 |
|
Magnus Müller
|
43d96da06c
|
Enhance error handling and debugging in evaluate_tasks.py by adding browser and LLM test calls. Capture and log errors during browser session initiation and agent execution, improving overall troubleshooting capabilities.
|
2025-06-07 12:53:14 +02:00 |
|
Magnus Müller
|
9899262186
|
Add additional debug logging in evaluate_tasks.py to track task execution flow and subprocess outputs. Enhance error reporting by capturing and displaying full stdout and stderr for better troubleshooting during agent execution.
|
2025-06-07 12:48:04 +02:00 |
|
Magnus Müller
|
54226c7b51
|
Add detailed debug logging in evaluate_tasks.py to capture agent execution steps and outputs. Enhance error reporting for subprocess failures and improve output handling when no agent output is provided.
|
2025-06-07 12:43:51 +02:00 |
|
Magnus Müller
|
592b9afbce
|
Refactor evaluate_tasks.py to run agent tasks in separate subprocesses, enhancing isolation and error handling. Introduce argparse for task selection and improve logging management during task execution.
|
2025-06-07 12:37:07 +02:00 |
|
Magnus Müller
|
8246bcd299
|
Refactor run_task in evaluate_tasks.py to create a dedicated BrowserSession for each agent, improving session management and ensuring headless execution with --no-sandbox argument.
|
2025-06-07 11:58:01 +02:00 |
|
Magnus Müller
|
38cfa86738
|
Update browser_use_pip.yaml to simplify output requirements and refactor run_task in evaluate_tasks.py to remove shared profile parameter, enhancing browser session management with a dedicated profile.
|
2025-06-07 11:50:30 +02:00 |
|
Magnus Müller
|
bf13dbb452
|
Refactor run_task function in evaluate_tasks.py to accept a shared profile for consistent browser sessions across tasks
|
2025-06-07 11:40:19 +02:00 |
|
Magnus Müller
|
d50581e87c
|
Remove --no-sandbox argument from BrowserProfile in evaluate_tasks.py to simplify configuration
|
2025-06-07 11:28:46 +02:00 |
|
Magnus Müller
|
df14af7f00
|
Add --no-sandbox argument to BrowserProfile in evaluate_tasks.py for enhanced security during task execution
|
2025-06-07 11:20:26 +02:00 |
|
Magnus Müller
|
b29dbe5f2a
|
Remove keep_alive option from BrowserProfile in evaluate_tasks.py to streamline session management
|
2025-06-07 11:10:51 +02:00 |
|
Magnus Müller
|
576519ee40
|
Enhance CI workflow by adding agent tasks evaluation step and updating evaluate_tasks.py to output evaluation results
|
2025-06-07 10:59:24 +02:00 |
|
Magnus Müller
|
412904dc65
|
Add keep_alive option to BrowserProfile in evaluate_tasks.py for improved session persistence
|
2025-06-07 10:53:58 +02:00 |
|
Magnus Müller
|
3666d2b077
|
Add agent tasks evaluation script and update CI workflow to include it
|
2025-06-07 10:49:03 +02:00 |
|