mirror of
https://github.com/browser-use/browser-use
synced 2026-05-06 17:52:15 +02:00
49 lines
1.6 KiB
Plaintext
49 lines
1.6 KiB
Plaintext
---
|
|
title: "Evaluations"
|
|
description: "Test the Browser Use agent on standardized benchmarks"
|
|
icon: "chart-bar"
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
Browser Use uses proprietary/private test sets that must never be committed to Github and must be fetched through a authorized api request.
|
|
Accessing these test sets requires an approved Browser Use account.
|
|
There are currently no publicly available test sets, but some may be released in the future.
|
|
|
|
## Get an Api Access Key
|
|
|
|
First, navigate to https://browser-use.tools and log in with an authorized browser use account.
|
|
|
|
Then, click the "Account" button at the top right of the page, and click the "Cycle New Key" button on that page.
|
|
|
|
Copy the resulting url and secret key into your `.env` file. It should look like this:
|
|
|
|
```bash .env
|
|
EVALUATION_TOOL_URL= ...
|
|
EVALUATION_TOOL_SECRET_KEY= ...
|
|
```
|
|
|
|
## Running Evaluations
|
|
|
|
First, ensure your file `eval/service.py` is up to date.
|
|
|
|
Then run the file:
|
|
|
|
```bash
|
|
python eval/service.py
|
|
```
|
|
|
|
## Configuring Evaluations
|
|
|
|
You can modify the evaluation by providing flags to the evaluation script. For instance:
|
|
|
|
```bash
|
|
python eval/service.py --parallel_runs 5 --parallel_evaluations 5 --max-steps 25 --start 0 --end 100 --model gpt-4o
|
|
```
|
|
|
|
The evaluations webpage has a convenient GUI for generating these commands. To use it, navigate to https://browser-use.tools/dashboard.
|
|
|
|
Then click the button "New Eval Run" on the left panel. This will open a interface with selectors, inputs, sliders, and switches.
|
|
|
|
Input your desired configuration into the interface and copy the resulting python command at the bottom. Then run this command as before.
|