eliott/browser-use

Fork 0

mirror of https://github.com/browser-use/browser-use synced 2026-05-13 17:56:35 +02:00

Go to file

Gregor Žunič 0c2942cb37 added discord buttons

2024-11-06 17:07:54 +01:00

.vscode

Debug file launch.json

2024-11-04 10:24:09 +01:00

examples

Keep browser open

2024-11-06 02:24:26 +01:00

src

Extract hn content

2024-11-06 02:43:44 +01:00

static

higher quality images

2024-11-05 17:55:00 +01:00

.env.example

cleanup

2024-11-05 13:41:05 +01:00

.gitattributes

Track GIF files in /static/ with Git LFS

2024-11-05 16:36:32 +01:00

.gitignore

processing 2.0

2024-11-02 18:43:35 +01:00

.python-version

first repo

2024-10-31 17:00:29 +01:00

conftest.py

work in progress

2024-11-01 17:07:05 +01:00

instruction.md

renamed planning to agent service

2024-11-04 14:50:43 +01:00

LICENSE

Added MIT license

2024-11-05 19:07:17 +01:00

pyproject.toml

cleanup

2024-11-05 13:41:05 +01:00

pytest.ini

added llms

2024-11-01 21:39:09 +01:00

README.md

added discord buttons

2024-11-06 17:07:54 +01:00

requirements.txt

added history

2024-11-05 15:03:10 +01:00

setup.py

cleanup

2024-11-05 13:41:05 +01:00

README.md

🌐 Browser-Use

Open-Source Web Automation with LLMs

Let LLMs interact with websites through a simple interface.

Short Example

from src import Agent
from langchain_openai import ChatOpenAI

agent = Agent(
    task='Go to hackernews on show hn and give me top 10 post titels, their points and hours. Calculate for each the ratio of points per hour.',
    llm=ChatOpenAI(model='gpt-4o'),
)

await agent.run()

Demo

Prompt: Go to hackernews on show hn and give me top 10 post titels, their points and hours. Calculate for each the ratio of points per hour. (1x speed)

Prompt: Search the top 3 AI companies 2024 and find what out what concrete hardware each is using for their model. (1x speed)

Prompt: Go to kayak.com and find a one-way flight from Zürich to San Francisco on 12 January 2025. (2.5x speed)

Prompt: Opening new tabs and searching for images for these people: Albert Einstein, Oprah Winfrey, Steve Jobs. (2.5x speed)

Setup

Create a virtual environment and install dependencies:

# I recommend using uv
pip install -r requirements.txt

Add your API keys to the .env file:

cp .env.example .env

You can use any LLM model supported by LangChain by adding the appropriate environment variables. See langchain models for available options.

Features

Universal LLM Support - Works with any Language Model
Interactive Element Detection - Automatically finds interactive elements
Multi-Tab Management - Seamless handling of browser tabs
XPath Extraction for scraping functions - No more manual DevTools inspection
Vision Model Support - Process visual page information
Customizable Actions - Add your own browser interactions (e.g. add data to database which the LLM can use)
Handles dynamic content - dont worry about cookies or changing content
Chain-of-thought prompting with memory - Solve long-term tasks
Self-correcting - If the LLM makes a mistake, the agent will self-correct its actions

Advanced Examples

Chain of Agents

You can persist the browser across multiple agents and chain them together.

from langchain_anthropic import ChatAnthropic
from src import Agent, Controller

# Persist browser state across agents
controller = Controller()

# Initialize browser agent
agent1 = Agent(
	task='Open 5 VCs websites in the New York area.',
	llm=ChatAnthropic(model_name='claude-3-sonnet', timeout=25, stop=None, temperature=0.3),
	controller=controller,
)
agent2 = Agent(
	task='Give me the names of the founders of the companies in all tabs.',
	llm=ChatAnthropic(model_name='claude-3-sonnet', timeout=25, stop=None, temperature=0.3),
	controller=controller,
)

await agent1.run()
founders, history = await agent2.run()

print(founders)

You can use the history to run the agents again deterministically.

Command Line Usage

Run examples directly from the command line:

python examples/try.py "Your query here" --provider [openai|anthropic]

Anthropic

You need to add ANTHROPIC_API_KEY to your environment variables. Example usage:


python examples/try.py "Search the top 3 AI companies 2024 and find out in 3 new tabs what hardware each is using for their models" --provider anthropic

OpenAI

You need to add OPENAI_API_KEY to your environment variables. Example usage:

python examples/try.py "Go to hackernews on show hn and give me top 10 post titels, their points and hours. Calculate for each the ratio of points per hour. " --provider anthropic

🤖 Supported Models

All LangChain chat models are supported. Tested with:

GPT-4o
GPT-4o Mini
Claude 3.5 Sonnet
LLama 3.1 405B

Limitations

When extracting page content, the message length increases and the LLM gets slower.
Currently one agent costs about 0.01$
Sometimes it tries to repeat the same task over and over again.
Some elements might not be extracted which you want to interact with.
What should we focus on the most?
- Robustness
- Speed
- Cost reduction

Roadmap

Save agent actions and execute them deterministically
Pydantic forced output
Third party SERP API for faster Google Search results
Multi-step action execution to increase speed
Test on mind2web dataset
Add more browser actions

Contributing

Contributions are welcome! Feel free to open issues for bugs or feature requests.

Star ⭐ this repo if you find it useful!
Made with ❤️ by the Browser-Use team