2024-11-06 17:07:54 +01:00
2024-11-04 10:24:09 +01:00
2024-11-06 02:24:26 +01:00
2024-11-06 02:43:44 +01:00
2024-11-05 17:55:00 +01:00
2024-11-05 13:41:05 +01:00
2024-11-02 18:43:35 +01:00
2024-10-31 17:00:29 +01:00
2024-11-01 17:07:05 +01:00
2024-11-05 19:07:17 +01:00
2024-11-05 13:41:05 +01:00
2024-11-01 21:39:09 +01:00
2024-11-06 17:07:54 +01:00
2024-11-05 15:03:10 +01:00
2024-11-05 13:41:05 +01:00

🌐 Browser-Use

Open-Source Web Automation with LLMs

GitHub stars License: MIT Python 3.11+ Discord

Let LLMs interact with websites through a simple interface.

Short Example

from src import Agent
from langchain_openai import ChatOpenAI

agent = Agent(
    task='Go to hackernews on show hn and give me top 10 post titels, their points and hours. Calculate for each the ratio of points per hour.',
    llm=ChatOpenAI(model='gpt-4o'),
)

await agent.run()

Demo

Prompt: Go to hackernews on show hn and give me top 10 post titels, their points and hours. Calculate for each the ratio of points per hour. (1x speed)

Prompt: Search the top 3 AI companies 2024 and find what out what concrete hardware each is using for their model. (1x speed)

Kayak flight search demo

Prompt: Go to kayak.com and find a one-way flight from Zürich to San Francisco on 12 January 2025. (2.5x speed)

Photos search demo

Prompt: Opening new tabs and searching for images for these people: Albert Einstein, Oprah Winfrey, Steve Jobs. (2.5x speed)

Setup

  1. Create a virtual environment and install dependencies:
# I recommend using uv
pip install -r requirements.txt
  1. Add your API keys to the .env file:
cp .env.example .env

You can use any LLM model supported by LangChain by adding the appropriate environment variables. See langchain models for available options.

Features

  • Universal LLM Support - Works with any Language Model
  • Interactive Element Detection - Automatically finds interactive elements
  • Multi-Tab Management - Seamless handling of browser tabs
  • XPath Extraction for scraping functions - No more manual DevTools inspection
  • Vision Model Support - Process visual page information
  • Customizable Actions - Add your own browser interactions (e.g. add data to database which the LLM can use)
  • Handles dynamic content - dont worry about cookies or changing content
  • Chain-of-thought prompting with memory - Solve long-term tasks
  • Self-correcting - If the LLM makes a mistake, the agent will self-correct its actions

Advanced Examples

Chain of Agents

You can persist the browser across multiple agents and chain them together.

from langchain_anthropic import ChatAnthropic
from src import Agent, Controller

# Persist browser state across agents
controller = Controller()

# Initialize browser agent
agent1 = Agent(
	task='Open 5 VCs websites in the New York area.',
	llm=ChatAnthropic(model_name='claude-3-sonnet', timeout=25, stop=None, temperature=0.3),
	controller=controller,
)
agent2 = Agent(
	task='Give me the names of the founders of the companies in all tabs.',
	llm=ChatAnthropic(model_name='claude-3-sonnet', timeout=25, stop=None, temperature=0.3),
	controller=controller,
)

await agent1.run()
founders, history = await agent2.run()

print(founders)

You can use the history to run the agents again deterministically.

Command Line Usage

Run examples directly from the command line:

python examples/try.py "Your query here" --provider [openai|anthropic]

Anthropic

You need to add ANTHROPIC_API_KEY to your environment variables. Example usage:


python examples/try.py "Search the top 3 AI companies 2024 and find out in 3 new tabs what hardware each is using for their models" --provider anthropic

OpenAI

You need to add OPENAI_API_KEY to your environment variables. Example usage:

python examples/try.py "Go to hackernews on show hn and give me top 10 post titels, their points and hours. Calculate for each the ratio of points per hour. " --provider anthropic

🤖 Supported Models

All LangChain chat models are supported. Tested with:

  • GPT-4o
  • GPT-4o Mini
  • Claude 3.5 Sonnet
  • LLama 3.1 405B

Limitations

  • When extracting page content, the message length increases and the LLM gets slower.
  • Currently one agent costs about 0.01$
  • Sometimes it tries to repeat the same task over and over again.
  • Some elements might not be extracted which you want to interact with.
  • What should we focus on the most?
    • Robustness
    • Speed
    • Cost reduction

Roadmap

  • Save agent actions and execute them deterministically
  • Pydantic forced output
  • Third party SERP API for faster Google Search results
  • Multi-step action execution to increase speed
  • Test on mind2web dataset
  • Add more browser actions

Contributing

Contributions are welcome! Feel free to open issues for bugs or feature requests.


Star this repo if you find it useful!
Made with ❤️ by the Browser-Use team
Description
Mirrored from GitHub
Readme MIT 128 MiB
Languages
Python 98%
Shell 1.4%
Dockerfile 0.4%
HTML 0.2%