2024-11-13 13:52:28 +01:00
2024-11-06 18:55:31 +01:00
2024-11-04 10:24:09 +01:00
2024-11-13 13:52:28 +01:00
2024-11-06 18:18:00 +01:00
2024-11-05 17:55:00 +01:00
2024-11-05 13:41:05 +01:00
2024-11-02 18:43:35 +01:00
2024-10-31 17:00:29 +01:00
2024-11-01 17:07:05 +01:00
2024-11-05 19:07:17 +01:00
2024-11-01 21:39:09 +01:00
2024-11-06 18:55:31 +01:00

🌐 Browser-Use

Open-Source Web Automation with LLMs

GitHub stars License: MIT Python 3.11+ Discord

Let LLMs interact with websites through a simple interface.

Short Example

pip install browser-use
from langchain_openai import ChatOpenAI
from browser_use import Agent

agent = Agent(
    task="Go to hackernews on show hn and give me top 10 post titles, their points and hours. Calculate for each the ratio of points per hour.",
    llm=ChatOpenAI(model="gpt-4o"),
)

# ... inside an async function
await agent.run()

Demo

Prompt: Go to hackernews on show hn and give me top 10 post titles, their points and hours. Calculate for each the ratio of points per hour. (1x speed)

Prompt: Search the top 3 AI companies 2024 and find what out what concrete hardware each is using for their model. (1x speed)

Kayak flight search demo

Prompt: Go to kayak.com and find a one-way flight from Zürich to San Francisco on 12 January 2025. (2.5x speed)

Photos search demo

Prompt: Opening new tabs and searching for images for these people: Albert Einstein, Oprah Winfrey, Steve Jobs. (2.5x speed)

Local Setup

  1. Create a virtual environment and install dependencies:
# To install all dependencies including dev
pip install . ."[dev]"
  1. Add your API keys to the .env file:
cp .env.example .env

E.g. for OpenAI:

OPENAI_API_KEY=

You can use any LLM model supported by LangChain by adding the appropriate environment variables. See langchain models for available options.

Features

  • Universal LLM Support - Works with any Language Model
  • Interactive Element Detection - Automatically finds interactive elements
  • Multi-Tab Management - Seamless handling of browser tabs
  • XPath Extraction for scraping functions - No more manual DevTools inspection
  • Vision Model Support - Process visual page information
  • Customizable Actions - Add your own browser interactions (e.g. add data to database which the LLM can use)
  • Handles dynamic content - dont worry about cookies or changing content
  • Chain-of-thought prompting with memory - Solve long-term tasks
  • Self-correcting - If the LLM makes a mistake, the agent will self-correct its actions

Advanced Examples

Chain of Agents

You can persist the browser across multiple agents and chain them together.

from asyncio import run
from browser_use import Agent, Controller
from dotenv import load_dotenv
from langchain_anthropic import ChatAnthropic
load_dotenv()

# Persist browser state across agents
controller = Controller()

# Initialize browser agent
agent1 = Agent(
    task="Open 3 VCs websites in the New York area.",
    llm=ChatAnthropic(model="claude-3-5-sonnet-20240620", timeout=25, stop=None),
    controller=controller)
agent2 = Agent(
    task="Give me the names of the founders of the companies in all tabs.",
    llm=ChatAnthropic(model="claude-3-5-sonnet-20240620", timeout=25, stop=None),
    controller=controller)

run(agent1.run())
founders, history = run(agent2.run())

print(founders)

You can use the history to run the agents again deterministically.

Command Line Usage

Run examples directly from the command line (clone the repo first):

python examples/try.py "Your query here" --provider [openai|anthropic]

Anthropic

You need to add ANTHROPIC_API_KEY to your environment variables. Example usage:


python examples/try.py "Search the top 3 AI companies 2024 and find out in 3 new tabs what hardware each is using for their models" --provider anthropic

OpenAI

You need to add OPENAI_API_KEY to your environment variables. Example usage:

python examples/try.py "Go to hackernews on show hn and give me top 10 post titles, their points and hours. Calculate for each the ratio of points per hour. " --provider anthropic

🤖 Supported Models

All LangChain chat models are supported. Tested with:

  • GPT-4o
  • GPT-4o Mini
  • Claude 3.5 Sonnet
  • LLama 3.1 405B

Limitations

  • When extracting page content, the message length increases and the LLM gets slower.
  • Currently one agent costs about 0.01$
  • Sometimes it tries to repeat the same task over and over again.
  • Some elements might not be extracted which you want to interact with.
  • What should we focus on the most?
    • Robustness
    • Speed
    • Cost reduction

Roadmap

  • Save agent actions and execute them deterministically
  • Pydantic forced output
  • Third party SERP API for faster Google Search results
  • Multi-step action execution to increase speed
  • Test on mind2web dataset
  • Add more browser actions

Contributing

Contributions are welcome! Feel free to open issues for bugs or feature requests.

Feel free to join the Discord for discussions and support.


Star this repo if you find it useful!
Made with ❤️ by the Browser-Use team
Description
Mirrored from GitHub
Readme MIT 128 MiB
Languages
Python 98%
Shell 1.4%
Dockerfile 0.4%
HTML 0.2%