Files
browser-use/docs/customize/sensitive-data.mdx
2025-05-27 15:31:26 -07:00

176 lines
6.3 KiB
Plaintext

---
title: "Sensitive Data"
description: "Handle sensitive information securely by preventing the model from seeing actual passwords."
icon: "shield"
---
## Handling Sensitive Data
When working with sensitive information like passwords, you can use the `sensitive_data` parameter to prevent the model from seeing the actual values while still allowing it to reference them in its actions.
Make sure to always set [`allowed_domains`](https://docs.browser-use.com/customize/browser-settings#restrict-urls) to restrict the domains the Agent is allowed to visit when working with sensitive data or logins.
### Basic Usage
Here's a basic example of how to use sensitive data:
```python
from dotenv import load_dotenv
load_dotenv()
from langchain_openai import ChatOpenAI
from browser_use import Agent, BrowserSession
# Initialize the model
llm = ChatOpenAI(model='gpt-4o', temperature=0.0)
# Define sensitive data
# The LLM will only see placeholder names (x_member_number, x_passphrase), never the actual values
sensitive_data = {
'https://*.example.com': {
'x_member_number': '123235325',
'x_passphrase': 'abcwe234',
},
}
# Use the placeholder names in your task description
task = """
1. go to https://travel.example.com
2. sign in with your member number x_member_number and private access code x_passphrase
3. extract today's list of travel deals as JSON
"""
# Recommended: Limit the domains available for the entire browser so the Agent can't be tricked into visiting untrusted URLs
browser_session = BrowserSession(
allowed_domains=['https://*.example.com']
)
# Pass the sensitive data to the agent
agent = Agent(
task=task,
llm=llm,
sensitive_data=sensitive_data,
browser_session=browser_session,
use_vision=False, # recommended: disable vision or the LLM might see entered values in screenshots
)
async def main():
await agent.run()
if __name__ == '__main__':
asyncio.run(main())
```
In this example:
1. The model only sees `x_member_number` and `x_passphrase` as placeholders.
2. When the model wants to use your password it outputs x_passphrase - and we replace it with the actual value in the DOM
3. When sensitive data appear in the content of the current page, we replace it in the page summary fed to the LLM - so that the model never has it in its state.
4. The browser will be entirely prevented from going to any site not under `https://*.example.com` to protect from prompt injection attacks, jailbreaks, and Agent distraction
This approach ensures that sensitive information remains secure while still allowing the agent to perform tasks that require authentication.
<Warning>
Warning: Vision models still see the screenshot of the page by default - where the sensitive data might be visible.
It's recommended to set `Agent(use_vision=False)` when working with `sensitive_data`.
</Warning>
### Full Example
Here's a more complex example demonstrating multiple domains and sensitive data values.
```python
from dotenv import load_dotenv
load_dotenv()
from langchain_openai import ChatOpenAI
from browser_use import Agent, BrowserSession
llm = ChatOpenAI(model='gpt-4o', temperature=0.0)
# Domain-specific sensitive data
sensitive_data = {
'https://*.google.com': {'x_email': '...', 'x_pass': '...'},
'chrome-extension://abcd': {'x_api_key': '...'},
'http*://example.com': {'x_authcode': '123123'}
}
# Set browser session with allowed domains that match all domain patterns in sensitive_data
browser_session = BrowserSession(
allowed_domains=[
'https://*.google.com',
'chrome-extension://abcd',
'http://example.com', # Explicitly include http:// if needed
'https://example.com' # By default, only https:// is matched
]
)
# Pass the sensitive data to the agent
agent = Agent(
task="Log into Google, then check my account information",
llm=llm,
sensitive_data=sensitive_data,
browser_session=browser_session
)
async def main():
await agent.run()
if __name__ == '__main__':
asyncio.run(main())
```
With this approach:
1. The Google credentials (`x_email` and `x_pass`) will only be used on Google domains (any subdomain)
2. The API key (`x_api_key`) will only be used in the specific Chrome extension
3. The auth code (`x_authcode`) will only be used on example.com via http or https
<Warning>
Warning: Vision models still see the image of the page - where the sensitive data might be visible.
It's recommended to set `Agent(use_vision=False)` when working with sensitive_data and forms.
</Warning>
This approach ensures that sensitive information remains secure while still allowing the agent to perform tasks that require authentication.
<a name="allowed_domains"></a>
### Domain Pattern Format
Domain patterns in sensitive_data follow the same format as `allowed_domains`:
- `example.com` - Matches only `https://example.com/*`
- `*.example.com` - Matches `https://example.com/*` and any subdomain `https://*.example.com/*`
- `http*://example.com` - Matches both `http://` and `https://` protocols for `example.com/*`
- `chrome-extension://*` - Matches any Chrome extension URL e.g. `chrome-extension://anyextensionid/options.html`
> **Security Warning**: For security reasons, certain patterns are explicitly rejected:
> - Wildcards in TLD part (e.g., `example.*`) are **not allowed** (`google.*` would match `google.ninja`, `google.pizza`, etc. which is a bad idea)
> - Embedded wildcards (e.g., `g*e.com`) are rejected to prevent overly broad matches
> - Multiple wildcards like `*.*.domain` are not supported currently, open an issue if you need this feature
The default protocol when no scheme is specified is now `https://` for enhanced security.
For convenience the system will validate that all domain patterns used in `Agent(sensitive_data)` are also included in `BrowserSession(allowed_domains)`.
### Missing or Empty Values
When working with sensitive data, keep these details in mind:
- If a key referenced by the model (`<secret>key_name</secret>`) is missing from your `sensitive_data` dictionary, a warning will be logged but the substitution tag will be preserved.
- If you provide an empty value for a key in the `sensitive_data` dictionary, it will be treated the same as a missing key.
- The system will always attempt to process all valid substitutions, even if some keys are missing or empty.