Files
browser-use/docs/customize/sensitive-data.mdx
2025-05-22 04:58:23 -07:00

148 lines
5.3 KiB
Plaintext

---
title: "Sensitive Data"
description: "Handle sensitive information securely by preventing the model from seeing actual passwords."
icon: "shield"
---
## Handling Sensitive Data
When working with sensitive information like passwords, you can use the `sensitive_data` parameter to prevent the model from seeing the actual values while still allowing it to reference them in its actions.
Make sure to always set [`allowed_domains`](https://docs.browser-use.com/customize/browser-settings#restrict-urls) to restrict the domains the Agent is allowed to visit when working with sensitive data or logins.
### Basic Usage
Here's a basic example of how to use sensitive data:
```python
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from browser_use import Agent
from browser_use.browser.session import BrowserSession
load_dotenv()
# Initialize the model
llm = ChatOpenAI(
model='gpt-4o',
temperature=0.0,
)
# Define sensitive data
# The model will only see the keys (x_name, x_password) but never the actual values
sensitive_data = {'x_name': 'magnus', 'x_password': '12345678'}
# Use the placeholder names in your task description
task = 'go to x.com and login with x_name and x_password then write a post about the meaning of life'
# Configure browser session with allowed domains
browser_session = BrowserSession(
allowed_domains=['example.com']
)
# Pass the sensitive data to the agent
agent = Agent(
task=task,
llm=llm,
sensitive_data=sensitive_data,
browser_session=browser_session
)
async def main():
await agent.run()
if __name__ == '__main__':
asyncio.run(main())
```
In this example:
1. The model only sees `x_name` and `x_password` as placeholders.
2. When the model wants to use your password it outputs x_password - and we replace it with the actual value.
3. When your password is visible on the current page, we replace it in the LLM input - so that the model never has it in its state.
4. The agent will be prevented from going to any site not on `example.com` to protect from prompt injection attacks and jailbreaks
### Domain-Specific Sensitive Data
For enhanced security, you can associate sensitive data with specific domains. This ensures credentials are only used on the domains they're intended for:
```python
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from browser_use import Agent
from browser_use.browser.session import BrowserSession
load_dotenv()
# Initialize the model
llm = ChatOpenAI(
model='gpt-4o',
temperature=0.0,
)
# Domain-specific sensitive data
sensitive_data = {
'https://*.google.com': {'x_email': '...', 'x_pass': '...'},
'chrome-extension://abcd': {'x_api_key': '...'},
'http*://example.com': {'x_authcode': '123123'}
}
# Set browser session with allowed domains that match all domain patterns in sensitive_data
browser_session = BrowserSession(
allowed_domains=[
'https://*.google.com',
'chrome-extension://abcd',
'http://example.com', # Explicitly include http:// if needed
'https://example.com' # By default, only https:// is matched
]
)
# Pass the sensitive data to the agent
agent = Agent(
task="Log into Google, then check my account information",
llm=llm,
sensitive_data=sensitive_data,
browser_session=browser_session
)
async def main():
await agent.run()
if __name__ == '__main__':
asyncio.run(main())
```
With this approach:
1. The Google credentials (`x_email` and `x_pass`) will only be used on Google domains (any subdomain)
2. The API key (`x_api_key`) will only be used in the specific Chrome extension
3. The auth code (`x_authcode`) will only be used on example.com via http or https
### Domain Pattern Format
Domain patterns in sensitive_data follow the same format as `allowed_domains`:
- `example.com` - Matches only example.com
- `*.example.com` - Matches any subdomain of example.com
- `http*://example.com` - Matches both http and https protocols for example.com
- `chrome-extension://*` - Matches any Chrome extension
> **Security Warning**: For security reasons, certain patterns are explicitly rejected:
> - Wildcards in TLD part (e.g., `example.*`) are not allowed as they could match any TLD
> - Embedded wildcards (e.g., `g*e.com`) are rejected to prevent overly broad matches
> - Multiple wildcards like `*.*.domain` are not supported to avoid security issues
The default protocol when no scheme is specified is now `https` for enhanced security.
The system will validate that all domain patterns used in `sensitive_data` are covered by the patterns in `allowed_domains`.
### Missing or Empty Values
When working with sensitive data, keep these details in mind:
- If a key referenced by the model (`<secret>key_name</secret>`) is missing from your `sensitive_data` dictionary, a warning will be logged but the substitution tag will be preserved.
- If you provide an empty value for a key in the `sensitive_data` dictionary, it will be treated the same as a missing key.
- The system will always attempt to process all valid substitutions, even if some keys are missing or empty.
Warning: Vision models still see the image of the page - where the sensitive data might be visible.
This approach ensures that sensitive information remains secure while still allowing the agent to perform tasks that require authentication.