Signed-off-by: Oleg Ivaniv <me@olegivaniv.com> Co-authored-by: Albert Alises <albert.alises@gmail.com> Co-authored-by: Jaakko Husso <jaakko@n8n.io> Co-authored-by: Dimitri Lavrenük <20122620+dlavrenuek@users.noreply.github.com> Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> Co-authored-by: Tuukka Kantola <Tuukkaa@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Mutasem Aldmour <4711238+mutdmour@users.noreply.github.com> Co-authored-by: Raúl Gómez Morales <raul00gm@gmail.com> Co-authored-by: Elias Meire <elias@meire.dev> Co-authored-by: Dimitri Lavrenük <dimitri.lavrenuek@n8n.io> Co-authored-by: Tomi Turtiainen <10324676+tomi@users.noreply.github.com> Co-authored-by: Mutasem Aldmour <mutasem@n8n.io>
14 KiB
Browser MCP — Technical Specification
Feature behaviour is defined in browser-mcp.md. This document covers the implementation in
packages/@n8n/mcp-browser.
Table of Contents
- Component Overview
- Package Structure
- Connection Flow
- CDP Relay Architecture
- Extension Protocol
- Tool System
- Tab Lifecycle
- Error Model
1. Component Overview
The system involves three runtime components:
- MCP Server (
@n8n/mcp-browser) — hosts MCP tools, manages the Playwright connection, and runs the CDP relay. - CDP Relay — WebSocket server bridging Playwright's CDP traffic to the Chrome extension.
- Browser Bridge Extension (
@n8n/mcp-browser-extension) — Chrome extension that useschrome.debuggerto execute CDP commands in the user's real browser.
graph LR
AI[AI Agent / MCP Client]
MCP[MCP Server]
PW[Playwright]
RELAY[CDP Relay Server]
EXT[Browser Bridge Extension]
CHROME[Chrome / chrome.debugger]
AI -- "MCP tool calls" --> MCP
MCP -- "High-level API\n(page.goto, page.click, ...)" --> PW
PW -- "CDP over WebSocket\n(/cdp/{uuid})" --> RELAY
RELAY -- "Extension protocol\n(/extension/{uuid})" --> EXT
EXT -- "chrome.debugger.*" --> CHROME
Key Classes
| Class | File | Responsibility |
|---|---|---|
BrowserConnection |
connection.ts |
Single-connection lifecycle: connect, disconnect, expose state |
PlaywrightAdapter |
adapters/playwright.ts |
All browser operations via Playwright's high-level API |
CDPRelayServer |
cdp-relay.ts |
WebSocket bridge: translates CDP ↔ extension protocol |
ExtensionConnection |
cdp-relay.ts (private) |
Manages the WebSocket to the extension with request/response tracking |
2. Package Structure
src/
├── adapters/
│ └── playwright.ts # PlaywrightAdapter — all browser operations
├── tools/
│ ├── index.ts # createBrowserTools() — tool factory
│ ├── schemas.ts # Composable Zod schemas and output envelope builders
│ ├── response-envelope.ts # Response enrichment (snapshot, modals, console) and error formatting
│ ├── helpers.ts # createConnectedTool() — tool factory with auto-enrichment
│ ├── session.ts # browser_connect, browser_disconnect
│ ├── tabs.ts # browser_tab_open, browser_tab_list, browser_tab_focus, browser_tab_close
│ ├── navigation.ts # browser_navigate, browser_back, browser_forward, browser_reload
│ ├── interaction.ts # browser_click, browser_type, browser_select, browser_drag, ...
│ ├── inspection.ts # browser_snapshot, browser_screenshot, browser_content, browser_evaluate, ...
│ ├── wait.ts # browser_wait
│ └── state.ts # browser_cookies, browser_storage, browser_set_*, ...
├── __tests__/ # Unit tests
├── browser-discovery.ts # Auto-detect Chrome/Brave/Edge executables
├── cdp-relay-protocol.ts # TypeScript types for the relay ↔ extension wire format
├── cdp-relay.ts # CDPRelayServer + ExtensionConnection
├── connection.ts # BrowserConnection — single-connection manager
├── errors.ts # Custom error classes
├── logger.ts # Tagged logger with log-level filtering
├── index.ts # Public API exports
├── server-config.ts # CLI flag + env var parsing
├── server.ts # MCP server setup (http/stdio transport)
├── vendor.d.ts # Type declarations for untyped dependencies
├── types.ts # Shared TypeScript types
└── utils.ts # Utilities (ID generation, error formatting)
3. Connection Flow
connect()
sequenceDiagram
participant AI as AI Agent
participant CONN as BrowserConnection
participant PA as PlaywrightAdapter
participant RELAY as CDPRelayServer
participant EXT as Browser Bridge Extension
AI->>CONN: browser_connect tool
CONN->>PA: new PlaywrightAdapter(config)
PA->>RELAY: new CDPRelayServer()
RELAY->>RELAY: listen() on random port
PA->>PA: chromium.connectOverCDP(relay.cdpEndpoint)
RELAY->>RELAY: waitForExtension() (15s timeout)
EXT->>RELAY: WebSocket connect to /extension/{uuid}
RELAY-->>PA: extension connected
PA->>RELAY: Target.setAutoAttach (root session)
RELAY->>EXT: listRegisteredTabs
EXT-->>RELAY: { tabs: [{ id, title, url }, ...] }
RELAY->>RELAY: cache tab metadata (lazy — no debugger attachment)
RELAY-->>PA: {} (ack)
PA-->>CONN: { pages, activePageId }
CONN-->>AI: { browser, pages }
Tabs are not attached on connect. The debugger is lazily attached to a tab on its first interaction (see §7 Tab Lifecycle).
disconnect()
BrowserConnection.disconnect()callsadapter.close()PlaywrightAdapter.close()closes the Playwright browser contextCDPRelayServer.stop()closes all WebSocket connections and the HTTP server- Extension detects WebSocket close and detaches from all tabs
4. CDP Relay Architecture
The relay server runs on 127.0.0.1 on a random port with two WebSocket
endpoints:
/cdp/{uuid}— Playwright connects here (speaks CDP)/extension/{uuid}— Browser Bridge extension connects here (speaks the extension protocol)
Intercepted CDP Commands
These commands are handled locally by the relay and not forwarded to the extension:
| CDP Command | Relay Behaviour |
|---|---|
Browser.getVersion |
Returns synthetic version info |
Browser.setDownloadBehavior |
Acknowledged, no-op |
Target.createBrowserContext |
Creates a browser context ID. Returns { browserContextId } |
Target.disposeBrowserContext |
Acknowledged, no-op |
Target.setAutoAttach |
On root session: sends listRegisteredTabs to extension, caches tab metadata (no Target.attachedToTarget — tabs are lazy-activated on first use). On child session: acknowledged, no-op |
Target.createTarget |
Sends createTab to extension, registers new tab, eagerly activates (emits Target.attachedToTarget) |
Target.closeTarget |
Sends closeTab to extension, deregisters tab, emits Target.detachedFromTarget |
Target.getTargetInfo |
Returns cached targetInfo from tab cache |
Forwarded Commands
All other CDP commands (e.g. Runtime.evaluate, Page.navigate,
DOM.getDocument) are forwarded to the extension via forwardCDPCommand.
The relay resolves the Playwright session ID to a Chrome tab ID for routing.
Session ID Mapping
The relay uses CDP targetId strings (e.g. "B4FE7A8D1C3E…") as Playwright
session IDs directly — there is no separate session-to-tab mapping. The relay
maintains:
tabCache: Map<targetId, { title, url }>— lightweight metadata for all known tabsactivatedTabs: Set<targetId>— tabs whose debugger has been attached andTarget.attachedToTargetsent to PlaywrightprimaryTabId: string | undefined— the first-seen tab ID
When forwarding CDP commands, the relay passes the Playwright sessionId
directly as the extension's id parameter, since they are the same targetId
string.
5. Extension Protocol
Defined in cdp-relay-protocol.ts. Current version: PROTOCOL_VERSION = 1.
All tab identifiers use CDP Target.targetId strings resolved by the
extension via chrome.debugger + Target.getTargetInfo (e.g.
"B4FE7A8D1C3E…"). The extension is the only component that maps these to
Chrome internals.
Commands (relay → extension)
| Command | Params | Description |
|---|---|---|
listRegisteredTabs |
{} |
List all registered (user-selected) tabs |
forwardCDPCommand |
{ method, params?, id? } |
Forward a CDP command to a tab |
createTab |
{ url? } |
Create and attach to a new tab |
closeTab |
{ id } |
Close a controlled tab |
attachTab |
{ id } |
Attach the debugger to a tab (lazy, on first interaction) |
listTabs |
{} |
List all currently controlled tabs |
Events (extension → relay)
| Event | Params | Description |
|---|---|---|
forwardCDPEvent |
{ method, params?, id? } |
CDP event from a tab |
tabOpened |
{ id, title, url } |
New tab opened |
tabClosed |
{ id } |
Tab was closed |
Wire Format
Request (relay → extension):
{ "id": 1, "method": "forwardCDPCommand", "params": { "method": "Runtime.evaluate", "params": { "expression": "1+1" }, "id": "B4FE7A8D1C3E" } }
Response (extension → relay):
{ "id": 1, "result": { "result": { "type": "number", "value": 2 } } }
Event (extension → relay, no id):
{ "method": "tabOpened", "params": { "id": "A1B2C3D4E5F6", "title": "New Tab", "url": "https://example.com" } }
6. Tool System
Tool Factory Pattern
All connected tools are created via createConnectedTool() in
tools/helpers.ts:
createConnectedTool(connection, name, description, inputSchema, async (state, input, pageId) => {
// state.adapter.* — Playwright operations
// pageId — resolved from input.pageId or state.activePageId
return formatCallToolResult({ clicked: true });
}, outputSchema, { autoSnapshot: true, waitForCompletion: true });
The factory accepts ConnectedToolOptions:
autoSnapshot— append accessibility snapshot, modal state, and console summary to the response after the actionwaitForCompletion— wrap the action in a network/navigation settle wait
Schema Composition
Output schemas use withSnapshotEnvelope() from tools/schemas.ts to
merge tool-specific fields with the auto-injected envelope:
import { withSnapshotEnvelope } from './schemas';
const outputSchema = withSnapshotEnvelope({
clicked: z.boolean(),
ref: z.string().optional(),
});
// → z.object({ clicked, ref, snapshot?, modalStates?, consoleSummary? })
Response Enrichment Pipeline
The createConnectedTool wrapper delegates enrichment to
tools/response-envelope.ts:
resolvePageContext(connection, args) → { state, pageId }
↓
fn(state, args, pageId) — optionally wrapped in waitForCompletion
↓
enrichResponse(result, state, pageId, options)
→ inject snapshot (if autoSnapshot)
→ inject modalStates (if any pending)
→ inject consoleSummary (if errors/warnings)
↓
return result
On error:
buildErrorResponse(error, connection, args, options)
→ structured { error, hint? } with best-effort snapshot + modals
→ isError: true
This wrapper handles:
- Getting the active
ConnectionStatefromBrowserConnection - Resolving
pageId(explicit or default tostate.activePageId) - Post-action response enrichment (snapshot, modals, console summary)
- Non-exclusive error handling — errors still include page context
Tool → Playwright → CDP → Extension flow
Tools call PlaywrightAdapter methods, which use Playwright's high-level API
(e.g. page.goto(), page.click(), locator.fill()). Playwright
internally translates these to CDP commands, which flow through the relay to
the extension, which executes them via chrome.debugger.sendCommand().
Tools never speak CDP directly. The abstraction layers are:
Tool code → PlaywrightAdapter → Playwright API → CDP → CDPRelayServer → Extension → chrome.debugger
7. Tab Lifecycle
Discovery (on connect)
When Playwright sends Target.setAutoAttach on the root session, the relay
sends listRegisteredTabs to the extension. The extension:
- Returns its list of user-selected tabs with their CDP
targetId, title, and URL - The CDP
targetIdis resolved viachrome.debugger.getTargets()without attaching the debugger
The relay caches this metadata but does not activate any tabs yet.
Lazy Activation
When a tool targets a tab for the first time, the relay activates it:
- Sends
attachTab { id }to the extension, which callschrome.debugger.attach() - Sends
Target.attachedToTargetto Playwright so it creates aPageobject - Marks the tab as activated (idempotent on subsequent calls)
Agent-created tabs (via browser_tab_open / Target.createTarget) are
eagerly activated since the extension attaches the debugger during creation.
Dynamic Tracking
The extension's service worker listens to Chrome tab events:
chrome.tabs.onCreated— registers new tab, sendstabOpenedchrome.tabs.onUpdated(status: 'complete') — updates tab infochrome.tabs.onRemoved— deregisters tab, sendstabClosed
The relay maps these events to Playwright's Target.attachedToTarget and
Target.detachedFromTarget CDP events.
Tab Eligibility
A tab is eligible for control if its URL starts with http:// or https://.
Tabs with chrome://, chrome-extension://, about:, or empty URLs are
excluded.
8. Error Model
Defined in errors.ts. All errors extend McpBrowserError.
| Error | When |
|---|---|
NotConnectedError |
Tool called without an active connection |
AlreadyConnectedError |
browser_connect called while already connected |
PageNotFoundError |
Tool targets a pageId that doesn't exist |
StaleRefError |
Element ref from a previous snapshot is no longer valid |
UnsupportedOperationError |
Operation not supported in the current mode |
BrowserNotAvailableError |
Requested browser not found on the system |
BrowserExecutableNotFoundError |
Detected browser has no executable path configured |
ExtensionNotConnectedError |
Extension WebSocket did not connect within timeout. Includes phase: browser_not_launched, extension_missing, or unknown |
Non-Exclusive Errors
Errors are non-exclusive: when a tool action fails, buildErrorResponse()
in tools/response-envelope.ts still attempts to include the accessibility
snapshot and modal state in the error response. This gives the AI page
context to understand and recover from failures.
Error responses include structured JSON with an error field, optional
hint (actionable guidance from McpBrowserError), and best-effort
snapshot and modalStates fields. The isError: true flag is set for
MCP SDK compatibility.
Session tools (browser_connect, browser_disconnect) use a separate
error path via formatErrorResponse() in utils.ts, since they don't
go through createConnectedTool.