9.2 KiB
Structured Output Implementation Analysis
Overview
The structured output functionality implementation in the browser-use evaluation system works by converting JSON schemas from task datasets into Pydantic models that are passed to the Controller. The Controller then creates a custom done action that enforces the structured output format. The judge system evaluates the final structured response without needing the original schema.
Correct Implementation Flow
1. Task Class Enhancement (eval/service.py)
✅ Status: Correctly Implemented
- Added
output_schemaas an optional field in the Task class__init__method (line 1074) - Properly included in known_fields set for clean handling (line 1077)
- Added to string representation for debugging (line 1086)
self.output_schema = kwargs.get('output_schema', None) # Add structured output schema support
known_fields = {'website', 'reference_length', 'level', 'cluster_id', 'login_cookie', 'login_type', 'category', 'output_schema'}
2. Controller Integration (Correct Approach)
✅ Status: Should be implemented in create_controller function
The correct implementation should:
- Extract
output_schemafrom task inrun_agent_with_browser - Convert JSON schema to a Pydantic model
- Pass the model to
Controller(output_model=PydanticModel) - Controller automatically creates a structured
doneaction
Example of correct pattern:
# From examples/features/custom_output.py
class Posts(BaseModel):
posts: list[Post]
controller = Controller(output_model=Posts)
3. How Controller Handles Structured Output
✅ Status: Already implemented in Controller class
When output_model is provided to Controller:
- Creates
ExtendedOutputModelwith success parameter + data field - Registers custom
doneaction that enforces the structure - Returns JSON serialized structured data in
extracted_content
# From browser_use/controller/service.py lines 77-104
if output_model is not None:
class ExtendedOutputModel(BaseModel):
success: bool = True
data: output_model
@self.registry.action(
'Complete task - with return text and if the task is finished (success=True)...',
param_model=ExtendedOutputModel,
)
async def done(params: ExtendedOutputModel):
output_dict = params.data.model_dump()
return ActionResult(
is_done=True,
success=params.success,
extracted_content=json.dumps(output_dict),
long_term_memory=f'Task completed. Success Status: {params.success}',
)
4. Judge System Evaluation
✅ Status: Correct - No schema needed
The judge system correctly evaluates the final structured response without needing the original schema:
- Receives the final JSON output in
final_result - Evaluates whether the output is properly structured and meaningful
- No need to pass
output_schemato judge functions
Required Changes to Complete Implementation
1. Modify create_controller function
def create_controller(use_serp: bool = False, output_schema: dict | None = None):
"""Create a controller, optionally with SERP search and structured output"""
if output_schema:
# Convert JSON schema to Pydantic model
output_model = create_pydantic_model_from_schema(output_schema)
controller = Controller(output_model=output_model)
else:
controller = Controller()
if use_serp:
# Add SERP search action to existing controller
add_serp_search_to_controller(controller)
return controller
2. Update run_agent_with_browser function
async def run_agent_with_browser(...):
# Extract output_schema from task if available
output_schema = None
if hasattr(task, 'output_schema') and task.output_schema:
output_schema = task.output_schema
logger.info(f'🎯 Task {task.task_id}: Using structured output schema')
# Create controller with structured output support
controller = create_controller(use_serp=use_serp, output_schema=output_schema)
agent = Agent(
task=task.confirmed_task,
llm=llm,
controller=controller, # Pass controller with structured output
# ... other parameters
)
3. Schema to Pydantic Conversion Utility
Need to implement a utility function to convert JSON schemas to Pydantic models:
def create_pydantic_model_from_schema(schema: dict) -> type[BaseModel]:
"""Convert JSON schema to Pydantic model"""
# Implementation needed to dynamically create Pydantic models from JSON schemas
pass
Incorrect Previous Implementation
❌ Removed: Agent-level output_schema handling
output_schemaparameter in Agent constructoroutput_schemain AgentSettings- Schema injection in
get_next_actionmethod
❌ Removed: Judge system schema validation
output_schemaparameter in judge functions- Schema validation prompts in judge system
- Schema extraction in evaluation functions
Technical Flow (Corrected)
End-to-End Process:
- Dataset Loading: JSON datasets include
output_schemafield in task definitions - Task Creation: Task class loads and stores the schema as an attribute
- Controller Creation: Schema is converted to Pydantic model and passed to Controller
- Controller Setup: Controller creates custom
doneaction with structured output enforcement - Agent Execution: Agent uses controller with structured
doneaction - Output Generation: Agent calls structured
doneaction, returns JSON inextracted_content - Judge Evaluation: Judge evaluates final JSON output for quality and correctness
Key Features
✅ Backward Compatibility
- Tasks without output_schema continue to work with standard
doneaction - No breaking changes to existing functionality
✅ Proper Separation of Concerns
- Controller handles output structure enforcement
- Agent focuses on task execution
- Judge evaluates final output quality
✅ Leverages Existing Patterns
- Uses established Controller
output_modelpattern - Follows existing examples in codebase
- Maintains consistency with browser-use architecture
Implementation Status
✅ Completed Components:
- Task class enhancement
- Controller structured output handling (already exists)
- Judge system evaluation (no changes needed)
🔄 Required Components:
- Modify
create_controllerfunction to acceptoutput_schema - Update
run_agent_with_browserto pass schema to controller - Implement JSON schema to Pydantic model conversion utility
❌ Removed Incorrect Components:
- Agent-level output_schema handling: Removed
output_schemaparameter from Agent constructor and AgentSettings - Judge system schema validation: Removed
output_schemaparameters from judge functions and schema validation prompts - Schema injection in LLM prompts: Removed schema instruction injection in
get_next_actionmethod
✅ Correctly Implemented Components:
- Task class enhancement:
output_schemafield properly added to Task class ineval/service.py - Controller structured output handling: Already exists in Controller class via
output_modelparameter - Judge system evaluation: Correctly evaluates final structured response without needing schema
🔄 Required Implementation:
The correct implementation requires:
- Convert JSON Schema to Pydantic Model: Create utility function to dynamically generate Pydantic models from JSON schemas
- Update
create_controllerfunction: Acceptoutput_schemaparameter and convert to Pydantic model for Controller - Update
run_agent_with_browser: Extract schema from task and pass to controller creation
Implementation Plan:
# 1. Add schema conversion utility
def create_pydantic_model_from_schema(schema: dict) -> type[BaseModel]:
"""Convert JSON schema to Pydantic model dynamically"""
# Implementation needed
# 2. Update create_controller function
def create_controller(use_serp: bool = False, output_schema: dict | None = None):
if output_schema:
output_model = create_pydantic_model_from_schema(output_schema)
controller = Controller(output_model=output_model)
else:
controller = Controller()
if use_serp:
add_serp_search_to_controller(controller)
return controller
# 3. Update run_agent_with_browser
async def run_agent_with_browser(...):
output_schema = getattr(task, 'output_schema', None)
controller = create_controller(use_serp=use_serp, output_schema=output_schema)
agent = Agent(task=task.confirmed_task, llm=llm, controller=controller, ...)
The main remaining work is implementing the JSON schema to Pydantic model conversion utility, which is the core technical challenge for this feature.
Conclusion
The structured output functionality should leverage the existing Controller output_model pattern rather than implementing custom schema handling in the Agent or Judge systems. This approach is cleaner, follows established patterns, and maintains proper separation of concerns. The main work required is converting JSON schemas to Pydantic models and updating the controller creation flow.