Files
browser-use/structured_output_implementation_analysis.md
2025-06-28 18:29:10 +00:00

9.2 KiB

Structured Output Implementation Analysis

Overview

The structured output functionality implementation in the browser-use evaluation system works by converting JSON schemas from task datasets into Pydantic models that are passed to the Controller. The Controller then creates a custom done action that enforces the structured output format. The judge system evaluates the final structured response without needing the original schema.

Correct Implementation Flow

1. Task Class Enhancement (eval/service.py)

Status: Correctly Implemented

  • Added output_schema as an optional field in the Task class __init__ method (line 1074)
  • Properly included in known_fields set for clean handling (line 1077)
  • Added to string representation for debugging (line 1086)
self.output_schema = kwargs.get('output_schema', None)  # Add structured output schema support
known_fields = {'website', 'reference_length', 'level', 'cluster_id', 'login_cookie', 'login_type', 'category', 'output_schema'}

2. Controller Integration (Correct Approach)

Status: Should be implemented in create_controller function

The correct implementation should:

  • Extract output_schema from task in run_agent_with_browser
  • Convert JSON schema to a Pydantic model
  • Pass the model to Controller(output_model=PydanticModel)
  • Controller automatically creates a structured done action

Example of correct pattern:

# From examples/features/custom_output.py
class Posts(BaseModel):
    posts: list[Post]

controller = Controller(output_model=Posts)

3. How Controller Handles Structured Output

Status: Already implemented in Controller class

When output_model is provided to Controller:

  • Creates ExtendedOutputModel with success parameter + data field
  • Registers custom done action that enforces the structure
  • Returns JSON serialized structured data in extracted_content
# From browser_use/controller/service.py lines 77-104
if output_model is not None:
    class ExtendedOutputModel(BaseModel):
        success: bool = True
        data: output_model

    @self.registry.action(
        'Complete task - with return text and if the task is finished (success=True)...',
        param_model=ExtendedOutputModel,
    )
    async def done(params: ExtendedOutputModel):
        output_dict = params.data.model_dump()
        return ActionResult(
            is_done=True,
            success=params.success,
            extracted_content=json.dumps(output_dict),
            long_term_memory=f'Task completed. Success Status: {params.success}',
        )

4. Judge System Evaluation

Status: Correct - No schema needed

The judge system correctly evaluates the final structured response without needing the original schema:

  • Receives the final JSON output in final_result
  • Evaluates whether the output is properly structured and meaningful
  • No need to pass output_schema to judge functions

Required Changes to Complete Implementation

1. Modify create_controller function

def create_controller(use_serp: bool = False, output_schema: dict | None = None):
    """Create a controller, optionally with SERP search and structured output"""
    if output_schema:
        # Convert JSON schema to Pydantic model
        output_model = create_pydantic_model_from_schema(output_schema)
        controller = Controller(output_model=output_model)
    else:
        controller = Controller()
    
    if use_serp:
        # Add SERP search action to existing controller
        add_serp_search_to_controller(controller)
    
    return controller

2. Update run_agent_with_browser function

async def run_agent_with_browser(...):
    # Extract output_schema from task if available
    output_schema = None
    if hasattr(task, 'output_schema') and task.output_schema:
        output_schema = task.output_schema
        logger.info(f'🎯 Task {task.task_id}: Using structured output schema')

    # Create controller with structured output support
    controller = create_controller(use_serp=use_serp, output_schema=output_schema)
    
    agent = Agent(
        task=task.confirmed_task,
        llm=llm,
        controller=controller,  # Pass controller with structured output
        # ... other parameters
    )

3. Schema to Pydantic Conversion Utility

Need to implement a utility function to convert JSON schemas to Pydantic models:

def create_pydantic_model_from_schema(schema: dict) -> type[BaseModel]:
    """Convert JSON schema to Pydantic model"""
    # Implementation needed to dynamically create Pydantic models from JSON schemas
    pass

Incorrect Previous Implementation

Removed: Agent-level output_schema handling

  • output_schema parameter in Agent constructor
  • output_schema in AgentSettings
  • Schema injection in get_next_action method

Removed: Judge system schema validation

  • output_schema parameter in judge functions
  • Schema validation prompts in judge system
  • Schema extraction in evaluation functions

Technical Flow (Corrected)

End-to-End Process:

  1. Dataset Loading: JSON datasets include output_schema field in task definitions
  2. Task Creation: Task class loads and stores the schema as an attribute
  3. Controller Creation: Schema is converted to Pydantic model and passed to Controller
  4. Controller Setup: Controller creates custom done action with structured output enforcement
  5. Agent Execution: Agent uses controller with structured done action
  6. Output Generation: Agent calls structured done action, returns JSON in extracted_content
  7. Judge Evaluation: Judge evaluates final JSON output for quality and correctness

Key Features

Backward Compatibility

  • Tasks without output_schema continue to work with standard done action
  • No breaking changes to existing functionality

Proper Separation of Concerns

  • Controller handles output structure enforcement
  • Agent focuses on task execution
  • Judge evaluates final output quality

Leverages Existing Patterns

  • Uses established Controller output_model pattern
  • Follows existing examples in codebase
  • Maintains consistency with browser-use architecture

Implementation Status

Completed Components:

  • Task class enhancement
  • Controller structured output handling (already exists)
  • Judge system evaluation (no changes needed)

🔄 Required Components:

  • Modify create_controller function to accept output_schema
  • Update run_agent_with_browser to pass schema to controller
  • Implement JSON schema to Pydantic model conversion utility

Removed Incorrect Components:

  • Agent-level output_schema handling: Removed output_schema parameter from Agent constructor and AgentSettings
  • Judge system schema validation: Removed output_schema parameters from judge functions and schema validation prompts
  • Schema injection in LLM prompts: Removed schema instruction injection in get_next_action method

Correctly Implemented Components:

  • Task class enhancement: output_schema field properly added to Task class in eval/service.py
  • Controller structured output handling: Already exists in Controller class via output_model parameter
  • Judge system evaluation: Correctly evaluates final structured response without needing schema

🔄 Required Implementation:

The correct implementation requires:

  1. Convert JSON Schema to Pydantic Model: Create utility function to dynamically generate Pydantic models from JSON schemas
  2. Update create_controller function: Accept output_schema parameter and convert to Pydantic model for Controller
  3. Update run_agent_with_browser: Extract schema from task and pass to controller creation

Implementation Plan:

# 1. Add schema conversion utility
def create_pydantic_model_from_schema(schema: dict) -> type[BaseModel]:
    """Convert JSON schema to Pydantic model dynamically"""
    # Implementation needed

# 2. Update create_controller function  
def create_controller(use_serp: bool = False, output_schema: dict | None = None):
    if output_schema:
        output_model = create_pydantic_model_from_schema(output_schema)
        controller = Controller(output_model=output_model)
    else:
        controller = Controller()
    
    if use_serp:
        add_serp_search_to_controller(controller)
    return controller

# 3. Update run_agent_with_browser
async def run_agent_with_browser(...):
    output_schema = getattr(task, 'output_schema', None)
    controller = create_controller(use_serp=use_serp, output_schema=output_schema)
    agent = Agent(task=task.confirmed_task, llm=llm, controller=controller, ...)

The main remaining work is implementing the JSON schema to Pydantic model conversion utility, which is the core technical challenge for this feature.

Conclusion

The structured output functionality should leverage the existing Controller output_model pattern rather than implementing custom schema handling in the Agent or Judge systems. This approach is cleaner, follows established patterns, and maintains proper separation of concerns. The main work required is converting JSON schemas to Pydantic models and updating the controller creation flow.