mirror of https://github.com/browser-use/browser-use synced 2026-05-06 17:52:15 +02:00

Files

Cursor Agent b2baabd934 Remove output_schema handling from Agent and judge system

Co-authored-by: mamagnus00 <mamagnus00@gmail.com>

2025-06-28 18:29:10 +00:00

9.2 KiB

Raw Blame History

Structured Output Implementation Analysis

Overview

The structured output functionality implementation in the browser-use evaluation system works by converting JSON schemas from task datasets into Pydantic models that are passed to the Controller. The Controller then creates a custom done action that enforces the structured output format. The judge system evaluates the final structured response without needing the original schema.

Correct Implementation Flow

1. Task Class Enhancement (`eval/service.py`)

✅ Status: Correctly Implemented

Added output_schema as an optional field in the Task class __init__ method (line 1074)
Properly included in known_fields set for clean handling (line 1077)
Added to string representation for debugging (line 1086)

self.output_schema = kwargs.get('output_schema', None)  # Add structured output schema support
known_fields = {'website', 'reference_length', 'level', 'cluster_id', 'login_cookie', 'login_type', 'category', 'output_schema'}

2. Controller Integration (Correct Approach)

✅ Status: Should be implemented in create_controller function

The correct implementation should:

Extract output_schema from task in run_agent_with_browser
Convert JSON schema to a Pydantic model
Pass the model to Controller(output_model=PydanticModel)
Controller automatically creates a structured done action

Example of correct pattern:

# From examples/features/custom_output.py
class Posts(BaseModel):
    posts: list[Post]

controller = Controller(output_model=Posts)

3. How Controller Handles Structured Output

✅ Status: Already implemented in Controller class

When output_model is provided to Controller:

Creates ExtendedOutputModel with success parameter + data field
Registers custom done action that enforces the structure
Returns JSON serialized structured data in extracted_content

# From browser_use/controller/service.py lines 77-104
if output_model is not None:
    class ExtendedOutputModel(BaseModel):
        success: bool = True
        data: output_model

    @self.registry.action(
        'Complete task - with return text and if the task is finished (success=True)...',
        param_model=ExtendedOutputModel,
    )
    async def done(params: ExtendedOutputModel):
        output_dict = params.data.model_dump()
        return ActionResult(
            is_done=True,
            success=params.success,
            extracted_content=json.dumps(output_dict),
            long_term_memory=f'Task completed. Success Status: {params.success}',
        )

4. Judge System Evaluation

✅ Status: Correct - No schema needed

The judge system correctly evaluates the final structured response without needing the original schema:

Receives the final JSON output in final_result
Evaluates whether the output is properly structured and meaningful
No need to pass output_schema to judge functions

Required Changes to Complete Implementation

1. Modify `create_controller` function

def create_controller(use_serp: bool = False, output_schema: dict | None = None):
    """Create a controller, optionally with SERP search and structured output"""
    if output_schema:
        # Convert JSON schema to Pydantic model
        output_model = create_pydantic_model_from_schema(output_schema)
        controller = Controller(output_model=output_model)
    else:
        controller = Controller()
    
    if use_serp:
        # Add SERP search action to existing controller
        add_serp_search_to_controller(controller)
    
    return controller

2. Update `run_agent_with_browser` function

async def run_agent_with_browser(...):
    # Extract output_schema from task if available
    output_schema = None
    if hasattr(task, 'output_schema') and task.output_schema:
        output_schema = task.output_schema
        logger.info(f'🎯 Task {task.task_id}: Using structured output schema')

    # Create controller with structured output support
    controller = create_controller(use_serp=use_serp, output_schema=output_schema)
    
    agent = Agent(
        task=task.confirmed_task,
        llm=llm,
        controller=controller,  # Pass controller with structured output
        # ... other parameters
    )

3. Schema to Pydantic Conversion Utility

Need to implement a utility function to convert JSON schemas to Pydantic models:

def create_pydantic_model_from_schema(schema: dict) -> type[BaseModel]:
    """Convert JSON schema to Pydantic model"""
    # Implementation needed to dynamically create Pydantic models from JSON schemas
    pass

Incorrect Previous Implementation

❌ Removed: Agent-level output_schema handling

output_schema parameter in Agent constructor
output_schema in AgentSettings
Schema injection in get_next_action method

❌ Removed: Judge system schema validation

output_schema parameter in judge functions
Schema validation prompts in judge system
Schema extraction in evaluation functions

Technical Flow (Corrected)

End-to-End Process:

Dataset Loading: JSON datasets include output_schema field in task definitions
Task Creation: Task class loads and stores the schema as an attribute
Controller Creation: Schema is converted to Pydantic model and passed to Controller
Controller Setup: Controller creates custom done action with structured output enforcement
Agent Execution: Agent uses controller with structured done action
Output Generation: Agent calls structured done action, returns JSON in extracted_content
Judge Evaluation: Judge evaluates final JSON output for quality and correctness

Key Features

✅ Backward Compatibility

Tasks without output_schema continue to work with standard done action
No breaking changes to existing functionality

✅ Proper Separation of Concerns

Controller handles output structure enforcement
Agent focuses on task execution
Judge evaluates final output quality

✅ Leverages Existing Patterns

Uses established Controller output_model pattern
Follows existing examples in codebase
Maintains consistency with browser-use architecture

Implementation Status

✅ Completed Components:

Task class enhancement
Controller structured output handling (already exists)
Judge system evaluation (no changes needed)

🔄 Required Components:

Modify create_controller function to accept output_schema
Update run_agent_with_browser to pass schema to controller
Implement JSON schema to Pydantic model conversion utility

❌ Removed Incorrect Components:

Agent-level output_schema handling: Removed output_schema parameter from Agent constructor and AgentSettings
Judge system schema validation: Removed output_schema parameters from judge functions and schema validation prompts
Schema injection in LLM prompts: Removed schema instruction injection in get_next_action method

✅ Correctly Implemented Components:

Task class enhancement: output_schema field properly added to Task class in eval/service.py
Controller structured output handling: Already exists in Controller class via output_model parameter
Judge system evaluation: Correctly evaluates final structured response without needing schema

🔄 Required Implementation:

The correct implementation requires:

Convert JSON Schema to Pydantic Model: Create utility function to dynamically generate Pydantic models from JSON schemas
Update create_controller function: Accept output_schema parameter and convert to Pydantic model for Controller
Update run_agent_with_browser: Extract schema from task and pass to controller creation

Implementation Plan:

# 1. Add schema conversion utility
def create_pydantic_model_from_schema(schema: dict) -> type[BaseModel]:
    """Convert JSON schema to Pydantic model dynamically"""
    # Implementation needed

# 2. Update create_controller function  
def create_controller(use_serp: bool = False, output_schema: dict | None = None):
    if output_schema:
        output_model = create_pydantic_model_from_schema(output_schema)
        controller = Controller(output_model=output_model)
    else:
        controller = Controller()
    
    if use_serp:
        add_serp_search_to_controller(controller)
    return controller

# 3. Update run_agent_with_browser
async def run_agent_with_browser(...):
    output_schema = getattr(task, 'output_schema', None)
    controller = create_controller(use_serp=use_serp, output_schema=output_schema)
    agent = Agent(task=task.confirmed_task, llm=llm, controller=controller, ...)

The main remaining work is implementing the JSON schema to Pydantic model conversion utility, which is the core technical challenge for this feature.

Conclusion

The structured output functionality should leverage the existing Controller output_model pattern rather than implementing custom schema handling in the Agent or Judge systems. This approach is cleaner, follows established patterns, and maintains proper separation of concerns. The main work required is converting JSON schemas to Pydantic models and updating the controller creation flow.

9.2 KiB Raw Blame History