OCI Raw API Integration
This module provides direct integration with Oracle Cloud Infrastructure's Generative AI service using raw API calls, without Langchain dependencies.
Features
- Direct API Integration: Uses OCI's native Python SDK for direct API calls
- Async Support: Full async/await support for non-blocking operations
- Structured Output: Support for Pydantic model validation of responses
- Error Handling: Comprehensive error handling with proper exception types
- Authentication: Support for multiple OCI authentication methods
Installation
Make sure you have the required OCI dependencies installed:
pip install oci
Usage
Basic Usage
from browser_use import Agent
from browser_use.llm import ChatOCIRaw
# Configure the model
model = ChatOCIRaw(
model_id="ocid1.generativeaimodel.oc1.us-chicago-1.amaaaaaask7dceya...",
service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
compartment_id="ocid1.tenancy.oc1..aaaaaaaayeiis5uk2nuubznrekd...",
provider="meta", # or "cohere"
temperature=1.0,
max_tokens=600,
top_p=0.75,
auth_type="API_KEY",
auth_profile="DEFAULT"
)
# Use with browser-use Agent
agent = Agent(
task="Search for Python tutorials and summarize them",
llm=model
)
# Run with asyncio
import asyncio
history = asyncio.run(agent.run())
Provider-Specific Configuration Examples
Meta Llama Model
meta_model = ChatOCIRaw(
model_id="ocid1.generativeaimodel.oc1.us-chicago-1.amaaaaaask7dceya...",
service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
compartment_id="ocid1.tenancy.oc1..aaaaaaaayeiis5uk2nuubznrekd...",
provider="meta", # Uses GenericChatRequest
temperature=0.7,
max_tokens=800,
frequency_penalty=0.0,
presence_penalty=0.0,
top_p=0.9
)
Cohere Model
cohere_model = ChatOCIRaw(
model_id="ocid1.generativeaimodel.oc1.us-chicago-1.amaaaaaask7dceya...",
service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
compartment_id="ocid1.tenancy.oc1..aaaaaaaayeiis5uk2nuubznrekd...",
provider="cohere", # Uses CohereChatRequest
temperature=1.0,
max_tokens=600,
frequency_penalty=0.0,
top_p=0.75,
top_k=0 # Cohere-specific parameter
)
xAI Model
xai_model = ChatOCIRaw(
model_id="ocid1.generativeaimodel.oc1.us-chicago-1.amaaaaaask7dceya...",
service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
compartment_id="ocid1.tenancy.oc1..aaaaaaaayeiis5uk2nuubznrekd...",
provider="xai", # Uses GenericChatRequest
temperature=1.0,
max_tokens=20000,
top_p=1.0,
top_k=0
)
Structured Output
from pydantic import BaseModel
class SearchResult(BaseModel):
title: str
summary: str
relevance_score: float
# Use structured output
response = await model.ainvoke(messages, output_format=SearchResult)
result = response.completion # This is a SearchResult instance
Available Models
For the complete list of available models in Oracle Cloud Infrastructure Generative AI, refer to the official documentation: OCI Generative AI Pretrained Models
Tool Calling Support
Important: Only models that support tool calling/function calling are compatible with browser-use. Tool calling is essential for browser-use as the agent needs to call browser automation functions (click, type, scroll, etc.) to interact with web pages.
According to Oracle's documentation, tool calling functionality is available exclusively through the API and is not supported for browser-based use. However, when using browser-use with OCI models through this integration, the tool calling happens at the application level (not browser-based), making it compatible.
Image Support Models
Several OCI models support image processing capabilities, which are useful when browser-use needs to analyze webpage screenshots:
Vision-Enabled Chat Models
- Meta Llama 3.2 90B Vision: Supports both text and image inputs
- Meta Llama 3.2 11B Vision: Supports both text and image inputs
Image Embedding Models
- Cohere Embed English Image 3: Supports image inputs for semantic searches
- Cohere Embed Multilingual Image 3: Supports multilingual image processing
- Cohere Embed English Light Image 3: Lightweight version with image support
- Cohere Embed Multilingual Light Image 3: Lightweight multilingual version with image support
These vision-enabled models are particularly useful for browser-use tasks that require understanding webpage content through screenshots, such as:
- Identifying UI elements and buttons
- Reading text from images
- Understanding page layouts and visual context
- Processing charts, graphs, and visual data
Configuration
Provider-Specific API Formats
Different model providers in OCI use different API request formats:
Meta and xAI Models
- Use
GenericChatRequestwith messages array - Support structured conversations with multiple message types
- Parameters:
temperature,max_tokens,frequency_penalty,presence_penalty,top_p
Cohere Models
- Use
CohereChatRequestwith single message string - Convert conversation history to a single formatted string
- Parameters:
temperature,max_tokens,frequency_penalty,top_p,top_k
The integration automatically detects the correct format based on the provider parameter and handles the conversion transparently.
Authentication Types
The integration supports multiple OCI authentication methods:
API_KEY: Uses API key authentication (default)INSTANCE_PRINCIPAL: Uses instance principal authenticationRESOURCE_PRINCIPAL: Uses resource principal authentication
Model Parameters
model_id: The OCID of your OCI GenAI modelservice_endpoint: The OCI service endpoint URLcompartment_id: The OCID of your OCI compartmentprovider: Model provider ("meta", "cohere", or "xai")temperature: Response randomness (0.0-2.0)max_tokens: Maximum tokens in responsetop_p: Top-p sampling parameterfrequency_penalty: Frequency penalty for repetitionpresence_penalty: Presence penalty for repetitiontop_k: Top-k sampling parameter (used by Cohere models)
Error Handling
The integration provides proper error handling with specific exception types:
ModelRateLimitError: For rate limiting (429 errors)ModelProviderError: For other API errors (4xx, 5xx)
Comparison with Langchain Integration
| Feature | OCI Raw API | Langchain Integration |
|---|---|---|
| Dependencies | OCI SDK only | Langchain + OCI SDK |
| Performance | Direct API calls | Additional abstraction layer |
| Control | Full control over requests | Limited by Langchain interface |
| Updates | Direct OCI SDK updates | Dependent on Langchain updates |
| Complexity | Lower complexity | Higher complexity |
Example Response Format
The OCI GenAI API returns responses in this format:
{
"chat_response": {
"api_format": "GENERIC",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": [
{
"text": "Response text here",
"type": "TEXT"
}
],
"role": "ASSISTANT"
}
}
],
"usage": {
"completion_tokens": 18,
"prompt_tokens": 38,
"total_tokens": 56
}
}
}
Troubleshooting
Common Issues
- Authentication Errors: Ensure your OCI configuration is correct and you have the necessary permissions
- Model Not Found: Verify your model OCID and ensure it's available in your compartment
- Rate Limiting: The integration handles rate limits automatically with proper error types
- API Format Mismatch: If you get "Chat request's apiFormat must match serving model's apiFormat" error, ensure you're using the correct
providerparameter:- Use
provider="meta"for Meta Llama models - Use
provider="cohere"for Cohere models - Use
provider="xai"for xAI models
- Use
Debug Mode
Enable verbose logging by setting the verbose parameter to True (not implemented in this version but can be added).
Contributing
When contributing to this module:
- Follow the existing code style
- Add proper type hints
- Include comprehensive error handling
- Add tests for new features
- Update documentation