OCI GenAI Integration - djvolz/coda-code-assistant GitHub Wiki

OCI GenAI Integration

This document explains how the Oracle Cloud Infrastructure (OCI) Generative AI integration works in Coda, including implementation details, streaming response handling, and testing.

Overview
Architecture
Authentication
Model Support
Implementation Details
Configuration
Testing
Examples
Troubleshooting
Best Practices

Overview

The OCI GenAI integration provides native support for Oracle's Generative AI service, offering access to over 30 models from providers like Cohere, Meta, and xAI. This integration was the first provider implemented in Coda and serves as a reference implementation for future providers.

Key Features

Native OCI SDK Integration: Direct use of Oracle's Python SDK
Dynamic Model Discovery: Automatically discovers available models
Multi-Format Streaming: Handles different response formats per provider
Comprehensive Testing: Unit, integration, and functional test coverage
Zero Configuration: Works with existing OCI CLI configuration

Architecture

Provider Interface

The OCI GenAI provider implements the abstract Provider interface:

class Provider(ABC):
    @abstractmethod
    async def chat(self, messages: List[Message], model: str, **kwargs) -> ChatCompletion:
        """Synchronous chat completion"""
        
    @abstractmethod
    async def stream_chat(self, messages: List[Message], model: str, **kwargs) -> AsyncIterator[ChatCompletionChunk]:
        """Streaming chat completion"""
        
    @abstractmethod
    def list_models(self) -> List[Model]:
        """List available models"""

Class Structure

OCIGenAIProvider
├── __init__()              # Initialize OCI client and config
├── list_models()           # Discover available models
├── chat()                  # Synchronous chat completion
├── stream_chat()           # Streaming chat completion
├── _parse_streaming_chunk() # Parse SSE chunks
└── _validate_model()       # Validate model ID

Implementation Details

Initialization

The provider initializes with OCI configuration from ~/.oci/config:

def __init__(self, compartment_id: Optional[str] = None):
    self.config = oci.config.from_file()
    self.compartment_id = compartment_id or os.getenv("OCI_COMPARTMENT_ID")
    self.client = GenerativeAiInferenceClient(
        config=self.config,
        service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"
    )

Model Discovery

Models are discovered dynamically from the OCI API:

def list_models(self) -> List[Model]:
    """Discover all available models in the compartment"""
    models = []
    for endpoint in self.inference_endpoints:
        response = generative_ai_client.list_models(
            compartment_id=self.compartment_id,
            state="ACTIVE"
        )
        for model in response.data:
            models.append(Model(
                id=self._convert_to_model_id(model.display_name),
                name=model.display_name,
                provider="oci-genai",
                capabilities=model.capabilities
            ))
    return models

Streaming Response Handling

The most complex part of the integration is handling different streaming response formats from various model providers.

Response Format Discovery

Through testing, we discovered three distinct response formats:

1. xAI/Meta Format

{
  "message": {
    "role": "ASSISTANT",
    "content": [
      {
        "type": "TEXT",
        "text": "Hello, world!"
      }
    ]
  }
}

2. Cohere Format

{
  "apiFormat": "COHERE",
  "text": "Hello from Cohere!",
  "finishReason": "stop"
}

3. Legacy Chat Format

{
  "chatResponse": {
    "choices": [
      {
        "delta": {
          "content": "Streaming text..."
        }
      }
    ]
  }
}

Streaming Parser Implementation

The _parse_streaming_chunk method handles all formats:

def _parse_streaming_chunk(self, chunk: str, model: str) -> Optional[ChatCompletionChunk]:
    """Parse SSE chunk based on provider format"""
    
    # Skip empty lines and SSE headers
    if not chunk or chunk.startswith(':'):
        return None
        
    # Extract JSON from SSE data
    if chunk.startswith('data: '):
        chunk = chunk[6:]
        
    try:
        data = json.loads(chunk)
        
        # Handle Cohere format
        if "cohere" in model.lower():
            if "text" in data and "finishReason" not in data:
                return ChatCompletionChunk(content=data.get("text", ""))
            elif "finishReason" in data:
                return ChatCompletionChunk(content="")  # Avoid duplication
                
        # Handle xAI/Meta format
        else:
            message = data.get("message", {})
            if message:
                content_list = message.get("content", [])
                if content_list and isinstance(content_list, list):
                    content = content_list[0].get("text", "")
                    return ChatCompletionChunk(content=content)
                    
        # Handle legacy format (kept for compatibility)
        if "chatResponse" in data:
            choices = data["chatResponse"].get("choices", [])
            if choices:
                delta = choices[0].get("delta", {})
                return ChatCompletionChunk(content=delta.get("content", ""))
                
    except json.JSONDecodeError:
        logger.warning(f"Failed to parse chunk: {chunk}")
        
    return None

Streaming Flow

Request Creation: Build OCI chat request with messages
Stream Initiation: Call chat_stream with SSE details
Event Processing: Parse Server-Sent Events line by line
Format Detection: Identify provider format from response
Content Extraction: Extract text based on format
Chunk Yielding: Yield ChatCompletionChunk objects

Test Suite

The test suite follows a layered approach for comprehensive coverage while maintaining fast CI/CD cycles.

Test Categories

1. Unit Tests (Fast, No Dependencies)

Located in tests/unit/test_oci_parsing.py:

Test response parsing logic
Model name conversion
JSON handling edge cases
No OCI SDK dependencies

Example:

@pytest.mark.unit
def test_parse_xai_message_format():
    """Test parsing xAI/Meta message format"""
    response = {
        "message": {
            "content": [{"type": "TEXT", "text": "Hello"}]
        }
    }
    assert extract_content(response) == "Hello"

2. Integration Tests (Real API Calls)

Located in tests/integration/test_oci_genai_integration.py:

Test actual OCI API connectivity
Model discovery validation
Real chat completions
Require OCI credentials

Example:

@pytest.mark.integration
@pytest.mark.skipif(not os.getenv("OCI_COMPARTMENT_ID"), 
                    reason="No credentials")
def test_real_model_discovery(provider):
    """Test discovering models from OCI API"""
    models = provider.list_models()
    assert len(models) > 0
    assert any("cohere" in m.id for m in models)

3. Functional Tests (End-to-End)

Located in tests/functional/test_oci_genai_functional.py:

Test CLI interactive mode
Concurrent requests
Error handling
Full user workflows

Running Tests Locally

# Unit tests only (fast, no credentials needed)
make test

# All tests including integration
make test-all

# Specific test category
make test-unit
make test-integration

# With coverage
make test-cov

Configuration

Environment Variables

# Required for OCI GenAI
export OCI_COMPARTMENT_ID="ocid1.compartment.oc1..xxxx"

# Optional overrides
export OCI_CONFIG_FILE="~/.oci/config"
export OCI_CONFIG_PROFILE="DEFAULT"

Config File

# ~/.config/coda/config.toml
[providers.oci_genai]
compartment_id = "ocid1.compartment.oc1..xxxx"
region = "us-chicago-1"

Troubleshooting

Common Issues

1. No Models Found

Error: No models available for provider oci-genai

Solution: Ensure OCI_COMPARTMENT_ID is set and you have access to GenAI models.

2. Streaming Not Working

Error: EOF when reading a line

Solution: This was the original issue - update to latest version with streaming fixes.

3. Authentication Errors

Error: Invalid private key

Solution: Check ~/.oci/config and ensure key file exists and has correct permissions.

Debug Mode

Enable debug logging to see detailed OCI requests:

export CODA_LOG_LEVEL=DEBUG
uv run coda --debug

Testing Response Formats

Use the debug script to test specific models:

# tests/debug_streaming.py
async def test_model_format(model_id):
    provider = OCIGenAIProvider()
    async for chunk in provider.stream_chat(
        messages=[{"role": "user", "content": "Hi"}],
        model=model_id
    ):
        print(f"Chunk: {chunk.content}")

Future Enhancements

Response Caching: Cache model discovery results
Retry Logic: Add exponential backoff for transient failures
Token Counting: Implement accurate token estimation
Fine-tuning Support: Add support for custom models
Multi-Region: Support multiple OCI regions
Batch Inference: Support batch chat completions

Contributing

When adding new features to the OCI GenAI provider:

Add Unit Tests First: Test parsing logic without dependencies
Mock OCI Calls: Use mocks for complex OCI interactions
Document Response Formats: Add examples of new formats
Update Integration Tests: Add tests for new capabilities
Follow Streaming Pattern: Maintain consistency with existing code

References

OCI GenAI Documentation
OCI Python SDK
Server-Sent Events Spec
Roadmap - Project roadmap and architecture