OCI GenAI Integration - djvolz/coda-code-assistant GitHub Wiki

OCI GenAI Integration

This document explains how the Oracle Cloud Infrastructure (OCI) Generative AI integration works in Coda, including implementation details, streaming response handling, and testing.

Table of Contents

Overview

The OCI GenAI integration provides native support for Oracle's Generative AI service, offering access to over 30 models from providers like Cohere, Meta, and xAI. This integration was the first provider implemented in Coda and serves as a reference implementation for future providers.

Key Features

  • Native OCI SDK Integration: Direct use of Oracle's Python SDK
  • Dynamic Model Discovery: Automatically discovers available models
  • Multi-Format Streaming: Handles different response formats per provider
  • Comprehensive Testing: Unit, integration, and functional test coverage
  • Zero Configuration: Works with existing OCI CLI configuration

Architecture

Provider Interface

The OCI GenAI provider implements the abstract Provider interface:

class Provider(ABC):
    @abstractmethod
    async def chat(self, messages: List[Message], model: str, **kwargs) -> ChatCompletion:
        """Synchronous chat completion"""
        
    @abstractmethod
    async def stream_chat(self, messages: List[Message], model: str, **kwargs) -> AsyncIterator[ChatCompletionChunk]:
        """Streaming chat completion"""
        
    @abstractmethod
    def list_models(self) -> List[Model]:
        """List available models"""

Class Structure

OCIGenAIProvider
├── __init__()              # Initialize OCI client and config
├── list_models()           # Discover available models
├── chat()                  # Synchronous chat completion
├── stream_chat()           # Streaming chat completion
├── _parse_streaming_chunk() # Parse SSE chunks
└── _validate_model()       # Validate model ID

Implementation Details

Initialization

The provider initializes with OCI configuration from ~/.oci/config:

def __init__(self, compartment_id: Optional[str] = None):
    self.config = oci.config.from_file()
    self.compartment_id = compartment_id or os.getenv("OCI_COMPARTMENT_ID")
    self.client = GenerativeAiInferenceClient(
        config=self.config,
        service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"
    )

Model Discovery

Models are discovered dynamically from the OCI API:

def list_models(self) -> List[Model]:
    """Discover all available models in the compartment"""
    models = []
    for endpoint in self.inference_endpoints:
        response = generative_ai_client.list_models(
            compartment_id=self.compartment_id,
            state="ACTIVE"
        )
        for model in response.data:
            models.append(Model(
                id=self._convert_to_model_id(model.display_name),
                name=model.display_name,
                provider="oci-genai",
                capabilities=model.capabilities
            ))
    return models

Streaming Response Handling

The most complex part of the integration is handling different streaming response formats from various model providers.

Response Format Discovery

Through testing, we discovered three distinct response formats:

1. xAI/Meta Format

{
  "message": {
    "role": "ASSISTANT",
    "content": [
      {
        "type": "TEXT",
        "text": "Hello, world!"
      }
    ]
  }
}

2. Cohere Format

{
  "apiFormat": "COHERE",
  "text": "Hello from Cohere!",
  "finishReason": "stop"
}

3. Legacy Chat Format

{
  "chatResponse": {
    "choices": [
      {
        "delta": {
          "content": "Streaming text..."
        }
      }
    ]
  }
}

Streaming Parser Implementation

The _parse_streaming_chunk method handles all formats:

def _parse_streaming_chunk(self, chunk: str, model: str) -> Optional[ChatCompletionChunk]:
    """Parse SSE chunk based on provider format"""
    
    # Skip empty lines and SSE headers
    if not chunk or chunk.startswith(':'):
        return None
        
    # Extract JSON from SSE data
    if chunk.startswith('data: '):
        chunk = chunk[6:]
        
    try:
        data = json.loads(chunk)
        
        # Handle Cohere format
        if "cohere" in model.lower():
            if "text" in data and "finishReason" not in data:
                return ChatCompletionChunk(content=data.get("text", ""))
            elif "finishReason" in data:
                return ChatCompletionChunk(content="")  # Avoid duplication
                
        # Handle xAI/Meta format
        else:
            message = data.get("message", {})
            if message:
                content_list = message.get("content", [])
                if content_list and isinstance(content_list, list):
                    content = content_list[0].get("text", "")
                    return ChatCompletionChunk(content=content)
                    
        # Handle legacy format (kept for compatibility)
        if "chatResponse" in data:
            choices = data["chatResponse"].get("choices", [])
            if choices:
                delta = choices[0].get("delta", {})
                return ChatCompletionChunk(content=delta.get("content", ""))
                
    except json.JSONDecodeError:
        logger.warning(f"Failed to parse chunk: {chunk}")
        
    return None

Streaming Flow

  1. Request Creation: Build OCI chat request with messages
  2. Stream Initiation: Call chat_stream with SSE details
  3. Event Processing: Parse Server-Sent Events line by line
  4. Format Detection: Identify provider format from response
  5. Content Extraction: Extract text based on format
  6. Chunk Yielding: Yield ChatCompletionChunk objects

Test Suite

The test suite follows a layered approach for comprehensive coverage while maintaining fast CI/CD cycles.

Test Categories

1. Unit Tests (Fast, No Dependencies)

Located in tests/unit/test_oci_parsing.py:

  • Test response parsing logic
  • Model name conversion
  • JSON handling edge cases
  • No OCI SDK dependencies

Example:

@pytest.mark.unit
def test_parse_xai_message_format():
    """Test parsing xAI/Meta message format"""
    response = {
        "message": {
            "content": [{"type": "TEXT", "text": "Hello"}]
        }
    }
    assert extract_content(response) == "Hello"

2. Integration Tests (Real API Calls)

Located in tests/integration/test_oci_genai_integration.py:

  • Test actual OCI API connectivity
  • Model discovery validation
  • Real chat completions
  • Require OCI credentials

Example:

@pytest.mark.integration
@pytest.mark.skipif(not os.getenv("OCI_COMPARTMENT_ID"), 
                    reason="No credentials")
def test_real_model_discovery(provider):
    """Test discovering models from OCI API"""
    models = provider.list_models()
    assert len(models) > 0
    assert any("cohere" in m.id for m in models)

3. Functional Tests (End-to-End)

Located in tests/functional/test_oci_genai_functional.py:

  • Test CLI interactive mode
  • Concurrent requests
  • Error handling
  • Full user workflows

Running Tests Locally

# Unit tests only (fast, no credentials needed)
make test

# All tests including integration
make test-all

# Specific test category
make test-unit
make test-integration

# With coverage
make test-cov

Configuration

Environment Variables

# Required for OCI GenAI
export OCI_COMPARTMENT_ID="ocid1.compartment.oc1..xxxx"

# Optional overrides
export OCI_CONFIG_FILE="~/.oci/config"
export OCI_CONFIG_PROFILE="DEFAULT"

Config File

# ~/.config/coda/config.toml
[providers.oci_genai]
compartment_id = "ocid1.compartment.oc1..xxxx"
region = "us-chicago-1"

Troubleshooting

Common Issues

1. No Models Found

Error: No models available for provider oci-genai

Solution: Ensure OCI_COMPARTMENT_ID is set and you have access to GenAI models.

2. Streaming Not Working

Error: EOF when reading a line

Solution: This was the original issue - update to latest version with streaming fixes.

3. Authentication Errors

Error: Invalid private key

Solution: Check ~/.oci/config and ensure key file exists and has correct permissions.

Debug Mode

Enable debug logging to see detailed OCI requests:

export CODA_LOG_LEVEL=DEBUG
uv run coda --debug

Testing Response Formats

Use the debug script to test specific models:

# tests/debug_streaming.py
async def test_model_format(model_id):
    provider = OCIGenAIProvider()
    async for chunk in provider.stream_chat(
        messages=[{"role": "user", "content": "Hi"}],
        model=model_id
    ):
        print(f"Chunk: {chunk.content}")

Future Enhancements

  1. Response Caching: Cache model discovery results
  2. Retry Logic: Add exponential backoff for transient failures
  3. Token Counting: Implement accurate token estimation
  4. Fine-tuning Support: Add support for custom models
  5. Multi-Region: Support multiple OCI regions
  6. Batch Inference: Support batch chat completions

Contributing

When adding new features to the OCI GenAI provider:

  1. Add Unit Tests First: Test parsing logic without dependencies
  2. Mock OCI Calls: Use mocks for complex OCI interactions
  3. Document Response Formats: Add examples of new formats
  4. Update Integration Tests: Add tests for new capabilities
  5. Follow Streaming Pattern: Maintain consistency with existing code

References


See also: Configuration, Development Guide, Troubleshooting