MDS Kafka Streaming: Design and Concepts - Mobility-Data-Space/mobility-data-space GitHub Wiki

Introduction

The Mobility Data Space (MDS) enables secure, federated sharing of mobility and transportation data between organizations. This document explains the design and core concepts for implementing real-time data streaming using Apache Kafka within the MDS ecosystem, leveraging the Eclipse Dataspace Connector (EDC) framework and MDS Kafka extension.

Key Concepts

Data Space

A federated ecosystem where organizations share data under agreed-upon rules and policies. Data spaces enable secure, interoperable data sharing while maintaining data sovereignty and control.

EDC Connector

A framework that facilitates secure data sharing between data space participants. The Eclipse Dataspace Connector (EDC) provides:

  • Data Plane: Handles actual data transfer operations
  • Control Plane: Manages metadata, contracts, and policies

Kafka Streaming

Real-time data streaming using Apache Kafka topics, enabling high-throughput, fault-tolerant data distribution across the data space.

Contract Negotiation

The process of establishing data usage agreements between providers and consumers. This includes:

  • Policy evaluation
  • Terms agreement
  • Access credential generation
  • Usage monitoring

EDR (Endpoint Data Reference)

Secure references that contain access credentials for data endpoints. EDRs include:

  • Authentication tokens
  • Connection parameters
  • Access permissions
  • Usage constraints

Architecture Overview

[tbd]

Data Space Protocol

Contract Negotiation Flow

Contract negotiation follows a standardized state machine:

[tbd]

Key Message Types

  • Contract Request: Consumer initiates negotiation for a specific dataset
  • Contract Agreement: Provider responds with terms and conditions
  • EDR (Endpoint Data Reference): Contains credentials and connection details for data access

MDS EDC with Kafka Extension

MDS Kafka Data Plane Extension Components

The MDS Kafka Extension provides:

1. data-plane-kafka

Manages Kafka topic access by:

  • Creating dynamic Kafka credentials
  • Managing Access Control Lists (ACLs)
  • Handling SASL authentication
  • Supporting OIDC-based client registration

2. data-plane-kafka-spi

Defines the DataAddress format for Kafka assets:

{
  "type": "Kafka", 
  "topic": "mobility-events", 
  "kafka.bootstrap.servers": "kafka.example.com:9092", 
  "kafka.sasl.mechanism": "OAUTHBEARER",
  "kafka.security.protocol": "SASL_SSL", 
  "oidc.discovery.url": "<https://auth.example.com/.well-known/openid_configuration>",
  "oidc.client.registration.endpoint": "<https://auth.example.com/clients>"
}

Security Model

The extension implements multi-layered security:

  1. EDC-level: Contract-based access control2.
  2. Kafka-level: SASL authentication with dynamic credentials3.
  3. Network-level: TLS encryption for all communications4.
  4. Application-level: OIDC tokens for client authentication

Monitoring and Observability

Key metrics to monitor:

  • Kafka Consumer Metrics: Lag, throughput, error rates
  • Database Metrics: Connection pool, query performance
  • Application Metrics: Event processing rates, error counts
  • Infrastructure Metrics: CPU, memory, network usage

Scaling Considerations

  • Consumer Pool: Configure thread pool sizes based on expected load
  • Database: Use connection pooling and consider read replicas
  • Kafka: Partition topics appropriately for parallel processing
  • Load Balancing: Use multiple backend instances behind a load balancer

Performance Tuning

  • Kafka Consumer: Adjust max.poll.records and fetch.min.bytes
  • Database: Tune connection pool size and query timeouts
  • JVM: Configure heap size and garbage collection settings
  • Monitoring: Set up alerts for key metrics