MDS Kafka Streaming: Design and Concepts - Mobility-Data-Space/mobility-data-space GitHub Wiki
Introduction
The Mobility Data Space (MDS) enables secure, federated sharing of mobility and transportation data between organizations. This document explains the design and core concepts for implementing real-time data streaming using Apache Kafka within the MDS ecosystem, leveraging the Eclipse Dataspace Connector (EDC) framework and MDS Kafka extension.
Key Concepts
Data Space
A federated ecosystem where organizations share data under agreed-upon rules and policies. Data spaces enable secure, interoperable data sharing while maintaining data sovereignty and control.
EDC Connector
A framework that facilitates secure data sharing between data space participants. The Eclipse Dataspace Connector (EDC) provides:
- Data Plane: Handles actual data transfer operations
- Control Plane: Manages metadata, contracts, and policies
Kafka Streaming
Real-time data streaming using Apache Kafka topics, enabling high-throughput, fault-tolerant data distribution across the data space.
Contract Negotiation
The process of establishing data usage agreements between providers and consumers. This includes:
- Policy evaluation
- Terms agreement
- Access credential generation
- Usage monitoring
EDR (Endpoint Data Reference)
Secure references that contain access credentials for data endpoints. EDRs include:
- Authentication tokens
- Connection parameters
- Access permissions
- Usage constraints
Architecture Overview
[tbd]
Data Space Protocol
Contract Negotiation Flow
Contract negotiation follows a standardized state machine:
[tbd]
Key Message Types
- Contract Request: Consumer initiates negotiation for a specific dataset
- Contract Agreement: Provider responds with terms and conditions
- EDR (Endpoint Data Reference): Contains credentials and connection details for data access
MDS EDC with Kafka Extension
MDS Kafka Data Plane Extension Components
The MDS Kafka Extension provides:
1. data-plane-kafka
Manages Kafka topic access by:
- Creating dynamic Kafka credentials
- Managing Access Control Lists (ACLs)
- Handling SASL authentication
- Supporting OIDC-based client registration
2. data-plane-kafka-spi
Defines the DataAddress format for Kafka assets:
{
"type": "Kafka",
"topic": "mobility-events",
"kafka.bootstrap.servers": "kafka.example.com:9092",
"kafka.sasl.mechanism": "OAUTHBEARER",
"kafka.security.protocol": "SASL_SSL",
"oidc.discovery.url": "<https://auth.example.com/.well-known/openid_configuration>",
"oidc.client.registration.endpoint": "<https://auth.example.com/clients>"
}
Security Model
The extension implements multi-layered security:
- EDC-level: Contract-based access control2.
- Kafka-level: SASL authentication with dynamic credentials3.
- Network-level: TLS encryption for all communications4.
- Application-level: OIDC tokens for client authentication
Monitoring and Observability
Key metrics to monitor:
- Kafka Consumer Metrics: Lag, throughput, error rates
- Database Metrics: Connection pool, query performance
- Application Metrics: Event processing rates, error counts
- Infrastructure Metrics: CPU, memory, network usage
Scaling Considerations
- Consumer Pool: Configure thread pool sizes based on expected load
- Database: Use connection pooling and consider read replicas
- Kafka: Partition topics appropriately for parallel processing
- Load Balancing: Use multiple backend instances behind a load balancer
Performance Tuning
- Kafka Consumer: Adjust
max.poll.records
andfetch.min.bytes
- Database: Tune connection pool size and query timeouts
- JVM: Configure heap size and garbage collection settings
- Monitoring: Set up alerts for key metrics