Domain - julianvb03/MOM-Implementation GitHub Wiki
Problem Domain Document
Content Table
1. Objective
The primary goal of this project is to develop a Message-Oriented Middleware (MOM) that supports two types of communication:
- N:1 Communication (Queue-Based Messaging) – Multiple senders push messages to a single queue, and only one receiver processes them.
- N:M Communication (Topic-Based Messaging - Publish/Subscribe) – Multiple senders publish messages to a topic, which are received simultaneously by multiple subscribers.
Additionally, the implementation must support a distributed deployment across a cluster, initially consisting of three nodes, each running its own instance of the MOM service. Each instance will manage its own queues and topics while also maintaining a backup of those from other nodes to ensure data replication and fault tolerance.
To enhance scalability and availability, message partitioning will be implemented, distributing different queues and topics across multiple MOM instances. This will optimize load balancing and improve the overall resilience of the system.
2. Introduction
MOM Responsibilities:
-
The MOM should expose a REST-like API as an interface that clients can use to interact with the middleware, including connection handling, message sending, message receiving, and disconnection, all with proper authentication mechanisms.
- For message receiving, the MOM must support at least one of the follow reception modes:
- Pull-based, where the client explicitly requests available messages.
- Push/event-based, where the MOM delivers messages to subscribed clients as soon as they are available.
- For message receiving, the MOM must support at least one of the follow reception modes:
-
Communication between multiple MOM instances should be handled via gRPC, ensuring secure interactions with an appropriate authentication system.
-
The MOM should allow clients to create new topics and queues, ensuring that each created queue or topic is registered under the user who created it. The creator should have the privilege to delete their own topics and queues.
-
The MOM should distribute created topics and queues across multiple instances to enable system partitioning, improving scalability and load balancing.
-
The MOM should maintain a mapping system that links:
- Users to the queues and topics they create.
- Queues and topics to the MOM instances where they are stored.
- Replication details for fault tolerance.
-
The MOM should be able to detect when another instance loses connection and handle failures accordingly.
-
The MOM should implement a replication strategy where each local queue is backed up on another node. This ensures that if the primary instance loses connection, the backup instance can take over seamlessly.
-
When performing a search for a queue or topic, the MOM should first attempt to retrieve the data from the original instance. If the primary instance is unavailable, it should automatically fallback to a replicated instance to retrieve the data.
-
The MOM must implement a failover mechanism that, in the event of a primary instance failure, automatically redirects client requests to a backup instance with the necessary information to ensure service continuity without message loss or interruptions.
-
The MOM must define its failback policy, specifying whether it will be:
-
Stateful: The previous client state (sessions, pending messages, etc.) is restored when the primary instance is brought back online.
-
Stateless: The client must reconnect and manually reestablish its context, as the state is not transferred.
-
3. Implementation Decisions
-
Message Reception Modes: support for pull, push, or dual mode.
-
Queue/Topic Partitioning Policy: for example, hash-based partitioning.
-
Metadata Coordination: use of coordination systems (a commercial example is Rkaf) or implementation of a ZooKeeper.
-
Configuration Management: static or dynamic approach.
-
Authentication and Authorization: for example OAuth 2.0 with JWT + mTLS for internal communication.
-
Replication Strategy: for example Full Synchronous, Full Asynchronous, Multi-Leader etc.
-
Fault Tolerance: for example healthcheck-based detection + automatic redirection.
-
Failback Policy: stateful or stateless.
-
Message Storage: using memory, disk, or a hybrid approach.
-
Load Balancing: for example client-side load balancing with DNS discovery.