Design decision - kristianrpo/mom-grpc-microservices GitHub Wiki
To ensure a scalable and resilient distributed system, we made key technology choices:
1. FastAPI for API Gateway
- High Performance: Built on ASGI (Asynchronous Server Gateway Interface), enabling efficient handling of concurrent requests.
- REST to gRPC Translation: Simplifies exposing gRPC microservices to REST clients with automatic OpenAPI documentation.
- Async Support: Seamlessly integrates with our async gRPC calls, improving throughput.
- Easy Prototyping: Rapid development with Python, reducing boilerplate code while maintaining robustness.
2. Redis as Storage
Our system uses Redis as the backbone for storing pending tasks and responses in our Message-Oriented Middleware (MOM) service. We chose Redis to power our MOM service’s task queue because it delivers:
- Speed: Redis keeps data in RAM, allowing microsecond read/write speeds—critical for minimizing latency in task queuing and retrieval.
- Optimized for Queue Operations:
- RPUSH/LPOP (for FIFO task processing)
- SETEX (auto-expiring responses)
- Atomic operations ensure no race conditions.
- No Complex Broker Setup: Unlike RabbitMQ/Kafka, Redis requires minimal configuration—just a running instance.
- TTL-Based Cleanup: Automatic key expiration (SETEX) prevents memory leaks from orphaned tasks.
- Horizontal Scaling: Redis Cluster distributes data across nodes, handling high throughput.
3. Docker Swarm Manager-Worker Architecture
We designed our cluster with a clear manager-worker separation:
3.1. Roles Defined
- Manager Node (API Gateway):
- Runs the Swarm control plane (orchestrates workers).
- Hosts the API Gateway service (routes client requests to workers).
- Handles service discovery and load balancing.
- Worker Nodes (Microservices):
- Execute stateless microservices (multiplication, subtraction, sum, MOM).
- Scale horizontally (e.g., microservice-sum: replicas: 2).
- Report status to the manager.
- Auto-register with Swarm’s internal load balancer for traffic distribution.
We chose this design because it delivers:
- Fault Isolation: Manager failure doesn’t crash workers (Swarm elects a new manager).
- Scalability: Workers focus on processing; the manager handles routing.
- Security: Only the manager needs Swarm control plane access.
4. Unified Development with Python
To streamline development and ensure consistency across the system, we chose Python as the programming language for all major components: the client, API Gateway, each microservice, and the MOM (Message-Oriented Middleware) service. This decision was based on several factors:
- Team Expertise: Python is well-known and widely used within our team, reducing the learning curve and increasing productivity.
- Simplicity and Readability: Python’s clean syntax and vast ecosystem allow for rapid development without sacrificing maintainability.
- Seamless Communication: Python supports both REST (via FastAPI) and gRPC (via grpcio), making it ideal for building services that expose and consume APIs efficiently.
- Rich Ecosystem: Python’s mature libraries and tools (e.g., asyncio, FastAPI, grpcio, redis-py) simplified implementation of asynchronous communication, service orchestration, and data handling.
This unified tech stack not only accelerated development but also facilitated debugging, testing, and onboarding new contributors.