System performance analysis - kristianrpo/mom-grpc-microservices GitHub Wiki
1. Response Time
Processing time measurements were taken for tasks executed both in a local environment and on AWS, using the log files stored by client_id. Measurements were performed while services were available and also by forcibly terminating their containers to test failure scenarios. These actions yielded the following results:
For tasks that were processed almost immediately, that is, because the service was available, the graph shows highly efficient system behavior. The median is very close to 0.02 seconds, indicating that more than 50% of the requests were processed almost instantly, reflecting a fast and responsive system under normal conditions. Additionally, the interquartile box is very close to the X-axis, indicating low dispersion between the 25th and 75th percentiles, which confirms stability in general performance. However, some outliers are identified above the one-second mark, suggesting that certain requests experienced unusual delays. These anomalies could be attributed to brief system load spikes, network latency, or specific components reacting slower than expected.
For tasks sent when the service was unavailable—triggering the failover mechanism—the graph reveals a clear pattern: all response times cluster around 5 seconds, with a slightly lower median of approximately 4.98 seconds. The interquartile box is very compact, which indicates high consistency in response times during failover situations. This suggests that although the system responds more slowly in these cases, it does so predictably. This behavior can be explained by AWS’s automatic service recovery logic: when a failure is detected, the system attempts to redirect the request or launch a new container/instance, which adds a consistent but controlled delay. It’s also important to note that the client sends requests every 5 seconds, which directly impacts these results. It's likely that many requests align with the moment when the service has just recovered, leading to latencies that closely match the polling interval. This reinforces the perceived uniformity in failover response times. The absence of outliers or sudden variations further supports the idea that the failover mechanism works correctly, albeit with a noticeable impact on latency.
1.1. Requests per Minute
Considering the average response time of immediate responses is approximately 0.2912 seconds, a single client could ideally process up to 206 requests per minute. This efficiency is further enhanced by the use of threads to track tasks in the background, allowing concurrent execution without blocking the terminal. As a result, multiple tasks can be managed simultaneously, improving the system's responsiveness under high-concurrency scenarios.
2. Scalability
Given that the entire system is deployed on AWS, using Docker containers orchestrated through Docker Swarm, with multiple replicas per microservice and an internal load balancer that distributes incoming requests, we can confidently say that the project presents a highly scalable architecture. Moreover, the ability to self-heal —replicas automatically restarting in case of failure— ensures high availability and resilience, which are essential features for distributed applications expected to handle variable workloads efficiently.