System Performance - aryanjoshi0823/5143-Operating-System GitHub Wiki
a) What Is Benchmarking?
Benchmarking is the process of systematically evaluating a system's performance using predefined tests and workloads. The goal is to quantify performance under various conditions.
Methods:
- Standardized Tests: Use predefined scenarios.
- Real-World Workloads: Simulate actual usage.
- Comparative Analysis: Compare different systems or configurations.
Types of Benchmarks:
-
Microbenchmarks:
- Evaluate specific components or functions.
- Focus on granular metrics (e.g., database query latency).
- Useful for isolating inefficiencies.
-
Macrobenchmarks:
- Assess system performance as a whole under realistic workloads.
- Simulate user scenarios (e.g., load testing web applications).
- Provide holistic insights but are more complex.
Key Metrics:
- Throughput: Number of processed requests per second (RPS).
- Latency: Time taken to process a request.
- Resource Utilization: Efficiency of CPU, memory, and network use.
- Scalability: Performance under increasing load.
- Fault Tolerance: System's ability to operate during component failures.
- Consistency: Accuracy and integrity of distributed databases.
Benchmarking Methodologies:
- Load Testing: Simulates high traffic to assess system limits.
- Stress Testing: Pushes the system beyond its capacity.
- Endurance Testing: Long-term performance evaluation.
- Spike Testing: Tests system response to sudden traffic surges.
- Configuration Testing: Compares performance under different settings.
- Failover Testing: Assesses recovery from component failures.
Tools and Frameworks:
- Apache JMeter: Open-source tool for web application performance testing.
- Gatling: Scala-based DSL for creating test scenarios.
- Locust: Python-based customizable load testing.
- Sysbench: Cross-platform database benchmarking tool.
- YCSB: Framework for cloud database performance testing.
b) Latency:
Latency is the measure of the time taken for a request to travel from a source to its destination and back (round-trip time), or the time elapsed between a request being initiated and a response being received. It is typically expressed in milliseconds (ms) and is critical for maintaining the responsiveness of a distributed system, especially in applications requiring real-time interactions.
Types of Latency:
-
Network Latency:
- The delay caused by data traveling over the network.
- Affected by factors like distance between nodes, network congestion, routing hops, and packet loss.
-
Processing Latency:
- The time taken by the server or node to process a request.
- Depends on CPU speed, memory availability, and algorithmic efficiency.
-
Disk I/O Latency:
- The delay in reading from or writing to storage devices.
- Influenced by the type of storage (SSD vs. HDD) and the I/O operations per second (IOPS).
-
Queueing Latency:
- Delay caused by a backlog of requests waiting to be processed.
- Occurs during high load scenarios where requests outpace processing capabilities.
-
**Application Latency:**Introduced by inefficiencies in the application's architecture or code, such as blocking calls or poor database queries.
Components of Latency:
-
Propagation Delay:
- The time for data to travel the physical distance between two nodes.
- Proportional to the speed of light in the medium (fiber optics, copper, etc.).
-
Transmission Delay:
- The time taken to push all bits of a packet onto the network.
- Depends on packet size and network bandwidth.
-
**Processing Delay:**Time spent at intermediate nodes (routers, servers) analyzing the data and making forwarding decisions.
-
Queuing Delay: Time spent waiting in queues at intermediate nodes when traffic exceeds network capacity.
c) Throughput:
Throughput measures the capacity of a system to handle a certain amount of work in a given period. In distributed systems, it is typically expressed as the number of requests, tasks, or data units processed per second (RPS, TPS, or Mbps). Throughput provides insights into how well a system handles load and is a critical metric for evaluating its performance, scalability, and resource efficiency.
Types of Throughput:
-
Request Throughput:
- Measures the number of API calls, database queries, or other user requests completed per second.
- Relevant for web applications and microservices.
-
Data Throughput:
- Measures the amount of data processed in a given time, typically in megabytes per second (MBps) or gigabytes per second (GBps).
- Common in file processing, streaming systems, or data pipelines.
-
Task Throughput: Refers to the number of jobs, transactions, or tasks completed per second in batch systems or computational workflows.
Factors Affecting Throughput:
-
Hardware Constraints:
- CPU Speed: Limits the number of computations per second.
- Memory: Insufficient memory can lead to swapping and reduce throughput.
- Disk I/O: Slow storage can become a bottleneck.
-
Network Bandwidth: The maximum data transfer rate of the network limits throughput.
-
Concurrency: A system’s ability to handle parallel tasks impacts its throughput. Insufficient thread pools or connection limits reduce concurrency.
-
Load Balancing: Uneven distribution of tasks among servers can lower overall throughput.
-
Application Efficiency: Inefficient algorithms, suboptimal database queries, or high contention for shared resources can degrade throughput.
-
Latency: High latency reduces throughput because requests spend more time in the system.
Throughput vs. Latency:
- Throughput measures the number of tasks completed in a given time.
- Latency measures the time taken to complete a single task.
- A high-throughput system may still have high latency if tasks spend time waiting in queues, while a low-latency system may process fewer tasks overall.
Example of Throughput:
- A web application processes 10,000 requests per minute during peak hours. The throughput is 10,000 requests / 60 seconds = 166.67 RPS.
- A data pipeline ingests and processes 50 GB of logs per hour. The throughput is 50 GB / 3600 seconds = ~13.89 MBps.