Page Cache Vs Disk vs Zero Copy Transfer - srivalligade04/ConfluentExamPreparationNotes GitHub Wiki
Page Cache vs. Disk
- Page Cache (in RAM)
- When Kafka writes data to disk, it first goes through the OS page cache.
- The page cache holds recently written or read data in memory, allowing fast access without hitting the physical disk.
- Kafka relies on the OS to flush this data to disk asynchronously.
- Disk
1.Eventually, the OS flushes the page cache to persistent storage (e.g., SSD or HDD). 2.Kafka ensures durability by using fsync (based on log.flush.interval.messages or log.flush.interval.ms) to force data to disk.
2. Zero-Copy Transfer in Kafka
Kafka uses zero-copy to efficiently send data from disk to the network socket without copying it into user space.
How It Works:
- Kafka stores messages in log segments on disk.
- When a consumer fetches data, Kafka uses the sendfile() system call.
- sendfile() transfers data directly from the page cache or disk to the network socket, bypassing user space.
Benefits:
- Reduces CPU usage (no user-kernel-user transitions).
- Minimizes memory copies.
- Improves throughput for high-volume data transfer.