Linux Understanding Memory: - samit/angular-site GitHub Wiki

How Memory is used on Linux Kernel:

Pages Paging Unit and Page Frame: Memory in linux are organized in the form of pages (4kb in size). Contiguous linear addresses within a pages are mapped into contiguous physical address on the RAM chip. These physical address mapping in kernel level is done at page level rather than for every linear address. Page simply refers to both linear address and data content in this addresses. This make a paging unit think of all physical RAM as partitioned into fixed-length page frames. Each page frame contains page and paging unit is responsible for mapping these linear addresses to the physical one. One of the key feature of these paging unit is to check access request against the request access of linear address space. If invalid memory address identified than Page Fault Exception is generated. The data structure that maps linear address to physical ram are called page table which must be stored on main memory and must be initialized by kernel before initializing paging unit. Generally page frame are 4MB in size but may not be acceptable for application where the expected data units are large. For linux kernel page frame containing kernel code and initialized data structure and those falling on unavailable physical address space are reserved. Remaining portion are called dynamic memory.

Allocation of Memory to Process: Linux kernel function fairly allocates the memory to itself but while allocating the memory to the user mode process the situation is different. When user mode process ask for dynamic memory no additional page frame is assigned but instead process is authorized to use new range of linear address which become part of its address space. This interval is known as memory region and consist of range linear address representing one or more page frame. Processes may get new memory region under following conditions.

If the running process decide to load entirely different program using exec() If a running process perform memory mapping on file A process may expand its dynamic area heap through malloc() If a running process keeps adding data on its user mode stack and if all the address space mat to this stack is occupied than kernel may decide to expand the size of memory region. If a running process create IPC shared memory region so share data with its cooperating process.

Demand Paging: Demand Paging includes the concepts of Dynamic memory allocation technique that consists of deferring page frame allocation till possible last moment until the process attempt to address the page that is not in RAM and therefore causing Page Fault Exception. Since processes do not address all the addresses included in their address space right from the start and in some cases these address space are never being used by the associated process. Further the program locality principle ensure that only the subset of process pages are referenced during the execution of program execution. This ensure that unused address space can be utilized by other process. Demand Paging here can be helpful as it increases the number of free page frames and handles the free memory in best possible way.

Overcommeting Memory: Linux allows overcommitting of memory meaning if your system have 4GB RAM and if 5 process ask for 1GB ram than in that case linux kernel overcommits memory to 5GB without any error. In most of the case this depends on overcommit_memory and overcommit_ratio settings of the vm. Since default settings allows overcommitting of memory in some cases it will be helpful to identify if any process utilize all the granted memory and there are no additional space left.

PageFault and Swapping:

Page faults occurs when a process ask for a space allocated to it but is not assigned to it. Swapping occurs when a kernel needs to allocate a page from memory and find that there is no more memory available. In this case linux kernel swap out the least used page of the existing process to the disk and allocate those frame to the requesting process.

Types of Page Fault: MinorFault: Minor fault generally occurs if the page is loaded in memory at the time fault is generated but is not marked by memory management unit as being loaded in memory. One possible scenario for minor fault to when different program shared memory and the page is already brought into memory for other programs.

Major Fault: Major fault generally occurs if the page is not loaded in the memory and at that time fault is generated. Major fault are generally expensive and may add disk latency to interrupted program execution. The page fault handler generally handles the major fault by finding a free page in memory or choosing the page in memory to be used, make an entry for that page to memory management unit to page in memory and indicate that the page is loaded in memory.

Invalid Page Fault: This type of page fault generally occurs if a request to an address space that is not a part of virtual address space.The page fault handler generally terminates those code making request to invalid address space.

Linux Page Cache: Linux kernel try to us as much as dynamic memory available to page cache so that it can reclaim page frame from page cache when needed. Page reclaiming uses LRU algorithm. Linux kernel refers to page cache when reading from or writing to disk. While reading new pages are added to meet the requirement for user mode read request. If those pages are not already on cache than a new entry is added to cache. Further is the system have enough free memory pages are kept in cache for indefinite period of time so that they can be later used by other processes.Similarly before writing page data to block of device kernel verifies if the entry is already loaded on cache and if not new entry is added to cache and data are written on the disk.
When system load is low RAM is mostly filled by disk caches but when the load is high RAM is mostly filled by process page and caches are shrunken to make room for additional process.

Page Frame Reclaiming Algorithm(PFRA): The PFRA algorithm is invoked under different conditions and handles page frame in different ways based upon their content. PFRA is invoke to pick up page frame and make them free. The PFRA algorithm is activated if the kernel detects low on memory condition or if kernel thread is activated periodically in order to reclaim memory. Linux kernel divides the memory space into memory space used by process, disk cache, free memory and memory used by kernel. Linux kernel periodically mark the pages from memory as active and inactive based on whether they have been accessed recently. If the memory is low than pages are first reclaimed from inactive list and then from active list if necessary. Only syncable, swappable and discardable pages are reclaimed. If the page is dirty it is written out to disk and is reclaimed. If the page belongs to user mode process that it is written out to swap or may prefer for disk cache but this depends upon the swappiness value settings.

OOM: If there are few syncable and discardable pages and is the swap space is full than OOM is invoked where linux kernel try to kill the process heuristically so that it can claim memory for kernel space. However there are several requirement for a OOM killer to kill those processes and they are. Process must not be owned by root Process that is not directly accessing hardware device Process can not be init, swapper or kernel thread. Process that owned large number of page frame so that the significant amount ot memory can be freed up.

Drop Cache: Linux kernel version 2.6 and higher provides a mechanism to drop page, dentry or inode cache which can be helpful to free up a lot of memory. This is non destructive operation and will only free up things that were completely un used. To free page_cache: echo 1 > /proc/sys/vm/drop_cache

To free dentries and inodes: echo 2 > /proc/sys/vm/drop_caches

To free dentries, inode and page cache: echo 3 > /proc/sys/vm/drop_cache

Memory Measurements Tools and Commands:

Benchmarking: In order to optimize memory within a linux operating system we may need to benchmark all the system activity . Commands like vmstat, sar and /proc/ file system can be helpful to further explore and investigate the issue.

The /proc/meminfo file provides overall memory statics that include memory used, total, free, active and inactive memory and more. Screenshot attached below.

Similarly if you want to identify per process memory,swap uses than /proc/pid/status can be more helpful. Here on the screenshot below vmRSS and vmSwap are the actual memory and swap used by process. Similarly vmHWM is the maximum memory used by the process which is determined by the linux kernel based on average sleep time.

In order to identify major fault and minor fault per process you can visit /proc/pid/stat. Please refer man proc for further details. Further you can also explore dirty pages per process using /proc/pid/statm file.

commands like top, ps can be handy if you re trying to debug memory usage per processes. For example ps -p PID -o %mem can output the percentage of memory used by respective processes. while top provides some more verbose information.

The /proc/vmstat provides the virtual memory statics from kernal. similarly /proc/slabinfo provides verbose information about memory usage at slab level, slabtop command can be used for this too. Similarly /proc/pid/smaps gives us memory distribution of process for various libraries, data and program and what portion of it are shared.

Memory Optimization Tips and Technique: