Windows Memory Cheat Sheet - microsoft/MSO-Scripts GitHub Wiki

The Short Story

For a computer, "Memory" is an ambiguous term, with various classes and categories that are often misunderstood.

What follows is a definitive, step-by-step guide to understanding, measuring, and tracing Windows Memory.

In brief:

  • Windows provides to each running Process a private Virtual Address Space entirely filled with available Virtualized RAM.

  • To facilitate virtualization, every object in Virtual Memory is backed by (charged to) a corresponding file on disk: module, data file, or pagefile.

  • Out-of-Memory occurs after a) a process's Virtual Address Space is fully allocated, or b) the system's pagefile is fully committed.

  • Memory Pressure occurs when demands for Virtualized RAM exceed the available Physical RAM, causing delays due to paging (disk I/O to repurpose Physical RAM).

  • Windows uses otherwise unoccupied Physical RAM as a file cache called the Standby List to improve system and application responsiveness.

  • Memory Latency is an underappreciated source of execution slowness. Memory is fastest when its usage is limited, local, and sequential.

  • Various Software Tools exist to measure and attribute memory usage. Event Tracing for Windows (ETW) can trace memory activity over time and identify the responsible code.


Contents



The Long Story

The Windows Memory Manager provides for the following:

  1. Every process gets its own private Virtual Address Space:

    • For 32-bit Process:     4 GB = 232 = ~4.3 billion bytes / addresses, maximum
    • For 64-bit Process: 128 TB = 247 = ~140 trillion bytes / available addresses
  2. Every process's entire Virtual Address Space is potentially populated with Virtualized RAM (containing code/data).

  But the total Virtualized RAM in use across all processes may easily exceed the installed, Physical RAM on the device.
  Therefore...

  1. 'Memory' in Windows is demand-paged:
    • Loading a module, mapping a data file, or allocating a range of memory is simply a bookkeeping event.
    • The corresponding code/data is paged-into the address space of the process only as it is accessed (read or written) by the executing program.

  In order for this system to work:

  1. Every memory object in a process's address space is backed by or charged to a corresponding file on disk, where code/data is also stored.
    • Code and Data in a module image is (mostly) backed by the module on disk (.exe/.dll).
    • A Memory Mapped File is backed by its corresponding file on disk.
    • Other memory allocations are backed by and charged to the system pagefile.

Key Definitions

  • Address Space: the totality of all numerical addresses accessible by a process. (Every byte of memory has an address.)
  • Memory Page: a unit of memory management. RAM and Address Space are managed as (usually) 4 KB pages.
  • Paging (Swapping): moving code/data into and out of a process's address space via Virtualized RAM.
  • System Pagefile: backing file(s) for virtualized data not otherwise backed by a module or data file. Usually: c:\pagefile.sys
  • Hard Page Fault: page(s) of RAM must be populated with code/data from disk before entering the process's address space.
  • Soft Page Fault: page(s) of RAM are already cached with code/data, or contain zeros, before entering the process's address space.
  • Working Set: the system's accounting of how RAM is currently assigned to each process.
  • Trimming: removing Physical RAM from one or more processes so that it may be used elsewhere. (This reduces the Working Set.)
  • Memory Pressure: demand for Virtualized RAM exceeds available Physical RAM, requiring trimming.
  • Image: the in-memory representation of an executable module on disk (.exe/.dll).
  • Section: the in-memory representation of a data file on disk. A memory-mapped file, or file mapping.
  • Pagefile-backed Section: a range of memory backed by the system pagefile, with a handle which may be shared across processes.
  • Copy-on-Write: data originally loaded from and backed by a file or module on disk, but forked to a private copy when modified.


Committed Address Space and "Commit Charge"

Every page of a process's Address Space is in one of three states:

  • Committed: allocated and accessible
  • Reserved: inaccessible until explicitly allocated
  • Free: unallocated and inaccessible

Virtually all of the Committed Address Space is backed by file objects: modules, data files, or the system pagefile

Commit Charge is the subset of the Committed Address Space in each process backed by (charged to) the system pagefile.

System Commit Charge Limit (or System Virtual Memory Limit) is the sum total size of all pagefiles on the system plus the size of Available Physical RAM.

The sum total Commit Charge across the entire system cannot exceed the System Commit Charge Limit.


Note

In some configurations, Windows will limit the automatic growth of its pagefiles to 3X Physical RAM size, for a total System Commit Charge Limit of 4X installed RAM. The size of system pagefiles can also be set manually.

See also: Commit Charge vs. Committed Address Space



Out-of-Memory Conditions (OOM)

  1. Commit Charge: OOM occurs when a requested Virtual Memory allocation would cause the sum total Commit Charge across all processes to exceed the System Commit Charge Limit.

  2. Virtual Address Space: OOM occurs when there is no contiguous range within the Virtual Address Space of a process available to satisfy an allocation request. (Typically for a 32-bit process: 4 GB max)

  3. Physical RAM: OOM failures almost never occur due to insufficient RAM. Nevertheless, excessive paging due to system-wide demands on limited Physical RAM can cause substantial slowdowns.

Important

TRUE or FALSE?
Unused Memory is Wasted Memory.

  • Application: FALSE - RAM left unused by an application is not wasted.
  • Operating System: TRUE - The OS should put most all Physical RAM to good use.

The Desktop OS makes use of otherwise unused RAM to improve system responsiveness through a cache called the Standby List.



The Standby List: Code and Data Caching

Physical RAM usage can be categorized in this way:

  • Active / In Use: RAM used by processes, drivers or the operating system.
  • Standby: RAM containing cached code/data not currently in use. ◄ ◄ ◄
  • Modified: RAM whose contents must be flushed to disk before being reused.
  • Zeroed: RAM containing all zeros and ready for use.
  • Free: RAM needing to be zeroed or loaded with data before use.

CONSIDER:
A process loads and pages-in a module or data file, then unloads it.

Q:  Where does that code/data go, once unloaded?
A:  It goes to the Standby List: RAM containing cached code/data awaiting future use.

Q:  What about a system service which speculatively preloads code/data from disk into unused RAM (Standby List) to improve responsiveness?
A:  This is the SysMain (SuperFetch) system service, active on Desktop versions of Windows.

Q:  Can I see what code/data is currently in this 'Standby List'?
A:  Yes: RAMMap > File Summary tab > Standby column

Q:  Does the 'Standby List' compete with my application for available RAM?
A:  No! Its RAM is released whenever needed elsewhere. Lowest priority data is released first.

Caution

Extensive memory usage by a process or driver has opportunity cost.
Stage 1: Large amounts of RAM used by an application might otherwise have been used by the OS for improving system responsiveness via the Standby List.
Stage 2: Using large amounts of RAM in one process can cause trimming and paging in others, inducing delays.
Stage 3: When processes or drivers compete for a substantial fraction of available Physical RAM, thrashing may occur, slowing the system substantially.

Note

See the current Physical RAM usage using:



Memory and Performance

"Out of CPU, memory and disk, memory [RAM] is typically the most important for overall system performance. The more, the better."
-Mark Russinovich

Memory virtualization and caching are transparent to the application as it executes. But that transparency cannot hide substantial latency and slowdowns when memory is accessed haphazardly or inefficiently.

Memory is fastest when its usage is Limited, Local, and Sequential.

  1. Limited:

    • Data-filled RAM requires time and energy to write and read. Therefore, in general, the greater the Working Set (RAM usage) of a process, the slower it launches and runs. (Time/Space Tradeoff describes a narrow set of algorithms where using more memory can sometimes result in faster execution. So measure it!)
    • Memory usage has opportunity costs. Large amounts of RAM used by an application could have been used by Windows' Standby List (file cache) to improve system responsiveness.
    • Memory Pressure induces paging delays when the Memory Manager must trim Physical RAM from other processes to satisfy new demands.
  2. Local:

    • Data Locality organizes data elements to be cached and accessed together, speeding execution by reducing latency.
    • Code Locality requires tools (such as PGO) to optimize code layout so as to load fewer memory pages along common execution paths.
      See: Profile Guided Optimization (PGO) - Basic Block Optimization or Basic Block Reordering
  3. Sequential:
    Sequential data access is far faster than random access. This is true for accessing data on disk (SDD or HDD), in RAM, and via CPU caches. (cf. Burst Mode)

    • Disk: To test your own disk's Sequential vs. Random Access speed, run (as Administrator):
      winsat disk -read -seq and winsat disk -read -ran
    • RAM: Physical RAM delivers data usually in 64-byte blocks (cache lines), and adjacent blocks in rapid succession via Burst Mode.
    • CPU: Modern CPUs predict regular, sequential data access patterns to prefetch the caches.

    Therefore, traversing data structures which rely on 'pointer chasing' to access separately allocated nodes (eg. linked lists, trees, ...) may be far slower than traversing adjacent array elements. (!)

  • Practical Guidance (Videos):

    • Memory Locality often drives Performance!, Modern C++: What You Need to Know, Herb Sutter, Build 2014
      "Assume your CPU has a prefetcher. If you're not using it, your code is not as efficient as it could be!" (0:40:39)

    • CPU Caches and Why You Care, Scott Meyers, code::dive conference 2014
      "CPU caches ultimately determine the performance of your program. It does not matter what programming language you are using. ..." (0:00:51)

    • Native Code Performance and Memory, Eric Brumer, Build 2013
      "Memory really matters: Working Set, Caching, Spatial and Temporal Locality of Access." (0:01:29)



MEMORY TOOLS

VMMap: Per-process Memory Usage

VMMap is a Microsoft SysInternals tool which reveals the Windows memory bookkeeping for one specific process.

There are many numbers and colors (categories), but each is an essential aspect of per-process memory usage.

VMMap Screenshot
Screenshot of Sysinternals VMMap

Bar Charts (top)
Committed: Committed Virtual Address Space of the process
Private Bytes: Private Commit Charge* of the process
Working Set: Physical RAM assigned to the process
Color Legend: see the Type column of the middle table.

Memory Categories (middle table rows)
Image: in-memory representation of an executable module (.exe/.dll)
Mapped File: data file mapped into memory - mapfile
Shareable*: pagefile-backed section, shareable across processes
  In a 64-bit process, 2+ TB in the Size column is likely reserved address space due to Control Flow Guard.
Private Data*: memory allocated via VirtualAlloc (except the Windows Heap)
Heap: memory acquired by the Windows Heap Manager
Managed Heap: memory acquired by the CLR Managed GC Allocator
Stack: memory allocated/reserved for each thread's stack (usually 1 MB address space)   Also: stack guard page
Page Table: Windows memory bookkeeping
Unusable: Virtual Address Space unusable due to allocation alignment and granularity
Free†: Virtual Address Space available for allocation

Memory Characteristic (middle table columns)
Size: total Virtual Address Space of the memory region
Committed: Virtual Address Space in the Committed state
Private*: the Private Commit Charge of the memory region (subset of Committed)
Total WS: the Physical RAM assigned to the region
Private WS: the Physical RAM which cannot be shared across processes (such as Heap)
Shareable WS: the Physical RAM which can be shared across processes (such as executable code)
Shared WS: the Physical RAM which is currently shared across processes (subset of Shareable WS)
Locked WS: the Physical RAM which cannot be paged-out
Blocks: sub-regions of differing sub-characteristics such as Protection
Largest†: the largest block of contiguous Virtual Address Space with this characteristic

† The Largest/Free value is the largest block of contiguous Virtual Address Space available for a single new allocation, loaded module, mapped file, etc. (This is typically a concern only for 32-bit processes, where the total available address space is 4 GB, max.) For example, if the largest free block is less than 1 MB then thread creation (with a 1 MB stack reservation) will fail.

* The Commit Charge is sometimes subdivided into Private and Shareable so that when summing across processes (to compare against the System Commit Charge Limit), the Shareable section can avoid double-counting. In VMMap, the Process Commit Charge is roughly the sum of Private Total + Committed Shareable values (middle table).
    • Private: usually allocated via VirtualAlloc (often through the Windows Heap)
    • Shareable: usually a Pagefile-backed Section - a file mapping against the system pagefile, with a shareable handle

See Also:



RAMMap: System-wide Memory Usage

RAMMap is another Microsoft SysInternals tool which gives a snapshot of how Physical RAM is being employed across the system.

RAMMap Screenshot
Screenshot of SysInternals RAMMap

Use Counts tab - Columns
Active: RAM in the Working Set of process(es) or the system
Standby: RAM containing cached code/data cached from disk, not part of any Working Set
Modified: RAM with modified data waiting to be flushed to its backing store on disk before reuse
Zeroed: RAM cleared of data and available for use
Free: RAM needing to be cleared or loaded with data before reuse (for security)
Bad: RAM pages identified as defective and excluded from use

Use Counts tab - Rows
Process Private: RAM which cannot be shared a across processes (heap, stack, etc.)
Shareable: Pagefile-backed Section, shareable across processes
Mapped File: Module Images and Memory-mapped datafiles
Paged Pool: Kernel Memory Manager - pageable to disk
Nonpaged Pool: Kernel Memory Manager - not pageable to disk
Page Table: Windows Memory Manager bookkeeping
System PTE: Page Table Entries for managing virtual memory
Session Private: Memory that is private to a particular logged in session
Metafile: NTFS filesystem metadata (unrelated to .wmf/.emf files)
Driver Locked: nonpageable RAM allocated by a kernel driver
Kernel Stack: RAM used by kernel thread stacks
AWE: Address Windowing Extensions
Large Page: nonpageable RAM allocated and managed in much larger units
Unused: Mostly zeroed and free RAM

Processes tab: RAM usage associated with individual processes. (Only Private memory is part of the Working Set.)

File Summary tab: RAM containing file/module code/data. (Only Active memory is part of a Working Set.)

See Also:



ETW: Event Tracing for Windows

ETW provides various ways to capture memory snapshots and traces.

Memory Stats: per process every ½ second

Every ½ second capture statistics on the various categories of memory usage for all running processes: Working Set, Commit Charge, Process Virtual Size

Resident Set: snapshot of physical RAM usage

The Resident Set is a detailed, single snapshot of how physical RAM is currently employed on the system and distributed across the running processes, including the modules and data files.

Commit Charge: trace of charges to the System Pagefile

The Commit Charge is the subset of the Committed Virtual Address Space which is charged to (backed by) the system pagefile. The sum total Commit Charge across the system cannot exceed the System Commit Charge Limit.

Reference Set: trace of RAM acquisition

Reference Set is a trace and accounting of RAM acquisition on demand by each process over a time interval.

See also: Memory: Expose RAM Usage in Windows Video



More to Explore

  • Pushing the Limits of Windows:
⚠️ **GitHub.com Fallback** ⚠️