AMD Ryzen 7900x3D DDR5 6000 - ssvb/tinymembench GitHub Wiki

2x 32GB G.Skill Trident Z5 30-40-40-96 7900x3D at stock speed, default governor linux 6.1

 tinymembench v0.4.9 (simple benchmark for memory throughput and latency)                                           

==========================================================================                                         
== Memory bandwidth tests                                               ==                                         
==                                                                      ==                                         
== Note 1: 1MB = 1000000 bytes                                          ==                                         
== Note 2: Results for 'copy' tests show how many bytes can be          ==                                         
==         copied per second (adding together read and writen           ==                                         
==         bytes would have provided twice higher numbers)              ==                                         
== Note 3: 2-pass copy means that we are using a small temporary buffer ==                                         
==         to first fetch data into it, and only then write it to the   ==                                         
==         destination (source -> L1 cache, L1 cache -> destination)    ==                                         
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==                                         
==         brackets                                                     ==                                         
==========================================================================                                         

 C copy backwards                                     :  48321.7 MB/s (18.8%)                                      
 C copy backwards (32 byte blocks)                    :  47628.1 MB/s (0.3%)                                       
 C copy backwards (64 byte blocks)                    :  47885.9 MB/s                                              
 C copy                                               :  48211.4 MB/s (0.1%)                                       
 C copy prefetched (32 bytes step)                    :  47304.3 MB/s                                              
 C copy prefetched (64 bytes step)                    :  50757.6 MB/s                                              
 C 2-pass copy                                        :  25137.1 MB/s (0.5%)                                       
 C 2-pass copy prefetched (32 bytes step)             :  23931.1 MB/s                                              
 C 2-pass copy prefetched (64 bytes step)             :  25435.1 MB/s (0.2%)                                       
 C fill                                               :  75265.3 MB/s                                              
 C fill (shuffle within 16 byte blocks)               :  76294.6 MB/s (0.2%)                                       
 C fill (shuffle within 32 byte blocks)               :  76073.8 MB/s                                              
 C fill (shuffle within 64 byte blocks)               :  62570.5 MB/s                                              
 ---                                                                                                               
 standard memcpy                                      :  31184.8 MB/s                                              
 standard memset                                      :  31158.8 MB/s                                              
 ---                                                                                                               
 MOVSB copy                                           :  31211.1 MB/s                                              
 MOVSD copy                                           :  31211.9 MB/s                                              
 SSE2 copy                                            :  57494.7 MB/s (0.2%)                                       
 SSE2 nontemporal copy                                :  31169.8 MB/s                                              
 SSE2 copy prefetched (32 bytes step)                 :  56528.2 MB/s                                              
 SSE2 copy prefetched (64 bytes step)                 :  56897.5 MB/s (0.1%)                                       
 SSE2 nontemporal copy prefetched (32 bytes step)     :  31195.8 MB/s                                              
 SSE2 nontemporal copy prefetched (64 bytes step)     :  31197.4 MB/s                                              
 SSE2 2-pass copy                                     :  37280.4 MB/s (0.3%)                                       
 SSE2 2-pass copy prefetched (32 bytes step)          :  36950.4 MB/s (0.3%)                                       
 SSE2 2-pass copy prefetched (64 bytes step)          :  36885.6 MB/s (0.8%)                                       
 SSE2 2-pass nontemporal copy                         :   5190.9 MB/s (0.3%)                                       
 SSE2 fill                                            :  78373.8 MB/s                                              
 SSE2 nontemporal fill                                :  31131.0 MB/s                                              

==========================================================================                                         
== Memory latency test                                                  ==                                         
==                                                                      ==                                         
== Average time is measured for random memory accesses in the buffers   ==                                         
== of different sizes. The larger is the buffer, the more significant   ==                                         
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==                                         
== accesses. For extremely large buffer sizes we are expecting to see   ==                                         
== page table walk with several requests to SDRAM for almost every      ==                                         
== memory access (though 64MiB is not nearly large enough to experience ==                                         
== this effect to its fullest).                                         ==                                         
==                                                                      ==                                         
== Note 1: All the numbers are representing extra time, which needs to  ==                                         
==         be added to L1 cache latency. The cycle timings for L1 cache ==                                         
==         latency can be usually found in the processor documentation. ==                                         
== Note 2: Dual random read means that we are simultaneously performing ==                                         
==         two independent memory accesses at a time. In the case if    ==                                         
==         the memory subsystem can't handle multiple outstanding       ==                                         
==         requests, dual random read has the same timings as two       ==                                         
==         single reads performed one after another.                    ==                                         
==========================================================================                                         

block size : single random read / dual random read, [MADV_NOHUGEPAGE]                                              
      1024 :    0.0 ns          /     0.0 ns                                                                       
      2048 :    0.0 ns          /     0.0 ns                                                                       
      4096 :    0.0 ns          /     0.0 ns                                                                       
      8192 :    0.0 ns          /     0.0 ns                                                                       
     16384 :    0.0 ns          /     0.0 ns                                                                       
     32768 :    0.0 ns          /     0.0 ns                                                                       
     65536 :    1.0 ns          /     1.4 ns                                                                       
    131072 :    1.5 ns          /     1.8 ns                                                                       
    262144 :    1.7 ns          /     1.9 ns                                                                       
    524288 :    2.5 ns          /     2.9 ns                                                                       
   1048576 :    3.0 ns          /     3.4 ns                                                                       
   2097152 :    7.1 ns          /     9.3 ns                                                                       
   4194304 :    9.2 ns          /    11.0 ns                                                                       
   8388608 :   10.2 ns          /    11.6 ns                                                                       
  16777216 :   11.9 ns          /    13.6 ns                                                                       
  33554432 :   13.9 ns          /    15.7 ns                                                                       
  67108864 :   15.2 ns          /    16.9 ns                                                                       

block size : single random read / dual random read, [MADV_HUGEPAGE]                                                
      1024 :    0.0 ns          /     0.0 ns                                                                       
      2048 :    0.0 ns          /     0.0 ns                                                                       
      4096 :    0.0 ns          /     0.0 ns                                                                       
      8192 :    0.0 ns          /     0.0 ns                                                                       
     16384 :    0.0 ns          /     0.0 ns                                                                       
     32768 :    0.0 ns          /     0.0 ns                                                                       
     65536 :    1.0 ns          /     1.4 ns                                                                       
    131072 :    1.5 ns          /     1.8 ns                                                                       
    262144 :    1.7 ns          /     1.9 ns                                                                       
    524288 :    1.9 ns          /     2.0 ns                                                                       
   1048576 :    2.0 ns          /     2.0 ns                                                                       
   2097152 :    5.9 ns          /     7.9 ns                                                                       
   4194304 :    7.7 ns          /     9.6 ns                                                                       
   8388608 :    8.7 ns          /    10.1 ns                                                                       
  16777216 :    9.2 ns          /    10.3 ns                                                                       
  33554432 :    9.4 ns          /    10.4 ns                                                                       
  67108864 :    9.6 ns          /    10.4 ns
tinymembench v0.4.9 (simple benchmark for memory throughput and latency)

Table of Contents

==============================================================

Memory bandwidth tests

Note 1: 1MB = 1000000 bytes

Note 2: Results for 'copy' tests show how many bytes can be

copied per second (adding together read and writen

bytes would have provided twice higher numbers)

Note 3: 2-pass copy means that we are using a small temporary buffer

to first fetch data into it, and only then write it to the

destination (source -> L1 cache, L1 cache -> destination)

Note 4: If sample standard deviation exceeds 0.1%, it is shown in

brackets

==============================================================
 C copy backwards                                     :  48321.7 MB/s (18.8%)                                      
 C copy backwards (32 byte blocks)                    :  47628.1 MB/s (0.3%)                                       
 C copy backwards (64 byte blocks)                    :  47885.9 MB/s                                              
 C copy                                               :  48211.4 MB/s (0.1%)                                       
 C copy prefetched (32 bytes step)                    :  47304.3 MB/s                                              
 C copy prefetched (64 bytes step)                    :  50757.6 MB/s                                              
 C 2-pass copy                                        :  25137.1 MB/s (0.5%)                                       
 C 2-pass copy prefetched (32 bytes step)             :  23931.1 MB/s                                              
 C 2-pass copy prefetched (64 bytes step)             :  25435.1 MB/s (0.2%)                                       
 C fill                                               :  75265.3 MB/s                                              
 C fill (shuffle within 16 byte blocks)               :  76294.6 MB/s (0.2%)                                       
 C fill (shuffle within 32 byte blocks)               :  76073.8 MB/s                                              
 C fill (shuffle within 64 byte blocks)               :  62570.5 MB/s                                              
 ---                                                                                                               
 standard memcpy                                      :  31184.8 MB/s                                              
 standard memset                                      :  31158.8 MB/s                                              
 ---                                                                                                               
 MOVSB copy                                           :  31211.1 MB/s                                              
 MOVSD copy                                           :  31211.9 MB/s                                              
 SSE2 copy                                            :  57494.7 MB/s (0.2%)                                       
 SSE2 nontemporal copy                                :  31169.8 MB/s                                              
 SSE2 copy prefetched (32 bytes step)                 :  56528.2 MB/s                                              
 SSE2 copy prefetched (64 bytes step)                 :  56897.5 MB/s (0.1%)                                       
 SSE2 nontemporal copy prefetched (32 bytes step)     :  31195.8 MB/s                                              
 SSE2 nontemporal copy prefetched (64 bytes step)     :  31197.4 MB/s                                              
 SSE2 2-pass copy                                     :  37280.4 MB/s (0.3%)                                       
 SSE2 2-pass copy prefetched (32 bytes step)          :  36950.4 MB/s (0.3%)                                       
 SSE2 2-pass copy prefetched (64 bytes step)          :  36885.6 MB/s (0.8%)                                       
 SSE2 2-pass nontemporal copy                         :   5190.9 MB/s (0.3%)                                       
 SSE2 fill                                            :  78373.8 MB/s                                              
 SSE2 nontemporal fill                                :  31131.0 MB/s                                              
==============================================================

Memory latency test

Average time is measured for random memory accesses in the buffers

of different sizes. The larger is the buffer, the more significant

are relative contributions of TLB, L1/L2 cache misses and SDRAM

accesses. For extremely large buffer sizes we are expecting to see

page table walk with several requests to SDRAM for almost every

memory access (though 64MiB is not nearly large enough to experience

this effect to its fullest).

Note 1: All the numbers are representing extra time, which needs to

be added to L1 cache latency. The cycle timings for L1 cache

latency can be usually found in the processor documentation.

Note 2: Dual random read means that we are simultaneously performing

two independent memory accesses at a time. In the case if

the memory subsystem can't handle multiple outstanding

requests, dual random read has the same timings as two

single reads performed one after another.

==============================================================

block size : single random read / dual random read, [MADV_NOHUGEPAGE]

      1024 :    0.0 ns          /     0.0 ns                                                                       
      2048 :    0.0 ns          /     0.0 ns                                                                       
      4096 :    0.0 ns          /     0.0 ns                                                                       
      8192 :    0.0 ns          /     0.0 ns                                                                       
     16384 :    0.0 ns          /     0.0 ns                                                                       
     32768 :    0.0 ns          /     0.0 ns                                                                       
     65536 :    1.0 ns          /     1.4 ns                                                                       
    131072 :    1.5 ns          /     1.8 ns                                                                       
    262144 :    1.7 ns          /     1.9 ns                                                                       
    524288 :    2.5 ns          /     2.9 ns                                                                       
   1048576 :    3.0 ns          /     3.4 ns                                                                       
   2097152 :    7.1 ns          /     9.3 ns                                                                       
   4194304 :    9.2 ns          /    11.0 ns                                                                       
   8388608 :   10.2 ns          /    11.6 ns                                                                       
  16777216 :   11.9 ns          /    13.6 ns                                                                       
  33554432 :   13.9 ns          /    15.7 ns                                                                       
  67108864 :   15.2 ns          /    16.9 ns                                                                       

block size : single random read / dual random read, [MADV_HUGEPAGE]

      1024 :    0.0 ns          /     0.0 ns                                                                       
      2048 :    0.0 ns          /     0.0 ns                                                                       
      4096 :    0.0 ns          /     0.0 ns                                                                       
      8192 :    0.0 ns          /     0.0 ns                                                                       
     16384 :    0.0 ns          /     0.0 ns                                                                       
     32768 :    0.0 ns          /     0.0 ns                                                                       
     65536 :    1.0 ns          /     1.4 ns                                                                       
    131072 :    1.5 ns          /     1.8 ns                                                                       
    262144 :    1.7 ns          /     1.9 ns                                                                       
    524288 :    1.9 ns          /     2.0 ns                                                                       
   1048576 :    2.0 ns          /     2.0 ns                                                                       
   2097152 :    5.9 ns          /     7.9 ns                                                                       
   4194304 :    7.7 ns          /     9.6 ns                                                                       
   8388608 :    8.7 ns          /    10.1 ns                                                                       
  16777216 :    9.2 ns          /    10.3 ns                                                                       
  33554432 :    9.4 ns          /    10.4 ns                                                                       
  67108864 :    9.6 ns          /    10.4 ns 

```

⚠️ **GitHub.com Fallback** ⚠️