F2FS Debugging - nicktehrany/notes GitHub Wiki

This contains my notes on debugging and understanding some internal performance of F2FS. It includes locking overheads and performance profiling of functions, since I am aiming to increase its performance. These are also my notes on implementing multi streams in F2FS, hence often contain comments for me to track what did and did not work and why.

Write Call Stack

This retrieves and shows the call write stack using perf, and is running on 5.19 Kernel without modifications. I don't include all output but only what is important. I include all commands if I ever happen to need them again (which is very likely).

user@stosys:~$ sudo perf record -F 99 -a --call-graph dwarf sudo env "NR_FILES=1" "NR_THREAD=1" ${FIO_HOME}/fio src/f2fs-zns-workloads/f2fs-multi-stream/jobs/f2fs_single_stream.fio
user@stosys:~$ sudo perf report

    32.14%     0.00%  kworker/u22:2-e  [kernel.kallsyms]           [k] ret_from_fork
            |
            ---ret_from_fork
               kthread
               worker_thread
               process_one_work
               wb_workfn
               wb_do_writeback
               wb_writeback
               __writeback_inodes_wb
               writeback_sb_inodes
               __writeback_single_inode
               do_writepages
               f2fs_write_data_pages
               __f2fs_write_data_pages
               |
                --31.97%--f2fs_write_cache_pages
                          |
                          |--29.07%--f2fs_write_single_data_page
                          |          |
                          |           --27.08%--f2fs_do_write_data_page
                          |                     |
                          |                     |--14.29%--f2fs_outplace_write_data
                          |                     |          |
                          |                     |          |--11.11%--do_write_page
                          |                     |          |          |
                          |                     |          |          |--4.02%--f2fs_allocate_data_block
                          |                     |          |          |          |
                          |                     |          |          |          |--1.13%--update_sit_entry
                          |                     |          |          |          |          |
                          |                     |          |          |          |           --0.51%--f2fs_usable_zone_blks_in_seg
                          |                     |          |          |          |
                          |                     |          |          |           --0.82%--locate_dirty_segment
                          |                     |          |          |                     |
                          |                     |          |          |                      --0.53%--__locate_dirty_segment
                          |                     |          |          |
                          |                     |          |          |--3.63%--f2fs_submit_page_write
                          |                     |          |          |          |
                          |                     |          |          |           --0.66%--__wake_up
                          |                     |          |          |                     __wake_up_common_lock
                          |                     |          |          |
                          |                     |          |           --1.25%--f2fs_update_device_state.part.0
                          |                     |          |                      --0.97%--f2fs_set_dirty_device
                          |                     |          |                                |
                          |                     |          |                                 --0.70%--__add_ino_entry
                          |                     |          |                                           |
                          |                     |          |                                            --0.68%--radix_tree_lookup
                          |                     |          |          |                      --0.53%--__locate_dirty_segment
                          |                     |          |          |
                          |                     |          |          |--3.63%--f2fs_submit_page_write
                          |                     |          |          |
                          |                     |          |          |--3.63%--f2fs_submit_page_write
                          |                     |          |          |          |
                          |                     |          |          |           --0.66%--__wake_up
                          |                     |          |          |                     __wake_up_common_lock
                          |                     |          |          |
                          |                     |          |           --1.25%--f2fs_update_device_state.part.0
                          |                     |          |                     |
                          |                     |          |                      --0.97%--f2fs_set_dirty_device
                          |                     |          |                                |
                          |                     |          |                                 --0.70%--__add_ino_entry
                          |                     |          |                                           |
                          |                     |          |                                                      __radix_tree_lookup
                          |                     |          |
                          |                     |          |--1.89%--f2fs_update_data_blkaddr
                          |                     |          |          f2fs_update_extent_cache
                          |                     |          |          |
                          |                     |          |           --1.29%--f2fs_update_extent_tree_range
                          |                     |          |                     |
                          |                     |          |                      --0.68%--__try_merge_extent_node
                          |                     |          |                                _raw_spin_unlock
                          |                     |          |                                __raw_callee_save___pv_queued_spin_unlock
                          |                     |          |
                          |                     |           --1.05%--f2fs_update_extent_cache
                          |                     |
                          |                     |--5.91%--f2fs_get_dnode_of_data
                          |                     |          |
                          |                     |           --4.66%--__get_node_page
                          |                     |                     |
                          |                     |                      --4.28%--__get_node_page.part.0
                          |                     |                                |
                          |                     |                                |--3.23%--pagecache_get_page
                          |                     |                                |          |
                          |                     |                                |           --2.56%--__filemap_get_folio
                          |                     |                                |
                          |                     |                                 --0.82%--read_node_page
                          |                     |
                          |                     |--3.97%--set_page_writeback
                          |                     |          __folio_start_writeback
                          |                     |          |
                          |                     |          |--1.31%--__lock_text_start
                          |                     |          |          __raw_spin_lock_irqsave
                          |                     |          |
                          |                     |          |--0.61%--_raw_spin_unlock_irqrestore
                          |                     |          |
                          |                     |           --0.52%--mod_lruvec_page_state
                          |                     |
                          |                      --1.07%--f2fs_put_dnode
                          |
                           --1.82%--clear_page_dirty_for_io
                                     |
                                      --1.72%--folio_clear_dirty_for_io

Lock Tracing

When debugging kernel code we often need to see contention points for used locks (semaphores, mutexes, etc.). This can easily be done with the builtin kernel lock statistics. These collect information on acquisition times, holding times, and much more. Usage is as follows:

# Enable lock stats
root@stosys:/home/user# echo 1 > /proc/sys/kernel/lock_stat

# Get stats with cat/less/head etc.
root@stosys:/home/user# less /proc/lock_stat

# Reset the collected stats
root@stosys:/home/user# echo 0 > /proc/lock_stat

F2FS Locks

For this I was aiming to find the lock contention in F2FS. Identifying if there are particular parts that can be parallelized with multiple locks. Note, this is run on a debug kernel with qemu -kernel, hence times are increased as a result of all the kernel debugging tools (kasan, etc.). I grep for the F2FS related locks to only see what is needed. For convenience I also pasted the column information of lock_stat, these are not actually part of the output (get them with head -5 /proc/lock-stat)

root@stosys:/home/user# cat /proc/lock_stat | grep -E "f2fs|sbi|sit_i|nm_i|sm_info"
lock_stat version 0.4
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                   &sit_i->sentry_lock-W:          4644           4846           0.34        1179.80        9679.18           2.00          57326        5920627           1.04      294251.76    61121145.04          10.32
                   &sit_i->sentry_lock-R:         56802          56991           0.42        1080.25     1839784.99          32.28          57247       18361436           0.21        6758.12    11502506.77           0.63
                     &sit_i->sentry_lock          56991          [<0000000025468521>] check_valid_map+0x48/0xd0
                     &sit_i->sentry_lock           4846          [<000000000ad77fab>] f2fs_allocate_data_block+0x16b/0x1470
                     &sit_i->sentry_lock          56721          [<000000000ad77fab>] f2fs_allocate_data_block+0x16b/0x1470
                     &sit_i->sentry_lock           5116          [<0000000025468521>] check_valid_map+0x48/0xd0
                    &nm_i->nat_list_lock:          4331           4379           0.36         624.16        4840.38           1.11          93982       13225060           0.29        2130.67     8722715.54           0.66
                    &nm_i->nat_list_lock           4379          [<0000000092ecb2b0>] __lookup_nat_cache+0x62/0x110
                    &nm_i->nat_list_lock           4379          [<0000000092ecb2b0>] __lookup_nat_cache+0x62/0x110
                               &et->lock             54          [<000000008cdbedea>] f2fs_lookup_extent_cache+0x190/0x710
                               &et->lock             83          [<00000000607fc718>] f2fs_update_extent_tree_range+0x184/0x950
                               &et->lock             54          [<00000000607fc718>] f2fs_update_extent_tree_range+0x184/0x950
                               &et->lock             83          [<000000008cdbedea>] f2fs_lookup_extent_cache+0x190/0x710
                       &sbi->extent_lock:           117            120           0.45          40.12         218.89           1.82          77783       12040725           0.39        6024.66     8658383.88           0.72
                       &sbi->extent_lock             61          [<000000006d1d302b>] f2fs_lookup_extent_cache+0x292/0x710
                       &sbi->extent_lock             58          [<00000000e65f9d87>] __try_merge_extent_node+0x15e/0x450
                       &sbi->extent_lock              1          [<00000000d253b52e>] __insert_extent_tree+0x106/0x240
                       &sbi->extent_lock             56          [<00000000e65f9d87>] __try_merge_extent_node+0x15e/0x450
                       &sbi->extent_lock             63          [<000000006d1d302b>] f2fs_lookup_extent_cache+0x292/0x710
                       &sbi->extent_lock              1          [<00000000d253b52e>] __insert_extent_tree+0x106/0x240
                      &sbinfo->stat_lock:             0              0           0.00           0.00           0.00           0.00            833           1695           0.00          17.69        2011.67           1.19
                          f2fs_stat_lock:             0              0           0.00           0.00           0.00           0.00              5              5           1.29          23.21          38.30           7.66
                &sbi->gc_thread->fggc_wq:             0              0           0.00           0.00           0.00           0.00              2              2           2.65          12.71          15.36           7.68
                         &sbi->s_es_lock:             0              0           0.00           0.00           0.00           0.00            661           7489           0.60         366.76        8040.86           1.07
                      &sbi->s_error_lock:             0              0           0.00           0.00           0.00           0.00              1              2           6.66          15.78          22.44          11.22
   (work_completion)(&sbi->s_error_work):             0              0           0.00           0.00           0.00           0.00              0              1          15.33          15.33          15.33          15.33
                         &sbi->s_md_lock:             0              0           0.00           0.00           0.00           0.00             89            743           0.54         179.06        1871.22           2.52
              &sbi->s_writepages_rwsem-R:             0              0           0.00           0.00           0.00           0.00              0            564           0.84      203262.41     3876161.76        6872.63
                     &sbi->s_orphan_lock:             0              0           0.00           0.00           0.00           0.00            115            876           1.26         226.80        3986.01           4.55
                    &sbi->s_mb_rb_lock-W:             0              0           0.00           0.00           0.00           0.00             62            444           0.69          23.42        1328.66           2.99
                    &sbi->s_mb_rb_lock-R:             0              0           0.00           0.00           0.00           0.00              0             47           1.95         216.46         745.21          15.86
 &sbi->s_mb_largest_free_orders_locks[-W:             0              0           0.00           0.00           0.00           0.00            131            795           0.58          23.76        1788.86           2.25
 &sbi->s_mb_largest_free_orders_locks[-R:             0              0           0.00           0.00           0.00           0.00              8             10           2.30         277.55         391.22          39.12
                           &sbi->fs_lock:             0              0           0.00           0.00           0.00           0.00              6             33           1.76          49.70         261.42           7.92
                          &sbi->wq_mutex:             0              0           0.00           0.00           0.00           0.00              3              3          28.37         278.69         341.32         113.77
                        &sbi->pipe_mutex:             0              0           0.00           0.00           0.00           0.00              1              1         568.39         568.39         568.39         568.39
                    &sbi->streammap_lock:             0              0           0.00           0.00           0.00           0.00             78       15203885           0.53        9988.61    12619640.08           0.83
                    &nm_i->nid_list_lock:             0              0           0.00           0.00           0.00           0.00              6         121398           0.58        2476.12     1789743.32          14.74
                       &nm_i->build_lock:             0              0           0.00           0.00           0.00           0.00             52            204           0.82      254992.18      314326.95        1540.82
                  &nm_i->nat_tree_lock-W:             0              0           0.00           0.00           0.00           0.00              4          13311           2.61       11699.63      230141.52          17.29
                  &nm_i->nat_tree_lock-R:             0              0           0.00           0.00           0.00           0.00          94083       15101090           0.67      254943.71    23852332.99           1.58
                     &sbi->cp_global_sem:             0              0           0.00           0.00           0.00           0.00              5             37           2.63      131434.47     1522091.42       41137.61
                          f2fs_list_lock:             0              0           0.00           0.00           0.00           0.00              5              5           1.00          16.19          27.66           5.53
     &sbi->gc_thread->gc_wait_queue_head:             0              0           0.00           0.00           0.00           0.00              7             87           0.64          26.58         247.85           2.85
                         &sbi->stat_lock:             0              0           0.00           0.00           0.00           0.00              7           9320           0.60         125.66       33056.64           3.55
                        &sbi->cp_rwsem-W:             0              0           0.00           0.00           0.00           0.00              4             28           2.26      125212.94     1478869.86       52816.78
                        &sbi->cp_rwsem-R:             0              0           0.00           0.00           0.00           0.00              3        5911958           5.25      295129.33   193559611.41          32.74
                     &sbi->inode_lock[i]:             0              0           0.00           0.00           0.00           0.00             31        6331921           0.58       18825.87     5245790.91           0.83
                  &sbi->extent_tree_lock:             0              0           0.00           0.00           0.00           0.00              2             13           1.37       54017.61       54900.16        4223.09
                         &sbi->sb_lock-R:             0              0           0.00           0.00           0.00           0.00              1              7           2.55          71.76          89.13          12.73
                     &sbi->node_change-W:             0              0           0.00           0.00           0.00           0.00              6             25           8.06       96483.89      551267.14       22050.69
                     &sbi->node_change-R:             0              0           0.00           0.00           0.00           0.00              9        2883752           2.50      255611.36    19970056.97           6.93
                           &sbi->gc_lock:             0              0           0.00           0.00           0.00           0.00              4             52           1.96     9837437.61   106697626.03     2051877.42
                        &sbi->writepages:             0              0           0.00           0.00           0.00           0.00             39            535       33242.83     4954863.85   242587877.71      453435.29
          &sbi->rr_active_stream_lock[i]:             0              0           0.00           0.00           0.00           0.00             55        5920572           0.60       21137.25     4544948.94           0.77
                 &sm_info->curseg_lock-R:             0              0           0.00           0.00           0.00           0.00             65        5920575           5.23      294307.94    83926844.31          14.18
            &sbi->write_io[i][j].io_lock:             0              0           0.00           0.00           0.00           0.00             55       17761716           0.60       23164.41    13938704.95           0.78
         &sbi->write_io[i][j].io_rwsem-W:             0              0           0.00           0.00           0.00           0.00            132        5933719           0.54       50951.26    25478283.65           4.29
         &sbi->write_io[i][j].io_rwsem-R:             0              0           0.00           0.00           0.00           0.00             90           1614           0.63          27.19        1440.30           0.89
                          &sbi->dev_lock:             0              0           0.00           0.00           0.00           0.00             43             44           0.98          76.89         211.17           4.80
                      &sbi->node_write-W:             0              0           0.00           0.00           0.00           0.00              6             46           1.25       78238.63      927459.38       20162.16
                      &sbi->node_write-R:             0              0           0.00           0.00           0.00           0.00              1           8628          28.39        7867.27      503611.17          58.37
                   &sbi->fsync_node_lock:             0              0           0.00           0.00           0.00           0.00              7          17259           0.51          17.67       14546.59           0.84
                           &sbi->cp_lock:             0              0           0.00           0.00           0.00           0.00              5             50           0.85           9.60          92.48           1.85
                           &sbi->cp_wait:             0              0           0.00           0.00           0.00           0.00              7            100           0.64         191.37         773.98           7.74
                      &sbi->umount_mutex:             0              0           0.00           0.00           0.00           0.00              2              2        3511.32       56781.12       60292.44       30146.22
                   &sbi->io_order_lock-R:             0              0           0.00           0.00           0.00           0.00             28        3032831          23.31        7000.68    84128416.81          27.74

Updated locks

One of the earlier implementations I made was to handle the streams in a different way in the F2FS I/O management. By default it manages the I/O in write_io, which is indexed by the type (hot/warm/cold). With multiple streams and hence multiple instances of a single type, this indexing is no longer the best. I extended the indexing to also index by the stream id. Then I profiled the locks and compare with the profiling of the unmodified F2FS.

root@stosys:/home/user# cat /proc/lock_stat | grep -E "f2fs|sbi|sit_i|nm_i|sm_info"
                   &sit_i->sentry_lock-W:         14219          14278           0.34        4017.78       19863.39           1.39         182271        5308208           5.70       32587.04    63464553.30          11.96
                   &sit_i->sentry_lock-R:        181612         181768           0.40       20589.27     5625626.63          30.95         182176       25623969           0.21        2782.04    16956195.57           0.66
                     &sit_i->sentry_lock         181768          [<000000006ce8634d>] check_valid_map+0x48/0xd0
                     &sit_i->sentry_lock          14278          [<00000000bd1804fe>] f2fs_allocate_data_block+0x17d/0x14a0
                     &sit_i->sentry_lock         180963          [<00000000bd1804fe>] f2fs_allocate_data_block+0x17d/0x14a0
                     &sit_i->sentry_lock          15083          [<000000006ce8634d>] check_valid_map+0x48/0xd0
                     &sbi->inode_lock[i]:        103638         103724           0.41       12975.84      161188.09           1.55         951026        5939719           0.30        1645.99     4923380.33           0.83
                     &sbi->inode_lock[i]         103724          [<000000007e654fa2>] f2fs_update_dirty_folio+0xa6/0x490
                     &sbi->inode_lock[i]         103724          [<000000007e654fa2>] f2fs_update_dirty_folio+0xa6/0x490
                    &nm_i->nat_list_lock:         17429          17472           0.36          46.89       11330.13           0.65         307119       18167578           0.28       25467.76    12727449.60           0.70
                    &nm_i->nat_list_lock          17472          [<000000002abff106>] __lookup_nat_cache+0x62/0x110
                    &nm_i->nat_list_lock          17472          [<000000002abff106>] __lookup_nat_cache+0x62/0x110
                       &sbi->extent_lock:           164            165           0.46          17.18         230.38           1.40         246398       13294578           0.34       24788.96     9961278.19           0.75
                       &sbi->extent_lock            100          [<00000000fb493d48>] f2fs_lookup_extent_cache+0x292/0x710
                       &sbi->extent_lock             64          [<00000000e61a1502>] __try_merge_extent_node+0x15e/0x450
                       &sbi->extent_lock              1          [<000000009073a0ea>] __insert_extent_tree+0x106/0x240
                       &sbi->extent_lock             94          [<00000000e61a1502>] __try_merge_extent_node+0x15e/0x450
                       &sbi->extent_lock             69          [<00000000fb493d48>] f2fs_lookup_extent_cache+0x292/0x710
                       &sbi->extent_lock              2          [<000000009073a0ea>] __insert_extent_tree+0x106/0x240
                               &et->lock             70          [<00000000e7e0ee04>] f2fs_lookup_extent_cache+0x190/0x710
                               &et->lock             23          [<000000007c9309ff>] f2fs_update_extent_tree_range+0x184/0x950
                               &et->lock             70          [<000000007c9309ff>] f2fs_update_extent_tree_range+0x184/0x950
                               &et->lock             23          [<00000000e7e0ee04>] f2fs_lookup_extent_cache+0x190/0x710
                     &sbi->node_change-W:             1              1         417.07         417.07         417.07         417.07              3             30          28.71      343037.33      758207.58       25273.59
                     &sbi->node_change-R:             7              8           0.85      343830.49     2061876.06      257734.51         951017        1167127           3.37       29638.65    13200975.92          11.31
                       &sbi->node_change              8          [<00000000533e9bb7>] f2fs_do_map_lock+0x1e/0x40
                       &sbi->node_change              1          [<0000000057f046d0>] f2fs_write_checkpoint+0x3cf/0x2fd0
                       &sbi->node_change              2          [<0000000057f046d0>] f2fs_write_checkpoint+0x3cf/0x2fd0
                       &sbi->node_change              7          [<00000000533e9bb7>] f2fs_do_map_lock+0x1e/0x40
                      &sbinfo->stat_lock:             1              1           1.33           1.33           1.33           1.33            808           1615           0.00          21.69        1986.07           1.23
                      &sbinfo->stat_lock              1          [<0000000052969bfb>] shmem_free_inode+0x4a/0x70
                      &sbinfo->stat_lock              1          [<0000000052969bfb>] shmem_free_inode+0x4a/0x70
                         &sbi->sb_lock-R:             0              0           0.00           0.00           0.00           0.00              1              6           2.92          19.77          35.19           5.86
          &sbi->rr_active_stream_lock[i]:             0              0           0.00           0.00           0.00           0.00             55        5308147           0.63       19083.77     4083601.88           0.77
                         &sbi->s_es_lock:             0              0           0.00           0.00           0.00           0.00            507           6874           0.58          26.90        6777.34           0.99
                      &sbi->s_error_lock:             0              0           0.00           0.00           0.00           0.00              2              2           5.81          39.45          45.26          22.63
 &sbi->s_mb_largest_free_orders_locks[-W:             0              0           0.00           0.00           0.00           0.00            107            428           0.66          21.76         740.25           1.73
 &sbi->s_mb_largest_free_orders_locks[-R:             0              0           0.00           0.00           0.00           0.00              6              7           2.65          38.33          73.39          10.48
                    &sbi->s_mb_rb_lock-W:             0              0           0.00           0.00           0.00           0.00             74            258           0.89          21.49         663.33           2.57
                    &sbi->s_mb_rb_lock-R:             0              0           0.00           0.00           0.00           0.00              0             28           7.92         191.82         698.42          24.94
   (work_completion)(&sbi->s_error_work):             0              0           0.00           0.00           0.00           0.00              0              1          11.83          11.83          11.83          11.83
                         &sbi->s_md_lock:             0              0           0.00           0.00           0.00           0.00             78            488           0.58          45.25         907.16           1.86
              &sbi->s_writepages_rwsem-R:             0              0           0.00           0.00           0.00           0.00              0            882           0.78      118426.33     1354388.06        1535.59
                     &sbi->s_orphan_lock:             0              0           0.00           0.00           0.00           0.00             91            416           1.27         218.72        2659.34           6.39
                           &sbi->fs_lock:             0              0           0.00           0.00           0.00           0.00              5             33           1.73          47.74         259.33           7.86
                          &sbi->wq_mutex:             0              0           0.00           0.00           0.00           0.00              2              3          32.37         295.96         369.09         123.03
                        &sbi->pipe_mutex:             0              0           0.00           0.00           0.00           0.00              0              1         255.44         255.44         255.44         255.44
                    &sbi->streammap_lock:             0              0           0.00           0.00           0.00           0.00             97       17767752           0.56       26587.72    15591418.66           0.88
                    &nm_i->nid_list_lock:             0              0           0.00           0.00           0.00           0.00              2          73619           0.62        2182.86      741000.55          10.07
                       &nm_i->build_lock:             0              0           0.00           0.00           0.00           0.00             78            145           1.13       29530.74       83847.61         578.26
                  &nm_i->nat_tree_lock-W:             0              0           0.00           0.00           0.00           0.00              1          23934           2.59       26425.01      205609.87           8.59
                  &nm_i->nat_tree_lock-R:             0              0           0.00           0.00           0.00           0.00         307024       18120088           0.69       29506.25    32271001.18           1.78
                          f2fs_stat_lock:             0              0           0.00           0.00           0.00           0.00              1              1           9.88           9.88           9.88           9.88
                     &sbi->cp_global_sem:             0              0           0.00           0.00           0.00           0.00              2             33           2.51      460989.54     2656346.58       80495.35
                          f2fs_list_lock:             0              0           0.00           0.00           0.00           0.00              1              1         129.95         129.95         129.95         129.95
     &sbi->gc_thread->gc_wait_queue_head:             0              0           0.00           0.00           0.00           0.00              2             96           0.64          20.53         200.48           2.09
                         &sbi->stat_lock:             0              0           0.00           0.00           0.00           0.00              2          18594           0.59          44.40       12536.14           0.67
                        &sbi->cp_rwsem-W:             0              0           0.00           0.00           0.00           0.00              1             36           1.83      451498.90     2623255.31       72868.20
                        &sbi->cp_rwsem-R:             0              0           0.00           0.00           0.00           0.00              1        5293538           5.77       38946.28   186140159.31          35.16
                  &sbi->extent_tree_lock:             0              0           0.00           0.00           0.00           0.00              1              6          11.07          70.25         169.07          28.18
                           &sbi->gc_lock:             0              0           0.00           0.00           0.00           0.00              1             62           1.86    11082989.32   157825138.93     2545566.76
                 &sm_info->curseg_lock-R:             0              0           0.00           0.00           0.00           0.00             78        5308147           8.98       32591.70    85100803.94          16.03
   &sbi->write_io[k + (MAX_ACTIVE_LOGS *:             0              0           0.00           0.00           0.00           0.00             55       15924441           0.63       32671.30    13189589.10           0.83
 &sbi->write_io[k + (MAX_ACTIVE_LOGS *-W:             0              0           0.00           0.00           0.00           0.00            124        5320593           0.62       45234.11    22182713.59           4.17
 &sbi->write_io[k + (MAX_ACTIVE_LOGS *-R:             0              0           0.00           0.00           0.00           0.00            108           2565           0.63          16.36        2629.88           1.03
                          &sbi->dev_lock:             0              0           0.00           0.00           0.00           0.00             51             58           1.09          24.72         198.29           3.42
                      &sbi->node_write-W:             0              0           0.00           0.00           0.00           0.00              1             58           1.22      108041.75     1864362.02       32144.17
                      &sbi->node_write-R:             0              0           0.00           0.00           0.00           0.00              0          14616          28.13        1291.81      657126.17          44.96
                   &sbi->fsync_node_lock:             0              0           0.00           0.00           0.00           0.00              1          29225           0.52          27.45       22347.58           0.76
                           &sbi->cp_lock:             0              0           0.00           0.00           0.00           0.00              1             58           0.86           8.79          94.32           1.63
                           &sbi->cp_wait:             0              0           0.00           0.00           0.00           0.00              1            116           0.63         505.93         864.57           7.45
                        &sbi->writepages:             0              0           0.00           0.00           0.00           0.00             43            850           3.94      849710.55   233279903.39      274446.95
                   &sbi->io_order_lock-R:             0              0           0.00           0.00           0.00           0.00             48        4135679          23.15       38928.24   116564852.66          28.19

Performance for this particular is the same, which logically now makes sense, since I/O is still managed (submitted) in the same way, hence locked up for the same time. The issue now is that performance significantly drops because we do not know which stream to flush when syncing page cache to the device, hence we have to flush all. This increases the writeback calls and hence drops the performance, since flushing is still called the same number of times. Next we do some more profiling on what else we can improve (and from the locking profiling the obvious choice is the sbi->writepages lock, where I/Os are submitted under a single global lock).

F2FS I/O submission

For this I am tracing the I/O submission, in order to easily see entire call stacks and overheads of particular functions. This is retrieved with perf, and only includes the part that is of interest.

user@stosys:~$ sudo perf record -F 99 -a --call-graph dwarf sudo env "NR_FILES=1" "NR_THREAD=1" ${FIO_HOME}/fio src/f2fs-zns-workloads/f2fs-multi-stream/jobs/f2fs_single_stream.fio
user@stosys:~$ sudo perf report --symbol-filter=f2fs_write_cache_pages


    43.24%     0.00%  kworker/u22:3-e  [kernel.kallsyms]           [k] ret_from_fork
            |
            ---ret_from_fork
               kthread
               worker_thread
               process_one_work
               wb_workfn
               wb_writeback
               __writeback_inodes_wb
               writeback_sb_inodes
               __writeback_single_inode
               do_writepages
               f2fs_write_data_pages
               |
                --43.23%--f2fs_write_cache_pages
                          |
                           --42.09%--f2fs_write_single_data_page
                                     |
                                     |--39.98%--f2fs_do_write_data_page
                                     |          |
                                     |          |--27.12%--f2fs_outplace_write_data
                                     |          |          |
                                     |          |          |--22.98%--do_write_page
                                     |          |          |          |
                                     |          |          |          |--14.48%--f2fs_allocate_data_block
                                     |          |          |          |          |
                                     |          |          |          |          |--3.50%--update_segment_mtime
                                     |          |          |          |          |          |
                                     |          |          |          |          |           --3.45%--ktime_get_coarse_with_offset
                                     |          |          |          |          |                     |
                                     |          |          |          |          |                     |--1.30%--trace_hardirqs_on
                                     |          |          |          |          |                     |          |
                                     |          |          |          |          |                     |           --1.23%--lockdep_hardirqs_on_prepare
                                     |          |          |          |          |                     |                     |
                                     |          |          |          |          |                     |                      --1.05%--mark_held_locks
                                     |          |          |          |          |                     |                                |
                                     |          |          |          |          |                     |                                 --0.94%--mark_lock.part.0
                                     |          |          |          |          |                     |
                                     |          |          |          |          |                     |--1.15%--lock_acquire
                                     |          |          |          |          |                     |          |
                                     |          |          |          |          |                     |           --0.80%--__lock_acquire
                                     |          |          |          |          |                     |
                                     |          |          |          |          |                      --0.71%--lock_release
                                     |          |          |          |          |
                                     |          |          |          |          |--2.55%--mutex_lock_nested
                                     |          |          |          |          |          |
                                     |          |          |          |          |           --2.50%--__mutex_lock
                                     |          |          |          |          |                     |
                                     |          |          |          |          |                      --1.53%--lock_acquire
                                     |          |          |          |          |                                |
                                     |          |          |          |          |                                 --1.02%--__lock_acquire
                                     |          |          |          |          |
                                     |          |          |          |          |--1.72%--__lock_text_start
                                     |          |          |          |          |          |
                                     |          |          |          |          |           --1.29%--lock_acquire
                                     |          |          |          |          |                     |
                                     |          |          |          |          |                      --0.95%--__lock_acquire
                                     |          |          |          |          |
                                     |          |          |          |          |--1.70%--locate_dirty_segment
                                     |          |          |          |          |          |
                                     |          |          |          |          |           --1.68%--IS_CURSEG
                                     |          |          |          |          |                     |
                                     |          |          |          |          |                      --1.34%--__lock_text_start
                                     |          |          |          |          |                                |
                                     |          |          |          |          |                                 --0.87%--lock_acquire
                                     |          |          |          |          |                                           |
                                     |          |          |          |          |                                            --0.52%--__lock_acquire
                                     |          |          |          |          |
                                     |          |          |          |          |--1.01%--down_write
                                     |          |          |          |          |          |
                                     |          |          |          |          |           --0.75%--lock_acquire
                                     |          |          |          |          |
                                     |          |          |          |          |--0.92%--down_read
                                     |          |          |          |          |
                                     |          |          |          |          |--0.67%--_raw_spin_unlock
                                     |          |          |          |          |          |
                                     |          |          |          |          |           --0.55%--lock_release
                                     |          |          |          |          |
                                     |          |          |          |           --0.52%--up_write
                                     |          |          |          |
                                     |          |          |          |--4.35%--f2fs_submit_page_write
                                     |          |          |          |          |
                                     |          |          |          |          |--1.74%--__lock_text_start
                                     |          |          |          |          |          |
                                     |          |          |          |          |           --1.32%--lock_acquire
                                     |          |          |          |          |                     |
                                     |          |          |          |          |                      --0.99%--__lock_acquire
                                     |          |          |          |          |
                                     |          |          |          |          |--0.90%--down_write
                                     |          |          |          |          |          |
                                     |          |          |          |          |           --0.68%--lock_acquire
                                     |          |          |          |          |
                                     |          |          |          |           --0.59%--_raw_spin_unlock
                                     |          |          |          |                     |
                                     |          |          |          |                      --0.53%--lock_release
                                     |          |          |          |
                                     |          |          |           --3.74%--f2fs_update_device_state.part.0
                                     |          |          |                     |
                                     |          |          |                      --3.72%--f2fs_set_dirty_device
                                     |          |          |                                |
                                     |          |          |                                 --3.71%--__add_ino_entry
                                     |          |          |                                           |
                                     |          |          |                                           |--1.07%--__lock_text_start
                                     |          |          |                                           |          |
                                     |          |          |                                           |           --0.66%--lock_acquire
                                     |          |          |                                           |                     |
                                     |          |          |                                           |                      --0.53%--__lock_acquire
                                     |          |          |                                           |
                                     |          |          |                                           |--0.82%--lock_acquire
                                     |          |          |                                           |          |
                                     |          |          |                                           |           --0.58%--__lock_acquire
                                     |          |          |                                           |
                                     |          |          |                                           |--0.66%--radix_tree_preload
                                     |          |          |                                           |          |
                                     |          |          |                                           |           --0.64%--__radix_tree_preload
                                     |          |          |                                           |                     |
                                     |          |          |                                           |                      --0.61%--lock_acquire
                                     |          |          |                                           |
                                     |          |          |                                            --0.54%--lock_release
                                     |          |          |
                                     |          |           --4.05%--f2fs_update_data_blkaddr
                                     |          |                     |
                                     |          |                      --3.71%--f2fs_update_extent_cache
                                     |          |                                |
                                     |          |                                 --3.50%--f2fs_update_extent_tree_range
                                     |          |                                           |
                                     |          |                                           |--1.40%--__try_merge_extent_node
                                     |          |                                           |          |
                                     |          |                                           |           --1.06%--__lock_text_start
                                     |          |                                           |                     |
                                     |          |                                           |                      --0.72%--lock_acquire
                                     |          |                                           |                                |
                                     |          |                                           |                                 --0.54%--__lock_acquire
                                     |          |                                           |
                                     |          |                                            --1.03%--_raw_write_lock
                                     |          |                                                      |
                                     |          |                                                       --0.77%--lock_acquire
                                     |          |                                                                 |
                                     |          |                                                                  --0.61%--__lock_acquire
                                     |          |
                                     |          |--6.10%--f2fs_get_dnode_of_data
                                     |          |          |
                                     |          |           --5.66%--__get_node_page
                                     |          |                     |
                                     |          |                      --5.30%--pagecache_get_page
                                     |          |                                |
                                     |          |                                 --5.14%--__filemap_get_folio
                                     |          |                                           |
                                     |          |                                           |--2.07%--xas_load
                                     |          |                                           |          |
                                     |          |                                           |           --1.49%--xas_descend
                                     |          |                                           |                     |
                                     |          |                                           |                      --0.90%--rcu_read_lock_held
                                     |          |                                           |
                                     |          |                                           |--1.56%--lock_acquire
                                     |          |                                           |          |
                                     |          |                                           |           --1.13%--__lock_acquire
                                     |          |                                           |
                                     |          |                                            --0.71%--lock_release
                                     |          |
                                     |          |--2.72%--f2fs_get_node_info
                                     |          |          |
                                     |          |          |--1.19%--__lookup_nat_cache
                                     |          |          |          |
                                     |          |          |           --0.79%--__lock_text_start
                                     |          |          |
                                     |          |           --0.84%--down_read
                                     |          |                     |
                                     |          |                      --0.60%--lock_acquire
                                     |          |
                                     |          |--2.68%--set_page_writeback
                                     |          |          |
                                     |          |           --2.62%--__folio_start_writeback
                                     |          |                     |
                                     |          |                     |--1.02%--_raw_spin_unlock_irqrestore
                                     |          |                     |
                                     |          |                      --0.81%--_raw_spin_lock_irqsave
                                     |          |                                |
                                     |          |                                 --0.54%--lock_acquire
                                     |          |
                                     |           --0.72%--down_read_trylock
                                     |                     |
                                     |                      --0.55%--lock_acquire
                                     |
                                      --1.12%--__lock_text_start
                                                |
                                                 --0.82%--lock_acquire