Ubuntu 故障排查 - alx696/share GitHub Wiki

Too many open files

默认情况下允许打开的最大文件数量为1024,对数据库(RocksDB,Postgre)应用来说远远不够。遇到此类问题时,通过以下方法确认系统限制。

  1. $ ulimit -n
  2. $ systemctl --user show syncthing | grep LimitNOFILE
  3. $ systemctl show -p DefaultLimitNOFILE

参考1, 参考2

Ubuntu20.04中systemctl默认值应该够用:

u@u:~$ systemctl --user show syncthing | grep LimitNOFILE
LimitNOFILE=1048576
LimitNOFILESoft=1048576
u@u:~$ systemctl show -p DefaultLimitNOFILE
DefaultLimitNOFILE=524288

设置ulimit

$ echo "* soft nofile 1048576"  | sudo tee -a /etc/security/limits.conf
$ echo "* hard nofile 1048576"  | sudo tee -a /etc/security/limits.conf
$ echo "root soft nofile 1048576"  | sudo tee -a /etc/security/limits.conf
$ echo "root hard nofile 1048576"  | sudo tee -a /etc/security/limits.conf

设置systemctl

$ echo "DefaultLimitNOFILE=1048576"  | sudo tee -a /etc/systemd/system.conf

内存溢出被杀

这是Linux系统的一种自我保护机制, 当空闲内存不足时系统会杀死占用内存最多的进程.

日志路径: /var/log/kern.log, 快捷查找命令: $ grep "Out of memory" /var/log/kern.log.

下面是IPFS在树莓派3上因内存溢出被杀的日志:

Nov 18 16:06:22 ubuntu kernel: [63798.590156] sshd invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
Nov 18 16:06:22 ubuntu kernel: [63798.590181] CPU: 1 PID: 5133 Comm: sshd Tainted: G         C  E     5.3.0-1012-raspi2 #14-Ubuntu
Nov 18 16:06:22 ubuntu kernel: [63798.590186] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
Nov 18 16:06:22 ubuntu kernel: [63798.590191] Call trace:
Nov 18 16:06:22 ubuntu kernel: [63798.590208]  dump_backtrace+0x0/0x190
Nov 18 16:06:22 ubuntu kernel: [63798.590217]  show_stack+0x24/0x30
Nov 18 16:06:22 ubuntu kernel: [63798.590230]  dump_stack+0xd0/0x11c
Nov 18 16:06:22 ubuntu kernel: [63798.590242]  dump_header+0x48/0x1ec
Nov 18 16:06:22 ubuntu kernel: [63798.590251]  oom_kill_process+0x19c/0x1a0
Nov 18 16:06:22 ubuntu kernel: [63798.590260]  out_of_memory+0x1a4/0x2b8
Nov 18 16:06:22 ubuntu kernel: [63798.590269]  __alloc_pages_slowpath+0x990/0xb68
Nov 18 16:06:22 ubuntu kernel: [63798.590276]  __alloc_pages_nodemask+0x26c/0x2c8
Nov 18 16:06:22 ubuntu kernel: [63798.590283]  __get_free_pages+0x30/0x58
Nov 18 16:06:22 ubuntu kernel: [63798.590292]  __pud_alloc+0x30/0x120
Nov 18 16:06:22 ubuntu kernel: [63798.590301]  __handle_mm_fault+0x244/0x320
Nov 18 16:06:22 ubuntu kernel: [63798.590310]  handle_mm_fault+0xd4/0x1a0
Nov 18 16:06:22 ubuntu kernel: [63798.590318]  __get_user_pages+0x148/0x3b0
Nov 18 16:06:22 ubuntu kernel: [63798.590326]  get_user_pages_remote+0x134/0x250
Nov 18 16:06:22 ubuntu kernel: [63798.590335]  copy_strings.isra.0+0x134/0x350
Nov 18 16:06:22 ubuntu kernel: [63798.590343]  copy_strings_kernel+0x6c/0xc8
Nov 18 16:06:22 ubuntu kernel: [63798.590351]  __do_execve_file.isra.0+0x448/0x798
Nov 18 16:06:22 ubuntu kernel: [63798.590359]  __arm64_sys_execve+0x48/0x58
Nov 18 16:06:22 ubuntu kernel: [63798.590368]  el0_svc_common.constprop.0+0xe0/0x1e8
Nov 18 16:06:22 ubuntu kernel: [63798.590374]  el0_svc_handler+0x34/0xa0
Nov 18 16:06:22 ubuntu kernel: [63798.590382]  el0_svc+0x10/0x14
Nov 18 16:06:22 ubuntu kernel: [63798.590388] Mem-Info:
Nov 18 16:06:22 ubuntu kernel: [63798.590415] active_anon:174961 inactive_anon:33 isolated_anon:0
Nov 18 16:06:22 ubuntu kernel: [63798.590415]  active_file:311 inactive_file:428 isolated_file:32
Nov 18 16:06:22 ubuntu kernel: [63798.590415]  unevictable:4250 dirty:0 writeback:0 unstable:0
Nov 18 16:06:22 ubuntu kernel: [63798.590415]  slab_reclaimable:8852 slab_unreclaimable:16188
Nov 18 16:06:22 ubuntu kernel: [63798.590415]  mapped:1959 shmem:914 pagetables:691 bounce:0
Nov 18 16:06:22 ubuntu kernel: [63798.590415]  free:5693 free_pcp:39 free_cma:3471
Nov 18 16:06:22 ubuntu kernel: [63798.590435] Node 0 active_anon:699844kB inactive_anon:132kB active_file:1244kB inactive_file:1712kB unevictable:17000kB isolated(anon):0kB isolated(file):128kB mapped:7836kB dirty:0kB writeback:0kB shmem:3656kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Nov 18 16:06:22 ubuntu kernel: [63798.590461] DMA32 free:22772kB min:10324kB low:11116kB high:11908kB active_anon:699844kB inactive_anon:132kB active_file:1048kB inactive_file:1652kB unevictable:17000kB writepending:0kB present:970752kB managed:927952kB mlocked:17000kB kernel_stack:2784kB pagetables:2764kB bounce:0kB free_pcp:156kB local_pcp:0kB free_cma:13884kB
Nov 18 16:06:22 ubuntu kernel: [63798.590478] lowmem_reserve[]: 0 0 0
Nov 18 16:06:22 ubuntu kernel: [63798.590499] DMA32: 2002*4kB (UMEHC) 869*8kB (UMEHC) 274*16kB (UMHC) 76*32kB (UHC) 3*64kB (HC) 4*128kB (H) 2*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 22992kB
Nov 18 16:06:22 ubuntu kernel: [63798.590583] 3220 total pagecache pages
Nov 18 16:06:22 ubuntu kernel: [63798.590594] 0 pages in swap cache
Nov 18 16:06:22 ubuntu kernel: [63798.590605] Swap cache stats: add 0, delete 0, find 0/0
Nov 18 16:06:22 ubuntu kernel: [63798.590613] Free swap  = 0kB
Nov 18 16:06:22 ubuntu kernel: [63798.590621] Total swap = 0kB
Nov 18 16:06:22 ubuntu kernel: [63798.590630] 242688 pages RAM
Nov 18 16:06:22 ubuntu kernel: [63798.590638] 0 pages HighMem/MovableOnly
Nov 18 16:06:22 ubuntu kernel: [63798.590647] 10700 pages reserved
Nov 18 16:06:22 ubuntu kernel: [63798.590655] 65536 pages cma reserved
Nov 18 16:06:22 ubuntu kernel: [63798.590664] Tasks state (memory values in pages):
Nov 18 16:06:22 ubuntu kernel: [63798.590673] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Nov 18 16:06:22 ubuntu kernel: [63798.590714] [    692]     0   692    12638      554   159744        0             0 systemd-journal
Nov 18 16:06:22 ubuntu kernel: [63798.590731] [    703]     0   703     5054      724    61440        0         -1000 systemd-udevd
Nov 18 16:06:22 ubuntu kernel: [63798.590750] [   1046]     0  1046     3137      595    65536        0             0 wpa_supplicant
Nov 18 16:06:22 ubuntu kernel: [63798.590766] [   1049]   101  1049     6400      664    73728        0             0 systemd-network
Nov 18 16:06:22 ubuntu kernel: [63798.590784] [   1053]     0  1053    70024     4120    94208        0         -1000 multipathd
Nov 18 16:06:22 ubuntu kernel: [63798.590802] [   1091]   100  1091    22364      451    73728        0             0 systemd-timesyn
Nov 18 16:06:22 ubuntu kernel: [63798.590818] [   1093]   102  1093     5158      775    77824        0             0 systemd-resolve
Nov 18 16:06:22 ubuntu kernel: [63798.590834] [   1160]   104  1160    55122      608    73728        0             0 rsyslogd
Nov 18 16:06:22 ubuntu kernel: [63798.590850] [   1161]     0  1161    59341      703    98304        0             0 accounts-daemon
Nov 18 16:06:22 ubuntu kernel: [63798.590866] [   1162]     0  1162    20184      516    61440        0             0 irqbalance
Nov 18 16:06:22 ubuntu kernel: [63798.590882] [   1164]     0  1164     7468     2502    90112        0             0 networkd-dispat
Nov 18 16:06:22 ubuntu kernel: [63798.590898] [   1165]     0  1165     3905      416    73728        0             0 systemd-logind
Nov 18 16:06:22 ubuntu kernel: [63798.590914] [   1166]   103  1166     1810      674    57344        0          -900 dbus-daemon
Nov 18 16:06:22 ubuntu kernel: [63798.590929] [   1167]     0  1167     3070      285    61440        0             0 wpa_supplicant
Nov 18 16:06:22 ubuntu kernel: [63798.590945] [   1171]     0  1171   284903     2831   245760        0          -900 snapd
Nov 18 16:06:22 ubuntu kernel: [63798.590962] [   1199]     0  1199    27312     2541   106496        0             0 unattended-upgr
Nov 18 16:06:22 ubuntu kernel: [63798.590984] [   1207]     0  1207     2073      469    57344        0             0 cron
Nov 18 16:06:22 ubuntu kernel: [63798.591000] [   1222]     0  1222      887      388    49152        0             0 atd
Nov 18 16:06:22 ubuntu kernel: [63798.591015] [   1224]     0  1224   188917   162750  1417216        0             0 ipfs
Nov 18 16:06:22 ubuntu kernel: [63798.591031] [   1229]     0  1229     1707      350    45056        0             0 agetty
Nov 18 16:06:22 ubuntu kernel: [63798.591047] [   1230]     0  1230     1326      289    45056        0             0 agetty
Nov 18 16:06:22 ubuntu kernel: [63798.591063] [   1242]     0  1242     3012      731    61440        0         -1000 sshd
Nov 18 16:06:22 ubuntu kernel: [63798.591079] [   1257]     0  1257    58201      327    86016        0             0 polkitd
Nov 18 16:06:22 ubuntu kernel: [63798.591105] [   5130]     0  5130     2917      310    57344        0             0 sshd
Nov 18 16:06:22 ubuntu kernel: [63798.591120] [   5133]     0  5133     3012      506    61440        0             0 sshd
Nov 18 16:06:22 ubuntu kernel: [63798.591131] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=ipfs,pid=1224,uid=0
Nov 18 16:06:22 ubuntu kernel: [63798.591400] Out of memory: Killed process 1224 (ipfs) total-vm:755668kB, anon-rss:651000kB, file-rss:0kB, shmem-rss:0kB
Nov 18 16:06:22 ubuntu kernel: [63798.746898] oom_reaper: reaped process 1224 (ipfs), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

遇到进程无故停止时, 应考虑是否为内存溢出被杀.