Ubuntu 故障排查 - alx696/share GitHub Wiki
Too many open files
默认情况下允许打开的最大文件数量为1024,对数据库(RocksDB,Postgre)应用来说远远不够。遇到此类问题时,通过以下方法确认系统限制。
$ ulimit -n
$ systemctl --user show syncthing | grep LimitNOFILE
$ systemctl show -p DefaultLimitNOFILE
Ubuntu20.04中systemctl默认值应该够用:
u@u:~$ systemctl --user show syncthing | grep LimitNOFILE
LimitNOFILE=1048576
LimitNOFILESoft=1048576
u@u:~$ systemctl show -p DefaultLimitNOFILE
DefaultLimitNOFILE=524288
设置ulimit
$ echo "* soft nofile 1048576" | sudo tee -a /etc/security/limits.conf
$ echo "* hard nofile 1048576" | sudo tee -a /etc/security/limits.conf
$ echo "root soft nofile 1048576" | sudo tee -a /etc/security/limits.conf
$ echo "root hard nofile 1048576" | sudo tee -a /etc/security/limits.conf
设置systemctl
$ echo "DefaultLimitNOFILE=1048576" | sudo tee -a /etc/systemd/system.conf
内存溢出被杀
这是Linux系统的一种自我保护机制, 当空闲内存不足时系统会杀死占用内存最多的进程.
日志路径: /var/log/kern.log
, 快捷查找命令: $ grep "Out of memory" /var/log/kern.log
.
下面是IPFS在树莓派3上因内存溢出被杀的日志:
Nov 18 16:06:22 ubuntu kernel: [63798.590156] sshd invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
Nov 18 16:06:22 ubuntu kernel: [63798.590181] CPU: 1 PID: 5133 Comm: sshd Tainted: G C E 5.3.0-1012-raspi2 #14-Ubuntu
Nov 18 16:06:22 ubuntu kernel: [63798.590186] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
Nov 18 16:06:22 ubuntu kernel: [63798.590191] Call trace:
Nov 18 16:06:22 ubuntu kernel: [63798.590208] dump_backtrace+0x0/0x190
Nov 18 16:06:22 ubuntu kernel: [63798.590217] show_stack+0x24/0x30
Nov 18 16:06:22 ubuntu kernel: [63798.590230] dump_stack+0xd0/0x11c
Nov 18 16:06:22 ubuntu kernel: [63798.590242] dump_header+0x48/0x1ec
Nov 18 16:06:22 ubuntu kernel: [63798.590251] oom_kill_process+0x19c/0x1a0
Nov 18 16:06:22 ubuntu kernel: [63798.590260] out_of_memory+0x1a4/0x2b8
Nov 18 16:06:22 ubuntu kernel: [63798.590269] __alloc_pages_slowpath+0x990/0xb68
Nov 18 16:06:22 ubuntu kernel: [63798.590276] __alloc_pages_nodemask+0x26c/0x2c8
Nov 18 16:06:22 ubuntu kernel: [63798.590283] __get_free_pages+0x30/0x58
Nov 18 16:06:22 ubuntu kernel: [63798.590292] __pud_alloc+0x30/0x120
Nov 18 16:06:22 ubuntu kernel: [63798.590301] __handle_mm_fault+0x244/0x320
Nov 18 16:06:22 ubuntu kernel: [63798.590310] handle_mm_fault+0xd4/0x1a0
Nov 18 16:06:22 ubuntu kernel: [63798.590318] __get_user_pages+0x148/0x3b0
Nov 18 16:06:22 ubuntu kernel: [63798.590326] get_user_pages_remote+0x134/0x250
Nov 18 16:06:22 ubuntu kernel: [63798.590335] copy_strings.isra.0+0x134/0x350
Nov 18 16:06:22 ubuntu kernel: [63798.590343] copy_strings_kernel+0x6c/0xc8
Nov 18 16:06:22 ubuntu kernel: [63798.590351] __do_execve_file.isra.0+0x448/0x798
Nov 18 16:06:22 ubuntu kernel: [63798.590359] __arm64_sys_execve+0x48/0x58
Nov 18 16:06:22 ubuntu kernel: [63798.590368] el0_svc_common.constprop.0+0xe0/0x1e8
Nov 18 16:06:22 ubuntu kernel: [63798.590374] el0_svc_handler+0x34/0xa0
Nov 18 16:06:22 ubuntu kernel: [63798.590382] el0_svc+0x10/0x14
Nov 18 16:06:22 ubuntu kernel: [63798.590388] Mem-Info:
Nov 18 16:06:22 ubuntu kernel: [63798.590415] active_anon:174961 inactive_anon:33 isolated_anon:0
Nov 18 16:06:22 ubuntu kernel: [63798.590415] active_file:311 inactive_file:428 isolated_file:32
Nov 18 16:06:22 ubuntu kernel: [63798.590415] unevictable:4250 dirty:0 writeback:0 unstable:0
Nov 18 16:06:22 ubuntu kernel: [63798.590415] slab_reclaimable:8852 slab_unreclaimable:16188
Nov 18 16:06:22 ubuntu kernel: [63798.590415] mapped:1959 shmem:914 pagetables:691 bounce:0
Nov 18 16:06:22 ubuntu kernel: [63798.590415] free:5693 free_pcp:39 free_cma:3471
Nov 18 16:06:22 ubuntu kernel: [63798.590435] Node 0 active_anon:699844kB inactive_anon:132kB active_file:1244kB inactive_file:1712kB unevictable:17000kB isolated(anon):0kB isolated(file):128kB mapped:7836kB dirty:0kB writeback:0kB shmem:3656kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Nov 18 16:06:22 ubuntu kernel: [63798.590461] DMA32 free:22772kB min:10324kB low:11116kB high:11908kB active_anon:699844kB inactive_anon:132kB active_file:1048kB inactive_file:1652kB unevictable:17000kB writepending:0kB present:970752kB managed:927952kB mlocked:17000kB kernel_stack:2784kB pagetables:2764kB bounce:0kB free_pcp:156kB local_pcp:0kB free_cma:13884kB
Nov 18 16:06:22 ubuntu kernel: [63798.590478] lowmem_reserve[]: 0 0 0
Nov 18 16:06:22 ubuntu kernel: [63798.590499] DMA32: 2002*4kB (UMEHC) 869*8kB (UMEHC) 274*16kB (UMHC) 76*32kB (UHC) 3*64kB (HC) 4*128kB (H) 2*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 22992kB
Nov 18 16:06:22 ubuntu kernel: [63798.590583] 3220 total pagecache pages
Nov 18 16:06:22 ubuntu kernel: [63798.590594] 0 pages in swap cache
Nov 18 16:06:22 ubuntu kernel: [63798.590605] Swap cache stats: add 0, delete 0, find 0/0
Nov 18 16:06:22 ubuntu kernel: [63798.590613] Free swap = 0kB
Nov 18 16:06:22 ubuntu kernel: [63798.590621] Total swap = 0kB
Nov 18 16:06:22 ubuntu kernel: [63798.590630] 242688 pages RAM
Nov 18 16:06:22 ubuntu kernel: [63798.590638] 0 pages HighMem/MovableOnly
Nov 18 16:06:22 ubuntu kernel: [63798.590647] 10700 pages reserved
Nov 18 16:06:22 ubuntu kernel: [63798.590655] 65536 pages cma reserved
Nov 18 16:06:22 ubuntu kernel: [63798.590664] Tasks state (memory values in pages):
Nov 18 16:06:22 ubuntu kernel: [63798.590673] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Nov 18 16:06:22 ubuntu kernel: [63798.590714] [ 692] 0 692 12638 554 159744 0 0 systemd-journal
Nov 18 16:06:22 ubuntu kernel: [63798.590731] [ 703] 0 703 5054 724 61440 0 -1000 systemd-udevd
Nov 18 16:06:22 ubuntu kernel: [63798.590750] [ 1046] 0 1046 3137 595 65536 0 0 wpa_supplicant
Nov 18 16:06:22 ubuntu kernel: [63798.590766] [ 1049] 101 1049 6400 664 73728 0 0 systemd-network
Nov 18 16:06:22 ubuntu kernel: [63798.590784] [ 1053] 0 1053 70024 4120 94208 0 -1000 multipathd
Nov 18 16:06:22 ubuntu kernel: [63798.590802] [ 1091] 100 1091 22364 451 73728 0 0 systemd-timesyn
Nov 18 16:06:22 ubuntu kernel: [63798.590818] [ 1093] 102 1093 5158 775 77824 0 0 systemd-resolve
Nov 18 16:06:22 ubuntu kernel: [63798.590834] [ 1160] 104 1160 55122 608 73728 0 0 rsyslogd
Nov 18 16:06:22 ubuntu kernel: [63798.590850] [ 1161] 0 1161 59341 703 98304 0 0 accounts-daemon
Nov 18 16:06:22 ubuntu kernel: [63798.590866] [ 1162] 0 1162 20184 516 61440 0 0 irqbalance
Nov 18 16:06:22 ubuntu kernel: [63798.590882] [ 1164] 0 1164 7468 2502 90112 0 0 networkd-dispat
Nov 18 16:06:22 ubuntu kernel: [63798.590898] [ 1165] 0 1165 3905 416 73728 0 0 systemd-logind
Nov 18 16:06:22 ubuntu kernel: [63798.590914] [ 1166] 103 1166 1810 674 57344 0 -900 dbus-daemon
Nov 18 16:06:22 ubuntu kernel: [63798.590929] [ 1167] 0 1167 3070 285 61440 0 0 wpa_supplicant
Nov 18 16:06:22 ubuntu kernel: [63798.590945] [ 1171] 0 1171 284903 2831 245760 0 -900 snapd
Nov 18 16:06:22 ubuntu kernel: [63798.590962] [ 1199] 0 1199 27312 2541 106496 0 0 unattended-upgr
Nov 18 16:06:22 ubuntu kernel: [63798.590984] [ 1207] 0 1207 2073 469 57344 0 0 cron
Nov 18 16:06:22 ubuntu kernel: [63798.591000] [ 1222] 0 1222 887 388 49152 0 0 atd
Nov 18 16:06:22 ubuntu kernel: [63798.591015] [ 1224] 0 1224 188917 162750 1417216 0 0 ipfs
Nov 18 16:06:22 ubuntu kernel: [63798.591031] [ 1229] 0 1229 1707 350 45056 0 0 agetty
Nov 18 16:06:22 ubuntu kernel: [63798.591047] [ 1230] 0 1230 1326 289 45056 0 0 agetty
Nov 18 16:06:22 ubuntu kernel: [63798.591063] [ 1242] 0 1242 3012 731 61440 0 -1000 sshd
Nov 18 16:06:22 ubuntu kernel: [63798.591079] [ 1257] 0 1257 58201 327 86016 0 0 polkitd
Nov 18 16:06:22 ubuntu kernel: [63798.591105] [ 5130] 0 5130 2917 310 57344 0 0 sshd
Nov 18 16:06:22 ubuntu kernel: [63798.591120] [ 5133] 0 5133 3012 506 61440 0 0 sshd
Nov 18 16:06:22 ubuntu kernel: [63798.591131] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=ipfs,pid=1224,uid=0
Nov 18 16:06:22 ubuntu kernel: [63798.591400] Out of memory: Killed process 1224 (ipfs) total-vm:755668kB, anon-rss:651000kB, file-rss:0kB, shmem-rss:0kB
Nov 18 16:06:22 ubuntu kernel: [63798.746898] oom_reaper: reaped process 1224 (ipfs), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
遇到进程无故停止时, 应考虑是否为内存溢出被杀.