Server crashed or became unresponsive with memory (physical and swap) fully utilised
Environment
- Red Hat Enterprise Linux(All versions)
Issue
- Server crashed with all memory fully consumed
- System becomes unresponsive and vmcore analysis suggested that system ran out of memory
- Found server in hung/frozen state, SAR shows memory fully utilized
- Server rebooted due to OOM
- Host unavailable due to Out Of Memory
- System becomes hung, unable to connect via ssh due to memory issue
Resolution
- Complete utilization of memory and swap can lead a system to hang due to the increased I/O workload, or can cause a crash due to system processes killed by the Out Of Memory killer process.
- To resolve this issue, one needs to identify the reason for excessive memory consumption and act accordingly. Some possible actions that may be taken depending on the culprit are listed below:
- Add more RAM to the system.
- Increase the swap space available to the system, though this can cause a further performance impact due to frequent swapping in and out of data to disk.
- Tune applications to use less memory where possible.
- Reallocate the workload to another system.
Root Cause
- The system was slow because it was out of memory, and there were a lot of processes trying to free memory.
Diagnostic Steps
-
Relevant
sarperformance data from the system is given below, which shows increase in paging and swapping activity, and almost all of the physical memory and swap used up. The part of memory used up for buffers and cache is comparatively low.00:00:00 pswpin/s pswpout/s 11:40:59 244.37 3371.47 00:00:00 pgpgin/s pgpgout/s fault/s majflt/s 11:40:59 1047.18 13686.11 14256.36 32.10 00:00:00 kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad 11:20:01 25136904 7694080 23.44 150976 5782872 16776792 160 0.00 0 11:40:59 150428 32680556 99.54 2428 35976 0 16776952 100.00 243696 -
The vmcore analysis from a system that was manually crashed with '
sysrq+c' during a low memory period shows:crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 8207746 31.3 GB ---- FREE 37508 146.5 MB 0% of TOTAL MEM USED 8170238 31.2 GB 99% of TOTAL MEM SHARED 8811 34.4 MB 0% of TOTAL MEM BUFFERS 109 436 KB 0% of TOTAL MEM CACHED 3053 11.9 MB 0% of TOTAL MEM SLAB 15598 60.9 MB 0% of TOTAL MEM TOTAL HIGH 0 0 0% of TOTAL MEM FREE HIGH 0 0 0% of TOTAL HIGH TOTAL LOW 8207746 31.3 GB 100% of TOTAL MEM FREE LOW 37508 146.5 MB 0% of TOTAL LOW <--- No free memory TOTAL SWAP 4194238 16 GB ---- SWAP USED 4194237 16 GB 99% of TOTAL SWAP SWAP FREE 1 4 KB 0% of TOTAL SWAP <--- No free swap -
There's very little memory in shared, buffers, cached and slab, so the used memory is probably almost entirely made up of userspace pages, which would also explain why the swap was used up so easily.
-
Load is very high at 205, but there's only 49 processes in uninterruptible state - that suggests the rest are runnable processes that don't have a CPU to run on.
crash> sys | grep LOAD LOAD AVERAGE: 205.62, 158.06, 104.87 -
There are 144 processes in the run queue:
crash> ps | grep -c RU 144 -
There are 56 processes running
shrink_active_listto reclaim memory:crash> foreach RU bt > foreach_RU_bt.out crash> !grep -c shrink_active_list foreach_RU_bt.out 56 -
There's a lot of processes in
cond_resched()which allows a process to yield a CPU but does not change the processes state from runnable.crash> !grep -c __cond_resched foreach_RU_bt.out 125 -
There are 49 processes in Uninterruptable "D" state:
crash> ps | grep -c UN 49 -
There are 112 in an unknown state (ie. crash can't figure out what state they're in):
crash> ps | grep -c ?? 112 -
What are they doing?
PID: 1 TASK: ffff81082ffb37a0 CPU: 4 COMMAND: "init" #0 [ffff81082ffb9868] schedule at ffffffff80063035 #1 [ffff81082ffb9940] __cond_resched at ffffffff8008bb2c #2 [ffff81082ffb9950] cond_resched at ffffffff8006316e #3 [ffff81082ffb9960] shrink_inactive_list at ffffffff800c7246 #4 [ffff81082ffb9b50] shrink_zone at ffffffff80012d02 #5 [ffff81082ffb9b90] try_to_free_pages at ffffffff800c8028 #6 [ffff81082ffb9c10] __alloc_pages at ffffffff8000f271 #7 [ffff81082ffb9c70] __do_page_cache_readahead at ffffffff80012a62 #8 [ffff81082ffb9d60] filemap_nopage at ffffffff80013401 #9 [ffff81082ffb9dc0] __handle_mm_fault at ffffffff80008863 #10 [ffff81082ffb9e60] do_page_fault at ffffffff80066b9a #11 [ffff81082ffb9f50] error_exit at ffffffff8005dde9 RIP: 0000000000406f5d RSP: 00007fff3876c7e0 RFLAGS: 00010206 RAX: 00007fff3876c980 RBX: 0000000000000008 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007fff3876cb00 RBP: 00007fff3876c980 R8: 0000000000000000 R9: 0000000000300000 R10: 0000000000000000 R11: 00000033c107a5d3 R12: 0000000000006ffc R13: 000000000e2ed1bc R14: 0000000000000000 R15: 0000000000407779 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b -
They are all in
shrink_zone()trying to free up memory and since their state doesn't change from runnable they are all contending for the same resources which seems excessive.
*CPUs 0, 1, 3, 4, 5, 6, 7 are all stuck on the same spinlock that's held by CPU 2.PID: 5769 TASK: ffff81062d1cc7a0 CPU: 0 COMMAND: "corpact" #0 [ffffffff80428f30] crash_nmi_callback at ffffffff80079db6 #1 [ffffffff80428f40] do_nmi at ffffffff80065942 #2 [ffffffff80428f50] nmi at ffffffff80064fa7 [exception RIP: .text.lock.spinlock+5] RIP: ffffffff80064cb7 RSP: ffff810157ce1910 RFLAGS: 00000286 RAX: 0000000000000000 RBX: ffff8107fa8ff2e8 RCX: ffff810108f27088 RDX: ffff8107fa8ff2e9 RSI: 0000000000000001 RDI: ffff8107fa8ff2e8 RBP: 0000000000000000 R8: ffff81000002b600 R9: ffff81000002b600 R10: 0000000057ce1bb8 R11: 0000000000000286 R12: 00000000000002bc R13: ffff81000002b600 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #3 [ffff810157ce1910] .text.lock.spinlock at ffffffff80064cb7 (via _spin_lock) #4 [ffff810157ce1910] page_lock_anon_vma at ffffffff80021a5f #5 [ffff810157ce1920] page_referenced at ffffffff8003ba76 #6 [ffff810157ce1960] shrink_inactive_list at ffffffff800c72c3 #7 [ffff810157ce1b50] shrink_zone at ffffffff80012d02 #8 [ffff810157ce1b90] try_to_free_pages at ffffffff800c8028 #9 [ffff810157ce1c10] __alloc_pages at ffffffff8000f271 #10 [ffff810157ce1c70] __do_page_cache_readahead at ffffffff80012a62 #11 [ffff810157ce1d60] filemap_nopage at ffffffff80013401 #12 [ffff810157ce1dc0] __handle_mm_fault at ffffffff80008863 #13 [ffff810157ce1e60] do_page_fault at ffffffff80066b9a #14 [ffff810157ce1f50] error_exit at ffffffff8005dde9 RIP: 00002b60f6adde23 RSP: 00000000416a7660 RFLAGS: 00010246 RAX: 000000000041c9b0 RBX: 0000000000000000 RCX: 000000001110a008 RDX: 00000000ffffffff RSI: 00000000ffffffff RDI: 00007fffb4277db0 RBP: 00000000416a7680 R8: 0000000011109810 R9: 0000000000000001 R10: 00000033c1351a30 R11: 0000000000000202 R12: 00007fffb4277dc0 R13: 00007fffb4277db0 R14: 00000000416a76e0 R15: 00000000000001e0 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b -
So this system is excessively busy but doing very little since only 1 CPU is really doing anything useful here.
Now to know how many processes are in reclamation on a zone.NODE: 0 ZONE: 2 ADDR: ffff81000001b600 NAME: "Normal" SIZE: 1310720 PRESENT: 1292800 MIN/LOW/HIGH: 1799/2248/2698 NR_ACTIVE: 680955 NR_INACTIVE: 538921 FREE: 1770 crash> p -d ((struct zone *)0xffff81000001b600)->reclaim_in_progress $6 = { counter = 268 }
So in the above case it looks 268 process are in page reclaim for the normal zone.
Below is another pattern for the similar memory issue.
-
No explicit OOM but constant allocation failures of page block request in order 0(4kB) size.
[1080682.276174] pool-ds_am: page allocation failure: order:0, mode:0x400000(GFP_NOWAIT), nodemask=(null),cpuset=/,mems_allowed=0 [1080682.281342] pool-ds_am: page allocation failure: order:0, mode:0x400000(GFP_NOWAIT), nodemask=(null),cpuset=/,mems_allowed=0 [1340364.651811] pool-ds_am: page allocation failure: order:0, mode:0x400000(GFP_NOWAIT), nodemask=(null),cpuset=/,mems_allowed=0 [1340364.999316] pool-ds_am: page allocation failure: order:0, mode:0x400000(GFP_NOWAIT), nodemask=(null),cpuset=/,mems_allowed=0 -
Many short-blocked uninterruptible tasks
crash> ps -S RU: 2 UN: 235 <-- IN: 735 ID: 68 WA: 1 crash> foreach UN ps -m|tail ... [ 0 00:00:02.181] [UN] PID: 2837459 TASK: ffff8fe76dfb8000 CPU: 1 COMMAND: "ds_nuagent" [ 0 00:00:02.284] [UN] PID: 2798 TASK: ffff8fe738d14000 CPU: 1 COMMAND: "ds_nuagent" [ 0 00:00:02.216] [UN] PID: 2020 TASK: ffff8fe702230000 CPU: 0 COMMAND: "node /data/env-" [ 0 00:00:02.265] [UN] PID: 121778 TASK: ffff8fe6c20b8000 CPU: 0 COMMAND: "ds_nuagent" [ 0 00:00:02.475] [UN] PID: 4540 TASK: ffff8fe73583c000 CPU: 1 COMMAND: "ds_nuagent" -
All those tasks were blocked waiting for I/O to be completed.
crash> foreach UN bt|grep __lock_page_killable -c 234 PID: 4540 TASK: ffff8fe73583c000 CPU: 1 COMMAND: "ds_nuagent" #0 [ffffb71905f4bb48] __schedule at ffffffffb160e1f1 #1 [ffffb71905f4bba0] schedule at ffffffffb160e7e5 #2 [ffffb71905f4bbb8] io_schedule at ffffffffb160ec12 #3 [ffffb71905f4bbc8] __lock_page_killable at ffffffffb0e9ed90 #4 [ffffb71905f4bc60] filemap_fault at ffffffffb0ea19d1 #5 [ffffb71905f4bd30] __xfs_filemap_fault at ffffffffc02716bd [xfs] #6 [ffffb71905f4bd80] __do_fault at ffffffffb0edc1e8 #7 [ffffb71905f4bda0] handle_pte_fault at ffffffffb0ee16fd #8 [ffffb71905f4bde0] __handle_mm_fault at ffffffffb0ee4232 #9 [ffffb71905f4be98] handle_mm_fault at ffffffffb0ee447a #10 [ffffb71905f4bed0] __do_page_fault at ffffffffb0c807c0 #11 [ffffb71905f4bf20] do_page_fault at ffffffffb0c80a47 #12 [ffffb71905f4bf50] page_fault at ffffffffb18011ae RIP: 00007fb2db958a17 RSP: 000000c0002dd1c8 RFLAGS: 00010202 RAX: 000000000006cc80 RBX: 00000000fffffffa RCX: 00007fb2dc914f00 RDX: 0000000000000556 RSI: 0000000000004448 RDI: 00007fb2dc761500 RBP: 000000c0002dd1d8 R8: 0000000000000557 R9: 00007fb2dc763fb8 R10: 00000000001976e0 R11: 000000000000101f R12: 000000c0002dd0c8 R13: ffffffffffffffff R14: 000000c0002d4000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b -
sshd process showed the same which is the reason why ssh connection was unavailable.
PID: 928 TASK: ffff8fe6c2b7c000 CPU: 1 COMMAND: "sshd" #0 [ffffb7190331fb48] __schedule at ffffffffb160e1f1 #1 [ffffb7190331fba0] schedule at ffffffffb160e7e5 #2 [ffffb7190331fbb8] io_schedule at ffffffffb160ec12 #3 [ffffb7190331fbc8] __lock_page_killable at ffffffffb0e9ed90 #4 [ffffb7190331fc60] filemap_fault at ffffffffb0ea19d1 #5 [ffffb7190331fd30] __xfs_filemap_fault at ffffffffc02716bd [xfs] #6 [ffffb7190331fd80] __do_fault at ffffffffb0edc1e8 #7 [ffffb7190331fda0] handle_pte_fault at ffffffffb0ee16fd #8 [ffffb7190331fde0] __handle_mm_fault at ffffffffb0ee4232 #9 [ffffb7190331fe98] handle_mm_fault at ffffffffb0ee447a #10 [ffffb7190331fed0] __do_page_fault at ffffffffb0c807c0 #11 [ffffb7190331ff20] do_page_fault at ffffffffb0c80a47 #12 [ffffb7190331ff50] page_fault at ffffffffb18011ae RIP: 00007fd97866eab0 RSP: 00007ffc4d5a2798 RFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000030 RCX: 00007ffc4d5a284c RDX: 0000000000000036 RSI: 00007fd97883e70b RDI: 0000000000000030 RBP: 0000000000000000 R8: 00007fd978ab7d00 R9: 0000000000000000 R10: 000000007fffffff R11: 0000000000000246 R12: 000055bd56573580 R13: 0000000000000010 R14: 000055bd56573580 R15: 00007ffc4d5a284c ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b -
Typically shows with busy disk I/O requests
crash> dev -D MAJOR GENDISK NAME REQUEST_QUEUE TOTAL ASYNC SYNC 8 ffff8fe6c8b8f000 sdc ffff8fe6c8b97488 239 239 0
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments