Backup server running RHEL 5 becomes unresponsive on regular base
Environment
- Red Hat Enterprise Linux 5.9 (RHEL)
Issue
- Our DL380 G6 regularly stops with hung task messages on the console.
Resolution
- Decrease value of
vm.dirty_ratio
from 40 to 20 to initiate writeback sooner and to avoid congestion and potential stalls- Change the value in
/etc/sysctl.conf
to make the change permanent, using your favorite text editor and activate the change runningsysctl -p
- For testing purpose only, change the value using
sysctl -w vm.dirty_ratio=20
- Change the value in
Root Cause
- Problem with writeback of dirty pages and be2net driver requiring memory to work correctly. Due to the fact, that the writeback needs to be done on NFS, it seems that the system is ending up in a deadlock situation as writeback needs network (i.e. be2net driver) but be2net drivers requires memory to keep working.
Diagnostic Steps
Configure the system to capture a kdump
- Configure the system to use kdump and force a crash using SysRq when the system is in unresponsive state
Analyze vmcore
- The vmcore shows that the system is very low on free memory. Most memory is in use for caching file data, and the majority of these pages are allocated to six files (NFS) only. These six files have been opened by
bptm
processes. The following table gives a summary (FD = File Descriptor).
'bptm' | number | file name and
PID / FD | of pages | mount point
----------|----------|-----------------------------------------
| |
12482 19 | 434437 | xy1234_1391574088_C2_F5.1391574088.img
| | /dstu_nfs_nf1/pav3 [*]
| |
2392 19 | 194792 | yz2412_1391578204_C1_F1.1391578204.img
2399 19 | | /dstu_nfs_nf2/pav4 [**]
| |
2389 19 | 192652 | yz2312_1391578201_C1_F1.1391578201.img
2396 19 | | /dstu_nfs_nf2/pav4 [**]
| |
2385 19 | 158263 | yz2512_1391578200_C1_F1.1391578200.img
2388 19 | | /dstu_nfs_nf2/pav4 [**]
| |
2381 19 | 85782 | yz9221_1391578202_C1_F1.1391578202.img
2384 19 | | /dstu_nfs_nf2/pav4 [**]
| |
2393 19 | 85775 | yz6221_1391578205_C1_F1.1391578205.img
2400 19 | | /dstu_nfs_nf2/pav4 [**]
| |
----------|----------|-----------------------------------------
total pages 1151701
[*] nf1:/vol/nbu_pav3_binz_srv1234/q1
[**] nf2:/vol/nbu_pav4_binz_srv1234/q1
- The
bptm
processes (PIDs) 12482, 2392, 2389, 2385, and 2393 are executing awrite()
system call and have the following kernel function call sequence in their stack traces.
...
nfs_write_begin
grab_cache_page_write_begin
__alloc_pages
try_to_free_pages
shrink_zone
throttle_vm_writeout
blk_congestion_wait
io_schedule_timeout
schedule_timeout
schedule
- It appears that these threads are executing a loop in
throttle_vm_writeout()
.
for ( ; ; ) {
get_dirty_limits(&background_thresh, &dirty_thresh, NULL);
...
dirty_thresh += dirty_thresh / 10; /* wheeee... */
if (global_page_state(NR_UNSTABLE_NFS) +
global_page_state(NR_WRITEBACK) <= dirty_thresh)
break;
blk_congestion_wait(WRITE, HZ/10);
}
- Interestingly, the kernel
events/2
thread appears to be executing the same loop too. This thread performs a callback of thebe_worker()
function which belongs to thebe2net
network driver. The driver - in turn - attempts to allocate memory and is stuck here.
PID: 28 TASK: ffff81019743d080 CPU: 2 COMMAND: "events/2"
#0 [ffff810197461a80] schedule at ffffffff80062fa0
#1 [ffff810197461b58] schedule_timeout at ffffffff8006389f
#2 [ffff810197461ba8] io_schedule_timeout at ffffffff80063241
#3 [ffff810197461bd8] blk_congestion_wait at ffffffff8003b871
#4 [ffff810197461c28] throttle_vm_writeout at ffffffff80053be0
#5 [ffff810197461c48] shrink_zone at ffffffff8001331f
#6 [ffff810197461c88] try_to_free_pages at ffffffff800cfd72
#7 [ffff810197461d18] __alloc_pages at ffffffff8000f604
#8 [ffff810197461d88] alloc_page_interleave at ffffffff80023d1a
#9 [ffff810197461da8] be_post_rx_frags at ffffffff882fa579 [be2net]
#10 [ffff810197461df8] be_worker at ffffffff882fe1a6 [be2net] <<-- be2net, be_worker function
#11 [ffff810197461e38] run_workqueue at ffffffff8004d903
#12 [ffff810197461e78] worker_thread at ffffffff8004a23c
#13 [ffff810197461ee8] kthread at ffffffff80032c2b
#14 [ffff810197461f48] kernel_thread at ffffffff8005dfc1
- According this information, it seems that the system is either in a kind of deadlock situation, or in a dependency situation where NFS writeback is extremely slow (such that the system appears to be unresponsive).
- Writeback of dirty pages must occur to free and reclaim memory. However, writeback depends on the
be2net
driver because the page contents need to be transferred to the NFS server. - The
be2net
driver depends on free memory. The fact that the driver is waiting for free memory may adversely impact the communication to the NFS server (and thus impact the ongoing writeback as well).
- Writeback of dirty pages must occur to free and reclaim memory. However, writeback depends on the
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments