RHEL7: jbd2 deadlock occurs while checkpoint thread waits commit thread to finish.
Issue
- Some applications become unresponsive (hanging)
- A number of processes are in 'D' state (uninterruptible sleep)
-
In dmesg output and/or in /var/log/messages, we can see messages with a stack like this:
Apr 1 22:46:09 localhost kernel: INFO: task jbd2/dm-5-8:1484 blocked for more than 120 seconds. Apr 1 22:46:09 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 1 22:46:09 localhost kernel: jbd2/dm-5-8 D ffff8d52674b8fd0 0 1484 2 0x00000000 Apr 1 22:46:09 localhost kernel: Call Trace: Apr 1 22:46:09 localhost kernel: [<ffffffff9e768b69>] schedule_preempt_disabled+0x29/0x70 Apr 1 22:46:09 localhost kernel: [<ffffffff9e766ab7>] __mutex_lock_slowpath+0xc7/0x1d0 Apr 1 22:46:09 localhost kernel: [<ffffffff9e765e9f>] mutex_lock+0x1f/0x2f Apr 1 22:46:09 localhost kernel: [<ffffffffc018b7b8>] jbd2_update_log_tail+0x28/0x60 [jbd2] Apr 1 22:46:09 localhost kernel: [<ffffffffc01845dd>] jbd2_journal_commit_transaction+0x155d/0x19b0 [jbd2] Apr 1 22:46:09 localhost kernel: [<ffffffffc0189e89>] kjournald2+0xc9/0x260 [jbd2] Apr 1 22:46:09 localhost kernel: [<ffffffff9e0c2d00>] ? wake_up_atomic_t+0x30/0x30 Apr 1 22:46:09 localhost kernel: [<ffffffffc0189dc0>] ? commit_timeout+0x10/0x10 [jbd2] Apr 1 22:46:09 localhost kernel: [<ffffffff9e0c1c31>] kthread+0xd1/0xe0 Apr 1 22:46:09 localhost kernel: [<ffffffff9e0c1b60>] ? insert_kthread_work+0x40/0x40 Apr 1 22:46:09 localhost kernel: [<ffffffff9e774c37>] ret_from_fork_nospec_begin+0x21/0x21 Apr 1 22:46:09 localhost kernel: [<ffffffff9e0c1b60>] ? insert_kthread_work+0x40/0x40
or like this:
Apr 1 22:46:09 localhost kernel: INFO: task java:63838 blocked for more than 120 seconds. Apr 1 22:46:09 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 1 22:46:09 localhost kernel: java D ffff8d802b344f10 0 63838 1 0x00000000 Apr 1 22:46:09 localhost kernel: Call Trace: Apr 1 22:46:09 localhost kernel: [<ffffffff9e768b69>] schedule_preempt_disabled+0x29/0x70 Apr 1 22:46:09 localhost kernel: [<ffffffff9e766ab7>] __mutex_lock_slowpath+0xc7/0x1d0 Apr 1 22:46:09 localhost kernel: [<ffffffffc01813da>] ? jbd2_journal_stop+0x1ea/0x3d0 [jbd2] Apr 1 22:46:09 localhost kernel: [<ffffffff9e765e9f>] mutex_lock+0x1f/0x2f Apr 1 22:46:09 localhost kernel: [<ffffffffc0186508>] __jbd2_log_wait_for_space+0xc8/0x1f0 [jbd2] Apr 1 22:46:09 localhost kernel: [<ffffffffc01803d3>] add_transaction_credits+0x2d3/0x2f0 [jbd2] Apr 1 22:46:09 localhost kernel: [<ffffffff9e27ad6f>] ? __block_write_begin_int+0x8f/0x5f0 Apr 1 22:46:09 localhost kernel: [<ffffffffc01805e1>] start_this_handle+0x1a1/0x430 [jbd2] Apr 1 22:46:09 localhost kernel: [<ffffffff9e2af000>] ? dquot_file_open+0x40/0x50 Apr 1 22:46:09 localhost kernel: [<ffffffff9e21c0b2>] ? kmem_cache_alloc+0x1c2/0x1f0 Apr 1 22:46:09 localhost kernel: [<ffffffffc0180a93>] jbd2__journal_start+0xf3/0x1f0 [jbd2] Apr 1 22:46:09 localhost kernel: [<ffffffffc0378b0a>] ? ext4_dirty_inode+0x2a/0x60 [ext4] Apr 1 22:46:09 localhost kernel: [<ffffffffc03a4c59>] __ext4_journal_start_sb+0x69/0xe0 [ext4] Apr 1 22:46:09 localhost kernel: [<ffffffffc0378b0a>] ext4_dirty_inode+0x2a/0x60 [ext4] Apr 1 22:46:09 localhost kernel: [<ffffffff9e27012d>] __mark_inode_dirty+0x16d/0x270 Apr 1 22:46:09 localhost kernel: [<ffffffff9e25e229>] update_time+0x89/0xd0 Apr 1 22:46:09 localhost kernel: [<ffffffff9e25e6ff>] touch_atime+0xff/0x240 Apr 1 22:46:09 localhost kernel: [<ffffffffc036c0dd>] ext4_file_mmap+0x7d/0xa0 [ext4] Apr 1 22:46:09 localhost kernel: [<ffffffff9e1f25c5>] mmap_region+0x435/0x660 Apr 1 22:46:09 localhost kernel: [<ffffffff9e1f2b68>] do_mmap+0x378/0x530 Apr 1 22:46:09 localhost kernel: [<ffffffff9e1d6380>] vm_mmap_pgoff+0xd0/0x120 Apr 1 22:46:09 localhost kernel: [<ffffffff9e1f0b11>] SyS_mmap_pgoff+0x1f1/0x270 Apr 1 22:46:09 localhost kernel: [<ffffffff9e774d15>] ? system_call_after_swapgs+0xa2/0x146 Apr 1 22:46:09 localhost kernel: [<ffffffff9e774d21>] ? system_call_after_swapgs+0xae/0x146 Apr 1 22:46:09 localhost kernel: [<ffffffff9e030c22>] SyS_mmap+0x22/0x30 Apr 1 22:46:09 localhost kernel: [<ffffffff9e774ddb>] system_call_fastpath+0x22/0x27 Apr 1 22:46:09 localhost kernel: [<ffffffff9e774d21>] ? system_call_after_swapgs+0xae/0x146
-
Similar backtraces can be seen in /proc/<pid>/stack of the processes that are hanging (<pid> is the actual pid of the hanging process)
Environment
- Red Hat Enterprise Linux 7 (before kernel-3.10.0-1127.el7)
- Red Hat Enterprise Linux 8 (before kernel-4.18.0-147.el8)
- Ext4 / jbd2
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.