High load average, crmd reports "High CPU load" increasingly as time passes, hung-task backtraces stuck in XFS calls in the logs, lvm commands become blocked, and/or corosync using 100% CPU in a RHEL 7 High Availability cluster
Issue
- We found our server with hundreds or even thousands of stuck
netstatandnetstatcommands and a load average in the thousands, but very little CPU being used - Processes are getting stuck waiting in the XFS slab shrinker
- We've detected high load on a cluster node and couldn't log in to the system. It was still a member but was unresponsive on the console or over ssh
corosyncis using 100% CPU on only one node, load average is very high, and many processes likepsandnetstatseem to be stuck- While
corosyncseems to be hogging an entire CPU, there are hung-task warnings in/var/log/messagesshowing processes stuck waiting in XFS functions - Why is
corosyncutilizing so much CPU on one of my nodes? - We had applications get stuck after processes hung waiting on something, and captured a vmcore. A number of processes are stuck waiting in
xfs_fs_free_cached_objects - We are frequently seeing
lvmcommands block andLVMresource operations time out. corosyncandclvmdboth spin away with 100% CPU on one node in the cluster
Environment
- Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add-On
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.