Node never fenced after token loss and kill because rejoin without restart in RHEL 5.6
Issue
- A token loss occurred and the node was killed when it attempted to rejoin the cluster without a restart, but it was never fenced.
- gfs_controld on a remaining node in the cluster shows repeated cpg_mcast_joined retries after the removed node was never fenced.
Apr 30 22:04:58 node4 gfs_controld[10877]: cpg_mcast_joined retry 100 MSG_PLOCK
Apr 30 22:04:58 node4 gfs_controld[10877]: cpg_mcast_joined retry 200 MSG_PLOCK
- rgmanager became stuck waiting for the node to be fenced
Apr 30 22:05:02 node4 clurgmgrd[13935]: <info> Waiting for node #1 to be fenced
group_tool dumpdoes not show groupd ever processing a confchg as expected following the node removal
1335837897 cman: node 1 removed
1335837897 add_recovery_set_cman nodeid 1
Environment
- Red Hat Enterprise Linux (RHEL) 5 with the High Availability Add On
- openais prior to release 0.80.6-28.el5_6.2 (RHEL 5.6), 0.80.6-30.el5_7.1 (RHEL 5.7), or 0.80.6-36.el5 (RHEL 5.8)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.