orainstance resource fails status check and blocks indefinitely in recoverable state after su segfaults in a RHEL 6 High Availability cluster with rgmanager
Issue
- I ran "yum update" on a cluster member and this node stopped the cluster resource with was running on it. I have found that following log messages at the moment the node began stopping the resource
Feb 4 08:51:36 node2 yum[53285]: Updated: elfutils-libs-0.158-3.2.el6.x86_64
Feb 4 08:51:36 node2 yum[53285]: Updated: libXinerama-1.1.3-2.1.el6.x86_64
Feb 4 08:51:36 node2 kernel: __ratelimit: 6 callbacks suppressed
Feb 4 08:51:36 node2 kernel: su[11208]: segfault at 968 ip 00007fe193ddb885 sp 00007fff59d62b10 error 4 in libpthread-2.12.so[7fe193dd6000+17000]
Feb 4 08:51:36 node2 abrt[11209]: Saved core dump of pid 11208 (/bin/su) to /var/spool/abrt/ccpp-2015-02-04-08:51:36-11208 (585728 bytes)
Feb 4 08:51:36 node2 abrtd: Directory 'ccpp-2015-02-04-08:51:36-11208' creation detected
Feb 4 08:51:36 node2 logger[11294]: Stopping service PRD06
Feb 4 08:51:36 node2 logger[11295]: Stopping Oracle DB PRD06 immediate
Feb 4 08:51:36 node2 yum[53285]: Updated: glib2-2.28.8-4.el6.x86_64
orainstanceresource failed a status check aftersusegfaulted, and the service became stuck in a "recoverable" state showing thatorainstancewas running a validation check as the last thing that happened
Feb 04 08:51:36 rgmanager [fs] Checking fs "ora1", Level 0
Feb 04 08:51:36 rgmanager [orainstance] Validating configuration for ora1
Feb 04 08:51:36 rgmanager status on orainstance "ora1" returned 139 (unspecified)
Feb 04 08:51:36 rgmanager Stopping service service:oracle-svc-ora1
Feb 04 08:51:36 rgmanager [orainstance] Validating configuration for ora1
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
rgmanager- One or more
orainstanceresource agent
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.