One node of Oracle RAC reboots often with the error "o2hb_write_timeout:172 ERROR: Heartbeat write timeout to device dm-5 after 60000 milliseconds"
Issue
-
One node of the two node Oracle RAC cluster is getting rebooted frequently. Following messages appear in the logs just before reboot:
Feb 19 06:16:19 hostname kernel: qla2xxx 0000:05:00.0: scsi(4:0:6): Abort command issued -- 1 1991d49 2002. Feb 19 06:22:29 hostname kernel: qla2xxx 0000:05:00.0: scsi(4:1:7): Abort command issued -- 1 19b25d0 2002. Feb 19 06:25:01 hostname kernel: (events/5,55,5):o2hb_write_timeout:172 ERROR: Heartbeat write timeout to device dm-5 after 60000 milliseconds
Environment
- Red Hat Enterprise Linux 5.5
- Oracle RAC cluster
- QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA
- Ocfs2 filesystem
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.