FAQ on ROSA cluster notifications

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Service on AWS 4

Issue

  1. Does Red Hat provide a list of all ROSA notifications available somewhere?
  2. Is it possible to configure settings regarding notifications? For example, to receive only Severity notifications above Warning.
  3. Are the Email notifications sent immediately when a service log is generated?
  4. What will be the actual notification content sent ?

Resolution

Q1: Does Red Hat provide a list of all ROSA notifications available somewhere?
A: There is no such list available, some of the notifications are manually sent by Red Hat team via service logs/email, while some are automated.

Q2: Is it possible to configure settings regarding notifications? For example, to receive only Severity notifications above Warning.
A: No, it not possible to configure the notifications as per severity basis. However, the recipient (who gets the notification) can be configured.

Q3. Are the Email notifications sent immediately when a service log is generated?
A: Yes, as soon as a Service Log is generated an email should be sent. However, sometimes there could be a delay in sending a Service Log because Red Hat might be assessing the situation and deciding what the best action should be. For example, will Red Hat do a recovery action or is it best leave as a Service Log message for the cluster admin to take action etc.

Q4: What will be the actual notification content sent ?
A: For cluster events such as Load Balancer Quota updates and Scheduled maintenance upgrades users can see the details of service log sent by Red Hat by using Red Hat OpenShift Cluster Manager and if subscribed will receive emails as well as per Red Hat product documentation.

Now the content of each service log could differ. For example :

  • Example 1 : Customer pod preventing node drain example in raw format could look like:
"severity": "Warning",
 "summary": "Action required: Pod(s) preventing Node Drain",
 "description": "Your cluster is attempting to drain a node but there are pod(s) preventing the drain. The SRE team has identified the pod(s) as '${POD}' running in namespace(s) '${NAMESPACE}'. 

Please re-schedule the impacted pod(s) so that the node can drain.",
  • Example 2 : In case of degraded performance and based on situation control plane may be resized which would follow up by Red Hat sending following service log where $JUSTIFICATION could be replaced with "exhausted memory of control plane nodes".
"severity": "Info",
"summary": "Control plane resized",
"description" : "SRE has observed ${JUSTIFICATION}. 

SRE has resized the control plane nodes of your cluster to ${INSTANCE_TYPE} to accommodate cluster load"

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments