ETCD is degraded with NAME-PENDING error in RHOCP 4
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
Issue
-
ETCD is degradedwith following error message:ClusterMemberControllerDegraded: unhealthy members found during reconciling members EtcdMembersDegraded: 2 of 3 members are available, NAME-PENDING-x.x.x.x has not started reason: ClusterMemberController_SyncError::EtcdMembers_UnhealthyMembers
Resolution
-
Get the shell access of the
ETCDpod from healthy node:$ oc rsh -n openshift-etcd etcd-ocp-xxx-master-1 -
Check the list of
etcd membersby using following command:$ etcdctl member list -w table +------------------+-----------+------------------+---------------------------+---------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+-----------+------------------+---------------------------+---------------------------+------------+ | 2782xxxxxxxxx409 | unstarted | | https://xxx.xx.xx.70:2380 | | true | | bc14xxxxxxxxx6ab | started | ocp-xxx-master-2 | https://xxx.xx.xx.94:2380 | https://xxx.xx.xx.94:2379 | false | | c7ef0xxxxxxx5881 | started | ocp-xxx-master-1 | https://xxx.xx.xx.95:2380 | https://xxx.xx.xx.95:2379 | false | +------------------+-----------+------------------+---------------------------+---------------------------+------------+ -
Refer the steps in official doc to
removethe unhealthy member
Root Cause
- One of the
ETCD memberhas differentIPfrom theIPof respectivemaster nodewhere it is deployed.
Diagnostic Steps
-
Check the status of
ETCD cluster operator:$ oc get co etcd -o yaml status: conditions: - lastTransitionTime: "2024-06-03T07:58:39Z" message: |- ClusterMemberControllerDegraded: unhealthy members found during reconciling members EtcdMembersDegraded: 2 of 3 members are available, NAME-PENDING-xxx.xx.xx.70 has not started reason: ClusterMemberController_SyncError::EtcdMembers_UnhealthyMembers status: "True" type: Degraded -
Check the
member listby using following command:$ etcdctl member list -w table +------------------+-----------+------------------+---------------------------+---------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+-----------+------------------+---------------------------+---------------------------+------------+ | 2782xxxxxxxxx409 | unstarted | | https://xxx.xx.xx.70:2380 | | true | | bc14xxxxxxxxx6ab | started | ocp-xxx-master-2 | https://xxx.xx.xx.94:2380 | https://xxx.xx.xx.94:2379 | false | | c7ef0xxxxxxx5881 | started | ocp-xxx-master-1 | https://xxx.xx.xx.95:2380 | https://xxx.xx.xx.95:2379 | false | +------------------+-----------+------------------+---------------------------+---------------------------+------------+ -
Verify that the
IP addressof thenodes, theIP Addressof theETCDand respectivenodeshould beidentical, if IP isdifferent, then it needs to befixed:$ oc get nodes -owide | awk '{print $1"\t\t"$7}' NAME INTERNAL-IP ocp-xxx-master-0 xxx.xx.xx.93 ocp-xxx-master-1 xxx.xx.xx.94 ocp-xxx-master-2 xxx.xx.xx.95
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments