Image-based upgrade for single-node OpenShift cluster with the Lifecycle Agent (Developer Preview)

Updated -

The image-based upgrade is currently a Developer Preview feature for OpenShift Container Platform 4.15.

About Developer Preview features

Developer Preview features are not supported with Red Hat production service level agreements (SLAs) and are not functionally complete. Red Hat does not advise using them in a production setting. Developer Preview features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. These releases may not have any documentation, and testing is limited. Red Hat may provide ways to submit feedback on Developer Preview releases without an associated SLA.

About the image-based upgrade for single-node OpenShift cluster

From OpenShift Container Platform 4.14.7, the Lifecycle Agent 4.15 provides you with an alternative way to upgrade the platform version of a single-node OpenShift cluster. The image-based upgrade is faster than the standard upgrade method and allows you to directly upgrade from OpenShift Container Platform <4.y> to <4.y+2>, and <4.y.z> to <4.y.z+n>.

This upgrade method utilizes a generated OCI image from a dedicated seed cluster that is installed on the target single-node OpenShift cluster as a new ostree stateroot. A seed cluster is a single-node OpenShift cluster deployed with the target OpenShift Container Platform version, Day 2 Operators, and configurations that is common to all target clusters.

You can use the seed image, which is generated from the seed cluster, to upgrade the platform version on any single-node OpenShift cluster that has the same combination of hardware, Day 2 Operators, and cluster configuration as the seed cluster.

IMPORTANT: The image-based upgrade uses custom images that are specific to the hardware platform that the clusters are running on. Each different hardware platform requires a separate seed image.

The Lifecycle Agent uses two custom resources (CRs) on the participating clusters to orchestrate the upgrade:

  • On the seed cluster, the SeedGenerator CR allows for the seed image generation. This CR specifies the repository to push the seed image to.

  • On the target cluster, the ImageBasedUpgrade CR specifies the seed container image for the upgrade of the target cluster and the backup configurations for your workloads.

Example SeedGenerator CR

apiVersion: lca.openshift.io/v1alpha1
kind: SeedGenerator
metadata:
  name: seedimage
spec:
  seedImage: <seed_container_image>

Example ImageBasedUpgrade CR

apiVersion: lca.openshift.io/v1alpha1
kind: ImageBasedUpgrade
metadata:
  name: example-upgrade
spec:
  stage: Idle
  seedImageRef:
    version: <target_version>
    image: <seed_container_image>
    pullSecretRef: <seed_pull_secret>
  additionalImages:
    name: ""
    namespace: ""
  autoRollbackOnFailure: {}
#    disabledForPostRebootConfig: "true"
#    disabledForUpgradeCompletion: "true"
#    disabledInitMonitor: "true"
#    initMonitorTimeoutSeconds: 1800
#  extraManifests:
#  - name: sno-extra-manifests
#    namespace: openshift-lifecycle-agent
  oadpContent:
    - name: oadp-cm-example
      namespace: openshift-adp
  • Define the desired stage for the ImageBasedUpgrade CR in the spec.stage field. The value can be Idle, Prep, Upgrade, or Rollback.
  • Define the target platform version, the seed image to be used, and the secret required to access the image in the spec.seedImageRef section.
  • Configure the automatic rollback in the spec.autoRollbackOnFailure section. By default, automatic rollback on failure is enabled throughout the upgrade.
  • Optional. If set to true, the autoRollbackOnFailure.disabledForPostRebootConfig field disables automatic rollback when the reconfiguration of the cluster fails upon the first reboot.
  • Optional. If set to true, the autoRollbackOnFailure.disabledForUpgradeCompletion field disables automatic rollback after the Lifecycle Agent reports a failed upgrade upon completion.
  • Optional. If set to true, the autoRollbackOnFailure.disabledInitMonitor field disables automatic rollback when the upgrade does not complete after reboot within the time frame specified in the initMonitorTimeoutSeconds field.
  • Optional. The autoRollbackOnFailure.initMonitorTimeoutSeconds field specifies the time frame in seconds. If not defined or set to 0, the default value of 1800 seconds (30 minutes) is used.
  • Optional. In the spec.extraManifests section, specify the list of ConfigMap resources that contain the additional extra manifests that you want to apply to the target cluster in the this section. You can also add your custom catalog sources that you want to retain after the upgrade.

After generating the seed image on the seed cluster, you can move through the stages on the target cluster by setting the spec.stage field to the following values in the ImageBasedUpgrade CR:

  • Idle

  • Prep

  • Upgrade

  • Rollback (Optional)

Idle stage

The Lifecycle Agent creates an ImageBasedUpgrade CR set to stage: Idle when the Operator is first deployed. This is the default stage, there is no ongoing upgrade and the cluster is ready to move to the Prep stage.

After a successful upgrade or a rollback, you commit to the change by patching the stage field to Idle in the ImageBasedUpgrade CR. Changing to this stage ensures that the Lifecycle Agent cleans up resources, so the cluster is ready for upgrades again.

Prep stage

Note: You can complete this stage before a scheduled maintenance window.

During the Prep stage, you specify the following upgrade details in the ImageBasedUpgrade CR:

  • seed image to use
  • resources to back up
  • extra manifests to apply after the upgrade

Then, based on what you specify, the Lifecycle Agent prepares for the upgrade without impacting the current running version. This preparation includes ensuring that the target cluster is ready to proceed to the Upgrade stage by checking if it meets certain conditions and pulling the seed image to the target cluster with additional container images specified in the seed image.

You also prepare backup resources with the OADP Operator’s Backup and Restore CRs. These CRs are used in the Upgrade stage to reconfigure the cluster, register the cluster with RHACM, and restore application artifacts.

Important: The same version of the applications must function on both the current and the target release of OpenShift Container Platform.

Additionally to the OADP Operator, the Lifecycle Agent uses the ostree versioning system to create a backup, which allow complete cluster reconfiguration after both upgrade and rollback.

You can stop the upgrade process at this point by moving to the Idle stage or you can start the upgrade by moving to the Upgrade stage in the ImageBasedUpgrade CR. If you stop, the Operator performs cleanup operations.

Upgrade stage

Just before the Lifecycle Agent starts the upgrade process, the backup of your cluster resources specified in the Prep stage are created on a compatible Object storage solution. After the target cluster reboots with the new platform version, the Operator applies the cluster and application configurations defined in the Backup and Restore CRs, and applies any extra manifests that are specified in the referenced ConfigMap resource.

The Operator also regenerates the seed image’s cluster cryptography. This ensures that each single-node OpenShift cluster upgraded with the same seed image has unique and valid cryptographic objects.

Once you are satisfied with the changes, you can finalize the upgrade by moving to the Idle stage. If you encounter issues after the upgrade, you can move to the Rollback stage for a manual rollback.

(Optional) Rollback stage

The rollback stage can be initiated manually or automatically upon failure. During the Rollback stage, the Lifecycle Agent sets the original ostree stateroot as default. Then, the node reboots with the previous release of OpenShift Container Platform and application configurations.

By default, automatic rollback is enabled in the ImageBasedUpgrade CR. The Lifecycle Agent can initiate an automatic rollback if the upgrade fails or if the upgrade does not complete within the specified time limit. For more information about the automatic rollback configurations, see the (Optional) Initiating rollback of the single-node OpenShift cluster after an image-based upgrade or (Optional) Initiating rollback with TALM sections.

Warning: If you move to the Idle stage after a rollback, the Lifecycle Agent cleans up resources that can be used to troubleshoot a failed upgrade.

Installing the Lifecycle Agent by using the CLI

You can use the OpenShift CLI (oc) to install the Lifecycle Agent from the 4.15 Operator catalog on both the seed and target cluster.

Prerequisites

  • Install the OpenShift CLI (oc).

  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create a namespace for the Lifecycle Agent.

    apiVersion: v1
    kind: Namespace
    metadata:
      name: openshift-lifecycle-agent
      annotations:
        workload.openshift.io/allowed: management
    
    1. Create the Namespace CR:

      $ oc create -f lca-namespace.yaml
      
  2. Create an Operator group for the Lifecycle Agent.

    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: openshift-lifecycle-agent
      namespace: openshift-lifecycle-agent
    spec:
      targetNamespaces:
      - openshift-lifecycle-agent
    
    1. Create the OperatorGroup CR:

      $ oc create -f lca-operatorgroup.yaml
      
  3. Create a Subscription CR:

    1. Define the Subscription CR and save the YAML file, for example, lca-subscription.yaml:

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: openshift-lifecycle-agent-subscription
        namespace: openshift-lifecycle-agent
      spec:
        channel: "alpha"
        name: lifecycle-agent
        source: redhat-operators
        sourceNamespace: openshift-marketplace
      
    2. Create the Subscription CR by running the following command:

      $ oc create -f lca-subscription.yaml
      

Verification

  1. Verify that the installation succeeded by inspecting the CSV resource:

    $ oc get csv -n openshift-lifecycle-agent
    

    Example output

    NAME                            DISPLAY                     VERSION               REPLACES                           PHASE
    lifecycle-agent.v4.15.0         Openshift Lifecycle Agent   4.15.0                                                   Succeeded
    
  2. Verify that the Lifecycle Agent is up and running:

    $ oc get deploy -n openshift-lifecycle-agent
    

    Example output

    NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
    lifecycle-agent-controller-manager   1/1     1            1           14s
    

Installing the Lifecycle Agent by using the web console

You can use the OpenShift Container Platform web console to install the Lifecycle Agent from the 4.15 Operator catalog on both the seed and target cluster.

Prerequisites

  • Log in as a user with cluster-admin privileges.

Procedure

  1. In the OpenShift Container Platform web console, navigate to OperatorsOperatorHub.

  2. Search for the Lifecycle Agent from the list of available Operators, and then click Install.

  3. On the Install Operator page, under A specific namespace on the cluster select openshift-lifecycle-agent.

  4. Click Install.

Verification.

To confirm that the installation is successful:

  1. Navigate to the OperatorsInstalled Operators page.

  2. Ensure that the Lifecycle Agent is listed in the openshift-lifecycle-agent project with a Status of InstallSucceeded.

    Note: During installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.

If the Operator is not installed successfully:

  1. Go to the OperatorsInstalled Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.

  2. Go to the WorkloadsPods page and check the logs for pods in the openshift-lifecycle-agent project.

Sharing the container directory between `ostree` stateroots

You must apply a MachineConfig to both the seed and the target clusters during installation time to create a separate partition and share the /var/lib/containers directory between the two ostree stateroots that will be used during the upgrade process.

Sharing the container directory when using RHACM

When you are using RHACM, you must apply a MachineConfig to both the seed and target clusters.

Important: You must complete this procedure at installation time.

Prerequisites

  • Log in as a user with cluster-admin privileges.

Procedure

  1. Apply a MachineConfig to share the /var/lib/containers directory.

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: master
      name: 98-var-lib-containers-partitioned
    spec:
      config:
        ignition:
          version: 3.2.0
        storage:
          disks:
            - device: /dev/disk/by-id/wwn-<root_disk> <1>
              partitions:
                - label: varlibcontainers
                  startMiB: <start_of_parition> <2>
                  sizeMiB: <parition_size> <3>
          filesystems:
            - device: /dev/disk/by-partlabel/varlibcontainers
              format: xfs
              mountOptions:
                - defaults
                - prjquota
              path: /var/lib/containers
              wipeFilesystem: true
        systemd:
          units:
            - contents: |-
                # Generated by Butane
                [Unit]
                Before=local-fs.target
                Requires=systemd-fsck@dev-disk-by\x2dpartlabel-varlibcontainers.service
                After=systemd-fsck@dev-disk-by\x2dpartlabel-varlibcontainers.service
    
                [Mount]
                Where=/var/lib/containers
                What=/dev/disk/by-partlabel/varlibcontainers
                Type=xfs
                Options=defaults,prjquota
    
                [Install]
                RequiredBy=local-fs.target
              enabled: true
              name: var-lib-containers.mount
    
    • Specify the root disk in the storage.disks.device field.
    • Specify the start of the partition in MiB in the storage.disks.partitions.start_mib field. If the value is too small, the installation fails.
    • Specify the size of the partition in MiB in the storage.disks.partitions.size_mib field. If the value is too small, the deployments after installation fail.

Sharing the container directory when using GitOps ZTP

When you are using the GitOps ZTP workflow, you can do the following procedure to create a separate disk partition on both the seed and target cluster and to share the /var/lib/containers directory.

Important: You must complete this procedure at installation time.

Prerequisites

  • Log in as a user with cluster-admin privileges.

  • Install Butane.

Procedure

  1. Create the storage.bu file.

    variant: fcos
    version: 1.3.0
    storage:
      disks:
      - device: /dev/disk/by-id/wwn-<root_disk>
        wipe_table: false
        partitions:
        - label: var-lib-containers
          start_mib: <start_of_partition>
          size_mib: <parition_size>
      filesystems:
        - path: /var/lib/containers
          device: /dev/disk/by-partlabel/var-lib-containers
          format: xfs
          wipe_filesystem: true
          with_mount_unit: true
          mount_options:
            - defaults
            - prjquota
    
    • Specify the root disk in the storage.disks.device field.
    • Specify the start of the partition in MiB in the storage.disks.partitions.start_mib field. If the value is too small, the installation fails.
    • Specify the size of the partition in MiB in the storage.disks.partitions.size_mib field. If the value is too small, the deployments after installation fail.
  2. Convert the storage.bu to an Ignition file.

    $ butane storage.bu
    

    Example output

    {"ignition":{"version":"3.2.0"},"storage":{"disks":[{"device":"/dev/disk/by-id/wwn-0x6b07b250ebb9d0002a33509f24af1f62","partitions":[{"label":"var-lib-containers","sizeMiB":0,"startMiB":250000}],"wipeTable":false}],"filesystems":[{"device":"/dev/disk/by-partlabel/var-lib-containers","format":"xfs","mountOptions":["defaults","prjquota"],"path":"/var/lib/containers","wipeFilesystem":true}]},"systemd":{"units":[{"contents":"# Generated by Butane\n[Unit]\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\n\n[Mount]\nWhere=/var/lib/containers\nWhat=/dev/disk/by-partlabel/var-lib-containers\nType=xfs\nOptions=defaults,prjquota\n\n[Install]\nRequiredBy=local-fs.target","enabled":true,"name":"var-lib-containers.mount"}]}}
    
  3. Copy the output into the .spec.clusters.nodes.ignitionConfigOverride field in the SiteConfig CR.

    [...]
    spec:
      clusters:
        - nodes:
            - ignitionConfigOverride: '{"ignition":{"version":"3.2.0"},"storage":{"disks":[{"device":"/dev/disk/by-id/wwn-0x6b07b250ebb9d0002a33509f24af1f62","partitions":[{"label":"var-lib-containers","sizeMiB":0,"startMiB":250000}],"wipeTable":false}],"filesystems":[{"device":"/dev/disk/by-partlabel/var-lib-containers","format":"xfs","mountOptions":["defaults","prjquota"],"path":"/var/lib/containers","wipeFilesystem":true}]},"systemd":{"units":[{"contents":"# Generated by Butane\n[Unit]\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\n\n[Mount]\nWhere=/var/lib/containers\nWhat=/dev/disk/by-partlabel/var-lib-containers\nType=xfs\nOptions=defaults,prjquota\n\n[Install]\nRequiredBy=local-fs.target","enabled":true,"name":"var-lib-containers.mount"}]}}'
    [...]
    

Verification

  1. During or after installation, verify on the hub cluster that the BareMetalHost object shows the annotation.

    $ oc get bmh -n my-sno-ns my-sno -ojson | jq '.metadata.annotations["bmac.agent-install.openshift.io/ignition-config-overrides"]
    

    Example output

    "{\"ignition\":{\"version\":\"3.2.0\"},\"storage\":{\"disks\":[{\"device\":\"/dev/disk/by-id/wwn-0x6b07b250ebb9d0002a33509f24af1f62\",\"partitions\":[{\"label\":\"var-lib-containers\",\"sizeMiB\":0,\"startMiB\":250000}],\"wipeTable\":false}],\"filesystems\":[{\"device\":\"/dev/disk/by-partlabel/var-lib-containers\",\"format\":\"xfs\",\"mountOptions\":[\"defaults\",\"prjquota\"],\"path\":\"/var/lib/containers\",\"wipeFilesystem\":true}]},\"systemd\":{\"units\":[{\"contents\":\"# Generated by Butane\\n[Unit]\\nRequires=systemd-fsck@dev-disk-by\\\\x2dpartlabel-var\\\\x2dlib\\\\x2dcontainers.service\\nAfter=systemd-fsck@dev-disk-by\\\\x2dpartlabel-var\\\\x2dlib\\\\x2dcontainers.service\\n\\n[Mount]\\nWhere=/var/lib/containers\\nWhat=/dev/disk/by-partlabel/var-lib-containers\\nType=xfs\\nOptions=defaults,prjquota\\n\\n[Install]\\nRequiredBy=local-fs.target\",\"enabled\":true,\"name\":\"var-lib-containers.mount\"}]}}"
    
  2. After installation, check the single-node OpenShift disk status:

    # lsblk
    

    Example output

    NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
    sda      8:0    0 446.6G  0 disk
    ├─sda1   8:1    0     1M  0 part
    ├─sda2   8:2    0   127M  0 part
    ├─sda3   8:3    0   384M  0 part /boot
    ├─sda4   8:4    0 243.6G  0 part /var
    │                                /sysroot/ostree/deploy/rhcos/var
    │                                /usr
    │                                /etc
    │                                /
    │                                /sysroot
    └─sda5   8:5    0 202.5G  0 part /var/lib/containers
    
    # df -h
    

    Example output

    Filesystem      Size  Used Avail Use% Mounted on
    devtmpfs        4.0M     0  4.0M   0% /dev
    tmpfs           126G   84K  126G   1% /dev/shm
    tmpfs            51G   93M   51G   1% /run
    /dev/sda4       244G  5.2G  239G   3% /sysroot
    tmpfs           126G  4.0K  126G   1% /tmp
    /dev/sda5       203G  119G   85G  59% /var/lib/containers
    /dev/sda3       350M  110M  218M  34% /boot
    tmpfs            26G     0   26G   0% /run/user/1000
    

Generating a seed image with the Lifecycle Agent

Use the Lifecycle Agent to generate the seed image with the SeedGenerator CR. The Operator checks for required system configurations, performs any necessary system cleanup before generating the seed image, and launches the image generation. The seed image generation includes the following tasks:

  • Stopping cluster operators

  • Preparing the seed image configuration

  • Generating and pushing the seed image to the image repository specified in the SeedGenerator CR

  • Restoring cluster operators

  • Expiring seed cluster certificates

  • Generating new certificates for the seed cluster

  • Restoring and updates the SeedGenerator CR on the seed cluster

Note: The generated seed image does not include any site-specific data.

Important: During the Developer Preview of this feature, when upgrading a cluster, any custom trusted certificates configured on the cluster will be lost. As a temporary workaround, you must use a seed image from a seed cluster that trusts the certificates to preserve them.

Prerequisites

  • Deploy a single-node OpenShift cluster with a DU profile.
  • Install the Lifecycle Agent on the seed cluster.
  • Install the OADP Operator on the seed cluster.
  • Log in as a user with cluster-admin privileges.
  • The seed cluster has the same CPU topology as the target cluster.
  • The seed cluster has the same IP version as the target cluster.

    Note: Dual-stack networking is not supported in this release.

  • If the target cluster has a proxy configuration, the seed cluster must also have a proxy configuration. The proxy configuration does not have to be the same.

  • The seed cluster is registered as a managed cluster.
  • The Lifecycle Agent deployed on the target cluster is compatible with the version in the seed image.
  • The seed cluster has a separate partition for the container images that will be shared between stateroots. For more information, see Sharing the container directory between ostree stateroots.

Warning: If the target cluster has multiple IPs and one of them belongs to the subnet that was used for creating the seed image, the upgrade fails if the target cluster's node IP does not belong to that subnet.

Procedure

  1. Detach the seed cluster from the hub cluster either manually or if using ZTP, by removing the SiteConfig CR from the kustomization.yaml. This deletes any cluster-specific resources from the seed cluster that must not be in the seed image.

    1. If you are using RHACM, manually detach the seed cluster by running the following command:

      $ oc delete managedcluster sno-worker-example
      
    2. Wait until the ManagedCluster CR is removed. Once the CR is removed, create the proper SeedGenerator CR. The Lifecycle Agent cleans up the RHACM artifacts.

  2. If you are using GitOps ZTP, detach your cluster by removing the seed cluster’s SiteConfig CR from the kustomization.yaml:

    1. Remove your seed cluster’s SiteConfig CR from the kustomization.yaml.

      apiVersion: kustomize.config.k8s.io/v1beta1
      kind: Kustomization
      generators:
        #- example-seed-sno1.yaml
        - example-target-sno2.yaml
        - example-target-sno3.yaml
      
    2. Commit the kustomization.yaml changes in your Git repository and push the changes.

      The ArgoCD pipeline detects the changes and removes the managed cluster.

  3. Create the Secret.

    1. Create the authentication file by running the following commands:

      $ MY_USER=myuserid
      $ AUTHFILE=/tmp/my-auth.json
      $ podman login --authfile ${AUTHFILE} -u ${MY_USER} quay.io/${MY_USER}
      $ base64 -w 0 ${AUTHFILE} ; echo
      
    2. Copy the output into the seedAuth field in the Secret YAML file named seedgen in the openshift-lifecycle-agent namespace.

      apiVersion: v1
      kind: Secret
      metadata:
        name: seedgen
        namespace: openshift-lifecycle-agent
      type: Opaque
      data:
        seedAuth: <encoded_authfile>
      
      • The Secret resource must have the name: seedgen and namespace: openshift-lifecycle-agent fields.
      • Specify a base64-encoded authfile for write-access to the registry for pushing the generated seed images in the data.seedAuth field.
    3. Apply the Secret.

      $ oc apply -f secretseedgenerator.yaml
      
  4. Create the SeedGenerator CR:

    apiVersion: lca.openshift.io/v1alpha1
    kind: SeedGenerator
    metadata:
      name: seedimage
    spec:
      seedImage: <seed_container_image>
    
    • The SeedGenerator CR must be named seedimage.
    • Specify the container image URL in the spec.seedImage field, for example, quay.io/example/seed-container-image:<tag>. It is recommended to use the <seed_cluster_name>:<ocp_version> format.
  5. Generate the seed image by running the following command:

    $ oc apply -f seedgenerator.yaml
    

Important: The cluster reboots and loses API capabilities while the Lifecycle Agent generates the seed image. Applying the SeedGenerator CR stops the kubelet and the CRI-O operations, then it starts the image generation.

Once the image generation is complete, the cluster can be reattached to the hub cluster, and you can access it through the API.

If you want to generate further seed images, you must provision a new seed cluster with the version you want to generate a seed image from.

Verification

  1. Once the cluster recovers and it is available, you can check the status of the SeedGenerator CR:

    $ oc get seedgenerator -A -oyaml
    

    Example output for completed seed generation

    status:
      conditions:
        - lastTransitionTime: 2024-02-13T21:24:26Z
          message: Seed Generation completed
          observedGeneration: 1
          reason: Completed
          status: "False"
          type: SeedGenInProgress
        - lastTransitionTime: 2024-02-13T21:24:26Z
          message: Seed Generation completed
          observedGeneration: 1
          reason: Completed
          status: "True"
          type: SeedGenCompleted
      observedGeneration: 1
    
  2. Verify that the single-node OpenShift cluster is running and is attached to the RHACM hub cluster:

    $ oc get managedclusters sno-worker-example
    

    Example output

    NAME                 HUB ACCEPTED   MANAGED CLUSTER URLS                                  JOINED   AVAILABLE   AGE
    sno-worker-example   true           https://api.sno-worker-example.example.redhat.com     True     True        21h
    
    • The cluster is attached if you see that the value is True for both JOINED and AVAILABLE.

Note: The cluster requires time to recover after restarting the kubelet operation.

Preparing the single-node OpenShift cluster for the image-based upgrade

When you deploy the Lifecycle Agent on a cluster, an ImageBasedUpgrade CR is automatically created. You edit this CR to specify the image repository of the seed image and to move through the different stages on the target cluster.

Prerequisites

Warning: If the target cluster has multiple IPs and one of them belongs to the subnet that was used for creating the seed image, the upgrade fails if the target cluster's node IP does not belong to that subnet.

Procedure

This example procedure demonstrates how to back up and upgrade a cluster with applications which are using persistent volumes.

Note: The target cluster does not need to be detached from the hub cluster.

  1. Create your OADP Backup and Restore CRs. For more information, see Creating a Backup CR and Creating a Restore CR

    1. To back up specific CRs, use the lca.openshift.io/apply-label annotation in your Backup CRs. Based on the annotation, the Lifecycle Agent applies the lca.openshift.io/backup: <backup_name> label and adds the labelSelector.matchLabels.lca.openshift.io/backup: <backup_name> label selector to the specified resources when creating the Backup CRs.

      apiVersion: velero.io/v1
      kind: Backup
      metadata:
        name: backup-acm-klusterlet
        annotations:
          lca.openshift.io/apply-label: "apps/v1/deployments/open-cluster-management-agent/klusterlet,v1/secrets/open-cluster-management-agent/bootstrap-hub-kubeconfig,rbac.authorization.k8s.io/v1/clusterroles/klusterlet,v1/serviceaccounts/open-cluster-management-agent/klusterlet,rbac.authorization.k8s.io/v1/clusterroles/open-cluster-management:klusterlet-admin-aggregate-clusterrole,rbac.authorization.k8s.io/v1/clusterrolebindings/klusterlet,operator.open-cluster-management.io/v1/klusterlets/klusterlet,apiextensions.k8s.io/v1/customresourcedefinitions/klusterlets.operator.open-cluster-management.io,v1/secrets/open-cluster-management-agent/open-cluster-management-image-pull-credentials" <1>
        labels:
          velero.io/storage-location: default
        namespace: openshift-adp
      spec:
        includedNamespaces:
        - open-cluster-management-agent
        includedClusterScopedResources:
        - klusterlets.operator.open-cluster-management.io
        - clusterclaims.cluster.open-cluster-management.io
        - clusterroles.rbac.authorization.k8s.io
        - clusterrolebindings.rbac.authorization.k8s.io
        includedNamespaceScopedResources:
        - deployments
        - serviceaccounts
        - secrets
      
      • The value of the lca.openshift.io/apply-label annotation must be a list of comma-separated objects in the group/version/resource/name format for cluster-scoped resources, or in the group/version/resource/namespace/name format for namespace-scoped resources. It must be attached to the related Backup CR.

      Note: Depending on the RHACM configuration, the v1/secrets/open-cluster-management-agent/open-cluster-management-image-pull-credentials object must be backed up. Check if your MultiClusterHub CR has the spec.imagePullSecret field defined and if the secret exists in the open-cluster-management-agent namespace in your hub cluster. If the spec.imagePullSecret field does not exist, you can remove the v1/secrets/open-cluster-management-agent/open-cluster-management-image-pull-credentials object from the lca.openshift.io/apply-label annotation.

      Important: To use the lca.openshift.io/apply-label annotation for backing up specific resources, the resources listed in the annotation must also be included in the spec section. If the lca.openshift.io/apply-label annotation is used in the Backup CR, only the resources listed in the annotation will be backed up, even if other resource types are specified in the spec section or not.

    2. Define the restore order for the OADP Operator in the Restore CR by using the lca.openshift.io/apply-wave field:

      apiVersion: velero.io/v1
      kind: Restore
      metadata:
        name: restore-acm-klusterlet
        namespace: openshift-adp
        labels:
          velero.io/storage-location: default
        annotations:
          lca.openshift.io/apply-wave: "1"
      spec:
        backupName:
          acm-klusterlet
      ---
      apiVersion: velero.io/v1
      kind: Restore
      metadata:
        name: restore-example-app
        namespace: openshift-adp
        labels:
          velero.io/storage-location: default
        annotations:
          lca.openshift.io/apply-wave: "2"
      spec:
        backupName:
          backup-example-app
      

      Note: If you do not define the lca.openshift.io/apply-wave annotation in the Backup and Restore CRs, they will be applied together.

    3. Create a kustomization.yaml that will append the information to a new ConfigMap:

      configMapGenerator:
      - name: oadp-cm-example
        namespace: openshift-adp
        files:
        - backup-acm-klusterlet.yaml
        - backup-example-app.yaml
        - restore-acm-klusterlet.yaml
        - restore-example-app.yaml
      generatorOptions:
        disableNameSuffixHash: true
      
      • The generatorOptions.disableNameSuffixHash disables the hash generation at the end of the ConfigMap filename when set to true. This option allows the ConfigMap file to be overwritten when a new one is generated with the same name.
    4. Create the ConfigMap:

      $ kustomize build ./ -o oadp-cm-example.yaml
      

      Example output

      kind: ConfigMap
      metadata:
        name: oadp-cm-example
        namespace: openshift-adp
      [...]
      
    5. Apply the ConfigMap:

      $ oc apply -f oadp-cm-example.yaml
      
  2. (Optional) To keep your custom catalog sources after the upgrade, add them to the spec.extraManifest in the ImageBasedUpgrade CR. For more information, see Catalog source.

  3. Edit the ImageBasedUpgrade CR:

    apiVersion: lca.openshift.io/v1alpha1
    kind: ImageBasedUpgrade
    metadata:
      name: example-upgrade
    spec:
      stage: Idle
      seedImageRef:
        version: 4.15.2
        image: <seed_container_image>
        pullSecretRef: <seed_pull_secret>
      additionalImages:
        name: ""
        namespace: ""
      autoRollbackOnFailure: {}
    #    disabledForPostRebootConfig: "true"
    #    disabledForUpgradeCompletion: "true"
    #    disabledInitMonitor: "true"
    #    initMonitorTimeoutSeconds: 1800
    #  extraManifests:
    #  - name: sno-extra-manifests
    #    namespace: openshift-lifecycle-agent
      oadpContent:
      - name: oadp-cm-example
        namespace: openshift-adp
    
    • Define the desired stage for the ImageBasedUpgrade CR in the spec.stage field. The value can be Idle, Prep, Upgrade, or Rollback.
    • Define the target platform version, the seed image to be used, and the secret required to access the image in the spec.seedImageRef section.
    • Configure the automatic rollback in the spec.autoRollbackOnFailure section. By default, automatic rollback on failure is enabled throughout the upgrade.
    • Optional. If set to true, the autoRollbackOnFailure.disabledForPostRebootConfig field disables automatic rollback when the reconfiguration of the cluster fails upon the first reboot.
    • Optional. If set to true, the autoRollbackOnFailure.disabledForUpgradeCompletion field disables automatic rollback after the Lifecycle Agent reports a failed upgrade upon completion.
    • Optional. If set to true, the autoRollbackOnFailure.disabledInitMonitor field disables automatic rollback when the upgrade does not complete after reboot within the time frame specified in the initMonitorTimeoutSeconds field.
    • Optional. The autoRollbackOnFailure.initMonitorTimeoutSeconds field specifies the time frame in seconds. If not defined or set to 0, the default value of 1800 seconds (30 minutes) is used.
    • Optional. In the spec.extraManifests section, specify the list of ConfigMap resources that contain the additional extra manifests that you want to apply to the target cluster in the this section. You can also add your custom catalog sources that you want to retain after the upgrade.
    • Specify the list of ConfigMap resources that contain the OADP Backup and Restore CRs in the spec.oadpContent section.
  4. Change the value of the stage field to Prep in the ImageBasedUpgrade CR:

    $ oc patch imagebasedupgrades.lca.openshift.io example-upgrade -p='{"spec": {"stage": "Prep"}}' --type=merge -n openshift-lifecycle-agent
    
  5. The Lifecycle Agent checks for the health of the cluster, creates a new ostree stateroot, and pulls the seed image to the target cluster. Then, the Operator precaches all the required images on the target cluster.

Verification

  1. Check the status of the ImageBasedUpgrade CR.

    $ oc get ibu -A -oyaml
    

    Example output

    status:
      conditions:
        - lastTransitionTime: 2024-01-01T09:00:00Z
          message: In progress
          observedGeneration: 2
          reason: InProgress
          status: "False"
          type: Idle
        - lastTransitionTime: 2024-01-01T09:00:00Z
          message: "Prep completed: total: 121 (pulled: 1, skipped: 120, failed: 0)"
          observedGeneration: 2
          reason: Completed
          status: "True"
          type: PrepCompleted
        - lastTransitionTime: 2024-01-01T09:00:00Z
          message: Prep completed
          observedGeneration: 2
          reason: Completed
          status: "False"
          type: PrepInProgress
      observedGeneration: 2
    

Upgrading the single-node OpenShift cluster with Lifecycle Agent

Once you generated the seed image and completed the Prep stage, you can upgrade the target cluster. During the upgrade process, the OADP Operator creates a backup of the artifacts specified in the OADP CRs, then the Lifecycle Agent upgrades the cluster.

If the upgrade fails or stops, an automatic rollback is initiated. If you have an issue after the upgrade, you can initiate a manual rollback. For more information about rollbacks, see the (Optional) Initiating rollback of the single-node OpenShift clusters after an image-based upgrade or (Optional) Initiating rollback with TALM sections.

Important: During the Developer Preview of this feature, when upgrading a cluster, any custom trusted certificates configured on the cluster will be lost. As a temporary workaround, to preserve these certificates, you must use a seed image from a seed cluster that trusts the certificates.

Prerequisites

  • Complete the Prep stage.

Procedure

  1. When you are ready, move to the upgrade stage by changing the value of the stage field to Upgrade in the ImageBasedUpgrade CR.

    $ oc patch imagebasedupgrades.lca.openshift.io example-upgrade -p='{"spec": {"stage": "Upgrade"}}' --type=merge
    
  2. Check the status of the ImageBasedUpgrade CR:

    $ oc get ibu -A -oyaml
    

    Example output

    status:
      conditions:
        - lastTransitionTime: 2024-01-01T09:00:00Z
          message: In progress
          observedGeneration: 2
          reason: InProgress
          status: "False"
          type: Idle
        - lastTransitionTime: 2024-01-01T09:00:00Z
          message: "Prep completed: total: 121 (pulled: 1, skipped: 120, failed: 0)"
          observedGeneration: 2
          reason: Completed
          status: "True"
          type: PrepCompleted
        - lastTransitionTime: 2024-01-01T09:00:00Z
          message: Prep completed
          observedGeneration: 2
          reason: Completed
          status: "False"
          type: PrepInProgress
        - lastTransitionTime: 2024-01-01T09:00:00Z
          message: Upgrade completed
          observedGeneration: 3
          reason: Completed
          status: "True"
          type: UpgradeCompleted
    
  3. The OADP Operator creates a backup of the data specified in the OADP Backup and Restore CRs.

  4. The target cluster reboots.

  5. Monitor the status of the CR:

    $ oc get ibu -A -oyaml
    
  6. The cluster reboots.

  7. Once you are satisfied with the upgrade, commit to the changes by changing the value of the stage field to Idle in the ImageBasedUpgrade CR:

    $ oc patch imagebasedupgrades.lca.openshift.io example-upgrade -p='{"spec": {"stage": "Idle"}}' --type=merge
    

    Important: You cannot roll back the changes once you move to the Idle stage after an upgrade.

  8. The Lifecycle Agent deletes all resources created during the upgrade process.

Verification

  1. Check the status of the ImageBasedUpgrade CR:

    $ oc get ibu -A -oyaml
    

    Example output

    status:
      conditions:
        - lastTransitionTime: 2024-01-01T09:00:00Z
          message: In progress
          observedGeneration: 2
          reason: InProgress
          status: "False"
          type: Idle
        - lastTransitionTime: 2024-01-01T09:00:00Z
          message: "Prep completed: total: 121 (pulled: 1, skipped: 120, failed: 0)"
          observedGeneration: 2
          reason: Completed
          status: "True"
          type: PrepCompleted
        - lastTransitionTime: 2024-01-01T09:00:00Z
          message: Prep completed
          observedGeneration: 2
          reason: Completed
          status: "False"
          type: PrepInProgress
        - lastTransitionTime: 2024-01-01T09:00:00Z
          message: Upgrade completed
          observedGeneration: 3
          reason: Completed
          status: "True"
          type: UpgradeCompleted
    
  2. Check the status of the cluster restoration:

    $ oc get restores -n openshift-adp -o custom-columns=NAME:.metadata.name,Status:.status.phase,Reason:.status.failureReason
    

    Example output

    NAME             Status      Reason
    acm-klusterlet   Completed   <none>
    apache-app       Completed   <none>
    localvolume      Completed   <none>
    

(Optional) Initiating rollback of the single-node OpenShift cluster after an image-based upgrade

You can manually roll back the changes if you encounter unresolvable issues after an upgrade. By default, an automatic rollback is initiated on the following conditions:

  • If the reconfiguration of the cluster fails upon the first reboot.
  • If the Lifecycle Agent reports a failed upgrade.
  • If the upgrade does not complete within the time frame specified in the initMonitorTimeoutSeconds field after rebooting.

You can disable the automatic rollback configuration in the ImageBasedUpgrade CR at the Prep stage:

Example ImageBasedUpgrade CR

apiVersion: lca.openshift.io/v1alpha1
kind: ImageBasedUpgrade
metadata:
  name: example-upgrade
spec:
  stage: Idle
  seedImageRef:
    version: 4.15.2
    image: <seed_container_image>
  additionalImages:
    name: ""
    namespace: ""
  autoRollbackOnFailure: {}
#    disabledForPostRebootConfig: "true"
#    disabledForUpgradeCompletion: "true"
#    disabledInitMonitor: "true"
#    initMonitorTimeoutSeconds: 1800
[...]
  • Configure the automatic rollback in the spec.autoRollbackOnFailure section. By default, automatic rollback on failure is enabled throughout the upgrade.
  • Optional. If set to true, the autoRollbackOnFailure.disabledForPostRebootConfig field disables automatic rollback when the reconfiguration of the cluster fails upon the first reboot.
  • Optional. If set to true, the autoRollbackOnFailure.disabledForUpgradeCompletion field disables automatic rollback after the Lifecycle Agent reports a failed upgrade upon completion.
  • Optional. If set to true, the autoRollbackOnFailure.disabledInitMonitor field disables automatic rollback when the upgrade does not complete after reboot within the time frame specified in the initMonitorTimeoutSeconds field.
  • Optional. The autoRollbackOnFailure.initMonitorTimeoutSeconds field specifies the time frame in seconds. If not defined or set to 0, the default value of 1800 seconds (30 minutes) is used.

Prerequisites

  • Log in to the hub cluster as a user with cluster-admin privileges.

Procedure

  1. Move to the rollback stage by changing the value of the stage field to Rollback in the ImageBasedUpgrade CR.

    $ oc patch imagebasedupgrades.lca.openshift.io example-upgrade -p='{"spec": {"stage": "Rollback"}}' --type=merge
    
  2. The Lifecycle Agent reboots the cluster with the previously installed version of OpenShift Container Platform and restores the applications.

  3. Commit to the rollback by changing the value of the stage field to Idle in the ImageBasedUpgrade CR:

    $ oc patch imagebasedupgrades.lca.openshift.io example-upgrade -p='{"spec": {"stage": "Idle"}}' --type=merge -n openshift-lifecycle-agent
    

    Warning: If you move to the Idle stage after a rollback, the Lifecycle Agent cleans up resources that can be used to troubleshoot a failed upgrade.

Upgrading the single-node OpenShift cluster through GitOps ZTP

You can upgrade your managed single-node OpenShift cluster with the image-based upgrade through GitOps ZTP.

Important: During the Developer Preview of this feature, when upgrading a cluster, any custom trusted certificates configured on the cluster will be lost. As a temporary workaround, to preserve these certificates, you must use a seed image from a seed cluster that trusts the certificates.

Prerequisites

  • Install RHACM 2.9.2. or later version.
  • Install TALM.
  • Update GitOps ZTP to the latest version.
  • Provision one or more managed clusters with GitOps ZTP.
  • Log in as a user with cluster-admin privileges.
  • You generated a seed image from a compatible seed cluster.
  • Install the OADP Operator on the target cluster. For more information, see About installing the OADP Operator.
  • Create an S3-compatible object storage solution and a ready-to-use bucket with proper credentials configured. For more information, see AWS S3 compatible backup storage providers.
  • Create a separate partition on the target cluster for the container images that is shared between stateroots. For more information about, see Sharing the container directory between ostree stateroots.

Procedure

  1. Create a policy for the OADP ConfigMap, named oadp-cm-common-policies. For more information about how to create the ConfigMap, follow the first step in Preparing the single-node OpenShift cluster for the image-based upgrade.

    IMPORTANT: Depending on the RHACM configuration, the v1/secrets/open-cluster-management-agent/open-cluster-management-image-pull-credentials object must be backed up. Check if your MultiClusterHub CR has the spec.imagePullSecret field defined and if the secret exists in the open-cluster-management-agent namespace in your hub cluster. If the spec.imagePullSecret field does not exist, you can remove the v1/secrets/open-cluster-management-agent/open-cluster-management-image-pull-credentials object from the lca.openshift.io/apply-label annotation.

  2. (Optional) Create a policy for the ConfigMap of your user-specific extra manifests that are not part of the seed image. The Lifecycle Agent does not automatically extract these extra manifests from the seed cluster, so you can add a ConfigMap resource of your user-specific extra manifests in the spec.extraManifests field in the ImageBasedUpgrade CR.

  3. (Optional) To keep your custom catalog sources after the upgrade, add them to the spec.extraManifest in the ImageBasedUpgrade CR. For more information, see Catalog source.

  4. Create a PolicyGenTemplate CR that contains policies for the Prep and Upgrade stages.

    apiVersion: ran.openshift.io/v1
    kind: PolicyGenTemplate
    metadata:
      name: group-ibu
      namespace: ztp-group
    spec:
      bindingRules:
        group-du-sno: ""
      mcp: master
      evaluationInterval:
        compliant: 10s
        noncompliant: 10s
      sourceFiles:
        - fileName: ImageBasedUpgrade.yaml
          policyName: prep-policy
          spec:
            stage: Prep
            seedImageRef:
              version: 4.15.0
              image: quay.io/user/lca-seed:4.15.0
              pullSecretRef:
                name: <seed_pull_secret>
            oadpContent:
              - name: oadp-cm-example
                namespace: openshift-adp
    #        extraManifests:
    #          - name: sno-extra-manifests
    #            namespace: openshift-lifecycle-agent
          status:
            conditions:
              - reason: Completed
                status: "True"
                type: PrepCompleted
        - fileName: ImageBasedUpgrade.yaml
          policyName: upgrade-policy
          spec:
            stage: Upgrade
          status:
            conditions:
              - reason: Completed
                status: "True"
                type: UpgradeCompleted
    
    • The spec.evaluationInterval field defines the policy evaluation interval for compliant and non-compliant policies. Set them to 10s to ensure that the policy status accurately reflects the current upgrade status.
    • Define the seed image, OpenShift Container Platform version, and pull secret for the upgrade in the spec.seedImageRef section at the Prep stage.
    • Define the OADP ConfigMap resources required for backup and restore in the spec.oadpContent section at the Prep stage.
    • Optional. Define the ConfigMap resource for your user-specific extra manifests in the spec.extraManifests section at the Prep stage.
  5. Create a PolicyGenTemplate CR for the default set of extra manifests:

    apiVersion: ran.openshift.io/v1
    kind: PolicyGenTemplate
    metadata:
      name: sno-ibu
    spec:
      bindingRules:
        sites: example-sno
        du-profile: 4.15.0
      mcp: master
      sourceFiles:
        - fileName: SriovNetwork.yaml
          policyName: config-policy
          metadata:
            name: sriov-nw-du-fh
            labels:
              lca.openshift.io/target-ocp-version: "4.15.0"
          spec:
            resourceName: du_fh
            vlan: 140
        - fileName: SriovNetworkNodePolicy.yaml
          policyName: config-policy
          metadata:
            name: sriov-nnp-du-fh
            labels:
              lca.openshift.io/target-ocp-version: "4.15.0"
          spec:
            deviceType: netdevice
            isRdma: false
            nicSelector:
              pfNames:
                - ens5f0
            numVfs: 8
            priority: 10
            resourceName: du_fh
        - fileName: SriovNetwork.yaml
          policyName: config-policy
          metadata:
            name: sriov-nw-du-mh
            labels:
              lca.openshift.io/target-ocp-version: "4.15.0"
          spec:
            resourceName: du_mh
            vlan: 150
        - fileName: SriovNetworkNodePolicy.yaml
          policyName: config-policy
          metadata:
            name: sriov-nnp-du-mh
            labels:
              lca.openshift.io/target-ocp-version: "4.15.0"
          spec:
            deviceType: vfio-pci
            isRdma: false
            nicSelector:
              pfNames:
                - ens7f0
            numVfs: 8
            priority: 10
            resourceName: du_mh
    

    Important: Ensure that the lca.openshift.io/target-ocp-version label matches the target OpenShift Container Platform version that is specified in the seedImageRef.version field of the ImageBasedUpgrade CR. The Lifecycle Agent only applies the CRs that match the specified version.

  6. Commit, and push the created CRs to the GitOps ZTP Git repository.

    1. Verify that the policies are created:
    $ oc get policies -n spoke1 | grep -E "group-ibu"
    

    Example output

    ztp-group.group-ibu-prep-policy          inform               NonCompliant          31h
    ztp-group.group-ibu-upgrade-policy       inform               NonCompliant          31h
    
  7. To reflect the target platform version, update the du-profile or the corresponding policy-binding label in the SiteConfig CR.

    apiVersion: ran.openshift.io/v1
    kind: SiteConfig
    [...]
    spec:
      [...]
        clusterLabels:
          du-profile: "4.15.2"
    

    Important: Updating the labels to the target platform version unbinds the existing set of policies.

  8. Commit and push the updated SiteConfig CR to the GitOps ZTP Git repository.

  9. When you are ready to move to the Prep stage, create the ClusterGroupUpgrade CR with the Prep and OADP ConfigMap policies:

    apiVersion: ran.openshift.io/v1alpha1
    kind: ClusterGroupUpgrade
    metadata:
      name: cgu-ibu-prep
      namespace: default
    spec:
      clusters:
      - spoke1
      enable: true
      managedPolicies:
      - oadp-cm-common-policies
      - group-ibu-prep-policy
    #  - user-spec-extra-manifests
      remediationStrategy:
        canaries:
          - spoke1
        maxConcurrency: 1
        timeout: 240
    
  10. Apply the Prep policy:

    $ oc apply -f cgu-ibu-prep.yml
    
    1. Monitor the status and wait for the cgu-ibu-prep ClusterGroupUpgrade to report Completed.

      $ oc get cgu -n default
      

      Example output

      NAME                    AGE   STATE       DETAILS
      cgu-ibu-prep            31h   Completed   All clusters are compliant with all the managed policies
      
  11. When you are ready to move to the Upgrade stage, create the ClusterGroupUpgrade CR that references the Upgrade policy:

    apiVersion: ran.openshift.io/v1alpha1
    kind: ClusterGroupUpgrade
    metadata:
      name: cgu-ibu-upgrade
      namespace: default
    spec:
      clusters:
      - spoke1
      enable: true
      managedPolicies:
      - group-ibu-upgrade-policy
      remediationStrategy:
        canaries:
          - spoke1
        maxConcurrency: 1
        timeout: 240
    
  12. Apply the Upgrade policy:

    $ oc apply -f cgu-ibu-upgrade.yml
    
    1. Monitor the status and wait for the cgu-ibu-upgrade ClusterGroupUpgrade to report Completed.

      $ oc get cgu -n default
      

      Example output

      NAME                    AGE   STATE       DETAILS
      cgu-ibu-prep            31h   Completed   All clusters are compliant with all the managed policies
      cgu-ibu-upgrade         31h   Completed   All clusters are compliant with all the managed policies
      
  13. When you are satisfied with the changes and ready, create the PolicyGenTemplate to finalize the upgrade:

    apiVersion: ran.openshift.io/v1
    kind: PolicyGenTemplate
    metadata:
      name: group-ibu
      namespace: "ztp-group"
    spec:
      bindingRules:
        group-du-sno: ""
      mcp: "master"
      evaluationInterval:
        compliant: 10s
        noncompliant: 10s
      sourceFiles:
        - fileName: ImageBasedUpgrade.yaml
          policyName: "finalize-policy"
          spec:
            stage: Idle
          status:
            conditions:
              - status: "True"
                type: Idle
    
  14. Create a ClusterGroupUpgrade CR that references the policy that finalizes the upgrade:

    apiVersion: ran.openshift.io/v1alpha1
    kind: ClusterGroupUpgrade
    metadata:
      name: cgu-ibu-finalize
      namespace: default
    spec:
      clusters:
      - spoke1
      enable: true
      managedPolicies:
      - group-ibu-finalize-policy
      remediationStrategy:
        canaries:
          - spoke1
        maxConcurrency: 1
        timeout: 240
    
  15. Apply the policy:

    $ oc apply -f cgu-ibu-finalize.yml
    
    1. Monitor the status and wait for the cgu-ibu-upgrade ClusterGroupUpgrade to report Completed.

      $ oc get cgu -n default
      

      Example output

      NAME                    AGE   STATE       DETAILS
      cgu-ibu-finalize        30h   Completed   All clusters are compliant with all the managed policies
      cgu-ibu-prep            31h   Completed   All clusters are compliant with all the managed policies
      cgu-ibu-upgrade         31h   Completed   All clusters are compliant with all the managed policies
      

(Optional) Initiating rollback with TALM

By default, an automatic rollback is initiated on certain conditions. For more information about automatic rollback configuration, see (Optional) Initiating rollback of the single-node OpenShift cluster after an image-based upgrade

If you encounter an issue after the upgrade, you can start a manual rollback.

  1. Update the du-profile or the corresponding policy-binding label with the original platform version in the SiteConfig CR:

    apiVersion: ran.openshift.io/v1
    kind: SiteConfig
    [...]
    spec:
      [...]
        clusterLabels:
          du-profile: "4.14.7"
    
  2. When you are ready to move to the Rollback stage, create a PolicyGenTemplate CR for the Rollback policies:

    apiVersion: ran.openshift.io/v1
    kind: PolicyGenTemplate
    metadata:
      name: group-ibu
      namespace: "ztp-group"
    spec:
      bindingRules:
        group-du-sno: ""
      mcp: "master"
      evaluationInterval:
        compliant: 10s
        noncompliant: 10s
      sourceFiles:
        - fileName: ImageBasedUpgrade.yaml
          policyName: "rollback-policy"
          spec:
            stage: Rollback
          status:
            conditions:
              - message: Rollback completed
                reason: Completed
                status: "True"
                type: RollbackCompleted
    
  3. Create a ClusterGroupUpgrade CR that references the Rollback policies:

    apiVersion: ran.openshift.io/v1alpha1
    kind: ClusterGroupUpgrade
    metadata:
      name: cgu-ibu-rollback
      namespace: default
    spec:
      clusters:
      - spoke1
      enable: true
      managedPolicies:
      - group-ibu-rollback-policy
      remediationStrategy:
        canaries:
          - spoke1
        maxConcurrency: 1
        timeout: 240
    
  4. Apply the Rollback policy:

    $ oc apply -f cgu-ibu-rollback.yml
    
  5. When you are satisfied with the changes and ready to finalize the rollback, create the PolicyGenTemplate CR:

    apiVersion: ran.openshift.io/v1
    kind: PolicyGenTemplate
    metadata:
      name: group-ibu
      namespace: "ztp-group"
    spec:
      bindingRules:
        group-du-sno: ""
      mcp: "master"
      evaluationInterval:
        compliant: 10s
        noncompliant: 10s
      sourceFiles:
        - fileName: ImageBasedUpgrade.yaml
          policyName: "finalize-policy"
          spec:
            stage: Idle
          status:
            conditions:
              - status: "True"
                type: Idle
    
  6. Create a ClusterGroupUpgrade CR that references the policy that finalizes the upgrade:

    apiVersion: ran.openshift.io/v1alpha1
    kind: ClusterGroupUpgrade
    metadata:
      name: cgu-ibu-finalize
      namespace: default
    spec:
      clusters:
      - spoke1
      enable: true
      managedPolicies:
      - group-ibu-finalize-policy
      remediationStrategy:
        canaries:
          - spoke1
        maxConcurrency: 1
        timeout: 240
    
  7. Apply the policy:

    $ oc apply -f cgu-ibu-finalize.yml
    

Comments