Upgrade Kubernetes cluster on AWS#
Pre-requisites#
Install/upgrade CLI tools
Refer to Prerequisites on how to do this.
Important
Ensure you have the latest version of
eksctl
, it matters.Learn/recall how to generate an
eksctl
config fileWhen upgrading an EKS cluster, we will use
eksctl
extensively and reference a generated config file,$CLUSTER_NAME.eksctl.yaml
. It’s generated from the the$CLUSTER_NAME.jsonnet
file.# re-generate an eksctl config file for use with eksctl jsonnet $CLUSTER_NAME.jsonnet > $CLUSTER_NAME.eksctl.yaml
Cluster upgrade#
1. Prepare two terminal sessions#
Start two terminal sessions and set CLUSTER_NAME=...
in both.
In the first terminal, setup your AWS credentials (according to this
documentation) and change you working directory to the
eksctl
folder.
In the second terminal, setup k8s credentials with deployer use-cluster-credentials $CLUSTER_NAME
.
Note
If you would use deployer use-cluster-credentials
where you have setup your
AWS credentials, you would no longer act as your your AWS user but the
hub-deployer-user
that doesn’t have the relevant permissions to work with
eksctl
.
2. Check cluster status and activity#
Before upgrading, get a sense of the current status of the cluster and users activity in it.
# get nodes, and k8s version, and their node group names
# - what node groups are active currently?
kubectl get node --label-columns=alpha.eksctl.io/nodegroup-name
# get all pods
# - are there any non-running/non-ready pod with issues before upgrading?
kubectl get pod -A
# get users' server and dask cluster pods
# - how many user servers are currently running?
# - is this an acceptable time to upgrade?
kubectl get pod -A -l "component=singleuser-server"
kubectl get pod -A -l "app.kubernetes.io/component in (dask-scheduler, dask-worker)"
3. Notify in slack#
Notify others in 2i2c that your are starting this cluster upgrade in the
#maintenance-notices
Slack channel.
4. Upgrade the k8s control plane[2]#
4.1. Upgrade the k8s control plane one minor version#
The k8s control plane can only be upgraded one minor version at the time,[1] so
increment the version field in the .jsonnet
file.
{
name: "openscapeshub",
region: clusterRegion,
version: "1.29", # increment this
}
Re-generate the .yaml
file, and then perform the upgrade which typically takes
~10 minutes. This upgrade is not expected to be disruptive for EKS clusters as
they come with multiple replicas of the k8s api-server.
jsonnet $CLUSTER_NAME.jsonnet > $CLUSTER_NAME.eksctl.yaml
eksctl upgrade cluster --config-file=$CLUSTER_NAME.eksctl.yaml --approve
Note
If you see the error Error: the server has asked for the client to provide credentials
don’t worry, if you try it again you will find that the cluster is
now upgraded.
4.2. Upgrade EKS add-ons#
As documented in eksctl
’s documentation[2], we also need to upgrade EKS
add-ons. This upgrade is believed to very briefly disrupt networking.
# upgrade all EKS addons (takes up to a few minutes)
eksctl update addon --config-file=$CLUSTER_NAME.eksctl.yaml
Note
Since November 2024, the add-ons are systematically installed as EKS managed
add-ons and not self-managed as they were before when eksctl
installed them
without involving EKS.
4.3. Repeat to upgrade multiple minor versions#
If you need to upgrade multiple minor versions, repeat the previous steps starting with the minor version upgrade.
5. Upgrade node groups#
All of the cluster’s node groups should be upgraded. Strategies to upgrade nodes are introduced in About strategies to upgrade nodes.
First upgrade the core node group with a rolling upgrade using
drain
.Then upgrade user node group(s) with rolling upgrades without using
drain
, or using re-creation upgrades if they aren’t running or empty.
Before committing to a node group upgrade its upgrade strategy, overview currently active nodes:
kubectl get node --label-columns=alpha.eksctl.io/nodegroup-name
Performing rolling upgrades (using drain or not)#
Create a new node group
If you are going to use drain (core node groups), rename the node group you want to do a rolling upgrade on in the
.jsonnet
file, for example by adjusting anameSuffix
froma
tob
. If you aren’t going to use drain (user node groups), add a duplicate but renamed entry instead.Node group names are set by
namePrefix
,nameSuffix
, andnameIncludeInstanceType
. Example node group names arecore-b
,nb-r5-xlarge
,nb-r5-xlarge-b
,dask-r5-4xlarge
.# create a new node group (takes ~5 min) # IMPORTANT: review the --include flag's value jsonnet $CLUSTER_NAME.jsonnet > $CLUSTER_NAME.eksctl.yaml eksctl create nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="core-b"
Optionally taint and wait
If you aren’t going to forcefully drain the old node group’s nodes, you have to wait for the nodes to empty out before deleting the old node group.
First taint the old node group’s nodes to stop new pods from scheduling on them. Doing this also prevents the cluster-autoscaler from scaling up new such nodes, at least until they have all scaled down.
kubectl taint node manual-phaseout:NoSchedule -l alpha.eksctl.io/nodegroup-name=core-b
Then a comment in the
.jsonnet
file above to the old node group saying:// FIXME: tainted, to be deleted when empty, replaced by equivalent during k8s upgrade
You can now commit changes and come back to the final deletion step at a later point in time.
Delete the old node group
Important
Before draining and deleting the old node group, you need to update the
tigera-operator
deployment and force it to not use the old node group anymore. Otherwise, the the drain process will get stuck.This is because
tigera-operator
tolerates all taints by default includingNoSchedule
, and it will recreate itself indefinitely on the tainted node that’s being drained even though it’s tainted withNoSchedule
.To fix this, you need to edit the
tigera-operator
deployment with:kubectl edit deployment tigera-operator -n tigera-operator
and search after
Exists
and remove the following toleration from that entry:tolerations: - effect: NoSchedule operator: Exists
You can now proceed with the drain and delete process.
If you added a duplicate renamed node group, then first remove the old node group in the
.jsonnet
file.# drains and deletes node groups not found in config (takes ~20s if the node groups has no running nodes) # IMPORTANT: note that --approve is needed and commented out jsonnet $CLUSTER_NAME.jsonnet > $CLUSTER_NAME.eksctl.yaml eksctl delete nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --only-missing # --approve
Note
The
eksctl
managed drain operation may get stuck but you could help it along by finding the pods that fail to terminate withkubectl get pod -A
, and then manually forcefully terminating them withkubectl delete pod --force <...>
.
Performing re-creation upgrades#
If you do maintenance on a cluster with no running user workloads (user servers,
dask gateway workers), a quick and simple strategy is to delete all user node
groups and then re-create them. The eksctl [delete|create] nodegroup
commands
can work with multiple node group at the time by using wildcards in the
--include
flag.
Delete node group(s)
# delete the node groups (takes ~20s if the node groups has no running nodes) # IMPORTANT: review the --include flag's value # IMPORTANT: note that --approve is needed and commented out eksctl delete nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="nb-*,dask-*,gpu-*" # --approve
Wait ~60 seconds
If we don’t wait now, our create command may fail without an error and instead log a message like # existing nodegroup(s) (…) will be excluded. If it happens though, you can just re-run the create command though.
Re-create node group(s)
# re-create node group(s) (takes ~5 min) # IMPORTANT: review the --include flag's value eksctl create nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="nb-*,dask-*,gpu-*"
6. Notify in slack#
Notify others in 2i2c that you’ve finalized the cluster upgrade in the
#maintenance-notices
Slack channel.
7. Commit changes#
During this upgrade, the k8s version and possibly the node group name might have been changed. Make sure you commit this changes after the upgrade is finished.