Upgrade Kubernetes cluster on AWS#
Warning
This upgrade will cause disruptions for users and trigger alerts for
Simple HTTPS uptime checks. To help other engineers, communicate that your are starting a
cluster upgrade in the #maintenance-notices
Slack channel and setup a snooze
Warning
We haven’t yet established a policy for planning and communicating maintenance procedures to users. So preliminary, only make a k8s cluster upgrade while the cluster is unused or that the maintenance is communicated ahead of time.
Pre-requisites#
Install or upgrade CLI tools
Install required tools as documented in Install required tools locally, and ensure you have a recent version of eksctl.
Warning
Using a modern version of
eksctl
has been found important historically, make sure to use the latest version to avoid debugging an already fixed bug!Consider changes to
template.jsonnet
The eksctl config jinja2 template
eksctl/template.jsonnet
was once used to generate the jsonnet templateeksctl/$CLUSTER_NAME.jsonnet
, that has been used to generate an actual eksctl config.Before upgrading an EKS cluster, it could be a good time to consider changes to
eksctl/template.jsonnet
since this cluster’s jsonnet template was last generated, which it was initially according to Generate cluster files.To do this first ensure
git status
reports no changes, then generate new cluster files using the deployer script, then restore changes to everything but theeksctl/$CLUSTER_NAME.jsonnet
file.export CLUSTER_NAME=<cluster-name> export CLUSTER_REGION=<cluster-region-like ca-central-1> export HUB_TYPE=<hub-type-like-basehub>
# only continue below if git status reports a clean state git status # generates a few new files deployer generate-aws-cluster --cluster-name=$CLUSTER_NAME --cluster-region=$CLUSTER_REGION --hub-type=$HUB_TYPE # overview changed files git status # restore changes to all files but the .jsonnet files git add *.jsonnet git checkout .. # .. should be the git repo's root git reset # inspect changes git diff
Finally if you identify changes you think should be retained, add and commit them. Discard the remaining changes with a
git checkout .
command.Learn how to generate an
eksctl
config fileWhen upgrading an EKS cluster, we will use
eksctl
extensively and reference a generated config file,$CLUSTER_NAME.eksctl.yaml
. It’s generated from the the$CLUTER_NAME.jsonnet
file.If you update the .jsonnet file, make sure to re-generate the .yaml file before using
eksctl
. Respectively if you update the .yaml file directly, remember to update the .jsonnet file.# re-generate an eksctl config file for use with eksctl jsonnet $CLUSTER_NAME.jsonnet > $CLUSTER_NAME.eksctl.yaml
Cluster upgrade#
Ensure in-cluster permissions
The k8s api-server won’t accept commands from you unless you have configured a mapping between the AWS user to a k8s user, and
eksctl
needs to make some commands behind the scenes.This mapping is done from a ConfigMap in kube-system called
aws-auth
, and we can use aneksctl
command to influence it.eksctl create iamidentitymapping \ --cluster=$CLUSTER_NAME \ --region=$CLUSTER_REGION \ --arn=arn:aws:iam::<aws-account-id>:user/<iam-user-name> \ --username=<iam-user-name> \ --group=system:masters
Acquire and configure AWS credentials
Visit https://2i2c.awsapps.com/start#/ and acquire CLI credentials.
In case the AWS account isn’t managed there, inspect
config/$CLUSTER_NAME/cluster.yaml
to understand what AWS account number to login to at https://console.aws.amazon.com/.Configure credentials like:
export AWS_ACCESS_KEY_ID="..." export AWS_SECRET_ACCESS_KEY="..."
Upgrade the k8s control plane’s one minor version
The k8s control plane can only be upgraded one minor version at the time.[1] So, update the eksctl config’s version field one minor version.
Then, perform the upgrade which typically takes ~10 minutes.
eksctl upgrade cluster --config-file=$CLUSTER_NAME.eksctl.yaml --approve
Note
If you see the error
Error: the server has asked for the client to provide credentials
don’t worry, if you try it again you will find that the cluster is now upgraded.Upgrade node groups up to two minor versions above the k8s control plane
A node’s k8s software (
kubelet
) can be up to two minor versions ahead or behind the control plane version.[1] Due to this, you can plan your cluster upgrade to only involve one node group upgrade even if you increment the control plane four minor versions.So if you upgrade from k8s 1.21 to 1.24, you can for example upgrade the k8s control plane from 1.21 to 1.22, then upgrade the node groups from 1.21 to 1.24, followed by upgrading the control plane two steps in a row.
To upgrade (unmanaged) node groups, you delete them and then them back. When adding them back, make sure your cluster config’s k8s version is what you want the node groups to be added back as.
Update the k8s version in the config temporarily
This is to influence the k8s software version for the nodegroup’s we create only. We can choose something two minor versions of the current k8s control plane version.
Add a new core node group (like
core-b
)Rename (part 1/3) the config file’s entry for the core node group temporarily when running this command, either from
core-a
tocore-b
or the other way around.eksctl create nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="core-b"
Delete all old node groups (like
core-a,nb-*,dask-*
)Rename (part 2/3) the core node group again in the config to its previous name, so the old node group can be deleted with the following command.
eksctl delete nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="core-a,nb-*,dask-*" --approve --drain=true
Rename (part 3/3) the core node group one final time in the config to its new name, as that represents the state of the EKS cluster.
Re-create all non-core node groups (like
nb-*,dask-*
)eksctl create nodegroup --config-file=$CLUSTER_NAME.eksctl.yaml --include="nb-*,dask-*"
Restore the k8s version in the config
We adjusted the k8s version in the config to influence the desired version of our created nodegroups. Let’s restore it to what the k8s control plane currently have.
Upgrad EKS add-ons (takes ~35s)*
As documented in
eksctl
’s documentation[1], we also need to upgrade three EKS add-ons enabled by default, and one we have added manually.# upgrade the kube-proxy daemonset eksctl utils update-kube-proxy --config-file=$CLUSTER_NAME.eksctl.yaml --approve # upgrade the aws-node daemonset eksctl utils update-aws-node --config-file=$CLUSTER_NAME.eksctl.yaml --approve # upgrade the coredns deployment eksctl utils update-coredns --config-file=$CLUSTER_NAME.eksctl.yaml --approve # upgrade the aws-ebs-csi-driver addon's deployment and daemonset eksctl update addon --config-file=$CLUSTER_NAME.eksctl.yaml
Note
Common failures The kube-proxy deamonset’s pods may fail to pull the image, to resolve this visit AWS EKS docs on managing coredns to identify the version to use and update the coredns deployment’s container image to match it.
kubectl edit daemonset coredns -n kube-system
Repeat steps 3 and 5 if needed
If you upgrade k8s multiple minor versions, repeat step 3 and 5, where you increment it one minor version at the time.