New Kubernetes cluster on AWS#
We use eksctl to provision our k8s clusters on AWS and terraform to provision supporting infrastructure, such as storage buckets.
Install required tools locally#
Install
kubectl
,helm
,sops
, etc.In Setting up your local environment to work on this repo you find instructions on how to setup
sops
to encrypt and decrypt files.Install
aws
Verify install and version with
aws --version
. You should have at least version 2.Install or upgrade eksctl
Mac users with homebrew can run
brew install eksctl
.Verify install and version with
eksctl version
. You typically need a very recent version of this CLI.Install
jsonnet
Mac users with homebrew can run
brew install jsonnet
.Verify install and version with
jsonnet --version
.
Create a new cluster#
Setup credentials#
Depending on wether this project is using AWS SSO or not, you can use the following links to figure out how to authenticate to this project from your terminal.
Generate cluster files#
We automatically generate the files required to setup a new cluster:
A
.jsonnet
file for use witheksctl
An ssh key for use with eksctl
A
.tfvars
file for use withterraform
You can generate these with:
export CLUSTER_NAME=<cluster-name>
export CLUSTER_REGION=<cluster-region-like ca-central-1>
export HUB_TYPE=<hub-type-like-basehub>
deployer generate-aws-cluster --cluster-name=$CLUSTER_NAME --cluster-region=$CLUSTER_REGION --hub-type=$HUB_TYPE
This will generate the following files:
eksctl/$CLUSTER_NAME.jsonnet
with a default cluster configuration, deployed tous-west-2
eksctl/ssh-keys/secret/$CLUSTER_NAME.key
, asops
encrypted ssh private key that can be used to ssh into the kubernetes nodes.eksctl/ssh-keys/$CLUSTER_NAME.pub
, an ssh public key used byeksctl
to grant access to the private key.terraform/aws/projects/$CLUSTER_NAME.tfvars
, a terraform variables file that will setup most of the non EKS infrastructure.
Create and render an eksctl config file#
We use an eksctl config file in YAML to specify
how our cluster should be built. Since it can get repetitive, we use
jsonnet to declaratively specify this config. You can
find the .jsonnet
files for the current clusters in the eksctl/
directory.
The previous step should’ve created a baseline .jsonnet
file you can modify as
you like. The eksctl docs have a reference
for all the possible options. You’d want to make sure to change at least the following:
Region / Zone - make sure you are creating your cluster in the correct region and verify the suggested zones 1a, 1b, and 1c actually are available in that region.
# a command to list availability zones, for example # ca-central-1 doesn't have 1c, but 1d instead aws ec2 describe-availability-zones --region=$CLUSTER_REGION
Size of nodes in instancegroups, for both notebook nodes and dask nodes. In particular, make sure you have enough quota to launch these instances in your selected regions.
Kubernetes version - older
.jsonnet
files might be on older versions, but you should pick a newer version when you create a new cluster.
Once you have a .jsonnet
file, you can render it into a config file that eksctl
can read.
jsonnet $CLUSTER_NAME.jsonnet > $CLUSTER_NAME.eksctl.yaml
Tip
The *.eksctl.yaml
files are git ignored as we can regenerate it, so work
against the *.jsonnet
file and regenerate the YAML file when needed by a
eksctl
command.
Create the cluster#
Now you’re ready to create the cluster!
eksctl create cluster --config-file=$CLUSTER_NAME.eksctl.yaml
Tip
Make sure the run this command inside the eksctl
directory, otherwise it cannot discover the ssh-keys
subfolder.
This might take a few minutes.
If any errors are reported in the config (there is a schema validation step),
fix it in the .jsonnet
file, re-render the config, and try again.
Once it is done, you can test access to the new cluster with kubectl
, after
getting credentials via:
aws eks update-kubeconfig --name=$CLUSTER_NAME --region=$CLUSTER_REGION
kubectl
should be able to find your cluster now! kubectl get node
should show
you at least one core node running.
Deploy Terraform-managed infrastructure#
Our AWS terraform code is now used to deploy supporting infrastructure for the EKS cluster, including:
An IAM identity account for use with our CI/CD system
Appropriately networked EFS storage to serve as an NFS server for hub home directories
Optionally, setup a shared database
Optionally, setup user buckets
We still store terraform state
in GCP, so you also need to have gcloud
set up and authenticated already.
The steps in Generate cluster files will have created a default
.tfvars
file.This file can either be used as-is or edited to enable the optional features listed above.
Initialise terraform for use with AWS:
cd terraform/aws terraform init
Create a new terraform workspace
terraform workspace new $CLUSTER_NAME
Deploy the terraform-managed infrastructure
terraform plan -var-file projects/$CLUSTER_NAME.tfvars
Observe the plan carefully, and accept it.
terraform apply -var-file projects/$CLUSTER_NAME.tfvars
Export account credentials with finely scoped permissions for automatic deployment#
In the previous step, we will have created an AWS IAM user with just enough permissions for automatic deployment of hubs from CI/CD. Since these credentials are checked-in to our git repository and made public, they should have least amount of permissions possible.
First, make sure you are in the right terraform directory:
cd terraform/aws
Fetch credentials for automatic deployment
mkdir -p ../../config/clusters/$CLUSTER_NAME terraform output -raw continuous_deployer_creds > ../../config/clusters/$CLUSTER_NAME/deployer-credentials.secret.json
Encrypt the file storing the credentials
sops --output ../../config/clusters/$CLUSTER_NAME/enc-deployer-credentials.secret.json --encrypt ../../config/clusters/$CLUSTER_NAME/deployer-credentials.secret.json
Double check to make sure that the
config/clusters/$CLUSTER_NAME/enc-deployer-credentials.secret.json
file is actually encrypted bysops
before checking it in to the git repo. Otherwise this can be a serious security leak!cat ../../config/clusters/$CLUSTER_NAME/enc-deployer-credentials.secret.json
Grant the freshly created IAM user access to the kubernetes cluster.
As this requires passing in some parameters that match the created cluster, we have a
terraform output
that can give you the exact command to run.terraform output -raw eksctl_iam_command
Run the
eksctl create iamidentitymapping
command returned byterraform output
. That should give the continuous deployer user access.The command should look like this:
eksctl create iamidentitymapping \ --cluster $CLUSTER_NAME \ --region $CLUSTER_REGION \ --arn arn:aws:iam::<aws-account-id>:user/hub-continuous-deployer \ --username hub-continuous-deployer \ --group system:masters
Create a minimal
cluster.yaml
file (config/clusters/$CLUSTER_NAME/cluster.yaml
), and provide enough information for the deployer to find the correct credentials.name: <your-cluster-name> provider: aws aws: key: enc-deployer-credentials.secret.json clusterType: eks clusterName: <your-cluster-name> region: <your-region> hubs: []
Note
The
aws.key
file is defined relative to the location of thecluster.yaml
file.Test the access by running:
deployer use-cluster-credentials $CLUSTER_NAME
and running:
kubectl get node
It should show you the provisioned node on the cluster if everything works out ok.
Grant eksctl
access to other users#
Note
This section is still required even if the account is managed by SSO. Though a
user could run deployer use-cluster-credentials $CLUSTER_NAME
to gain access
as well.
AWS EKS has a strange access control problem, where the IAM user who creates the cluster has full access without any visible settings changes, and nobody else does. You need to explicitly grant access to other users. Find the usernames of the 2i2c engineers on this particular AWS account, and run the following command to give them access:
Note
You can modify the command output by running terraform output -raw eksctl_iam_command
as described in Export account credentials with finely scoped permissions for automatic deployment.
eksctl create iamidentitymapping \
--cluster $CLUSTER_NAME \
--region $CLUSTER_REGION \
--arn arn:aws:iam::<aws-account-id>:user/<iam-user-name> \
--username <iam-user-name> \
--group system:masters
This gives all the users full access to the entire kubernetes cluster. After this step is done, they can fetch local config with:
aws eks update-kubeconfig --name=$CLUSTER_NAME --region=$CLUSTER_REGION
This should eventually be converted to use an IAM Role instead, so we need not give each individual user access, but just grant access to the role - and users can modify them as they wish.
Export the EFS IP address for home directories#
The terraform run in the previous step will have also created an EFS instance to store the hub home directories, and sets up the network correctly to mount it.
Get the address a hub on this cluster should use for connecting to NFS with
terraform output nfs_server_dns
, and set it in the hub’s config under
nfs.pv.serverIP
(nested under basehub
when necessary) in the appropriate
<hub>.values.yaml
file.
Add the cluster to be automatically deployed#
The deploy-hubs
GitHub workflow has a job named
upgrade-support-and-staging
that need to list of clusters being
automatically deployed by our CI/CD system. Add an entry for the new cluster
here.
A note on the support chart for AWS clusters#
Warning
When you deploy the support chart on an AWS cluster, you must enable the
cluster-autoscaler
sub-chart, otherwise the node groups will not automatically
scale. Include the following in your support.values.yaml
file:
cluster-autoscaler:
enabled: true
autoDiscovery:
clusterName: <cluster-name>
awsRegion: <aws-region>