Our CI/CD system#

Overview#

We use GitHub Actions as our CI/CD vendor and all our workflows are defined as YAML files in the .github/workflows folder in the infrastructure repo.

Automatic hub deployment#

Further reading

You can learn more about this workflow in our blog post Multiple JupyterHubs, multiple clusters, one repository.

The best place to learn about the latest state of our automatic hub deployment is to look at the deploy-hubs.yaml GitHub Actions workflow file. This workflow file depends on a locally defined action that sets up access to a given cluster and itself contains four main jobs, detailed below.

1. generate-jobs: Generate Helm upgrade jobs#

The first job takes a list of files that have been added/modified as part of a Pull Request and pipes them into the generate-helm-upgrade-jobs sub-command of the deployer module. This sub-command uses a set of functions to calculate which hubs on which clusters require a helm upgrade, alongside whether the support chart and staging hub on that cluster should also be upgraded. If any production hubs require an upgrade, the upgrade of the staging hub is a requirement.

This job provides the following outputs:

  • Two JSON objects that can be read by later GitHub Actions jobs to define matrix jobs. These JSON objects detail: which clusters require their support chart and/or staging hub to be upgraded, and which production hubs require an upgrade.

  • The above JSON objects are also rendered as human-readable tables using rich.

Some special cased filepaths#

While the aim of this workflow is to only upgrade the pieces of the infrastructure that require it with every change, some changes do require us to redeploy everything.

  • If a cluster’s cluster.yaml file has been modified, we upgrade the support chart and all hubs on that cluster. This is because we cannot tell what has been changed without inspecting the diff of the file.

  • If either the basehub or daskhub Helm charts have additions/modifications in their paths, we redeploy all hubs across all clusters.

  • If the support Helm chart has additions/modifications in its path, we redeploy the support chart on all clusters.

  • If the deployer module has additions/modifications in its path, then we redeploy all hubs on all clusters.

Note

Right now, we redeploy everything when the deployer changes since the deployer undertakes some tasks that generates config related to authentication. This may change in the future as we move towards the deployer becoming a separable, stand-alone package.

2. upgrade-support-and-staging: Upgrade support and staging hub Helm charts on clusters that require it#

The next job reads in one of the JSON objects detailed above that defines which clusters need their support chart and/or staging hub upgrading. Note that it is not a requirement for both the support chart and staging hub to be upgraded during this job. A matrix job is set up that parallelises over all the clusters defined in the JSON object. For each cluster, the support chart is first upgraded (if required) followed by the staging hub (if required).

Note

The 2i2c cluster is a special case here as it has two staging hubs: one running the basehub Helm chart, and the other running the daskhub Helm chart. We therefore run an extra step for the 2i2c cluster to upgrade the dask-staging hub (if required).

We use staging hubs as canary deployments and prevent deploying production hubs if a staging deployment fails. Hence, the last step of this job is to set an output variable that stores if the job completed successfully or failed.

3. filter-generate-jobs: Filter out jobs for clusters whose support/staging job failed#

This job is an optimisation job. While we do want to prevent all production hubs on Cluster X from being upgraded if its support/staging job fails, we don’t want to prevent the production hubs on Cluster Y from being upgraded because the support/staging job for Cluster X failed.

This job reads in the production hub job definitions generated in job 1 and the support/staging success/failure variables set in job 2, then proceeds to filter out the productions hub upgrade jobs that were due to be run on a cluster whose support/staging job failed.

4. upgrade-prod-hubs: Upgrade Helm chart for production hubs in parallel#

This last job deploys all production hubs that require it in parallel to the clusters that successfully completed job 2.

Known issues with this workflow#

Sometimes the generate-jobs job fails with the following message:

The head commit for this pull_request event is not ahead of the base commit. Please submit an issue on this action's GitHub repo.

This issue is tracked upstream and can be resolved by rebasing your branch against master.

Helm chart values and cluster config validation#

All of our helm charts have associated values.schema.yaml files and we also maintain a custom cluster.schema.yaml file. Here is an example of a values.schema.yaml file from the basehub chart. These schemas explicitly list what values can be passed through our config, and what type these values should have. Therefore, we can use these schemas to validate all our config before making deploys, and catch bugs early.

We have functions within the deployer that validate the cluster config, the support chart values, and the helm chart values for each hub against these schemas. We automatically run these functions in GitHub Actions configured by the validate-clusters.yaml workflow file. This workflow is only triggered when related configuration has changed.

Finding the right workflow runs#

GitHub’s UI is slightly confusing for distinguishing between workflows that ran on a Pull Request and workflows that ran on the merge commit to the default branch.

To help contributors to our infrastructure repository find the right workflow run, we have a GitHub Actions workflow that posts a comment on a just merged Pull Request with a link to the GitHub Actions UI filtered for the deploy-hubs.yaml workflow (described above) running on the default branch. Hence, it should be much easier to find the current deployments happening on master than just following GitHub’s UI.

Automatically bumping image tags and helm sub-chart versions#

Throughout the infrastructure repo we have a few upstream dependencies. This section will focus on the images our JupyterHubs use to define environments and services, the sub-charts our helm charts are built on top of, and the process we have for automatically keeping these up-to-date with upstream releases.

Bumping image tags#

To keep the tags of any images we use up-to-date with upstream container registries, we use this Action: sgibson91/bump-jhub-image-action in the bump-image-tags.yaml workflow file.

This workflow runs as a matrix where one matrix job relates to one config file. A config file might be a *.values.yaml file for a specific hub, or a values.yaml file for a helm chart. But all it really needs to contain is valid YAML!

Two inputs are required for this Action:

  1. The path to the config file as defined from the root of the repository, e.g., helm-charts/basehub/values.yaml

  2. A variable called images_info which is a list of dictionaries containing information about the images we wish to bump in the given config file. By providing a list in this way, we can choose to include/exclude images in the given config from being bumped.

Each dictionary in the images_info list must have a values_path key whose value is a valid JMESPath expression to the image we would like to bump. The example below would bump the singleuser image.

[{"values_path": ".singleuser.image"}]

Additionally, you can provide a regexpr key with a valid regular expression that will filter the tags available on the container registry. This can be particularly useful if the image has different tags published, e.g., commit tags as well as date tags, etc.

More configuration options

Please see the project README for more information about configuring this Action.

When triggered, either on a schedule or by a workflow dispatch event, the Action will open a Pull Request for each item in the matrix, bumping the tags for the defined images in the defined config for each matrix job.

Warning

Currently this Action only works for images that are publicly available on either Docker Hub or quay.io.

  • To contribute support for other container registries, see this issue

  • To contribute support for authenticated calls to container registries, see this issue

Bumping helm sub-chart versions#

To keep the versions of sub-charts (charts our helm charts depend on) up-to-date with upstream releases, we use this Action: sgibson91/bump-helm-deps-action in the bump-helm-versions.yaml workflow file.

This workflow runs as a matrix where one matrix job relates to one of our helm charts, e.g., basehub. A config file is where the dependencies for that helm chart are listed. This is usually in a Chart.yaml file, but has historically also been a requirements.yaml file. All it really needs to contain is valid YAML!

Two inputs are required for this Action:

  1. The path to the config file as defined from the root of the repository, e.g., helm-charts/basehub/Chart.yaml

  2. A variable called chart_urls which is a dictionary containing information about the sub-charts we wish to bump in the given config file. By providing a dictionary in this way, we can choose to include/exclude sub-charts in the given config from being bumped.

The chart_urls has the sub-charts we wish to bump as keys, and URLs where a list of pulished versions of those charts is available. An example below would bump the JupyterHub subchart of the basehub helm chart.

{"jupyterhub": "https://jupyterhub.github.io/helm-chart/index.yaml"}

Note that the URL is not the expected https://jupyterhub.github.io/helm-chart/. This is so the Action can pass the file contents directly to a YAML parser, rather than having to scrape the rendered site’s HTML.

More configuration options

Please see the project README for more information about configuring this Action.

When triggered, either on a schedule or by a workflow dispatch event, the Action will open a Pull Request for each item in the matrix, bumping the versions for the defined sub-charts in the defined config for each matrix job.

Warning

Currently this Action only works for sub-charts that have a YAML formatted index of versions published at a URL that either:

  • contains /gh-pages/, or;

  • ends with index.yaml (or index.yml).

Other sources for version lists, such as GitHub Releases or HTML sites, will need to have code added upstream as they are required.