Add support for daskhubs - 2i2c Infrastructure Guide

To an existing hub¶

A daskhub setup can now also be enabled after a hub has been setup as a basehub.

To enable dask-gateway support on a hub, the following configuration changes need to be made to the hub’s values file:

set dask-gateway.enabled to true:
```
dask-gateway:
  enabled: true
```

enable authentication with JupyterHub:

dask-gateway:
  enabled: true
  gateway:
    auth:
      type: jupyterhub
      jupyterhub:
        jupyterhubServiceName: dask-gateway

set jupyterhub.custom.daskGateway.enabled to true:

jupyterhub:
  custom:
    daskhubSetup:
      enabled: true

grant some or all Hub users access to the dask-gateway service:

all users:

jupyterhub:
  hub:
    loadRoles:
      server:
        scopes:
        - self
        - access:services!service=dask-gateway
      user:
        scopes:
        - self
        - access:services!service=dask-gateway

some users (eg. only the dask-access group and user user-with-access will be able access to dask-gateway)

jupyterhub:
  hub:
    loadRoles:
      server:
        scopes:
        - self
        - access:services!service=dask-gateway
      dask-users:
        scopes:
          - access:services!service=dask-gateway
        groups:
          - dask-access
        users:
          - user-with-access

set jupyterhub.singleuser.cloudMetadata.blockWithIptables to false:
This is to don’t block access to the cloud provider’s metadata server! If we do the coupling between the cloud providers IAM permissions and the credentials provided to pod’s by mounting a k8s ServiceAccount with certain annotations on breaks (AWS IRSA, GCP workload identity). This in turn results in users unable to able to access AWS/GCP object storage buckets.
```
jupyterhub:
  singleuser:
    cloudMetadata:
      blockWithIptables: false
```
if binderhub is enabled to work against a private container registry:
Then dask-gateway’s scheduler and worker pods need to pull from that registry, so follow the final step in Setup the binderhub-service chart to set up permissions for that.

To an existing cluster¶

GCP¶

Setting up dask nodepools with terraform can be done by adding the following to the cluster’s terraform config file:

# Setup a single node pool for dask workers.
#
# A not yet fully established policy is being developed about using a single
# node pool, see https://github.com/2i2c-org/infrastructure/issues/2687.
#
dask_nodes = {
  "n2-highmem-16" : {
    min : 0,
    max : 100,
    machine_type : "n2-highmem-16",
  },
}

This provisions a single n2-highmem-16 nodepool. The reasons behind the choice of machine can be found in 2i2c-org/infrastructure#2687.

Tip

Don’t forget to run terraform plan and terraform apply for the new node pool to get created.

terraform plan -var-file projects/$CLUSTER_NAME.tfvars

terraform apply -var-file projects/$CLUSTER_NAME.tfvars

AWS¶

We use eksctl with jsonnet to provision our kubernetes clusters on AWS, and we can configure a node group there for the dask pods to run onto.

In the appropriate .jsonnet file, update the local daskNodes:
This is how it could look in a .jsonnet file after updating the local daskNodes = [] variable. Note that there needs to be one nodegroup dictionary per hub and {{hub-name}} should be replaced with the actual name of the hub.
```
local daskNodes = [
  {
    namePrefix: "dask-{{hub-name}}",
    labels+: { "2i2c/hub-name": "{{hub-name}}" },
    tags+: { "2i2c:hub-name": "{{hub-name}}" },
    instancesDistribution+: { instanceTypes: ["r5.4xlarge"] }
  },
];
```

Render the .jsonnet file into a .yaml file that eksctl can use

export CLUSTER_NAME=<your_cluster>

jsonnet $CLUSTER_NAME.jsonnet > $CLUSTER_NAME.eksctl.yaml

Create the nodegroup
```
eksctl create nodegroup -f $CLUSTER_NAME.eksctl.yaml
```
This should create the nodegroup with 0 nodes in it, and the autoscaler should recognize this! eksctl will also setup the appropriate driver installer, so you won’t have to.