(features:binderhub-ui-hub)=
# Make binderhub-ui hub

We can support users who want to build, push and launch user images, from open source GitHub repositories, from an UI similar with [mybinder.org](https://mybinder.org).

We call this a `binderhub-ui` style hub and the primary features offered would be:

- **(Optional) User authentication**
- **NO Persistent storage**
- **BinderHub style UI**
- **Two separate domains, one for binderhub UI and one for JupyterHub**

   Keeping them separate will help with having clean and correct sharing URLs without having them be based off the `hub/services/:name` path.
- **A logged out homepage, and a logged-in homepage**

   See https://github.com/2i2c-org/infrastructure/issues/4168 for context on this decision.

## Generate a sample initial hub configuration

Directly run the following `deployer` command or follow the steps in the [Initial Hub setup](hub-deployment-guide:runbooks:phase3.1) guide to get started on a very basic hub setup or copy-paste the configuration of a similar hub, then follow the steps below.

```bash
deployer generate hub-asset binderhub-ui-values-file
```

## Double-check the generated config

The sample config that has been generated by the deployer command needs to be checked to make sure that everything is as expected and nothing is missing.

Follow the checklist below before committing the hub values file to the infrastructure repository.

### I. General configuration

The following configuration applies to both authenticated and not-authenticated binderhubs.

#### 1. Check that some of the inherited configuration is emptied

Some of the configuration that gets inherited either from the `basehub` defaults or from the cluster's common value file (if that exists) needs to be clear out as not relevant for a binderhub style hub.

- disable `jupyterhub-home-nfs` (no persistent storage)

  ```yaml

  jupyterhub-home-nfs:
    enabled: false
  ```

- disable `jupyterhub.custom.singleuserAdmin.extraVolumeMounts` (no persistent storage so these don't make sense)

    ```yaml
    jupyterhub:
      custom:
        singleuserAdmin:
          extraVolumeMounts:
    ```
- on the singleuser server, disable storage and init containers and profile lists

    There will be no persistent storage, so disable it. Because of this `singleuser.extraVolumeMounts` and `singleuser.initContainers` should also be emptied.

    Also make sure that the profileList is disabled in case it gets set in the common values file, as keeping it will make binderhub fail to launch a server.

    ```yaml
    jupyterhub:
      singleuser:
        storage:
          type: none
          extraVolumeMounts:
          extraVolumes:
        initContainers: []
        profileList: []
    ```
#### 2. Check jupyterhub and binderhub domains setup

Having separate domains for both jupyterhub and binderhub will help with having clean and correct sharing URLs without having them be based off the `hub/services/:name` path.

So make sure ingress is setup correctly for both jupyterhub and binderhub and **make sure that ingress.tls.secretName differs** as they will be in the same namespace and naming them the same will fail the setup of the other one.

```yaml
jupyterhub:
  ingress:
    hosts: [{{ jupyterhub_domain }}]
    tls:
      - hosts: [{{ jupyterhub_domain }}]
        secretName: https-auto-tls
  custom:
    binderhubUI:
      enabled: true

binderhub-service:
  ingress:
    enabled: true
    hosts: [{{ binderhub_domain }}]
    tls:
      - hosts: [{{ binderhub_domain }}]
        secretName: binder-https-auto-tls
```

#### 3. Check that binderhubUI is enabled

Enable `jupyterhub.custom.binderhubUI` which will in turn enable the hub to use [BinderSpawnerMixin](https://github.com/jupyterhub/binderhub/blob/bd297b2c3f713cf46b0b22cfabc86d8140bbed41/helm-chart/binderhub/values.yaml#L115-L207) that allows converting JupyterHub container spawners into BinderHub spawners

```yaml
jupyterhub:
  custom:
    binderhubUI:
      enabled: true
```

#### 4. Check that the binderhub-service chart and network policy is enabled

We will use the [binderhub-service](https://github.com/2i2c-org/binderhub-service/) Helm chart to run BinderHub, the Python software, as a standalone service to build and push images with [repo2docker](https://github.com/jupyterhub/repo2docker), next to JupyterHub so we need to enable it.

```yaml
binderhub-service:
  enabled: true
  networkPolicy:
    enabled: true
```

#### 5. Check that BinderHub is configured correctly

We need to configure BinderHub so that:

- it's not running in an API only mode
- it knows about where the hub is running

```yaml
binderhub-service:
  config:
    BinderHub:
      base_url: /
      hub_url: https://<jupyterhub-public-url>.2i2c.cloud
      badge_base_url: https://<binderhub-public-url>.2i2c.cloud
      enable_api_only_mode: false
```

#### 6. Check that the builder docker api and user pods are scheduled on the smallest available instance or on the hub dedicated instance type

In general, for GCP, they should run on `n2-highmem-4` and on AWS they should be placed on `r5.xlarge` machines. But it's best to double-check the cluster's terraform or eksctl configuration files to make sure this is the smallest instance and not another one.

If you are creating a separate nodepool for the binderhub, then you can set the instance type to be the one that is used for the hub of the desired size and type.


```yaml
binderhub-service:
  dockerApi:
    nodeSelector:
      # Schedule dockerApi pods to run on the smallest user nodes only
      # https://github.com/2i2c-org/infrastructure/issues/4241
      node.kubernetes.io/instance-type: n2-highmem-4
  config:
    KubernetesBuildExecutor:
      node_selector:
        # Schedule builder pods to run on the smallest user nodes only
        # https://github.com/2i2c-org/infrastructure/issues/4241
        node.kubernetes.io/instance-type: n2-highmem-4
jupyterhub:
  singleuser:
    nodeSelector:
      # Schedule users on the smallest instance
      # https://github.com/2i2c-org/infrastructure/issues/4241
      node.kubernetes.io/instance-type: n2-highmem-4
```

#### 7. Check the binderhub extra env variables

These are needed by the jupyterhub software bits that the binderhub software uses.

```yaml
binderhub-service:
  extraEnv:
    - name: JUPYTERHUB_API_TOKEN
      valueFrom:
        # Any JupyterHub Services api_tokens are exposed in this k8s Secret
        secretKeyRef:
          name: hub
          key: hub.services.binder.apiToken
    - name: JUPYTERHUB_CLIENT_ID
      value: "service-binder"
    - name: JUPYTERHUB_API_URL
      value: "https://<hub-public-url>.2i2c.cloud/hub/api"
    # Without this, the redirect URL to /hub/api/... gets
    # appended to binderhub's URL instead of the hub's
    - name: JUPYTERHUB_BASE_URL
      value: "https://<hub-public-url>.2i2c.cloud/"
```

#### 8. Setup logging of launch events to 2i2c

We are sending logs of launch events to a 2i2c managed GCP project to be able to
produce reports about usage in the future.

This requires an explicit opt-in in the deployments chart config and setup of
credentials to the 2i2c managed GCP project.

To opt-in, this should be configured:

```yaml
binderhub-service:
  custom:
    sendLogsOfLaunchEventsTo2i2c: true
```

To setup credentials, we can reuse a single GCP service account's key already
encrypted for other BinderHub UI enabled hubs. You can use `sops` to read, and
then to write.

```bash
# read from existing hub
sops config/clusters/2i2c/enc-binderhub-ui-demo.secret.values.yaml

# copy a section looking like this under binderhub-service
#
#   extraCredentials:
#       googleServiceAccountKey: |
#         ...
#         ...
#         ...
#

# write it to new hub by pasting it under binderhub-service
sops config/clusters/<cluster-name>/enc-<hubname>.secret.values.yaml
```

### II. Configuration specific to authenticated hubs

#### 1. Check that the simpler landing page is used

If accessing binderhub will require users to login first, then the login page, i.e. the page where users land to login into the hub before actually seeing the binderhub UI must be updated to use a simpler version of it.

This is done by having the hub track the `no-homepage-subsection` branch of the default homepage repo

```yaml
jupyterhub:
  custom:
    homepage:
      gitRepoBranch: "no-homepage-subsection"
```

#### 2. Make sure we don't redirect to singleuser server after login
After the user logs in, don't redirect it to it's server as we want them to go to the binderhub launch page to configure their image before launching it.

```yaml
jupyterhub:
  hub:
    redirectToServer: false
```

#### 3. Check the binder hub service

Setup `binder` as a jupyterhub externally managed service making sure that redirection happens correctly after authentication with the OAuth provider and that users are not presented with extra prompts to login.

```yaml
jupyterhub:
  hub:
    services:
      binder:
        oauth_redirect_uri: https://<binderhub-public-url>/oauth_callback
```

#### 4. Check the roles
Setup a `binder` and a `user` role and make sure the correct permissions are being assigned to this new service but also to the users so that they can access the service.

```yaml
jupyterhub:
  hub:
    loadRoles:
      # The role binder allows service binder to start and stop servers
      # and read (but not modify) any user’s model
      binder:
        services:
          - binder
        scopes:
          - servers
          - read:users
      # The role user allows access to the user’s own resources and to access
      # only the binder service
      user:
        scopes:
          - self
          # Admin users will by default have access:services, so this is only
          # observed to be required for non-admin users.
          - access:services!service=binder
```

#### 5. Make sure servers are spawned only for authenticated hub users

Enable authenticated binderhub spawner setup via `hub.config.BinderSpawnerMixin.auth_enabled`

```yaml
jupyterhub:
  hub:
    config:
      BinderSpawnerMixin:
        auth_enabled: true
```

#### 6. Check the binderhub extra env variables

There is one extra env var that needs to be set if the hub is authenticated:

```yaml
binderhub-service:
  extraEnv:
    - name: JUPYTERHUB_OAUTH_CALLBACK_URL
      value: "https://{{ binderhub_domain }}/oauth_callback"
```

### III. Configuration specific to non-authenticated hubs

#### 1. Check that the NullAuthenticator is used

This will disable the hub login page and allow binderhub to generate random usernames for user servers.

This also means that any configuration of the homepage (`gitRepoBranch` or `templateVars`) will just be ignored.
However, you need to disable `templateVars` configuration in order to pass the validation step.

```yaml
jupyterhub:
  custom:
    homepage:
      templateVars:
        enabled: false
  hub:
    config:
      JupyterHub:
        authenticator_class: "null"
```

#### 2. Check admins are disabled
No authentication, so no admins:

```yaml
jupyterhub:
  custom:
    2i2c:
      add_staff_user_ids_to_admin_users: false
```

#### 3. Check the roles
Setup a `binder` and a `user` role and make sure the correct permissions are being assigned to this new service but also to the users so that they can access the service.

```yaml
jupyterhub:
  hub:
    loadRoles:
      # The role binder allows service binder to start and stop servers
      # and read (but not modify) any user’s model
      binder:
        services:
          - binder
        scopes:
          - servers
          - admin:users
      # The role user allows access to the user’s own resources and to access
      # only the binder service
      user:
        scopes:
          - self
          # Admin users will by default have access:services, so this is only
          # observed to be required for non-admin users.
          - access:services!service=binder
```

#### 5. Make sure servers aren't spawned just for authenticated hub users

Disable authenticated binderhub spawner setup via `hub.config.BinderSpawnerMixin.auth_enabled`

```yaml
jupyterhub:
  hub:
    config:
      BinderSpawnerMixin:
        auth_enabled: false
```

#### 6. Check the singleuser cmd that is used

If the binderhub will be unauthenticated, then we need to replace `jupyterhub.singleuser.jupyterhub-singleuser` with `jupyterhub.singleuser.jupyter-lab` if available or `jupyterhub.singleuser.jupyter-notebook`.

Otherwise the requests to authorize the user server will get redirected to `/hub/login` which always returns a `403` HTTP response code when using the null authenticator.

```yaml
jupyterhub:
  singleuser:
    cmd:
      - python3
      - "-c"
      - |
        import os
        import sys

        try:
            import jupyterlab
            import jupyterlab.labapp
            major = int(jupyterlab.__version__.split(".", 1)[0])
        except Exception as e:
            print("Failed to import jupyterlab: {e}", file=sys.stderr)
            have_lab = False
        else:
            have_lab = major >= 3

        if have_lab:
            # technically, we could accept another jupyter-server-based frontend
            print("Launching jupyter-lab", file=sys.stderr)
            exe = "jupyter-lab"
        else:
            print("jupyter-lab not found, launching jupyter-notebook", file=sys.stderr)
            exe = "jupyter-notebook"

        # launch the notebook server
        os.execvp(exe, sys.argv)
```

#### 7. Restrict the repositories that can be built

When deploying an unauthenticated binderhub, it's useful to restrict what repositories can be built to avoid abuse. This can be achieved by setting 

```yaml
binderhub-service:
  config:
    GitHubRepoProvider:
      allowed_specs:
        - <some regex>
        - <another regex>
```

#### 8. Enabling CORS (optional)

Give access to the binder service from another resource, such as [live computation with a MyST website](https://mystmd.org/guide/integrating-jupyter#connecting-to-a-binder).

This requires enabling [Cross-Origin Resource Sharing (CORS) on the BinderHub](https://binderhub.readthedocs.io/en/latest/cors.html) for both JupyterHub and BinderHub. You can restrict access to certain domains or allow access from any domain with the `'*'` wildcard.

```yaml
jupyterhub:
  hub:
    config:
      BinderSpawnerMixin:
        cors_allow_origin: '*'

binderhub-service:
  config:
    BinderHub:
      cors_allow_origin: '*'
```

## Manually handle registry creation and login

Configuration about image registry is not *yet* being generated by the deployer, so the steps below need to be followed in order to set it up.

Following the steps below will require adding additional configuration to the sample file generated by the deployer.

```{important}
For clusters running on AWS, you can use the encrypted file located at `config/clusters/template/aws/enc-binder.secret.values.yaml` and follow the manual steps below to double-check everything is there and everything is correctly setup for your use-case (it should be).
```
### 1. Setup the image registry

Follow the guide at [](howto:features:imagebuilding-hub:image-registry) of the imagebuilding hub.

### 2. Further configure the `binderhub-service` chart

Some more configuration of the `binderhub-service` chart is required by following the guide at [](howto:features:imagebuilding-hub:configure-binderhub-service-chart).
Specifically, we need to:

- **Configure `binderhub-service.config.BinderHub.image_prefix** so that BinderHub knows under which prefix to push the images to the registry
- **Setup the credentials to push the image to the registry by the build pods** under `binderhub-service.buildPodsRegistryCredentials`
- **Setup `imagePullSecret` for pulling images from the registry** if using quay.io

### 3. Setup the credentials needed to check for and pull existing images in the registry by the BinderHub software

Without these credentials, images will be rebuilt unnecessarily since the BinderHub software does not have the appropriate permissions to check if an image exists in the registry.

We must pass information and credentials through `DockerRegistry` so that the BinderHub software can read from the registry.
You should have the username and password for the registry from a previous step, and `password` should be stored in the `enc-<hub>.secret.values.yaml` file.

```yaml
binderhub-service:
  config:
    DockerRegistry:
      # registry url address like https://quay.io or https://us-central1-docker.pkg.dev
      url: <url-address>
      username: <username>
      password: <password>
```

### 4. Sops-encrypt any credentials added to the `enc-<hub>.secret.values.yaml` file

This ensures they are not leaked.
See our [`sops` documentation](https://compass.2i2c.org/engineering/secrets/#sops-overview) for more info.

````{tip}
In setting up this config, we have repeated the username and password for the registry in a few places.
You can use [YAML anchors](https://support.atlassian.com/bitbucket-cloud/docs/yaml-anchors/) to avoid this repetition like the example config below.
Anchors work for individual values as well as maps, and are preserved when sops-encrypted too!

```yaml
jupyterhub:
  imagePullSecret:
    password: &password <password>
binderhub-service:
  buildPodsRegistryCredentials:
    password: *password
  DockerRegistry:
    password: *password
```
````