Setup a database server per user#
We can support providing a single, isolated database server per user. This is extremely helpful for teaching cases, as users can do whatever they want with their server without affecting other users.
Setting up the backing disk for the database#
Each database server needs a backing disk to store its data. We have custom code in
our basehub/values.yaml
that creates arbitrary extra disks per user that can be used.
jupyterhub:
custom:
singleuser:
extraPVCs:
- name: postgres-{username}
class: standard
capacity: 1Gi
This will create an additional Kubernetes Persistent Volume Claim
per user, named postgres-<name-of-user>
, created with Kubernetes Storage
Class, and capacity of
1Gi
(remember it is 1Gi
, not 1G
!).
You can get the list of storageclasses available in the target cluster with a
kubectl get storageclass
. On AWS, the two options generally available are
gp2
(standard spinning disk) and ssd
(faster, more expensive SSD). On GKE, the two options
are standard
(spinning disk), standard-rwo
(“balanced” disk) and premium-rwo
(SSD disk).
capacity
can be increased later on if needed and kubernetes will
automatically resize the disk
when users start / stop the server. However, the storage class can not be changed
afterwards - so users will have to decide if they want spinning disks (slower, chepeaper)
or ssd disks (faster, much more expensive) at the start.
Setup the database server#
We can then run the database server (such as postgres) as a sidecar container. This will run one server per user with a config we provide, and use the backing disk we have created for the user in the previous section. The configuration for server needs to be specific to the kind of database we are running.
Postgres#
The following config can be used to setup a postgres db container as a sidecar per-user. Tweak the config as necessary for the specific hub’s needs.
jupyterhub:
singleuser:
initContainers:
# /var/lib/postgresql should be writeable by uid 1000, so students
# can blow out their db directories if need to.
# We have to chown /home/jovyan and /home/jovyan/shared-readwrite as well -
# since initContainers is a list, setting this here overwrites the chowning
# initContainer we have set in basehub/values.yaml
- name: volume-mount-ownership-fix
image: busybox:1.36.1
command:
- sh
- -c
- id && chown 1000:1000 /home/jovyan /home/jovyan/shared /var/lib/postgresql/data && ls -lhd /home/jovyan
securityContext:
runAsUser: 0
volumeMounts:
- name: home
mountPath: /home/jovyan
subPath: "{username}"
# Mounted without readonly attribute here,
# so we can chown it appropriately
- name: home
mountPath: /home/jovyan/shared
subPath: _shared
- name: postgres-db
mountPath: /var/lib/postgresql/data
# postgres recommends against mounting a volume directly here
# So we put data in a subpath
subPath: data
storage:
extraVolumes:
# We include the dev-shm extraVolume as the list of extraVolumes from base hub will be overwritten
- name: dev-shm
emptyDir:
medium: Memory
- name: postgres-db
persistentVolumeClaim:
# This should match what is set as `name` in the earlier step under `custom.singleuser.extraPVCs`
claimName: 'postgres-{username}'
extraVolumeMounts:
# We include the dev-shm extraVolumeMount as the list of extraVolumeMounts from base hub will be overwritten
- name: dev-shm
mountPath: /dev/shm
- name: postgres-db
mountPath: /var/lib/postgresql/data
# postgres recommends against mounting a volume directly here
# So we put data in a subpath
subPath: data
extraContainers:
- name: postgres
image: postgres:14.5 # use the latest version available at https://hub.docker.com/_/postgres/tags
args:
# Listen only on localhost, rather than on all interfaces
# This allows us to use passwordless login, as only the user notebook container can access this
- -c
- listen_addresses=127.0.0.1
resources:
limits:
# Best effort only. No more than 1 CPU, and if postgres uses more than 512M, restart it
memory: 512Mi
cpu: 1.0
requests:
# If we don't set requests, k8s sets requests == limits!
# So we set something tiny
memory: 64Mi
cpu: 0.01
env:
# Configured using the env vars documented in https://hub.docker.com/_/postgres/
# Postgres is only listening on localhost, so we can trust all connections that come to it
- name: POSTGRES_HOST_AUTH_METHOD
value: "trust"
# The name of the default user in our user image is jovyan, so the postgresql superuser
# should also be called that
- name: POSTGRES_USER
value: "jovyan"
securityContext:
runAsUser: 1000
volumeMounts:
# Mount the user homedirectory in the postgres db container as well, so postgres commands
# that load data into the db from disk work
- name: home
mountPath: /home/jovyan
subPath: "{username}"
- name: postgres-db
mountPath: /var/lib/postgresql/data
# postgres recommends against mounting a volume directly here
# So we put data in a subpath
subPath: data
Cleanup created disks after use#
If the per-user db is no longer needed, we should cleanup the created disks so we are no longer charged for them. Just deleting the Kubernetes PVCs created here should delete the underlying disks as well.
Warning
This is irreversible! Make sure the community is fully aware before doing this
Get the list of PVCs created with this feature:
kubectl -n <namespace> get pvc -l component=extra-pvc
Inspect it to make sure you are deleting the PVCs from the right hub from the right cluster.
Delete the PVCs:
kubectl -n <namespace> delete pvc -l component=extra-pvc
This might take a while. The backing disks will be deleted by Kubernetes as well, once any pods attached to them go down. This means it is safe to run this command while users are currently running - there will be no effects on any running user pods!