Move a Hub across clusters¶
Moving hubs between clusters is possible, but requires manual steps to ensure data is preserved.
Setup a new hub¶
Setup a new hub in the target cluster, mimicking the config of the old hub as much as possible.
Copy home directories¶
Next, copy home directory contents from the old cluster to the new cluster.
This might not entirely be necessary - if the source and target cluster are in the same GCP Project / AWS Account, we can just re-use the same home directory storage!
Primarily used with GKE right now.
SSH into both the target server and source server and then enable the
ubuntuuser on both servers with
sudo su ubuntu. This will ensure that the keys we are about to create will be assigned to the correct user.
In the target NFS server, create a new ssh key-pair, with
ssh-keygen -f nfs-transfer-key
Append the public key
nfs-transfer-key.pubto the source NFS server’s
/home/ubuntu/.ssh/authorized_keysfile. This way, the target NFS server will be able to open SSH connections to the source NFS server.
Copy the NFS home directories from the source NFS server to the target NFS server, making sure that the NFS exports locations match up appopriately. For example, if the source NFS server has home directories for each hub stored in
/export/home-01/homes, and the target NFS server also has hub home directories stored under
/export/home-01/homes, you can
scpthe contents across with:
scp -p -r -i nfs-transfer-key ubuntu@nfs-source-server-public-IP:/export/home-01/homes/<hub-name> /export/home-01/homes/<hub-name>
This makes sure the target is owned by the
ubuntuuser, which has uid
1000. Our user pods run as uid
1000, so this makes sure they can mount their home directories.
As an alternative to
scp you can use
rsync as follows:
rsync -e 'ssh -i nfs-transfer-key' -rouglvhP ubuntu@nfs-source-server-public-IP:/export/home-01/homes/<hub-name>/ /export/home-01/homes/<hub-name>/
The trailing slashes are important to copy the contents of the directory, without copying the directory itself.
rsync man page to understand these options.
We also use GCP Filestores as in-cluster NFS storage and can transfer the home directories between them in a similar fashion to the NFS servers described above.
The filestores must be mounted in order to be accessed.
Create VMs in the projects of the source and target filestores.
For both filestores, get the server address from the GCP console.
On each VM for the source and target filestores:
sudo apt-get -y update && sudo apt-get -y install nfs-common
Create a mount point:
sudo mkdir -p /mnt/filestore
Mount the Filestore:
sudo mount SERVER_ADDRESS /mnt/filestore
The user directories can then be transferred in the same manner as NFS Servers with the locations updated to be the following:
<your_scp_or_rsync_command> ubuntu@nfs-source-server-public-IP:/mnt/filestore/<hub-name> /mnt/filestore/<hub-name>
AWS DataSync (docs) can copy files between EFS volumes in an AWS account. The [quickstart] Once the source & dest EFS instances are created, create a DataSync instance in the the VPC, Subnet and Security Group that have access to the EFS instance (you can find these details in the ‘Network’ tab of the EFS page in the AWS Console). Set the transfer to hourly, but immediately manually start the sync task. Once the data is transfered over and verified, switch the EFS used in the hub config. Remember to delete the datasync instance soon after - or it might incur extra charges!
If you need to modify the directory structure on the EFS instance, use
the ssh key provided to
eksctl during cluster creation to
ssh into any worker node. Then
mount the EFS instance manually and
do your modifications. This prevents needing to create another EC2
instance just for this.
Transfer the JupyterHub Database¶
This step is only required if users have been added to a hub manually, using the admin panel. In cases where the auth is handled by an external service, e.g. GitHub, the hub database is flexible enough to update itself with the new information.
This step preserves user information, since they might be added via the admin UI.
/srv/jupyterhub/jupyerhub.sqlitefile from the old hub pod locally.
kubectl --namespace OLD_NAMESPACE cp -c hub OLD_HUB_POD_NAME:/srv/jupyterhub/jupyterhub.sqlite ./
Transfer the local copy of the
jupyterhub.sqlitefile to the new hub pod
kubectl --namespace NEW_NAMESPACE cp -c hub ./jupyterhub.sqlite NEW_HUB_POD_NAME:/srv/jupyterhub/jupyterhub.sqlite
Move DNS from old cluster to new cluster, thus completing the move.