Transfer data between NFS servers on separate clusters

This documentation covers how to transfer data between NFS servers running on different clusters in a cloud-agnostic way. For simplicity and reliability, this guide focuses on using rsync over SSH to securely and reliably copy a filesystem between distinct NFS servers.

Note

rsync can be performed in a forwards or reverse direction, e.g.

Forward

Reverse

# Run on src
rsync /foo user@dst:/bar/

For simplicity, we’ll rsync in the forwards direction and run rsync from the source container.

Setting up the destination server¶

Create a public-private key pair
To securely communicate between the two file-servers, we must create a keypair:
```
ssh-keygen -N "" -t ed25519 -f key
```
This will create two files in the working directory, key and key.pub. From here on, we’ll refer to the contents of key.pub as <PUBLIC-KEY-CONTENTS>
Deploy an OpenSSH server container
The linuxserver/openssh-server Docker image is a well-tested image that ships with an OpenSSH server. For simplicity, we will deploy this image as a container on both the source and destination jupyterhub-home-nfs deployments.^[1]
We can start by adding this image as an entry of jupyterhub-home-nfs.extraContainers. The configuration for the destination deployment is shown in the following code block, with the features required to run an SSH server emphasised:
```
jupyterhub-home-nfs:
  extraContainers:
    - name: openssh-server
      image: linuxserver/openssh-server:latest
      ports:
        - containerPort: 2222
      volumeMounts:
        - mountPath: /export
          name: home-directories
      env:
        - name: PUID
          value: "1000"
        - name: PGID
          value: "1000"
        - name: PUBLIC_KEY
          value: <PUBLIC-KEY-CONTENTS>
```
This configuration can then be deployed by running deployer deploy <DEST-CLUSTER> <DEST-HUB>.
Install rsync
The container image described in the extraContainers configuration does not natively include the rsync utility. We can remedy this by opening a shell with the following command:
```
kubectl -n <DEST-HUB> exec -it deploy/storage-quota-home-nfs -c openssh-server -- /bin/sh
```
As this image is based upon Alpine Linux, we can easily install rsync with
```
apk add rsync
```
Establish an ingress
To make the SSH server visible outside the cluster, we’ll need to expose the container via a service:
```
# Assume deployment is called storage-quota-home-nfs
kubectl -n <DEST-HUB> expose --type LoadBalancer deploy storage-quota-home-nfs --port=2222 --name openssh-service
```
We can now investigate the external IP <SERVICE-IP> associated with this service, and record it for later:
```
kubectl -n <DEST-HUB> get service/openssh-service
```

Setting up the source server¶

Deploy a file-transfer container
We can add the same image used in Setting up the destination server as an entry of jupyterhub-home-nfs.extraContainers. The configuration for the source deployment is shown in the following code block, with the specialisations for the source container emphasised:
```
jupyterhub-home-nfs:
  extraContainers:
    - name: openssh-server
      image: linuxserver/openssh-server:latest
      volumeMounts:
        - mountPath: /export
          name: home-directories
          readOnly: true
      env:
        - name: PUID
          value: "1000"
        - name: PGID
          value: "1000"
```
This configuration can then be deployed by running deployer deploy <SRC-CLUSTER> <SRC-HUB>.
Install rsync
As we saw earlier, the container image described in the extraContainers configuration does not natively include the rsync utility. We can remedy this by opening a shell with the following command:
```
kubectl -n <SRC-HUB> exec -it deploy/storage-quota-home-nfs -c openssh-server -- /bin/sh
```
As this image is based upon Alpine Linux, we can easily install rsync with
```
apk add rsync
```
Provision the SSH private key
In order for the sender to be able to authorise with the receiver, we’ll need to provision the environment with the private counterpart to the public key that we created earlier. We can easily do this by writing it to a temporary file from the clipboard. In the existing shell in the source container, run the following command and paste the key with Ctrl+V. Once the key has been pasted, enter an EOF with Ctrl+D
```
cat > /tmp/key
```
We must now define an SSH configuration entry and configure it with the appropriate IP address, port, and username. If you’re using the image defined in this how-to guide, you’ll only need to change the HostName. First, switch to the linuxserver.io user.
```
su linuxserver.io
```
Then we can set the SSH config:
```
echo > ~/.ssh/config '
Host receiver
    HostName <SERVICE-IP>
    Port 2222
    IdentityFile /tmp/key
    User linuxserver.io
    IdentitiesOnly yes
'
```

Performing the initial sync¶

Now we can use rsync in archive mode (preserving the important file attributes) and sync with the remote. Substitute <SRC_HUB_NAME> and <DST-HUB-NAME> with the names of the source and destination hubs. These names are used to determine the name of the home directory within the storage volume.

rsync -avh /export/<SRC-HUB>/ receiver:/export/<DST-HUB>/

Performing the final sync¶

Once an initial sync of the data has been performed, we can ensure that we’ve captured the true state of the disk by performing a final reconciliation sync. We will do this only once we are confident that there are no active sessions modifying the home storage, and that new sessions cannot be started. This may be ensured by:

Disabling the source ingress with kubectl -n <SRC-HUB> delete ingress jupyterhub.
Stopping existing user pods.

Now that we’ve cordoned off the storage, we can repeat the step performed in Performing the initial sync to copy only the modified files.

Tearing down the transfer deployments¶

After copying the files between disks, we now can tear down the migration deployments.

First, delete the service created in an earlier step by running the following in the destination hub context:
```
kubectl -n <DST-HUB> delete service/openssh-service
```
Then, revert the changes to the JupyterHub values.yaml in Setting up the destination server and Setting up the source server.
Finally, re-deploy both hubs.

Footnotes¶

Although we only need an SSH server on destination side in forward mode, this container also providers a generic shell environment that is nearly sufficient for performing an rsync.
↩