Build a Docker Image remotely

Build a Docker Image remotely#

Docker images with datascience related packages can be huge, and very difficult to build locally! We run a remote docker-in-docker hack on our 2i2c cluster to make this a lot more painless. This document describes how you can use this to build docker images from your laptop much faster. This frees up your laptop’s resources, as well as provides you with a datacenter scale upload / download speeds.

Building images remotely#

  1. From a clone of the infrastructure repository, use the debug start-docker-proxy command.

    deployer debug start-docker-proxy

    This will forward your local computer’s port 23760 to the port 2376 running inside the dind deployment on the default namespace. There is a docker daemon running on that port, so essentially this looks like you have a docker daemon running locally on your system at port 23760 but it’s actually running in our kubernetes cluster!

    This command will block, and you will have to start it again if you have a network interruption.

  2. In another terminal, you need to tell tools to use this new port-forwarded port as the docker daemon. You can do this by setting the DOCKER_HOST environment variable.

    export DOCKER_HOST=tcp://localhost:23760
  3. Verify that the remote docker daemon is what is used by running docker info. You should see something like Name: dind-<random-chars> - this verifies it’s running on the remote cluster!

  4. Now you can run any tool (like repo2docker, chartpress or just docker build) as you wish, and they will all automatically talk to this remote docker instance! Hurrah!

  5. If using repo2docker, the following command approximates most of the settings we use with [repo2docker-action]:

    repo2docker --image-name='<some-name>' --user-id=1000  --user-name=jovyan  --target-repo-dir /srv/repo .

    This should be run from within the directory you are trying to build into an image.

Testing images remotely#

Now that the image has been built remotely, how do you test it? Ideally, we would be able to run a container with the built image, run jupyterlab inside it, and test some stuff up. Can we still do it even with the docker daemon running remotely?

Yes, we can!

  1. Open a new terminal, and make sure you are authenticated to the 2i2c cluster.

    deployer use-cluster-credentials 2i2c
  2. Now, let’s assume the image you built and want to test is called test-image:v1. This can be any image name, including something that is being pulled from a remote repository. Execute the following code on your terminal

    # The name of the image we wanna test
    # The DIND pod running in the default namespace
    DIND_POD_NAME=$(kubectl get pod -l app=dind -o name)
    # Now, we execute a docker commandline command from inside this dind pod!
    # We start jupyter lab, and forward the port 9999 from inside that container inside
    # the docker daemon running inside the kubernetes pod to just inside the kubernetes pod.
    # We've essentially peeled behind one layer of our container onion.
    # if you get a port conflict, try a different port!
    kubectl exec -it \
         ${DIND_POD_NAME} \
         -- \
         /bin/sh -c \
         "DOCKER_HOST=tcp:// docker run -it -p ${PORT}:${PORT} ${IMAGE_NAME} jupyter lab --ip= --port=${PORT}"

    This should produce output that looks like:

        To access the notebook, open this file in a browser:
    Or copy and paste one of these URLs:

    The first line won’t be useful, but the second two (which should be the same!) are what we need. However, one more step is needed before we can access these!

  3. Open yet another terminal, and make sure you are authenticated to the 2i2c cluster (follow step 1).

  4. Now in this terminal, run:

    # Should match PORT set in step 2
    kubectl port-forward deployment/dind ${PORT}:${PORT}

    This establishes a port-forward from your local machine at port 9999 (or whatever PORT is) to the jupyter lab running inside the container inside the kubernetes pod on the remote cluster!

  5. Now Go to the URL you found in step (2), and it should give you a jupyter lab instance in the browser! RStudio, Linux Desktop, etc should also work if they are in the image.


This does come with some limitations.

  1. When the server is started on JupyterHub, $HOME inside the container is overwritten by the user’s home directory and that persists. When testing using the method listed here, the $HOME in the container image is shown as is. This might lead (in some very limited circumstances) to things that work while we test here but not in the hub. If $HOME is not empty when you open it this way, you should find and fix it such that it is!