Docker Containers

From AG Euler Wiki
Revision as of 10:10, 20 December 2023 by Joesterle (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

If you want to visit the old page: Docker (deprecated)

Instructions to build and run docker containers on cluster nodes

Accessing the cluster nodes

TLDR:

ssh username@172.25.250.112 -p 60222 -L 8888:172.29.0.xxx:pppp
ssh eulX

More info: To access the cluster nodes you need to establish a ssh connection. This can be done from the terminal (natively under Ubuntu or using Cygwin under Windows). You first need to ssh to the cluster headnode (ssh 172.25.250.112 -p 60222). Authentication works with your LDAP/CIN useraccount and password. If your local username differs from your LDAP account name you have to use ssh username@172.25.250.112 -p 60222. From there you can access the cluster nodes via ssh eulX (e.g. eul1). If you want to run a docker container with a jupyter notebook server on the cluster node and want to access it from your browser the usual way (opening localhost:8888), a ssh tunnel to this cluster node is needed. Therefore you ssh to the cluster using the command: ssh username@172.25.250.112 -p 60222 -L 8888:172.29.0.xx:pppp where the second IP address has to be the one corresponding to the clusternode and pppp is the port number used by your docker container (see below):

Cluster node IPs:

  • eul1: 172.29.0.148
  • eul2: 172.29.0.149
  • eul3: 172.29.0.150
  • eul4: 172.29.0.184
  • eul5: 172.29.0.185
  • gpu4: 172.29.0.54 (Use only for GPU computation)


If you have troubles with the ssh key authentication run

 ssh-keygen -R "hostname"

(with hostname = eulX) and accept the new key when you ssh to the respective node.

Tip: You can also find the node IP by calling ip a | grep 172.29 when you are on that node.

Building and running docker containers

Disclaimer: Our docker containers contain special functionality to integrate LDAP user accounts and our home directories. The following instruction doesn't work out of the box with any random docker container.

Starting docker containers

TLDR: To start a new container call the following command replacing the arguments with meaningful values:

python agte-docker run -d --jupyterport pppp --jupytertoken some_password --name container_name_suffix -v /gpfs01/euler/data/Data:/gpfs01/euler/data/Data:ro -v /gpfs01/euler/data/Resources:/gpfs01/euler/data/Resources:ro berenslab/datajoint

More info: To start a docker container with a running jupyter notebook server and correct user permissions, we use one of the scripts agte-docker / agte-docker_gpu / agpb-docker. Note that only the second and third support GPU usage. For this reason they also only work on nodes with GPUs. Calling the script will look something like this. You have use a script of the group you are in (AG Euler: agte_xx, AG Berens. agpb_xx)

python agte-docker run -d --jupyterport pppp --jupytertoken some_password --name container_name_suffix berenslab/datajoint
  • The last argument (e.g. berenslab/datajoint) specifies the image the container will be based on.
    • You can list all available images calling docker images on a node. If no images are available you have to build them first (see below).
  • The argument after --name (e.g. container_name_suffix) should be a suffix that helps to identify specific containers for yourself.
    • The script will automatically add your username to the container name.
  • The argument after --jupyterport (e.g. pppp) is the port number used for the ssh tunnel that allows you to access the jupyter notebook server.
    • This should be a number well above 1000.
    • It has to be free, i.e. not used by other container. You can check this by running docker ps to list allrunning containers on that node.
  • The argument after --jupyterpass (e.g. some_password) can be used to specify a passwort to access the jupyter notebook later in the server.

If you want to use a GPU you want to ensure that you are the only one on a specific GPU and that your calculations use only one of the two GPUs of the node. Therefore run the script with

GPU=X python agte-docker run -d --jupyterport pppp --jupyterpass some_password --name container_name_suffix berenslab/datajoint

with either GPU=0 or GPU=1.

The container name is then automatically set as gpuX-username and everyone can see which GPUs are already taken.

To start a container in interactive mode with bash and no jupyter notebook server call

python agte-docker run -d --jupyterport pppp --name container_name berenslab/datajoint bash


Accessing additional directories

To work with datajoint and any experimental data you need to make two additional directories available from inside the docker container, therefore include

-v /gpfs01/euler/data/Data:/gpfs01/euler/data/Data:ro -v /gpfs01/euler/data/Resources/Stimulus:/gpfs01/euler/data/Resources/Stimulus:ro

in the run command.


Limit the container to a certain number of CPUs

Often it is useful to constrain your container such that it doesn't use all CPUs. For example, if someone else wants to access the GPUs they need at least 1 CPU. Therefore, when stating a container add the following to use e.g. only first 30 CPUs.

--cpuset-cpus 0-29

The whole command might look like this:

python agte-docker run -d --cpuset-cpus X-Y --jupyterport pppp --jupyterpass some_password --name container_name berenslab/datajoint
Trouble-Shooting
  • Always make sure the jupyterport is not already used.
  • Always make sure the container_name is not already used.
  • Do not state parameters after the image name (only before).
  • Container stops directly after creating it:
    • If you create a new container with agte-docker run it may happen that the docker stops directly because of an error. You will not find it with docker ps, but only with docker ps -a. This can happen for several reasons. Use docker container logs container_name to check the logs.

Interactive container mode

These commands starts the docker container in the detached mode which means it runs in the background. If in addition you want to enter the docker container with a bash you can call

docker exec -u username -ti container_name bash

Building a docker container

To check which docker images are available on a cluster node call docker images on that node. If you want to build an image that does not yet exist do the following:

  • Clone this https://github.com/eulerlab/docker (or this https://github.com/berenslab/docker) GitHub repo to a folder your-docker-folder where you want to have your docker files stored.
    • This is a private repo, so you need to be a member of eulerlab (or berenslab).
      • The eulerlab repo is based on the berenslab repo.
    • You may want to fork the repo before you clone it. This might not be necessary for many users, though.
  • Go to your-docker-folder.
  • Run e.g. docker build -t berenslab/deeplearning:latest ./berenslab/deeplearning which builds the container called berenslab/deeplearning with the tag latest from the directory ./berenslab/deeplearning.
    • If you don't state a tag, it will default to latest which is fine, but if there are multiple similar images, please use tags.
  • Run docker images to see your docker image listed

Some docker images depend on other local images, meaning that you have to build the images they depend on first. For example berenslab/datajoint requires an existing image berenslab/deeplearning.

Otherwise they typically depend on remote images, e.g. berenslab/deeplearning, you typically do not want to build a container from scratch. If you want to create a new docker file and its respective image, first have a look at how the files are structured in https://github.com/eulerlab/docker.

  • Then create a folder that is or includes your username within your-docker-folder, e.g. your-docker-folder/inewton
  • Within create a folder with a name that describes your image, e.g. your-docker-folder/inewton/deeplearning
  • To create the image from this dockerfile call docker build -t inewton/deeplearning ./inewton/deeplearning to have an image name everyone else on this node can interpret.
  • Sometimes you may want to save this image to a tar file on your harddrive for reproducibility using docker save image_name
  • If you just want to build the standard images (no debugging etc.), it makes sense to use --no-cache to save space on the node and don't store intermediate images from the build steps

Creating an image from a container

Sometimes you install things not using a docker file but within a container. While this should be avoided when it can be, sometimes this is very useful. If you have done so, you may want to create an image from your container, to be able to reuse it. Make sure to document what changes you have done to your container if it was anything non trivial.

To create an image of the container first do the following to create an image:

  • docker commit your_container_name
  • docker tag id_of_the_just_created_container your_image_name

Saving an image to your harddrive

To save an image call: docker save some_image_name > some_filename.tar

This will create a tar file of you docker image.

Copying an image to another note

If you want to copy an image to another node you can do the following:

  • sshto a node with an image you want
  • cdto some folder you created for docker images
  • docker save some_image_name > some_filename.tar
  • ssh to another node
  • cd to the same folder again with the docker images
  • docker load --input some_filename.tar

Basic usage of containers

Install packages in running containers

Typically you want to add all (python etc) packages you will need to your Dockerfile before creating your Docker image. In practice however, you might want to install packages in running containers, to extent their functionality.

To do this execute you docker container:

docker exec -u username -ti container_name bash

Inside your docker container, you almost always want to install packages using sudo, because then they will be installed in the container only, and not in your home directory which will then be visible in other containers too, e.g.:

sudo pip install numpy

Trouble shooting

To check which python and pip version you are using type python -V and pip -V. The pip python version should match your python version, and if not, set an alias or always use pip3 (not recommended).

Inside your notebooks you can run !python -V and check if the versions are matching.

pip list -v lists all packages and their installation locations. Verify that packages you only want inside you docker containers are not installed in you home directory.

pip check checks for broken dependencies.