Singularity containers

From Cheaha
Revision as of 22:27, 2 January 2020 by Louistw@uab.edu (talk | contribs) (Created page with "=What is a container= A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Attention: Research Computing Documentation has Moved
https://docs.rc.uab.edu/


Please use the new documentation url https://docs.rc.uab.edu/ for all Research Computing documentation needs.


As a result of this move, we have deprecated use of this wiki for documentation. We are providing read-only access to the content to facilitate migration of bookmarks and to serve as an historical record. All content updates should be made at the new documentation site. The original wiki will not receive further updates.

Thank you,

The Research Computing Team

What is a container

A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.

Why containers

Containers package together a program and all of its dependencies, so that if you have the container you can use it on any Linux system with the container system software installed. It doesn't matter whether the system runs Ubuntu, RedHat or CentOS Linux - if the container system is available then the program runs identically on each, inside its container. This is great for distributing complex software with a lot of dependencies, and ensuring you can reproduce experiments exactly. If you still have the container you know you can reproduce your work. Also since the container runs as a process on the host machine, it can be run very easily in a SLURM job.

Docker vs Singularity

Docker is the most popular and widely used container system in the industry. But Singularity was built keeping HPC in mind, i.e a shared environment. Singularity is designed so that you can use it within SLURM jobs and it does not violate security constraints on the cluster. Though, since Docker is very popular and a lot of people were already using the Docker for their softwares, Singularity maintained a compatibility for Docker images. We'll be seeing this compatibility later in the notebook. Both Singularity and Docker maintain a hub where you can keep your docker remotely, and pull them from anywhere. Here is a link for both the hubs:

Docker Hub

Singularity Hub

Singularity is already available on Cheaha. To check all available version on Cheaha by running the command below:

module avail Singularity

Usage

Basic singularity command line functions

To check the basic functions or command line options provided run help on the singularity.

singularity --help

To check more information about a particular parameter, use help in conjunction with that parameter

singularity pull help

Download an Image

Singularity/2.6.1-GCC-5.4.0-2.26
singularity pull shub://vsoch/hello-world
singularity run -B $USER_DATA $SINGULARITY_CACHEDIR/vsoch-hello-world-master-latest.simg

Note: We have defined SINGULARITY_CACHEDIR environment variable, so the image will be download to this location: /data/user/$USER/.singularity

Run container

There are few ways to run the container:

run

It will run the pre-defined script inside the container.

singularity run -B $USER_DATA $SINGULARITY_CACHEDIR/vsoch-hello-world-master-latest.simg

exec

It will `exec` the command you give inside the container context.

singularity exec -B $USER_DATA $SINGULARITY_CACHEDIR/vsoch-hello-world-master-latest.simg cat /etc/os-release

shell

It will give you a shell within container context.

singularity shell -B $USER_DATA $SINGULARITY_CACHEDIR/vsoch-hello-world-master-latest.simg

Namespace resolution within the container

Now, let's list the content of your /data/user/$USER directory by running it from within the container. We'll use exec parameter for this.

singularity exec $SINGULARITY_CACHEDIR/vsoch-hello-world-master-latest.simg ls $USER_DATA
> ls: cannot access /data/user/louistw: No such file or directory

Hmmm, an error. Remember your singularity container image doesn't know about the directories on your host machine. It by default (in most containers) binds your HOME and tmp directory.

Now, all our raw data is generally in our /data/user/$USER locations, so we really need to access that location if our container has to be useful. Thankfully, you can explicitly tell singularity to bind a host directory to your container image. Singularity provides you with a parameter (-B) to bind path from your host machine to the container. Try the same command again, but with the bind parameter:

singularity exec -B $USER_DATA $SINGULARITY_CACHEDIR/vsoch-hello-world-master-latest.simg ls $USER_DATA

Now like mentioned earlier during the security considerations of Singularity in a HPC environment, all the sigularity runs adhere to the user level permissions, from the host system. So I would get a permission denied error if I try to list directories to which I don't have access. In this example William's directory contents.

singularity exec -B $USER_DATA $SINGULARITY_CACHEDIR/vsoch-hello-world-master-latest.simg ls /data/user/wsmonroe
> ls: cannot access /data/user/wsmonroe: No such file or directory

Example Job Script with containers

Using Singularity container with SLURM job script is very easy, as the containers run as a process on the host machine, just like any other command in a batch script. You just need to load Singularity in your job script and run the command via a singularity process. Here's an example job script below:

#!/bin/bash
#
#SBATCH --job-name=test-singularity
#SBATCH --output=res.out
#SBATCH --error=res.err
#
# Number of tasks needed for this job. Generally, used with MPI jobs
#SBATCH --ntasks=1
#SBATCH --partition=express
#
# Time format = HH:MM:SS, DD-HH:MM:SS
#SBATCH --time=10:00
#
# Number of CPUs allocated to each task. 
#SBATCH --cpus-per-task=1
#
# Mimimum memory required per allocated  CPU  in  MegaBytes. 
#SBATCH --mem-per-cpu=100
#
# Send mail to the email address when the job fails
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=$USER@uab.edu

#Set your environment here
module load Singularity/2.6.1-GCC-5.4.0-2.26

#Run your singularity or any other commands here
singularity exec -B $USER_DATA $SINGULARITY_CACHEDIR/vsoch-hello-world-master-latest.simg cat /etc/os-release