Slurm: Difference between revisions

Revision as of 17:26, 22 December 2015

SLURM is a queue management system and stands for Simple Linux Utility for Resource Management. SLURM was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. SLURM is the primary job manager on Cheaha (BigGreen- new hardware) while GridEngine continues to be the job manager on the old hardware.

SLURM is similar in many ways to GridEngine or most other queue systems. You write a batch script then submit it to the queue manager (scheduler). The queue manager then schedules your job to run on the queue (or partition in SLURM parlance) that you designate. Below we will provide an outline of how to submit jobs to SLURM, how SLURM decides when to schedule your job and how to monitor progress.

General SLURM Documentation

The primary source for documentation on SLURM usage and commands can be found at the SLURM site. If you Google for SLURM questions, you'll often see the Lawrence Livermore pages as the top hits, but these tend to be outdated.

A great way to get details on the SLURM commands is the man pages available from the Cheaha cluster. For example, if you type the following command:

man sbatch

you'll get the manual page for the sbatch command.

Logging on and Running Jobs from the command line

Once you've gone through the account setup procedure and obtained a suitable terminal application, you can login to the Cheaha system via ssh

 ssh blazerid@cheaha.rc.uab.edu

Cheaha (new hardware) run the CentOS 7 version of the Linux operating system and commands are run under the "bash" shell. There are a number of Linux and bash references, cheat sheets and tutorials available on the web.

Typical Workflow

Stage data to $USER_SCRATCH (your scratch directory)
Research how to run your code in "batch" mode. Batch mode typically means the ability to run it from the command line without requiring any interaction from the user.
Identify the appropriate resources needed to run the job. The following are mandatory resource requests for all jobs on Cheaha
- Number of processor cores required by the job
- Maximum memory (RAM) required per core
- Maximum runtime
Write a job script specifying queuing system parameters, resource requests and commands to run program
Submit script to queuing system (sbatch script.job)
Monitor job (squeue)
Review the results and resubmit as necessary
Clean up the scratch directory by moving or deleting the data off of the cluster

Interactive Session

Head Node (The command-line interface after you login to Cheaha ) is supposed to be used for submitting jobs and/or lighter prep work required for the job scripts. You are not supposed to run heavy computations on the head node. If you have a heavier workload to prepare for a batch job (eg. compiling code or other manipulations of data) or your compute application requires interactive control, you should request a dedicated interactive node for this work.

Interactive resources are requested by submitting an "interactive" job to the scheduler. Interactive jobs will provide you a command line on a compute resource that you can use just like you would the command line on the head node. The difference is that the scheduler has dedicated the requested resources to your job and you can run your interactive commands without having to worry about impacting other users on the head node. Interactive jobs are requested with the srun command

srun -n 1 -N 1 -t 01:00:00 --pty /bin/bash

This command requests for 1 core (-n) on 1 node (-N) for 1 hour (-t).

More advanced interactive scenarios to support graphical applications are available using VNC

@@ Line 41: / Line 41: @@
 <pre>
-srun -N 1 -t 01:00:00 --pty /bin/bash
+srun -n 1 -N 1 -t 01:00:00 --pty /bin/bash
 </pre>
-This command requests for 1 node for 1 hour.
+This command requests for 1 core (-n) on 1 node (-N) for 1 hour (-t).
 More advanced interactive scenarios to support graphical applications are available using [https://docs.uabgrid.uab.edu/wiki/Setting_Up_VNC_Session VNC]

Slurm: Difference between revisions

Revision as of 17:26, 22 December 2015

Contents

General SLURM Documentation

Logging on and Running Jobs from the command line

Typical Workflow

Interactive Session

Navigation menu

Slurm: Difference between revisions

Revision as of 17:26, 22 December 2015

General SLURM Documentation

Logging on and Running Jobs from the command line

Typical Workflow

Interactive Session

Navigation menu

Search