Slurm: Difference between revisions

Latest revision as of 17:24, 30 August 2022

The Slurm documentation has moved to the new documentation site at https://uabrc.github.io.

The obsolete content for the original page can be accessed via Obsolete: Slum for historical reference.

@@ Line 1: / Line 1: @@
-[http://slurm.schedmd.com/ Slurm] is a queue management system and stands for Simple Linux Utility for Resource Management. Slurm was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. Slurm is now the primary job manager on Cheaha, it replaces SUN Grid Engine (SGE) the job manager used earlier.
+The Slurm documentation has moved to the new documentation site at https://uabrc.github.io.
-Slurm is similar in many ways to GridEngine or most other queue systems. You write a batch script then submit it to the queue manager (scheduler). The queue manager then schedules your job to run on the queue (or '''partition''' in Slurm parlance) that you designate. Below we will provide an outline of how to submit jobs to Slurm, how Slurm decides when to schedule your job, and how to monitor progress.
+The obsolete content for the original page can be accessed via [[Obsolete: Slum]] for historical reference.
-== General Slurm Documentation ==
-The primary source for documentation on Slurm usage and commands can be found at the [http://slurm.schedmd.com/ Slurm] site. If you Google for Slurm questions, you'll often see the Lawrence Livermore pages as the top hits, but these tend to be outdated.
-The [https://slurm.schedmd.com/quickstart.html SLURM QuickStart Guide] provides a very useful overview of how SLURM treats a cluster as pool of resources which you can allocate to get your work done.  The Example section on that page is a very useful orientation to SLURM environments.
-The [http://www.ceci-hpc.be/slurm_tutorial.html SLURM Tutorial at CECI], a European Consortium of HPC sites, provides a very good introduction on submitting single threaded, multi-threaded, and MPI jobs.
-A great way to get details on the Slurm commands is the man pages available from the Cheaha cluster. For example, if you type the following command:
-<pre>
-man sbatch
-</pre>
-you'll get the manual page for the sbatch command.
-Cheatsheets for [https://github.com/wwarriner/slurm_cheatsheets/blob/master/sacct_cheat_sheet.pdf <code>sacct</code>] and [https://github.com/wwarriner/slurm_cheatsheets/blob/master/sbatch_cheat_sheet.pdf <code>sbatch</code>] are available at [https://github.com/wwarriner/slurm_cheatsheets GitHub]. These cheatsheets contain some of the more commonly used flags and parameters for the two commands.
-== Slurm Partitions ==
-Cheaha has the following Slurm partitions (can also be thought of in terms of SGE queues) defined (the lower the number the higher the priority).
-'''Note:'''Jobs '''must request''' the appropriate partition (ex: ''--partition=short'') to satisfy the jobs resource request (maximum runtime, number of compute nodes, etc...)
-{{Slurm_Partitions}}
-== Logging on and Running Jobs from the command line ==
-Once you've gone through the [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#Access_.28Cluster_Account_Request.29 account setup procedure] and obtained a suitable [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#Client_Configuration terminal application], you can login to the Cheaha system via ssh
-  ssh '''BLAZERID'''@cheaha.rc.uab.edu
-Alternatively, '''existing users''' could follow these [https://docs.uabgrid.uab.edu/wiki/SSH_Key_Authentication instructions to add SSH keys] and access the new system.
-Cheaha (new hardware) run the CentOS 7 version of the Linux operating system and commands are run under the "bash" shell (the default shell). There are a number of Linux and [http://www.gnu.org/software/bash/manual/bashref.html bash references], [http://cli.learncodethehardway.org/bash_cheat_sheet.pdf cheat sheets] and [http://www.tldp.org/LDP/Bash-Beginners-Guide/html/ tutorials] available on the web.
-== Typical Workflow ==
-* Stage data to $USER_SCRATCH (your scratch directory)
-* Determine how to run your code in "batch" mode. Batch mode typically means the ability to run it from the command line without requiring any interaction from the user.
-* Identify the appropriate resources needed to run the job. The following are mandatory resource requests for all jobs on Cheaha:
-** Number of processor cores required by the job
-** Maximum memory (RAM) required per core
-** Maximum runtime
-* Write a job script specifying queuing system parameters, resource requests, and commands to run program
-* Submit script to queuing system (sbatch script.job)
-* Monitor job (squeue)
-* Review the results and resubmit as necessary
-* Clean up the scratch directory by moving or deleting the data off of the cluster
-== Slurm Job Types ==
-=== Jupyter Job ===
-Cheaha can be used with [[Jupyter]] notebooks.
-=== Batch Job ===
-'''TODO: ''' provide an explanation of what makes a batch job and why use that vs an interactive job
-For additional information on the '''sbatch''' command execute '''man sbatch''' at the command line to view the manual.
-==== Example Batch Job Script ====
-A job consists of '''resource requests''' and '''tasks'''. The Slurm job scheduler interprets lines beginning with '''#SBATCH''' as Slurm arguments. In this example, the job is requesting to run 1 task
-'''Note:'''Jobs '''must request''' the appropriate partition (ex: ''--partition=short'') to satisfy the jobs resource request (maximum runtime, number of compute nodes, etc...)
-<pre>#!/bin/bash
-#
-#SBATCH --job-name=test
-#SBATCH --output=res.out
-#SBATCH --error=res.err
-#
-# Number of tasks needed for this job. Generally, used with MPI jobs
-#SBATCH --ntasks=1
-#SBATCH --partition=express
-#
-# Time format = HH:MM:SS, DD-HH:MM:SS
-#SBATCH --time=10:00
-#
-# Number of CPUs allocated to each task.
-#SBATCH --cpus-per-task=1
-#
-# Mimimum memory required per allocated  CPU  in  MegaBytes.
-#SBATCH --mem-per-cpu=100
-#
-# Send mail to the email address when the job fails
-#SBATCH --mail-type=FAIL
-#SBATCH --mail-user=YOUR_EMAIL_ADDRESS
-#Set your environment here
-#Run your commands here
-srun hostname
-srun sleep 60
-</pre>
-[https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#Sample_Job_Scripts Click here] for more example SLURM job scripts.
-=== Interactive Job ===
-Login Node (the host that you connected to when you setup the SSH connection to Cheaha) is supposed to be used for submitting jobs and/or lighter prep work required for the job scripts. '''Do not run heavy computations on the login node'''. If you have a heavier workload to prepare for a batch job (eg. compiling code or other manipulations of data) or your compute application requires interactive control, you should request a dedicated interactive node for this work.
-Interactive resources are requested by submitting an "interactive" job to the scheduler. Interactive jobs will provide you a command line on a compute resource that you can use just like you would the command line on the login node. The difference is that the scheduler has dedicated the requested resources to your job and you can run your interactive commands without having to worry about impacting other users on the login node.
-Interactive jobs, that can be run on command line,  are requested with the '''srun''' command.
-<pre>
-srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash
-</pre>
-This command requests for 4 cores (--cpus-per-task) for a single task (--ntasks) with each cpu requesting size 4GB of RAM (--mem-per-cpu) for 8 hrs (--time).
-More advanced interactive scenarios to support graphical applications are available using [https://docs.uabgrid.uab.edu/wiki/Setting_Up_VNC_Session VNC] or X11 tunneling [http://www.uab.edu/it/software X-Win32 2014 for Windows]
-Interactive jobs that requires running a graphical application, are requested with the '''sinteractive''' command, via '''Terminal''' on your VNC window.
-<pre>
-sinteractive --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME
-</pre>
-====Requesting for GPUs====
-To request for an interactive session on one of the GPU nodes (c0089-c0092 K80's and c0097-c0114 P100's), add --gres parameter to the 'srun' or 'sinteractive' command.
-<pre style="white-space: pre-wrap;" >
-srun --ntasks=1 --cpus-per-task=1 --mem-per-cpu=4096 --time=08:00:00 --partition=pascalnodes --job-name=JOB_NAME --gres=gpu:1 --pty /bin/bash
-</pre>
-<pre style="white-space: pre-wrap;" >
-sinteractive --ntasks=1 --cpus-per-task=1 --mem-per-cpu=4096 --time=08:00:00 --partition=pascalnodes --job-name=JOB_NAME --gres=gpu:1
-</pre>
-'''NOTE:'''
-* If you want to use more then one GPU on the node, please increase the value in --gres=gpu:[1-4]
-* If you want to use the P100s please use the partition as 'pascalnodes', wheres for K80s please use either of the express, short, medium or long as partitions.
-* To request an interactive session using a single GPU, say for code development, you can use the following syntax
-<pre>
- sinteractive --partition=pascalnodes --gres=gpu
-</pre>
-=== MPI Job ===
-'''TODO add MPI information and a job example'''
-=== OpenMP / SMP Job ===
-[https://en.wikipedia.org/wiki/OpenMP OpenMP / SMP] jobs are those that use multiple CPU cores on a single compute node.
-It is very important to properly structure an SMP job to ensure that the requested CPU cores are assigned to the same compute node. The following example requests 4 CPU cores by setting the number of '''ntasks''' to '''1''' and '''cpus-per-tasks''' to '''4'''
-For OpenMP you must ensure your OMP_NUM_THREADS environment variable is set and matches then --cpus-per-task requested.  This ensures you have requested the 1-core for each thread in your reservation and provides dedicated cores for each thread.
-<pre>
-export OMP_NUM_THREADS=4
-srun --partition=short \
-        --ntasks=1 \
-        --cpus-per-task=$OMP_NUM_THREADS \
-        --mem-per-cpu=1024 \
-        --time=5:00:00 \
-        --job-name=rsync \
-        --pty /bin/bash
-</pre>
-For additional examples of Slurm with SMP jobs see [https://help.rc.ufl.edu/doc/Sample_SLURM_Scripts#Multi-Threaded_SMP_Job  UFL's examples].  For a simple hello world SMP example app see [https://www.geeksforgeeks.org/openmp-hello-world-program/ this code]
-=== Job Dependencies ===
-It is also possible to link job scripts using job dependencies. Visit the following git repository for more detailed information and sample scripts: https://gitlab.rc.uab.edu/rc-training-sessions/job-dependency
-== Job Status ==
-=== SQUEUE ===
-To check your job status, you can use the following command
-<pre>
-squeue -u $USER
-</pre>
-Following fields are displayed when you run '''squeue'''
-<pre style="white-space: pre-wrap;">
-JOBID - ID assigned to your job by Slurm scheduler
-PARTITION - Partition your job gets, depends upon time requested (express(max 2 hrs), short(max 12 hrs), medium(max 50 hrs), long(max 150 hrs), sinteractive(0-2 hrs))
-NAME - JOB name given by user
-USER - User who started the job
-ST - State your job is in. The typical states are PENDING (PD), RUNNING(R), SUSPENDED(S), COMPLETING(CG), and COMPLETED(CD)
-TIME - Time for which your job has been running
-NODES - Number of nodes your job is running on
-NODELIST - Node on which the job is running
-</pre>
-For more details on '''squeue''', go [http://slurm.schedmd.com/squeue.html here].
-=== SSTAT ===
-The '''sstat''' command shows status and metric information for a running job.
-'''NOTE: the job parts must be executed using ''srun'' otherwise ''sstat'' will not display useful output'''
-<pre style="white-space: pre-wrap;">
-[rcs@login001 ~]$ sstat 256483
-       JobID  MaxVMSize  MaxVMSizeNode  MaxVMSizeTask  AveVMSize     MaxRSS MaxRSSNode MaxRSSTask     AveRSS MaxPages MaxPagesNode   MaxPagesTask   AvePages     MinCPU MinCPUNode MinCPUTask     AveCPU   NTasks AveCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov ConsumedEnergy  MaxDiskRead MaxDiskReadNode MaxDiskReadTask  AveDiskRead MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask AveDiskWrite
------------- ---------- -------------- -------------- ---------- ---------- ---------- ---------- ---------- -------- ------------ -------------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ------------- ------------- ------------- -------------- ------------ --------------- --------------- ------------ ------------ ---------------- ---------------- ------------
-.0       1962728K          c0043              1   1960633K     91920K      c0043          3     91867K      67K        c0043              3        50K  00:00.000      c0043          0  00:00.000        8      1.20G       Unknown       Unknown       Unknown              0           1M           c0043               5           1M        0.34M            c0043                5        0.34M
-</pre>
-For more details on '''sstat''', go [http://slurm.schedmd.com/sstat.html here].
-=== SCONTROL ===
-<pre>
-$ scontrol show jobid -dd 123
-JobId=123 JobName=SLI
-   UserId=rcuser(1000) GroupId=rcuser(1000)
-   Priority=4294898073 Nice=0 Account=(null) QOS=normal
-   JobState=RUNNING Reason=None Dependency=(null)
-   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
-   DerivedExitCode=0:0
-   RunTime=06:27:02 TimeLimit=08:00:00 TimeMin=N/A
-   SubmitTime=2016-09-12T14:40:20 EligibleTime=2016-09-12T14:40:20
-   StartTime=2016-09-12T14:40:20 EndTime=2016-09-12T22:40:21
-   PreemptTime=None SuspendTime=None SecsPreSuspend=0
-   Partition=medium AllocNode:Sid=login001:123
-   ReqNodeList=(null) ExcNodeList=(null)
-   NodeList=c0003
-   BatchHost=c0003
-   NumNodes=1 NumCPUs=24 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
-   TRES=cpu=24,mem=10000,node=1
-   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
-     Nodes=c0003 CPU_IDs=0-23 Mem=10000
-   MinCPUsNode=1 MinMemoryNode=10000M MinTmpDiskNode=0
-   Features=(null) Gres=(null) Reservation=(null)
-   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
-   Command=/share/apps/rc/git/rc-sched-scripts/bin/_interactive
-   WorkDir=/scratch/user/rcuser/work/other/rhea/Gray/MERGED
-   StdErr=/dev/null
-   StdIn=/dev/null
-   StdOut=/dev/null
-   Power= SICP=0
-</pre>
-== Job History ==
-TODO: Provide some examples of using the '''sacct''' or our wrapper '''rc-sacct''' to view historical information.
-This example uses the rc-sacct wrapper script, for comparison here is the equivalent sacct command:
-<pre>
-$ sacct --starttime 2016-08-30 \
-      --allusers \
-      --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist
-</pre>
-<pre style="white-space: pre-wrap;">
-$ rc-sacct --allusers --starttime 2016-08-30
-     User        JobID    JobName  Partition      State  Timelimit               Start                 End    Elapsed     MaxRSS  MaxVMSize   NNodes      NCPUS        NodeList
---------- ------------ ---------- ---------- ---------- ---------- ------------------- ------------------- ---------- ---------- ---------- -------- ---------- ---------------
- kxxxxxxx 34308        Connectom+ interacti+    PENDING   08:00:00             Unknown             Unknown   00:00:00                              1          4   None assigned
- kxxxxxxx 34310        Connectom+ interacti+    PENDING   08:00:00             Unknown             Unknown   00:00:00                              1          4   None assigned
- dxxxxxxx 35927         PK_htseq1     medium  COMPLETED 2-00:00:00 2016-08-30T09:21:33 2016-08-30T10:06:25   00:44:52                              1          4       c0005
-.batch       batch             COMPLETED            2016-08-30T09:21:33 2016-08-30T10:06:25   00:44:52    307704K    718152K        1          4       c0005
- bxxxxxxx 35928                SI     medium    TIMEOUT   12:00:00 2016-08-30T09:36:04 2016-08-30T21:36:42   12:00:38                              1          1       c0006
-.batch       batch                FAILED            2016-08-30T09:36:04 2016-08-30T21:36:43   12:00:39     31400K    286532K        1          1       c0006
-.0        hostname             COMPLETED            2016-08-30T09:36:16 2016-08-30T09:36:17   00:00:01      1112K    207252K        1          1       c0006
-</pre>
-Additional information about the sacct command can be found by running '''man sacct''' or [http://slurm.schedmd.com/sacct.html found here]
-The rc-sacct wrapper script supports the following arguments:
-<pre>
-$ rc-sacct --help
-  Copyright (c) 2016 Mike Hanby, University of Alabama at Birmingham IT Research Computing.
-  rc-sacct - version 1.0.0
-  Run sacct to display history in a nicely formatted output.
-    -r, --starttime                  HH:MM[:SS] [AM|PM]
-                                     MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
-                                     MM/DD[/YY]-HH:MM[:SS]
-                                     YYYY-MM-DD[THH:MM[:SS]]
-    -a, --allusers                   Dispay hsitory for all users)
-    -u, --user user_list             Display hsitory for all users in the comma seperated user list
-    -f, --format a,b,c               Comma separated list of columns: i.e. --format jobid,elapsed,ncpus,ntasks,state
-        --debug                      Display additional output like internal structures
-    -?, -h, --help                   Display this help message
-</pre>
-== Slurm Variables ==
-The following is a list of useful Slurm environment variables (click here for the [http://slurm.schedmd.com/srun.html full list]):
-{{Slurm_Variables}}
-== SGE - Slurm ==
-This section shows Slurm and SGE equivalent commands
-<pre>
-   SGE                   Slurm
----------             ------------
-  qsub                  sbatch
-  qlogin                sinteractive
-  qdel                   scancel
-  qstat                  squeue
-</pre>
-To get more info about individual commands, run : '''man SLURM_COMMAND''' . For an extensive list of Slurm-SGE equivalent commands, go [https://docs.uabgrid.uab.edu/wiki/SGE-SLURM here] or Slurm's official [http://slurm.schedmd.com/rosetta.pdf documentation]

Slurm: Difference between revisions

Latest revision as of 17:24, 30 August 2022

Navigation menu

Search