Cheaha Quick Start: Difference between revisions

From Cheaha
Jump to navigation Jump to search
(→‎Scheduling Policies: Added scheduling policies and partitions)
 
(5 intermediate revisions by 3 users not shown)
Line 3: Line 3:
Cheaha is a shared cluster computing environment for UAB researchers. Cheaha offers total 110 TFLOPS compute power, 4.7 PB high-performance storage and 20 TB memory. See [[Cheaha2_GettingStarted#Hardware  |Hardware]] for more details on compute platform, but first let's get started with an example and see how easy it is to use.
Cheaha is a shared cluster computing environment for UAB researchers. Cheaha offers total 110 TFLOPS compute power, 4.7 PB high-performance storage and 20 TB memory. See [[Cheaha2_GettingStarted#Hardware  |Hardware]] for more details on compute platform, but first let's get started with an example and see how easy it is to use.


If you have any questions about Cheaha usage then please contact Research Computing team at support@vo.uabgrid.uab.edu .
If you have any questions about Cheaha usage then please contact Research Computing team {{CheahaSupportRequest}}.


== Logging In ==
== Logging In ==
More [[Cheaha2_GettingStarted#Login|detailed login instructions]] are also available.
More [[Cheaha_GettingStarted#Login|detailed login instructions]] are also available.


Most users will authenticate to Cheaha using their BlazerID and associated password using an SSH (Secure Shell) client. The basic syntax is as follows:
Most users will authenticate to Cheaha using their BlazerID and associated password using an SSH (Secure Shell) client. The basic syntax is as follows:
Line 15: Line 15:


== Hello Cheaha! ==
== Hello Cheaha! ==
A shared cluster environment like Cheaha uses a job scheduler to run tasks on the cluster to provide optimal resource sharing among users. Cheaha uses a job scheduling system call SLURM to schedule and manage jobs. A user needs to tell SLURM about resource requirements (e.g. CPU, memory) so that it can schedule jobs effectively. These resource requirements along with actual application code can be specified in a single file commonly referred as 'Job Script/File'. Following is a simple job script that prints job number and hostname.
A shared cluster environment like Cheaha uses a job scheduler to run tasks on the cluster to provide optimal resource sharing among users. Cheaha uses a job scheduling system called Slurm to schedule and manage jobs. A user needs to tell Slurm about resource requirements (e.g. CPU, memory) so that it can schedule jobs effectively. These resource requirements along with actual application code can be specified in a single file commonly referred as 'Job Script/File'. Following is a simple job script that prints job number and hostname.


<pre>
<pre>
Line 23: Line 23:
#SBATCH --output=res.txt
#SBATCH --output=res.txt
#SBATCH --ntasks=1
#SBATCH --ntasks=1
#SBATCH --partition=express
#SBATCH --time=10:00
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100
#SBATCH --mem-per-cpu=100
#SBATCH --mail-type=FAIL
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=$USER@uab.edu
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS


srun hostname
srun hostname
Line 34: Line 35:
</pre>
</pre>


Lines starting with '#SBATCH' have a special meaning in the SLURM world. SLURM specific configuration options are specified after the '#SBATCH' characters. Above configuration options are useful for most job scripts and for additional configuration options refer to SLURM commands manual. A job script is submitted to the cluster using SLURM specific commands. There are many commands available, but following three commands are the most common:
Lines starting with '#SBATCH' have a special meaning in the Slurm world. Slurm specific configuration options are specified after the '#SBATCH' characters. Above configuration options are useful for most job scripts and for additional configuration options refer to Slurm commands manual. A job script is submitted to the cluster using Slurm specific commands. There are many commands available, but following three commands are the most common:
* sbatch - to submit job
* sbatch - to submit job
* scancel - to delete job
* scancel - to delete job
Line 45: Line 46:
</pre>
</pre>


When the job script is submitted, SLURM queues it up and assigns it a job number (e.g. 52707 in above example). The job number is available inside job script using environment variable $JOB_ID. This variable can be used inside job script to create job related directory structure or file names.
When the job script is submitted, Slurm queues it up and assigns it a job number (e.g. 52707 in above example). The job number is available inside job script using environment variable $JOB_ID. This variable can be used inside job script to create job related directory structure or file names.


== Software ==
== Software ==
Line 70: Line 71:


== Graphical Interface ==
== Graphical Interface ==
Some applications use graphical interface to perform certain actions (e.g. submit buttons, file selections etc.). Cheaha supports graphical applications using an interactive X-Windows session with SLURMS sinteractive command. This will allow you to run graphical applications like MATLAB or AFNI on Cheaha. Refer to [[Cheaha2_GettingStarted#Interactive_Resources | Interactive Resources]] for details on running graphical X-Windows applications.
Some applications use graphical interface to perform certain actions (e.g. submit buttons, file selections etc.). Cheaha supports graphical applications using an interactive X-Windows session with Slurms sinteractive command. This will allow you to run graphical applications like MATLAB or AFNI on Cheaha. Refer to [[Cheaha2_GettingStarted#Interactive_Resources | Interactive Resources]] for details on running graphical X-Windows applications.


== Scheduling Policies and partitions ==
== Scheduling Policies and partitions ==
The primary job scheduler on Cheaha is '''[[SLURM]]'''.
The primary job scheduler on Cheaha is '''[[Slurm]]'''.


The following partitions (Queues in SGE context)  are currently available on Cheaha via SLURM  
The following partitions (Queues in SGE context)  are currently available on Cheaha via Slurm  
         express (default partition): Priority 2 :: Max Runtime 2 hours
         express (default partition): Priority 2 :: Max Runtime 2 hours
         short: Priority 2 :: Max Runtime 12 hours
         short: Priority 2 :: Max Runtime 12 hours
Line 85: Line 86:
         --partition argument (--time=48:00:00 --partition=medium)
         --partition argument (--time=48:00:00 --partition=medium)


Graphical based jobs  can be run as an interactive job using the [[SLURM#Interactive_Session | sinteractive]] command
Graphical based jobs  can be run as an interactive job using the [[Slurm#Interactive_Session | sinteractive]] command


== Support ==
== Support ==
If you have any questions about our documentation or need any help with Cheaha then please contact us at [mailto:support@vo.uabgrid.uab.edu support@vo.uabgrid.uab.edu] . Cheaha is maintained by [[About_Research_Computing|UAB IT's Research Computing team]].
If you have any questions about our documentation or need any help with Cheaha then please {{CheahaSupportRequest}}.
 
Cheaha is maintained by [[About_Research_Computing|UAB IT's Research Computing team]].

Latest revision as of 14:33, 10 May 2017

NOTE: This page is still under development. Please refer to Getting Started page for detailed documentation.

Cheaha is a shared cluster computing environment for UAB researchers. Cheaha offers total 110 TFLOPS compute power, 4.7 PB high-performance storage and 20 TB memory. See Hardware for more details on compute platform, but first let's get started with an example and see how easy it is to use.

If you have any questions about Cheaha usage then please contact Research Computing team submit a request for support (login at link and click "Request this Service").

Logging In

More detailed login instructions are also available.

Most users will authenticate to Cheaha using their BlazerID and associated password using an SSH (Secure Shell) client. The basic syntax is as follows:

ssh BLAZERID@cheaha.rc.uab.edu

Hello Cheaha!

A shared cluster environment like Cheaha uses a job scheduler to run tasks on the cluster to provide optimal resource sharing among users. Cheaha uses a job scheduling system called Slurm to schedule and manage jobs. A user needs to tell Slurm about resource requirements (e.g. CPU, memory) so that it can schedule jobs effectively. These resource requirements along with actual application code can be specified in a single file commonly referred as 'Job Script/File'. Following is a simple job script that prints job number and hostname.

#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --output=res.txt
#SBATCH --ntasks=1
#SBATCH --partition=express
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS

srun hostname
srun sleep 60


Lines starting with '#SBATCH' have a special meaning in the Slurm world. Slurm specific configuration options are specified after the '#SBATCH' characters. Above configuration options are useful for most job scripts and for additional configuration options refer to Slurm commands manual. A job script is submitted to the cluster using Slurm specific commands. There are many commands available, but following three commands are the most common:

  • sbatch - to submit job
  • scancel - to delete job
  • squeue - to view job status

We can submit above job script using sbatch command:

$ sbatch HelloCheaha.sh
Submitted batch job 52707

When the job script is submitted, Slurm queues it up and assigns it a job number (e.g. 52707 in above example). The job number is available inside job script using environment variable $JOB_ID. This variable can be used inside job script to create job related directory structure or file names.

Software

Cheaha's software stack includes many scientific computing softwares. Below is list of popular softwares available on Cheaha:

These softwares can be included in a job environment using environment modules. Environment modules make environment variables modification easy and repeatable.

Storage

During 2016, as part of the Alabama Innovation Fund grant working in partnership with numerous departments, 6.6PB raw GPFS storage on DDN SFA12KX hardware was added to meet the growing data needs of UAB researchers.

More Details can be found on the Storage Resources page.

Graphical Interface

Some applications use graphical interface to perform certain actions (e.g. submit buttons, file selections etc.). Cheaha supports graphical applications using an interactive X-Windows session with Slurms sinteractive command. This will allow you to run graphical applications like MATLAB or AFNI on Cheaha. Refer to Interactive Resources for details on running graphical X-Windows applications.

Scheduling Policies and partitions

The primary job scheduler on Cheaha is Slurm.

The following partitions (Queues in SGE context) are currently available on Cheaha via Slurm

       express (default partition): Priority 2 :: Max Runtime 2 hours
       short: Priority 2 :: Max Runtime 12 hours
       medium: Priority 4 :: Max Runtime 50 hours
       long: Priority 6 :: Max Runtime 159 hours (6 days 6 hours)
       interactive: Priority 10 :: Max Runtime 2 hours 

In order to run a job in a partition other than "express" you'll need to specifically request it using the

       --partition argument (--time=48:00:00 --partition=medium)

Graphical based jobs can be run as an interactive job using the sinteractive command

Support

If you have any questions about our documentation or need any help with Cheaha then please submit a request for support (login at link and click "Request this Service").

Cheaha is maintained by UAB IT's Research Computing team.