https://docs.uabgrid.uab.edu/w/api.php?action=feedcontributions&user=Mhanby%40uab.edu&feedformat=atomCheaha - User contributions [en]2024-03-19T13:10:49ZUser contributionsMediaWiki 1.38.2https://docs.uabgrid.uab.edu/w/index.php?title=Simplified_MATLAB_Install&diff=6198Simplified MATLAB Install2021-06-25T18:20:11Z<p>Mhanby@uab.edu: </p>
<hr />
<div>{{MatlabAppPage}}<br />
<br />
== MATLAB Simplified Install ==<br />
<br />
===Download and Installation===<br />
Follow the instructions detailed in [[Downloading and Installing MATLAB]]. An outline of these installation steps are as follows:<br />
# [http://www.mathworks.com/accesslogin/createProfile.do Create an account at the Mathworks site] using your campus @uab.edu email address. Please do not share your mathworks account ''username'' or ''password'' with anyone as this account will be associated with the UAB TAH license.<br />
#Request an [[Get a UAB Mathworks key |activation key]] from the [http://www.uab.edu/it/software/index.php UAB software library page]. Please make sure to request the appropriate key (Faculty/staff or student) as the software are on different licenses.<br />
#[[Downloading_and_Installing_MATLAB#Associate_with_the_UAB_TAH_license|Associate your Mathworks account]] with the campus-wide MATLAB license using your activation key.<br />
#[[Downloading_and_Installing_MATLAB#Download_Matlab|Download the software]] from the [http://www.mathworks.com/downloads/web_downloads/agent_check?s_cid=mwa-cmlndedl&mode=gwylf&refer=mwa mathworks download site] and [[Downloading_and_Installing_MATLAB#Install_Matlab|install MATLAB]]<br />
<br />
===Activation===<br />
Activate the software: Add the following content to the network.lic file in the installation and start MATLAB. <br />
SERVER lmgr.rc.uab.edu 27000<br />
USE_SERVER<br />
<br />
By default, the license file location is $MATLAB\licenses for all platforms. The license file name will be network.lic<br />
<br />
On Mac's this file is located in /Applications/MATLAB_R<VERSION>.app/licenses/network.lic<br />
<br />
On Linux systems it is usually located at /usr/local/matlabR<VERSION>/licenses/network.lic (conditional to the install location being /usr/local)<br />
<br />
'''NOTE:''' If the network.lic license file does not already exist on your system, you must create the network.lic file described above. <br />
<br />
<br />
'''NOTE:''' Please engage in the [[Talk:Simplified MATLAB Install|Discussion]] to improve documentation of the above steps.<br />
<br />
===Installation Help===<br />
If you get stuck or have problems with the ''self-service'' installation you may contact support@listserv.uab.edu for installation or usage issues, or [mailto:support@mathworks.com support@mathworks.com]. Mathworks will want your license number: 678600. <br />
<br />
You must be faculty, staff, post-doc, or a grad student using Mathworks software for UAB research activities to activate this software.<br />
<br />
== Common Issues ==<br />
=== libXp.so.6 Error when starting MATLAB on a Red Hat / Fedora workstation ===<br />
When executing 'matlab' on a Red Hat / Fedora workstation you may receive the following error:<br />
<br />
''''bin/glnxa64/MATLAB: error while loading shared libraries: libXp.so.6: cannot open shared object file: No such file or directory"'''<br />
<br />
The solution is to install the libXp package (this will require root privileges)<br />
$ sudo yum install libXp<br />
<br />
<br />
{{MATLAB Support}}<br />
<br />
[[Category:MATLAB]][[Category:MATLAB installation]]</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Anaconda&diff=6116Anaconda2020-12-22T17:11:30Z<p>Mhanby@uab.edu: </p>
<hr />
<div>[https://conda.io/docs/user-guide/overview.html Conda] is a powerful package manager and environment manager. Conda allows you to maintain distinct environments for your different projects, with dependency packages defined and installed for each project.<br />
<br />
===Creating a Conda virtual environment===<br />
First step, direct conda to store files in $USER_DATA to avoid filling up $HOME. Create the '''$HOME/.condarc''' file by running the following code:<br />
<pre><br />
cat << "EOF" > ~/.condarc<br />
pkgs_dirs:<br />
- $USER_DATA/.conda/pkgs<br />
envs_dirs:<br />
- $USER_DATA/.conda/envs<br />
EOF<br />
</pre><br />
<br />
Load one of the conda environments available on Cheaha (Note, starting with Anaconda 2018.12, Anaconda releases changed to using YYYY.MM format for version numbers):<br />
<pre><br />
$ module -t avail Anaconda<br />
...<br />
Anaconda3/5.3.0<br />
Anaconda3/5.3.1<br />
Anaconda3/2019.10<br />
<br />
</pre><br />
<pre><br />
$ module load Anaconda3/2019.10 <br />
</pre><br />
<br />
Once you have loaded Anaconda, you can create an environment using the following command (change '''test_env''' to whatever you want to name your environment):<br />
<pre><br />
$ conda create --name test_env<br />
<br />
Solving environment: done<br />
<br />
## Package Plan ##<br />
<br />
environment location: ~/.conda/envs/test_env<br />
<br />
added / updated specs:<br />
- setuptools<br />
<br />
<br />
The following packages will be downloaded:<br />
<br />
package | build<br />
---------------------------|-----------------<br />
python-3.7.0 | h6e4f718_3 30.6 MB<br />
wheel-0.32.1 | py37_0 35 KB<br />
setuptools-40.4.3 | py37_0 556 KB<br />
------------------------------------------------------------<br />
Total: 31.1 MB<br />
<br />
The following NEW packages will be INSTALLED:<br />
<br />
ca-certificates: 2018.03.07-0<br />
certifi: 2018.8.24-py37_1<br />
libedit: 3.1.20170329-h6b74fdf_2<br />
libffi: 3.2.1-hd88cf55_4<br />
libgcc-ng: 8.2.0-hdf63c60_1<br />
libstdcxx-ng: 8.2.0-hdf63c60_1<br />
ncurses: 6.1-hf484d3e_0<br />
openssl: 1.0.2p-h14c3975_0<br />
pip: 10.0.1-py37_0<br />
python: 3.7.0-h6e4f718_3<br />
readline: 7.0-h7b6447c_5<br />
setuptools: 40.4.3-py37_0<br />
sqlite: 3.25.2-h7b6447c_0<br />
tk: 8.6.8-hbc83047_0<br />
wheel: 0.32.1-py37_0<br />
xz: 5.2.4-h14c3975_4<br />
zlib: 1.2.11-ha838bed_2<br />
<br />
Proceed ([y]/n)? y<br />
<br />
Downloading and Extracting Packages<br />
python-3.7.0 | 30.6 MB | ########################################################################### | 100%<br />
wheel-0.32.1 | 35 KB | ########################################################################### | 100%<br />
setuptools-40.4.3 | 556 KB | ########################################################################### | 100%<br />
Preparing transaction: done<br />
Verifying transaction: done<br />
Executing transaction: done<br />
#<br />
# To activate this environment, use:<br />
# > source activate test_env<br />
#<br />
# To deactivate an active environment, use:<br />
# > source deactivate<br />
#<br />
</pre><br />
<br />
You can also specify the packages that you want to install in the conda virtual environment:<br />
<pre><br />
$ conda create --name test_env PACKAGE_NAME<br />
</pre><br />
<br />
===Listing all your conda virtual environments===<br />
In case you forget the name of your virtual environments, you can list all your virtual environments by running '''conda env list'''<br />
<pre><br />
$ conda env list<br />
# conda environments:<br />
#<br />
jupyter_test ~/.conda/envs/jupyter_test<br />
modeller ~/.conda/envs/modeller<br />
psypy3 ~/.conda/envs/psypy3<br />
test ~/.conda/envs/test<br />
test_env ~/.conda/envs/test_env<br />
test_pytorch ~/.conda/envs/test_pytorch<br />
tomopy ~/.conda/envs/tomopy<br />
base * /share/apps/rc/software/Anaconda3/5.2.0<br />
DeepNLP /share/apps/rc/software/Anaconda3/5.2.0/envs/DeepNLP<br />
ubrite-jupyter-base-1.0 /share/apps/rc/software/Anaconda3/5.2.0/envs/ubrite-jupyter-base-1.0<br />
</pre><br />
NOTE: Virtual environment with the asterisk(*) next to it is the one that's currently active.<br />
<br />
===Activating a conda virtual environment===<br />
You can activate your virtual environment for use by running '''source activate''' followed by '''conda activate ENV_NAME'''<br />
<br />
<pre><br />
$ source activate<br />
$ conda activate test_env<br />
(test_env) $<br />
</pre><br />
<br />
NOTE: Your shell prompt would also include the name of the virtual environment that you activated.<br />
<br />
<br />
'''IMPORTANT!'''<br />
<br />
The following only applies to versions prior to 2019.10. '''source activate <env>''' is not idempotent. Using it twice with the same environment in a given session can lead to unexpected behavior. The recommended workflow is to use '''source activate''' to source the '''conda activate''' script, followed by '''conda activate <env>'''.<br />
<br />
From version 2019.10 and on, simply use '''conda activate <env>'''.<br />
<br />
===Locate and install packages===<br />
Conda allows you to search for packages that you want to install:<br />
<pre><br />
(test_env) $ conda search BeautifulSoup4<br />
Loading channels: done<br />
# Name Version Build Channel<br />
beautifulsoup4 4.4.0 py27_0 pkgs/free<br />
beautifulsoup4 4.4.0 py34_0 pkgs/free<br />
beautifulsoup4 4.4.0 py35_0 pkgs/free<br />
...<br />
beautifulsoup4 4.6.3 py35_0 pkgs/main<br />
beautifulsoup4 4.6.3 py36_0 pkgs/main<br />
beautifulsoup4 4.6.3 py37_0 pkgs/main<br />
(test_env) $<br />
</pre><br />
NOTE: Search is case-insensitive<br />
<br />
You can install the packages in conda environment using<br />
<pre><br />
(test_env) $ conda install beautifulsoup4<br />
Solving environment: done<br />
<br />
## Package Plan ##<br />
<br />
environment location: ~/.conda/envs/test_env<br />
<br />
added / updated specs:<br />
- beautifulsoup4<br />
<br />
<br />
The following packages will be downloaded:<br />
<br />
package | build<br />
---------------------------|-----------------<br />
beautifulsoup4-4.6.3 | py37_0 138 KB<br />
<br />
The following NEW packages will be INSTALLED:<br />
<br />
beautifulsoup4: 4.6.3-py37_0<br />
<br />
Proceed ([y]/n)? y<br />
<br />
<br />
Downloading and Extracting Packages<br />
beautifulsoup4-4.6.3 | 138 KB | ########################################################################### | 100%<br />
Preparing transaction: done<br />
Verifying transaction: done<br />
Executing transaction: done<br />
(test_env) $<br />
</pre><br />
<br />
===Deactivating your virtual environment===<br />
You can deactivate your virtual environment using '''source deactivate'''<br />
<pre><br />
(test_env) $ source deactivate<br />
$<br />
</pre><br />
<br />
===Sharing an environment===<br />
You may want to share your environment with someone for testing or other purposes. Sharing the environemnt file for your virtual environment is the most starightforward metohd which allows other person to quickly create an environment identical to you.<br />
====Export environment====<br />
* Activate the virtual environment that you want to export.<br />
* Export an environment.yml file<br />
<pre><br />
conda env export -n test_env > environment.yml<br />
</pre><br />
* Now you can send the recently created environment.yml file to the other person.<br />
<br />
====Create a virtual environment using environment.yml====<br />
<pre><br />
conda env create -f environment.yml -n test_env<br />
</pre><br />
<br />
===Delete a conda virtual environment===<br />
You can use the '''remove''' parameter of conda to delete a conda virtual environment that you don't need:<br />
<pre><br />
$ conda remove --name test_env --all<br />
<br />
Remove all packages in environment ~/.conda/envs/test_env:<br />
<br />
<br />
## Package Plan ##<br />
<br />
environment location: ~/.conda/envs/test_env<br />
<br />
<br />
The following packages will be REMOVED:<br />
<br />
beautifulsoup4: 4.6.3-py37_0<br />
ca-certificates: 2018.03.07-0<br />
certifi: 2018.8.24-py37_1<br />
libedit: 3.1.20170329-h6b74fdf_2<br />
libffi: 3.2.1-hd88cf55_4<br />
libgcc-ng: 8.2.0-hdf63c60_1<br />
libstdcxx-ng: 8.2.0-hdf63c60_1<br />
ncurses: 6.1-hf484d3e_0<br />
openssl: 1.0.2p-h14c3975_0<br />
pip: 10.0.1-py37_0<br />
python: 3.7.0-h6e4f718_3<br />
readline: 7.0-h7b6447c_5<br />
setuptools: 40.4.3-py37_0<br />
sqlite: 3.25.2-h7b6447c_0<br />
tk: 8.6.8-hbc83047_0<br />
wheel: 0.32.1-py37_0<br />
xz: 5.2.4-h14c3975_4<br />
zlib: 1.2.11-ha838bed_2<br />
<br />
Proceed ([y]/n)? y<br />
</pre><br />
<br />
===Moving conda directory===<br />
As you build new conda environments, you may find that it is taking a lot of space in your $HOME directory. Here are 2 methods:<br />
<br />
Method 1: Move a pre-existing conda directory and create a symlink<br />
<pre><br />
cd ~<br />
mv ~/.conda $USER_DATA/<br />
ln -s $USER_DATA/.conda .conda<br />
</pre><br />
<br />
Method 2: Create a "$HOME/.condarc" file in the $HOME directory by running the following code<br />
<pre><br />
cat << "EOF" > ~/.condarc<br />
pkgs_dirs:<br />
- $USER_DATA/.conda/pkgs<br />
envs_dirs:<br />
- $USER_DATA/.conda/envs<br />
EOF<br />
</pre></div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Python_Virtual_Environment&diff=6115Python Virtual Environment2020-12-18T19:28:36Z<p>Mhanby@uab.edu: </p>
<hr />
<div>Python virtual environment is a method of creating an isolated environment for Python projects. It enables each project to have its own dependencies, regardless of what dependencies every other project has. To read more about Python virtual environments, click [https://docs.python.org/3/tutorial/venv.html here].<br />
<br />
===Creating a Python Virtual Environment===<br />
Load one of the Python modules available on Cheaha in your environment.<br />
<pre><br />
[snoopy@c1 ~]$ module avail Python<br />
<br />
-------------------------- /share/apps/rc/modules/all --------------------------<br />
Python/2.7.10-goolf-1.7.20 Python/2.7.13-intel-2017a<br />
Python/2.7.10-intel-2015b Python/2.7.3-foss-2016a<br />
Python/2.7.11-foss-2016a Python/2.7.3-goolf-1.7.20<br />
Python/2.7.11-foss-2016b Python/2.7.5-goolf-1.7.20<br />
Python/2.7.11-goolf-1.7.20 Python/2.7.8-intel-2015b<br />
Python/2.7.11-intel-2015b Python/2.7.9-goolf-1.7.20<br />
Python/2.7.11-intel-2016a Python/2.7.9-intel-2015b<br />
Python/2.7.12-foss-2016a Python/3.2.3-goolf-1.7.20<br />
Python/2.7.12-foss-2016b Python/3.5.1-foss-2016a<br />
Python/2.7.12-intel-2015b Python/3.5.1-intel-2016a<br />
Python/2.7.12-intel-2016a Python/3.6.1-intel-2017a<br />
Python/2.7.13-GCCcore-6.3.0-bare Python/3.6.3-intel-2017a<br />
</pre><br />
<br />
Once you have loaded Python, we would use '''virtualenv''' to create and manage virtual environments.<br />
<pre><br />
[snoopy@c1 Python_Environments]$ module load Python/3.6.3-intel-2017a <br />
[snoopy@c1 Python_Environments]$ virtualenv test_environment<br />
Using base prefix '/share/apps/rc/software/Python/3.6.3-intel-2017a'<br />
New python executable in /data/user/snoopy/Python_Environments/test_environment/bin/python<br />
Installing setuptools, pip, wheel...done.<br />
[snoopy@c1 Python_Environments]$<br />
</pre><br />
<br />
===Activating a Virtual Environment===<br />
Once a virtual environment has been created, you need to activate it to be in the virtual environment.<br />
<pre><br />
[snoopy@c1 Python_Environments]$ source test_environment/bin/activate<br />
(test_environment) [snoopy@c1 Python_Environments]$<br />
</pre><br />
Activating the virtual environment will change your shell’s prompt to show what virtual environment you’re using, test_environment in the above case, and modify the environment so that you can install Python packages for that particular environment.<br />
<br />
===Maintaining a Virtual Environment===<br />
After this you can install the packages that you would like for this environment, using '''pip'''. [https://pip.pypa.io/en/stable/ pip] is a package management system used to install and manage software packages written in Python.<br />
<pre><br />
(test_environment) [snoopy@c1 Python_Environments]$ pip install numpy<br />
Collecting numpy<br />
Downloading numpy-1.14.0-cp36-cp36m-manylinux1_x86_64.whl (17.2MB)<br />
100% |████████████████████████████████| 17.2MB 77kB/s <br />
Installing collected packages: numpy<br />
Successfully installed numpy-1.14.0<br />
(test_environment) [snoopy@c1 Python_Environments]$ ls test_environment/lib/python3.6/site-packages/<br />
easy_install.py pip-9.0.1.dist-info setuptools-38.4.0.dist-info<br />
numpy pkg_resources wheel<br />
numpy-1.14.0.dist-info __pycache__ wheel-0.30.0.dist-info<br />
pip setuptools<br />
(test_environment) [snoopy@c1 Python_Environments]$<br />
</pre><br />
<br />
You can use this method to install a Python application alongside all the dependencies that it requires. <br />
<br />
===Deactivating a Virtual Environment===<br />
After you are done using the virtual environment, you can use '''deactivate''' command to go back to your bash shell environemnt.<br />
<pre><br />
(test_environment) [snoopy@c1 Python_Environments]$ deactivate <br />
[snoopy@c1 Python_Environments]$<br />
</pre><br />
It would change your shell's prompt and remove the name of the virtual environment that you were in.<br />
<br />
===Sharing a virtual environment===<br />
You can use '''pip freeze''' to list all the packages in a virtual environment and copy it to a '''requirement.txt.file'''<br />
<pre><br />
pip freeze > requirements.txt<br />
</pre><br />
<br />
Now you can create new virtualenv and after activating that virtual environment, install all the packages using the following command.<br />
<pre><br />
pip install -r requirements.txt<br />
</pre></div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Collaborator_Account&diff=6114Collaborator Account2020-12-07T21:32:48Z<p>Mhanby@uab.edu: Added XIAS VPN and Duo instructions</p>
<hr />
<div>This page describes the process for a UAB employee to request for a Cheaha account for an external collaborator (i.e. a person who does not have a UAB BlazerID).<br />
<br />
==Create XIAS Account==<br />
XIAS Accounts, or external access account. allows UAB employees to sponsor external collaborators to utilize some of the UAB resources for which the user has been granted access. Creating XIAS account is a self-service interface which allows you to sponsor and create an account for your collaborator at [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website].<br />
<br />
For additional information, see the [https://apps.idm.uab.edu/xias/top XIAS help page].<br />
<br />
'''Going through the sponsorship process, you are stating that you know the individual(s) and are responsible for their actions while they are using the XIAS accounts.'''<br />
<br />
===Create a site===<br />
The [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website] has two options on the left-hand panel: [https://idm.uab.edu/cgi-cas/xrmi/sites Manage Projects/Sites] and [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.]<br />
<br />
* Choose Manage '''Projects/sites'''<br />
*Click '''New''' to create a new site<br />
* Fill in all the information i.e. Short Description, Long Description, Start date and End date <br />
** '''End date''' is the expiration date for the site, the users added in the next section cannot have an expiration date beyond the sites '''End date'''. The dates should be in the format '''YYYY-MM-DD'''<br />
** URIs are the resources that the sponsored users should have access to. If the resources are applications or servers then the manager of that resource must do what is necessary within that resource to authorize the external users to gain access. Add the following for Cheaha access:<br />
*** '''https://rc.uab.edu'''<br />
* Click on '''Add''' button to create the site<br />
<br />
=== Create a user ===<br />
Once the new site has been created:<br />
<br />
* Click [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.] from the left hand panel<br />
* In the drop-down select your XIAS site<br />
* Click the '''Register''' button to add new users<br />
** Enter an end date for the new site user(s) in the format '''YYYY-MM-DD'''. The date cannot extend past the site's end date!<br />
** Enter the collaborator's email address in the box under the end date. You can add multiple users by putting each on a separate line.<br />
<br />
=== Collaborator ===<br />
Inform the collaborator(s) to expect an emails from the following addresses containing instructions:<br />
* '''UAB Identity Management''' [[userservices@uab.edu]] <br />
* '''UAB External ID admin''' [[ph-admin@uab.edu]]<br />
<br />
They will need to complete the process within '''72 hours''' of receipt of the email(s)!<br />
<br />
==Request an account on Cheaha==<br />
Once the steps above have been completed of adding/sponsoring XIAS account for your collaborator, send us an email on support@listserv.uab.edu with information about the collaborator. Please don't forget to include their PrimaryID and email address which you used to create their XIAS account, as it would become their Username on [[Cheaha]].<br />
<br />
==UAB VPN Access==<br />
If the collaborators will need to access UAB VPN, you will need to add the following to your projects '''URIs''' list:<br />
<br />
* '''https://vpn.ad.uab.edu/'''<br />
<br />
The account used to login to UAB VPN uses a different format than their email address. This can be found by going to '''Manage Users''', select your '''Project Site''' and click '''List'''. The VPN account names are listed under the column '''AD account''' and will use the syntax '''XXXX-XXXXX-X'''', ex: '''xias-jdoe-3'''.<br />
<br />
===UAB 2 Factor Authentication (2FA)===<br />
UAB VPN now requires the use of UAB 2FA (provided by Duo). The collaborator will need to call AskIT at '''(205) 996-5555''' to register a device for Duo before they'll be able to connect to VPN.<br />
<br />
After they've been registered with Duo, the collaborator will need to install the Cisco VPN client: [https://www.uab.edu/it/home/tech-solutions/network/vpn]<br />
<br />
Once installed, they'll launch '''Cisco AnyConnect''' and enter '''vpn.uab.edu''' in the field and click '''Connect'''. Their user name / account will be the previously mentioned '''XXXX-XXXXX-X'''' format, not their email address (they should have received an email with this account ID.</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Collaborator_Account&diff=6113Collaborator Account2020-12-04T18:00:09Z<p>Mhanby@uab.edu: /* Collaborator */</p>
<hr />
<div>This page describes the process for a UAB employee to request for a Cheaha account for an external collaborator (i.e. a person who does not have a UAB BlazerID).<br />
<br />
==Create XIAS Account==<br />
XIAS Accounts, or external access account. allows UAB employees to sponsor external collaborators to utilize some of the UAB resources for which the user has been granted access. Creating XIAS account is a self-service interface which allows you to sponsor and create an account for your collaborator at [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website].<br />
<br />
For additional information, see the [https://apps.idm.uab.edu/xias/top XIAS help page].<br />
<br />
'''Going through the sponsorship process, you are stating that you know the individual(s) and are responsible for their actions while they are using the XIAS accounts.'''<br />
<br />
===Create a site===<br />
The [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website] has two options on the left-hand panel: [https://idm.uab.edu/cgi-cas/xrmi/sites Manage Projects/Sites] and [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.]<br />
<br />
* Choose Manage '''Projects/sites'''<br />
*Click '''New''' to create a new site<br />
* Fill in all the information i.e. Short Description, Long Description, Start date and End date <br />
** '''End date''' is the expiration date for the site, the users added in the next section cannot have an expiration date beyond the sites '''End date'''. The dates should be in the format '''YYYY-MM-DD'''<br />
** URIs are the resources that the sponsored users should have access to. If the resources are applications or servers then the manager of that resource must do what is necessary within that resource to authorize the external users to gain access. Add the following for Cheaha access:<br />
*** '''VPN.DPO.UAB.EDU'''<br />
*** '''rc.uab.edu'''<br />
* Click on '''Add''' button to create the site<br />
<br />
=== Create a user ===<br />
Once the new site has been created:<br />
<br />
* Click [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.] from the left hand panel<br />
* In the drop-down select your XIAS site<br />
* Click the '''Register''' button to add new users<br />
** Enter an end date for the new site user(s) in the format '''YYYY-MM-DD'''. The date cannot extend past the site's end date!<br />
** Enter the collaborator's email address in the box under the end date. You can add multiple users by putting each on a separate line.<br />
<br />
=== Collaborator ===<br />
Inform the collaborator(s) to expect an emails from the following addresses containing instructions:<br />
* '''UAB Identity Management''' [[userservices@uab.edu]] <br />
* '''UAB External ID admin''' [[ph-admin@uab.edu]]<br />
<br />
They will need to complete the process within '''72 hours''' of receipt of the email(s)!<br />
<br />
==Request an account on Cheaha==<br />
Once the steps above have been completed of adding/sponsoring XIAS account for your collaborator, send us an email on support@listserv.uab.edu with information about the collaborator. Please don't forget to include their PrimaryID and email address which you used to create their XIAS account, as it would become their Username on [[Cheaha]].</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=SSH_Key_Authentication&diff=6085SSH Key Authentication2020-05-28T15:29:45Z<p>Mhanby@uab.edu: /* Linux and Mac */</p>
<hr />
<div>== SSH Key Generation ==<br />
These instructions assist new Cheaha users to access the cluster using an SSH client.<br />
<br />
===Mac OS X===<br />
<br />
* On your Mac open '''Terminal''' application. <br />
* Run the following command on your '''terminal: <br />
<pre><br />
ssh-keygen -t rsa<br />
</pre> <br />
* You can put a passphrase for your SSH key (''' Not mandatory but highly recommended''')<br />
* A '''id_rsa.pub''' file would have been created.<br />
* Open the file by running '''less .ssh/id_rsa.pub''' and copy the content.<br />
* Press '''q''' to exit out of the file.<br />
* Now SSH to your '''cheaha.rc.uab.edu''' account , and paste the content in '''~/.ssh/authorized_keys''' using your favorite editor.<br />
* Now '''log out''' from cheaha.rc.uab.edu and login again. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run '''ssh blazerid@cheaha.rc.uab.edu''' from next time.<br />
<br />
===Linux===<br />
<br />
* On your linux machine open '''Terminal''' application. <br />
* Run the following command on your '''terminal: <br />
<pre><br />
ssh-keygen -t rsa<br />
</pre> <br />
* You can put a passphrase for your SSH key (''' Not mandatory but highly recommended''')<br />
* A '''id_rsa.pub''' file would have been created.<br />
* Open the file by running '''less .ssh/id_rsa.pub''' and copy the content.<br />
* Press '''q''' to exit out of the file.<br />
* Now SSH to your '''cheaha.rc.uab.edu''' account , and paste the content in '''~/.ssh/authorized_keys''' using your favorite editor.<br />
* Now '''log out''' from cheaha.rc.uab.edu and login again. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run '''ssh blazerid@cheaha.rc.uab.edu''' from next time.<br />
<br />
===Windows===<br />
<br />
====Putty====<br />
<br />
You will require a tool called '''puttygen''', to generate SSH keys for the pairing purpose. You can download it [http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html here]. Once you have downloaded and installed '''putty''' and '''puttygen''', follow these instructions:<br />
<br />
* Launch PuTTY Key Generator.<br />
* Click the Generate button and press a series of random keystrokes to aid in generating the key.<br />
* Enter a unique key passphrase in the Key passphrase and Confirm passphrase fields.<br />
* Save the public and private keys by clicking the Save public key and Save private key buttons.<br />
* Right click the filed '''Public key for pasting into OpenSSH authorized_keys file''', choose '''Select All''', right click again and select Copy<br />
* Now open application '''Putty'''.<br />
* Set up your session for '''cheaha.rc.uab.edu''' in PuTTy. (If you don't know how, follow these [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#PuTTY instructions]).<br />
* Login to your Cheaha account.<br />
* Paste the content of the '''Public key''' that you previously copied to the clip board in '''Puttygen''' into the '''~/.ssh/authorized_keys''' file using your favorite editor.<br />
* Now select your saved session for '''cheaha.rc.uab.edu'''.<br />
* Click '''Connection > SSH > Auth''' in the left-hand navigation pane and configure the private key to use by clicking Browse under Private key file for authentication.<br />
* Navigate to the location where you saved your private key earlier, select the file, and click Open.<br />
* The private key path is now displayed in the Private key file for authentication field.<br />
* Click Session in the left-hand navigation pane and click '''Save''' in the Load, save or delete a stored session section.<br />
* Click Open to begin your session with the server. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run your '''cheaha.rc.uab.edu''' profile from next time.<br />
<br />
====SSH Secure Shell Client====<br />
<br />
* In SSH Secure Shell, from the '''Edit''' menu, select '''Settings...''' <br />
* In the window that opens, select '''Global Settings''', then '''User Authentication''', and then '''Keys'''.<br />
* Under "Key pair management", click Generate New.... In the window that appears, click Next.<br />
* In the Key Generation window that appears:<br />
** From the drop-down list next to '''Key Type:''', select from the following:<br />
***If you want to take less time to initially generate the key, select '''DSA'''.<br />
*** If you want to take less time during each connection for the server to verify your key, select '''RSA'''.<br />
** From the the drop-down list next to '''Key Length:''', select at least '''1024'''. You may choose a greater key length, but the time it takes to generate the key, as well as the time it takes to authenticate using it, will go up.<br />
* Click '''Next'''. The key generation process will start. When it's complete, click Next again.<br />
* In the '''File Name:''' field, enter a name for the file where SSH Secure Shell will store your '''private key'''. Your '''public key''' will be stored in a file with the same name, plus a '''.pub extension'''. <br />
** '''Important:''' You can put a passphrase for your SSH key ( Not mandatory but highly recommended)<br />
* To complete the key generation process, click '''Next''', and then '''Finish'''.<br />
* At the '''Settings''' screen, click '''OK'''.<br />
* Copy the content of .pub file generated.<br />
* Now SSH to your '''cheaha.rc.uab.edu''' account, following the instructions [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#SSH_Secure_Shell_Client here] , and paste the content in '''~/.ssh/authorized_keys''' using your favorite editor.<br />
* Now '''exit/logout''' from your account on '''cheaha.uabgrid.uab.edu''' and login again. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run your '''cheaha.rc.uab.edu''' profile from next time.<br />
<br />
== SSH Passphrases ==<br />
It is highly recommended that users protect their SSH key by using a passphrase (see above for SSH key generation instructions). This section explains how to use the '''ssh-add''' command to avoid having to type your passphrase each time you use SSH to connect to Cheaha.<br />
<br />
=== Linux and Mac ===<br />
Open the terminal application and run the following command (make sure to use the backtick (`) not single quotes (')). We wrap this inside of an ''if statement'' to avoid starting more than one '''ssh-agent''', you only need one!<br />
<pre><br />
if [[ "$(pgrep -U $USER ssh-agent)" == "" ]]; then eval `ssh-agent`; fi<br />
</pre><br />
<br />
The run the ssh-add command to load your SSH key (if you have multiple keys you can specify the specific key to use by providing the path and file name: '''ssh-add ~/.ssh/id_rsa''')<br />
<pre><br />
ssh-add<br />
</pre><br />
<br />
You can list the SSH public keys that are currently represented by the agent by running this command.<br />
<pre><br />
ssh-add -L<br />
</pre><br />
<br />
Enter your private key passphrase. Now your passphrase is stored and you'll be able to SSH to Cheaha without being prompted for the passphrase.<br />
<pre><br />
ssh cheaha.rc.uab.edu<br />
</pre></div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=SSH_Key_Authentication&diff=6084SSH Key Authentication2020-05-28T15:29:14Z<p>Mhanby@uab.edu: /* SSH Passphrases */</p>
<hr />
<div>== SSH Key Generation ==<br />
These instructions assist new Cheaha users to access the cluster using an SSH client.<br />
<br />
===Mac OS X===<br />
<br />
* On your Mac open '''Terminal''' application. <br />
* Run the following command on your '''terminal: <br />
<pre><br />
ssh-keygen -t rsa<br />
</pre> <br />
* You can put a passphrase for your SSH key (''' Not mandatory but highly recommended''')<br />
* A '''id_rsa.pub''' file would have been created.<br />
* Open the file by running '''less .ssh/id_rsa.pub''' and copy the content.<br />
* Press '''q''' to exit out of the file.<br />
* Now SSH to your '''cheaha.rc.uab.edu''' account , and paste the content in '''~/.ssh/authorized_keys''' using your favorite editor.<br />
* Now '''log out''' from cheaha.rc.uab.edu and login again. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run '''ssh blazerid@cheaha.rc.uab.edu''' from next time.<br />
<br />
===Linux===<br />
<br />
* On your linux machine open '''Terminal''' application. <br />
* Run the following command on your '''terminal: <br />
<pre><br />
ssh-keygen -t rsa<br />
</pre> <br />
* You can put a passphrase for your SSH key (''' Not mandatory but highly recommended''')<br />
* A '''id_rsa.pub''' file would have been created.<br />
* Open the file by running '''less .ssh/id_rsa.pub''' and copy the content.<br />
* Press '''q''' to exit out of the file.<br />
* Now SSH to your '''cheaha.rc.uab.edu''' account , and paste the content in '''~/.ssh/authorized_keys''' using your favorite editor.<br />
* Now '''log out''' from cheaha.rc.uab.edu and login again. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run '''ssh blazerid@cheaha.rc.uab.edu''' from next time.<br />
<br />
===Windows===<br />
<br />
====Putty====<br />
<br />
You will require a tool called '''puttygen''', to generate SSH keys for the pairing purpose. You can download it [http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html here]. Once you have downloaded and installed '''putty''' and '''puttygen''', follow these instructions:<br />
<br />
* Launch PuTTY Key Generator.<br />
* Click the Generate button and press a series of random keystrokes to aid in generating the key.<br />
* Enter a unique key passphrase in the Key passphrase and Confirm passphrase fields.<br />
* Save the public and private keys by clicking the Save public key and Save private key buttons.<br />
* Right click the filed '''Public key for pasting into OpenSSH authorized_keys file''', choose '''Select All''', right click again and select Copy<br />
* Now open application '''Putty'''.<br />
* Set up your session for '''cheaha.rc.uab.edu''' in PuTTy. (If you don't know how, follow these [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#PuTTY instructions]).<br />
* Login to your Cheaha account.<br />
* Paste the content of the '''Public key''' that you previously copied to the clip board in '''Puttygen''' into the '''~/.ssh/authorized_keys''' file using your favorite editor.<br />
* Now select your saved session for '''cheaha.rc.uab.edu'''.<br />
* Click '''Connection > SSH > Auth''' in the left-hand navigation pane and configure the private key to use by clicking Browse under Private key file for authentication.<br />
* Navigate to the location where you saved your private key earlier, select the file, and click Open.<br />
* The private key path is now displayed in the Private key file for authentication field.<br />
* Click Session in the left-hand navigation pane and click '''Save''' in the Load, save or delete a stored session section.<br />
* Click Open to begin your session with the server. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run your '''cheaha.rc.uab.edu''' profile from next time.<br />
<br />
====SSH Secure Shell Client====<br />
<br />
* In SSH Secure Shell, from the '''Edit''' menu, select '''Settings...''' <br />
* In the window that opens, select '''Global Settings''', then '''User Authentication''', and then '''Keys'''.<br />
* Under "Key pair management", click Generate New.... In the window that appears, click Next.<br />
* In the Key Generation window that appears:<br />
** From the drop-down list next to '''Key Type:''', select from the following:<br />
***If you want to take less time to initially generate the key, select '''DSA'''.<br />
*** If you want to take less time during each connection for the server to verify your key, select '''RSA'''.<br />
** From the the drop-down list next to '''Key Length:''', select at least '''1024'''. You may choose a greater key length, but the time it takes to generate the key, as well as the time it takes to authenticate using it, will go up.<br />
* Click '''Next'''. The key generation process will start. When it's complete, click Next again.<br />
* In the '''File Name:''' field, enter a name for the file where SSH Secure Shell will store your '''private key'''. Your '''public key''' will be stored in a file with the same name, plus a '''.pub extension'''. <br />
** '''Important:''' You can put a passphrase for your SSH key ( Not mandatory but highly recommended)<br />
* To complete the key generation process, click '''Next''', and then '''Finish'''.<br />
* At the '''Settings''' screen, click '''OK'''.<br />
* Copy the content of .pub file generated.<br />
* Now SSH to your '''cheaha.rc.uab.edu''' account, following the instructions [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#SSH_Secure_Shell_Client here] , and paste the content in '''~/.ssh/authorized_keys''' using your favorite editor.<br />
* Now '''exit/logout''' from your account on '''cheaha.uabgrid.uab.edu''' and login again. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run your '''cheaha.rc.uab.edu''' profile from next time.<br />
<br />
== SSH Passphrases ==<br />
It is highly recommended that users protect their SSH key by using a passphrase (see above for SSH key generation instructions). This section explains how to use the '''ssh-add''' command to avoid having to type your passphrase each time you use SSH to connect to Cheaha.<br />
<br />
=== Linux and Mac ===<br />
Open the terminal application and run the following command (make sure to use the backtick (`) not single quotes (')). We wrap this inside of an if statement to avoid starting more than one '''ssh-agent''', you only need one!<br />
<pre><br />
if [[ "$(pgrep -U $USER ssh-agent)" == "" ]]; then eval `ssh-agent`; fi<br />
</pre><br />
<br />
The run the ssh-add command to load your SSH key (if you have multiple keys you can specify the specific key to use by providing the path and file name: '''ssh-add ~/.ssh/id_rsa''')<br />
<pre><br />
ssh-add<br />
</pre><br />
<br />
You can list the SSH public keys that are currently represented by the agent by running this command.<br />
<pre><br />
ssh-add -L<br />
</pre><br />
<br />
Enter your private key passphrase. Now your passphrase is stored and you'll be able to SSH to Cheaha without being prompted for the passphrase.<br />
<pre><br />
ssh cheaha.rc.uab.edu<br />
</pre></div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=SSH_Key_Authentication&diff=6083SSH Key Authentication2020-05-26T18:22:41Z<p>Mhanby@uab.edu: </p>
<hr />
<div>== SSH Key Generation ==<br />
These instructions assist new Cheaha users to access the cluster using an SSH client.<br />
<br />
===Mac OS X===<br />
<br />
* On your Mac open '''Terminal''' application. <br />
* Run the following command on your '''terminal: <br />
<pre><br />
ssh-keygen -t rsa<br />
</pre> <br />
* You can put a passphrase for your SSH key (''' Not mandatory but highly recommended''')<br />
* A '''id_rsa.pub''' file would have been created.<br />
* Open the file by running '''less .ssh/id_rsa.pub''' and copy the content.<br />
* Press '''q''' to exit out of the file.<br />
* Now SSH to your '''cheaha.rc.uab.edu''' account , and paste the content in '''~/.ssh/authorized_keys''' using your favorite editor.<br />
* Now '''log out''' from cheaha.rc.uab.edu and login again. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run '''ssh blazerid@cheaha.rc.uab.edu''' from next time.<br />
<br />
===Linux===<br />
<br />
* On your linux machine open '''Terminal''' application. <br />
* Run the following command on your '''terminal: <br />
<pre><br />
ssh-keygen -t rsa<br />
</pre> <br />
* You can put a passphrase for your SSH key (''' Not mandatory but highly recommended''')<br />
* A '''id_rsa.pub''' file would have been created.<br />
* Open the file by running '''less .ssh/id_rsa.pub''' and copy the content.<br />
* Press '''q''' to exit out of the file.<br />
* Now SSH to your '''cheaha.rc.uab.edu''' account , and paste the content in '''~/.ssh/authorized_keys''' using your favorite editor.<br />
* Now '''log out''' from cheaha.rc.uab.edu and login again. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run '''ssh blazerid@cheaha.rc.uab.edu''' from next time.<br />
<br />
===Windows===<br />
<br />
====Putty====<br />
<br />
You will require a tool called '''puttygen''', to generate SSH keys for the pairing purpose. You can download it [http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html here]. Once you have downloaded and installed '''putty''' and '''puttygen''', follow these instructions:<br />
<br />
* Launch PuTTY Key Generator.<br />
* Click the Generate button and press a series of random keystrokes to aid in generating the key.<br />
* Enter a unique key passphrase in the Key passphrase and Confirm passphrase fields.<br />
* Save the public and private keys by clicking the Save public key and Save private key buttons.<br />
* Right click the filed '''Public key for pasting into OpenSSH authorized_keys file''', choose '''Select All''', right click again and select Copy<br />
* Now open application '''Putty'''.<br />
* Set up your session for '''cheaha.rc.uab.edu''' in PuTTy. (If you don't know how, follow these [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#PuTTY instructions]).<br />
* Login to your Cheaha account.<br />
* Paste the content of the '''Public key''' that you previously copied to the clip board in '''Puttygen''' into the '''~/.ssh/authorized_keys''' file using your favorite editor.<br />
* Now select your saved session for '''cheaha.rc.uab.edu'''.<br />
* Click '''Connection > SSH > Auth''' in the left-hand navigation pane and configure the private key to use by clicking Browse under Private key file for authentication.<br />
* Navigate to the location where you saved your private key earlier, select the file, and click Open.<br />
* The private key path is now displayed in the Private key file for authentication field.<br />
* Click Session in the left-hand navigation pane and click '''Save''' in the Load, save or delete a stored session section.<br />
* Click Open to begin your session with the server. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run your '''cheaha.rc.uab.edu''' profile from next time.<br />
<br />
====SSH Secure Shell Client====<br />
<br />
* In SSH Secure Shell, from the '''Edit''' menu, select '''Settings...''' <br />
* In the window that opens, select '''Global Settings''', then '''User Authentication''', and then '''Keys'''.<br />
* Under "Key pair management", click Generate New.... In the window that appears, click Next.<br />
* In the Key Generation window that appears:<br />
** From the drop-down list next to '''Key Type:''', select from the following:<br />
***If you want to take less time to initially generate the key, select '''DSA'''.<br />
*** If you want to take less time during each connection for the server to verify your key, select '''RSA'''.<br />
** From the the drop-down list next to '''Key Length:''', select at least '''1024'''. You may choose a greater key length, but the time it takes to generate the key, as well as the time it takes to authenticate using it, will go up.<br />
* Click '''Next'''. The key generation process will start. When it's complete, click Next again.<br />
* In the '''File Name:''' field, enter a name for the file where SSH Secure Shell will store your '''private key'''. Your '''public key''' will be stored in a file with the same name, plus a '''.pub extension'''. <br />
** '''Important:''' You can put a passphrase for your SSH key ( Not mandatory but highly recommended)<br />
* To complete the key generation process, click '''Next''', and then '''Finish'''.<br />
* At the '''Settings''' screen, click '''OK'''.<br />
* Copy the content of .pub file generated.<br />
* Now SSH to your '''cheaha.rc.uab.edu''' account, following the instructions [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#SSH_Secure_Shell_Client here] , and paste the content in '''~/.ssh/authorized_keys''' using your favorite editor.<br />
* Now '''exit/logout''' from your account on '''cheaha.uabgrid.uab.edu''' and login again. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run your '''cheaha.rc.uab.edu''' profile from next time.<br />
<br />
== SSH Passphrases ==<br />
It is highly recommended that users protect their SSH key by using a passphrase (see above for SSH key generation instructions). This section explains how to use the '''ssh-add''' command to avoid having to type your passphrase each time you use SSH to connect to Cheaha.<br />
<br />
=== Linux and Mac ===<br />
Open the terminal application and run the following command (make sure to use the backtick (`) not single quotes ('))<br />
<pre><br />
eval `ssh-agent`<br />
</pre><br />
<br />
The run the ssh-add command to load your SSH key (if you have multiple keys you can specify the specific key to use by providing the path and file name: '''ssh-add ~/.ssh/id_rsa''')<br />
<pre><br />
ssh-add<br />
</pre><br />
<br />
Enter your private key passphrase. Now your passphrase is stored and you'll be able to SSH to Cheaha without being prompted for the passphrase.</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Welcome&diff=6082Welcome2020-05-26T17:47:55Z<p>Mhanby@uab.edu: </p>
<hr />
<div>{{Main_Banner}}<br />
Welcome to the '''Research Computing System'''<br />
<br />
The Research Computing System (RCS) provides a framework for sharing data, accessing compute power, and collaborating with peers on campus and around the globe. Our goal is to construct a dynamic "network of services" that you can use to organize your data, study it, and share outcomes.<br />
<br />
''''docs'''' (the service you are looking at while reading this text) is one of a set of core services, or libraries, available for you to organize information you gather. Docs is a wiki, an online editor to collaboratively write and share documentation. ([http://en.wikipedia.org/wiki/Wiki Wiki is a Hawaiian term] meaning fast.) You can learn more about '''docs''' on the page [[UnderstandingDocs]]. The docs wiki is filled with pages that document the many different services and applications available on the Research Computing System. If you see information that looks out of date please don't hesitate to [mailto:support@vo.uabgrid.uab.edu ask about it] or fix it.<br />
<br />
The Research Computing System is designed to provide services to researchers in three core areas:<br />
<br />
* '''Data Analysis''' - using the High Performance Computing (HPC) fabric we call [[Cheaha]] for analyzing data and running simulations. Many [[Cheaha_Software|applications are already available]] or you can install your own <br />
* '''Data Sharing''' - supporting the trusted exchange of information using virtual data containers to spark new ideas<br />
* '''Application Development''' - providing virtual machines and web-hosted development tools empowering you to serve others with your research<br />
<br />
== Support and Development ==<br />
<br />
The Research Computing System is developed and supported by UAB IT's Research Computing Group. We are also developing a core set of applications to help you to easily incorporate our services into your research processes and this documentation collection to help you leverage the resources already available. We follow the best practices of the Open Source community and develop the RCS openly. You can follow our progress via the [http://dev.uabgrid.uab.edu our development wiki].<br />
<br />
The Research Computing System is an out growth of the UABgrid pilot, launched in September 2007 which has focused on demonstrating the utility of unlimited analysis, storage, and application for research. RCS is being built on the same technology foundations used by major cloud vendors and decades of distributed systems computing research, technology that powered the last ten years of large scale systems serving prominent national and international initiatives like the [http://opensciencegrid.org/ Open Science Grid], [http://xsede.org XSEDE], [http://www.teragrid.org/ TeraGrid], the [http://lcg.web.cern.ch/LCG/ LHC Computing Grid], and [https://cabig.nci.nih.gov caBIG].<br />
<br />
== Outreach ==<br />
<br />
The UAB IT Research Computing Group has collaborated with a number of prominent research projects at UAB to identify use cases and develop the requirements for the RCS. Our collaborators include the Center for Clinical and Translational Science (CCTS), Heflin Genomics Center, the Comprehensive Cancer Center (CCC), the Department of Computer and Information Sciences (CIS), the Department of Mechanical Engineering (ME), Lister Hill Library, the School of Optometry's Center for the Development of Functional Imaging, and Health System Information Services (HSIS). <br />
<br />
As part of the process of building this research computing platform, the UAB IT Research Computing Group has hosted an annual campus symposium on research computing and cyber-infrastructure (CI) developments and accomplishments. Starting as CyberInfrastructure (CI) Days in 2007, the name was changed to [http://docs.uabgrid.uab.edu/wiki/UAB_Research_Computing_Day '''UAB Research Computing Day'''] in 2011 to reflect the broader mission to support research. IT Research Computing also participates in other campus wide symposiums including UAB Research Core Day.<br />
<br />
== Featured Research Applications ==<br />
<br />
The Research Computing Group also helps support the campus MATLAB license with self-service installation documentation and supports using MATLAB on the HPC platform, providing a pathway to expand your computational power and freeing your laptop from serving as a compute platform.<br />
<br />
{{abox<br />
| UAB MATLAB Information |<br />
In January 2011, UAB acquired a site license from Mathworks for MATLAB, SimuLink and 42 Toolboxes. <br />
* Learn more about [[MATLAB|MATLAB and how you can use it at UAB]]<br />
* Learn more about the [[UAB TAH license|UAB Mathworks Site license]] and review [[Matlab site license FAQ|frequently asked questions about the license]]<br />
}}<br />
<br />
The UAB IT Research Computing group, the CCTS BMI, and [http://www.uab.edu/hcgs/bioinformatics Heflin Center for Genomic Science] have teamed up to help improve genomic research at UAB. Researchers can work with the scientists and research experts to produce a research pipeline from sequencing, to analysis, to publication.<br />
<br />
{{abox<br />
|'''Galaxy'''|<br />
A web front end to run analyses on the cluster fabric. Currently focused on NGS (Next Generation Sequencing; biology) analysis support. <br />
* [[Galaxy|Galaxy Project Home]]<br />
* [http://projects.uabgrid.uab.edu/galaxy Galaxy Development Wiki]<br />
}}<br />
<br />
== Data Backups ==<br />
<br />
Users of Cheaha are solely responsible for backing up their files. This includes files under '''/data/user''', '''/data/project''', and '''/home'''.<br />
<br />
{{ClusterDataBackup}}<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. Any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. We also request that you send us a list of publications based on your use of Cheaha resources.<br />
<br />
=== Description of Cheaha for Grants (short)===<br />
<br />
UAB IT Research Computing maintains high performance compute (HPC) and storage resources for investigators. The Cheaha compute cluster provides over 3744 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an EDR InfiniBand network and provides 528 TFLOP/s of aggregate theoretical peak performance. A high performance, 6.6PB raw GPFS storage on a DDN SFA14KX cluster with site replication to a DDN SFA12KX cluster, is also connected to the compute nodes via an InfiniBand fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
==== Cheaha HPC system ====<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (RC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. Research Computing in open collaboration with the campus research community is leading the design and development of these resources.<br />
<br />
==== Compute Resources ====<br />
<br />
Cheaha provides users with both a web based interface and a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. The local compute pool provides access to three generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The second generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs per node, and EDR InfiniBand interconnect. The third generation is composed of 35 nodes with EDR Infinband interconnect: 2x12 core (840 cores total) 2.60GHz Intel Xeon Gold 6126 compute nodes with 21 compute nodes at 192GB RAM, 10 nodes at 768GB RAM and 4 nodes at 1.5TB of RAM. The compute nodes combine to provide over 528 TFLOP/s of dedicated computing power.<br />
<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (OSG).<br />
<br />
==== Storage Resources ====<br />
<br />
The compute nodes on Cheaha are backed by high performance, 6.6PB raw GPFS storage on DDN SFA14KX hardware connected via an InfiniBand fabric. The non-scratch files on the GPFS cluster are replicated to 6.6PB raw storage on a DDN SFA12KX located in another building to provide site redundancy. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
==== Network Resources ====<br />
<br />
The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverages the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone, ensuring predictable latencies. In addition, the network also consist of a secure Science DMZ with data transfer nodes (DTNs), Perfsonar measurement nodes, and a Bro security node connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using 10 Gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
==== Personnel ====<br />
<br />
UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes an HPC Architect-Manager, four Software developers, two Scientists, a system administrator and a project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
{{Grant_Ack}}</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Welcome&diff=6081Welcome2020-05-26T17:44:45Z<p>Mhanby@uab.edu: /* Description of Cheaha for Grants (short) */</p>
<hr />
<div>{{Main_Banner}}<br />
Welcome to the '''Research Computing System'''<br />
<br />
The Research Computing System (RCS) provides a framework for sharing data, accessing compute power, and collaborating with peers on campus and around the globe. Our goal is to construct a dynamic "network of services" that you can use to organize your data, study it, and share outcomes.<br />
<br />
''''docs'''' (the service you are looking at while reading this text) is one of a set of core services, or libraries, available for you to organize information you gather. Docs is a wiki, an online editor to collaboratively write and share documentation. ([http://en.wikipedia.org/wiki/Wiki Wiki is a Hawaiian term] meaning fast.) You can learn more about '''docs''' on the page [[UnderstandingDocs]]. The docs wiki is filled with pages that document the many different services and applications available on the Research Computing System. If you see information that looks out of date please don't hesitate to [mailto:support@vo.uabgrid.uab.edu ask about it] or fix it.<br />
<br />
The Research Computing System is designed to provide services to researchers in three core areas:<br />
<br />
* '''Data Analysis''' - using the High Performance Computing (HPC) fabric we call [[Cheaha]] for analyzing data and running simulations. Many [[Cheaha_Software|applications are already available]] or you can install your own <br />
* '''Data Sharing''' - supporting the trusted exchange of information using virtual data containers to spark new ideas<br />
* '''Application Development''' - providing virtual machines and web-hosted development tools empowering you to serve others with your research<br />
<br />
== Support and Development ==<br />
<br />
The Research Computing System is developed and supported by UAB IT's Research Computing Group. We are also developing a core set of applications to help you to easily incorporate our services into your research processes and this documentation collection to help you leverage the resources already available. We follow the best practices of the Open Source community and develop the RCS openly. You can follow our progress via the [http://dev.uabgrid.uab.edu our development wiki].<br />
<br />
The Research Computing System is an out growth of the UABgrid pilot, launched in September 2007 which has focused on demonstrating the utility of unlimited analysis, storage, and application for research. RCS is being built on the same technology foundations used by major cloud vendors and decades of distributed systems computing research, technology that powered the last ten years of large scale systems serving prominent national and international initiatives like the [http://opensciencegrid.org/ Open Science Grid], [http://xsede.org XSEDE], [http://www.teragrid.org/ TeraGrid], the [http://lcg.web.cern.ch/LCG/ LHC Computing Grid], and [https://cabig.nci.nih.gov caBIG].<br />
<br />
== Outreach ==<br />
<br />
The UAB IT Research Computing Group has collaborated with a number of prominent research projects at UAB to identify use cases and develop the requirements for the RCS. Our collaborators include the Center for Clinical and Translational Science (CCTS), Heflin Genomics Center, the Comprehensive Cancer Center (CCC), the Department of Computer and Information Sciences (CIS), the Department of Mechanical Engineering (ME), Lister Hill Library, the School of Optometry's Center for the Development of Functional Imaging, and Health System Information Services (HSIS). <br />
<br />
As part of the process of building this research computing platform, the UAB IT Research Computing Group has hosted an annual campus symposium on research computing and cyber-infrastructure (CI) developments and accomplishments. Starting as CyberInfrastructure (CI) Days in 2007, the name was changed to [http://docs.uabgrid.uab.edu/wiki/UAB_Research_Computing_Day '''UAB Research Computing Day'''] in 2011 to reflect the broader mission to support research. IT Research Computing also participates in other campus wide symposiums including UAB Research Core Day.<br />
<br />
== Featured Research Applications ==<br />
<br />
The Research Computing Group also helps support the campus MATLAB license with self-service installation documentation and supports using MATLAB on the HPC platform, providing a pathway to expand your computational power and freeing your laptop from serving as a compute platform.<br />
<br />
{{abox<br />
| UAB MATLAB Information |<br />
In January 2011, UAB acquired a site license from Mathworks for MATLAB, SimuLink and 42 Toolboxes. <br />
* Learn more about [[MATLAB|MATLAB and how you can use it at UAB]]<br />
* Learn more about the [[UAB TAH license|UAB Mathworks Site license]] and review [[Matlab site license FAQ|frequently asked questions about the license]]<br />
}}<br />
<br />
The UAB IT Research Computing group, the CCTS BMI, and [http://www.uab.edu/hcgs/bioinformatics Heflin Center for Genomic Science] have teamed up to help improve genomic research at UAB. Researchers can work with the scientists and research experts to produce a research pipeline from sequencing, to analysis, to publication.<br />
<br />
{{abox<br />
|'''Galaxy'''|<br />
A web front end to run analyses on the cluster fabric. Currently focused on NGS (Next Generation Sequencing; biology) analysis support. <br />
* [[Galaxy|Galaxy Project Home]]<br />
* [http://projects.uabgrid.uab.edu/galaxy Galaxy Development Wiki]<br />
}}<br />
<br />
== Data Backups ==<br />
<br />
Users of Cheaha are solely responsible for backing up their files. This includes files under '''/data/user''', '''/data/project''', and '''/home'''.<br />
<br />
{{ClusterDataBackup}}<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. Any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. We also request that you send us a list of publications based on your use of Cheaha resources.<br />
<br />
=== Description of Cheaha for Grants (short)===<br />
<br />
UAB IT Research Computing maintains high performance compute (HPC) and storage resources for investigators. The Cheaha compute cluster provides over 3744 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an EDR InfiniBand network and provides 528 TFLOP/s of aggregate theoretical peak performance. A high performance, 6.6PB raw GPFS storage on a DDN SFA14KX cluster with site replication to a DDN SFA12KX cluster, is also connected to the compute nodes via an Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
==== Cheaha HPC system ====<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (RC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. Research Computing in open collaboration with the campus research community is leading the design and development of these resources.<br />
<br />
==== Compute Resources ====<br />
<br />
Cheaha provides users with both a web based interface and a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. The local compute pool provides access to three generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The second generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs per node, and EDR InfiniBand interconnect. The third generation is composed of 35 nodes with EDR Infinband interconnect: 2x12 core (840 cores total) 2.60GHz Intel Xeon Gold 6126 compute nodes with 21 compute nodes at 192GB RAM, 10 nodes at 768GB RAM and 4 nodes at 1.5TB of RAM. The compute nodes combine to provide over 528 TFLOP/s of dedicated computing power.<br />
<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (OSG).<br />
<br />
==== Storage Resources ====<br />
<br />
The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA14KX hardware connected via an Infiniband fabric. The non-scratch files on the GPFS cluster are replicated to 6.6PB raw storage on a DDN SFA12KX located in another building to provide site redundancy. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
==== Network Resources ====<br />
<br />
The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverages the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone, ensuring predictable latencies. In addition, the network also consist of a secure Science DMZ with data transfer nodes (DTNs), Perfsonar measurement nodes, and a Bro security node connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using 10 Gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
==== Personnel ====<br />
<br />
UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes an HPC Architect-Manager, four Software developers, two Scientists, a system administrator and a project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
{{Grant_Ack}}</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Welcome&diff=6080Welcome2020-05-26T17:40:55Z<p>Mhanby@uab.edu: /* Compute Resources */</p>
<hr />
<div>{{Main_Banner}}<br />
Welcome to the '''Research Computing System'''<br />
<br />
The Research Computing System (RCS) provides a framework for sharing data, accessing compute power, and collaborating with peers on campus and around the globe. Our goal is to construct a dynamic "network of services" that you can use to organize your data, study it, and share outcomes.<br />
<br />
''''docs'''' (the service you are looking at while reading this text) is one of a set of core services, or libraries, available for you to organize information you gather. Docs is a wiki, an online editor to collaboratively write and share documentation. ([http://en.wikipedia.org/wiki/Wiki Wiki is a Hawaiian term] meaning fast.) You can learn more about '''docs''' on the page [[UnderstandingDocs]]. The docs wiki is filled with pages that document the many different services and applications available on the Research Computing System. If you see information that looks out of date please don't hesitate to [mailto:support@vo.uabgrid.uab.edu ask about it] or fix it.<br />
<br />
The Research Computing System is designed to provide services to researchers in three core areas:<br />
<br />
* '''Data Analysis''' - using the High Performance Computing (HPC) fabric we call [[Cheaha]] for analyzing data and running simulations. Many [[Cheaha_Software|applications are already available]] or you can install your own <br />
* '''Data Sharing''' - supporting the trusted exchange of information using virtual data containers to spark new ideas<br />
* '''Application Development''' - providing virtual machines and web-hosted development tools empowering you to serve others with your research<br />
<br />
== Support and Development ==<br />
<br />
The Research Computing System is developed and supported by UAB IT's Research Computing Group. We are also developing a core set of applications to help you to easily incorporate our services into your research processes and this documentation collection to help you leverage the resources already available. We follow the best practices of the Open Source community and develop the RCS openly. You can follow our progress via the [http://dev.uabgrid.uab.edu our development wiki].<br />
<br />
The Research Computing System is an out growth of the UABgrid pilot, launched in September 2007 which has focused on demonstrating the utility of unlimited analysis, storage, and application for research. RCS is being built on the same technology foundations used by major cloud vendors and decades of distributed systems computing research, technology that powered the last ten years of large scale systems serving prominent national and international initiatives like the [http://opensciencegrid.org/ Open Science Grid], [http://xsede.org XSEDE], [http://www.teragrid.org/ TeraGrid], the [http://lcg.web.cern.ch/LCG/ LHC Computing Grid], and [https://cabig.nci.nih.gov caBIG].<br />
<br />
== Outreach ==<br />
<br />
The UAB IT Research Computing Group has collaborated with a number of prominent research projects at UAB to identify use cases and develop the requirements for the RCS. Our collaborators include the Center for Clinical and Translational Science (CCTS), Heflin Genomics Center, the Comprehensive Cancer Center (CCC), the Department of Computer and Information Sciences (CIS), the Department of Mechanical Engineering (ME), Lister Hill Library, the School of Optometry's Center for the Development of Functional Imaging, and Health System Information Services (HSIS). <br />
<br />
As part of the process of building this research computing platform, the UAB IT Research Computing Group has hosted an annual campus symposium on research computing and cyber-infrastructure (CI) developments and accomplishments. Starting as CyberInfrastructure (CI) Days in 2007, the name was changed to [http://docs.uabgrid.uab.edu/wiki/UAB_Research_Computing_Day '''UAB Research Computing Day'''] in 2011 to reflect the broader mission to support research. IT Research Computing also participates in other campus wide symposiums including UAB Research Core Day.<br />
<br />
== Featured Research Applications ==<br />
<br />
The Research Computing Group also helps support the campus MATLAB license with self-service installation documentation and supports using MATLAB on the HPC platform, providing a pathway to expand your computational power and freeing your laptop from serving as a compute platform.<br />
<br />
{{abox<br />
| UAB MATLAB Information |<br />
In January 2011, UAB acquired a site license from Mathworks for MATLAB, SimuLink and 42 Toolboxes. <br />
* Learn more about [[MATLAB|MATLAB and how you can use it at UAB]]<br />
* Learn more about the [[UAB TAH license|UAB Mathworks Site license]] and review [[Matlab site license FAQ|frequently asked questions about the license]]<br />
}}<br />
<br />
The UAB IT Research Computing group, the CCTS BMI, and [http://www.uab.edu/hcgs/bioinformatics Heflin Center for Genomic Science] have teamed up to help improve genomic research at UAB. Researchers can work with the scientists and research experts to produce a research pipeline from sequencing, to analysis, to publication.<br />
<br />
{{abox<br />
|'''Galaxy'''|<br />
A web front end to run analyses on the cluster fabric. Currently focused on NGS (Next Generation Sequencing; biology) analysis support. <br />
* [[Galaxy|Galaxy Project Home]]<br />
* [http://projects.uabgrid.uab.edu/galaxy Galaxy Development Wiki]<br />
}}<br />
<br />
== Data Backups ==<br />
<br />
Users of Cheaha are solely responsible for backing up their files. This includes files under '''/data/user''', '''/data/project''', and '''/home'''.<br />
<br />
{{ClusterDataBackup}}<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. Any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. We also request that you send us a list of publications based on your use of Cheaha resources.<br />
<br />
=== Description of Cheaha for Grants (short)===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2900 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an EDR InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
==== Cheaha HPC system ====<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (RC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. Research Computing in open collaboration with the campus research community is leading the design and development of these resources.<br />
<br />
==== Compute Resources ====<br />
<br />
Cheaha provides users with both a web based interface and a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. The local compute pool provides access to three generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The second generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs per node, and EDR InfiniBand interconnect. The third generation is composed of 35 nodes with EDR Infinband interconnect: 2x12 core (840 cores total) 2.60GHz Intel Xeon Gold 6126 compute nodes with 21 compute nodes at 192GB RAM, 10 nodes at 768GB RAM and 4 nodes at 1.5TB of RAM. The compute nodes combine to provide over 528 TFLOP/s of dedicated computing power.<br />
<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (OSG).<br />
<br />
==== Storage Resources ====<br />
<br />
The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA14KX hardware connected via an Infiniband fabric. The non-scratch files on the GPFS cluster are replicated to 6.6PB raw storage on a DDN SFA12KX located in another building to provide site redundancy. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
==== Network Resources ====<br />
<br />
The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverages the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone, ensuring predictable latencies. In addition, the network also consist of a secure Science DMZ with data transfer nodes (DTNs), Perfsonar measurement nodes, and a Bro security node connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using 10 Gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
==== Personnel ====<br />
<br />
UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes an HPC Architect-Manager, four Software developers, two Scientists, a system administrator and a project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
{{Grant_Ack}}</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Welcome&diff=6079Welcome2020-05-26T17:40:00Z<p>Mhanby@uab.edu: /* Storage Resources */</p>
<hr />
<div>{{Main_Banner}}<br />
Welcome to the '''Research Computing System'''<br />
<br />
The Research Computing System (RCS) provides a framework for sharing data, accessing compute power, and collaborating with peers on campus and around the globe. Our goal is to construct a dynamic "network of services" that you can use to organize your data, study it, and share outcomes.<br />
<br />
''''docs'''' (the service you are looking at while reading this text) is one of a set of core services, or libraries, available for you to organize information you gather. Docs is a wiki, an online editor to collaboratively write and share documentation. ([http://en.wikipedia.org/wiki/Wiki Wiki is a Hawaiian term] meaning fast.) You can learn more about '''docs''' on the page [[UnderstandingDocs]]. The docs wiki is filled with pages that document the many different services and applications available on the Research Computing System. If you see information that looks out of date please don't hesitate to [mailto:support@vo.uabgrid.uab.edu ask about it] or fix it.<br />
<br />
The Research Computing System is designed to provide services to researchers in three core areas:<br />
<br />
* '''Data Analysis''' - using the High Performance Computing (HPC) fabric we call [[Cheaha]] for analyzing data and running simulations. Many [[Cheaha_Software|applications are already available]] or you can install your own <br />
* '''Data Sharing''' - supporting the trusted exchange of information using virtual data containers to spark new ideas<br />
* '''Application Development''' - providing virtual machines and web-hosted development tools empowering you to serve others with your research<br />
<br />
== Support and Development ==<br />
<br />
The Research Computing System is developed and supported by UAB IT's Research Computing Group. We are also developing a core set of applications to help you to easily incorporate our services into your research processes and this documentation collection to help you leverage the resources already available. We follow the best practices of the Open Source community and develop the RCS openly. You can follow our progress via the [http://dev.uabgrid.uab.edu our development wiki].<br />
<br />
The Research Computing System is an out growth of the UABgrid pilot, launched in September 2007 which has focused on demonstrating the utility of unlimited analysis, storage, and application for research. RCS is being built on the same technology foundations used by major cloud vendors and decades of distributed systems computing research, technology that powered the last ten years of large scale systems serving prominent national and international initiatives like the [http://opensciencegrid.org/ Open Science Grid], [http://xsede.org XSEDE], [http://www.teragrid.org/ TeraGrid], the [http://lcg.web.cern.ch/LCG/ LHC Computing Grid], and [https://cabig.nci.nih.gov caBIG].<br />
<br />
== Outreach ==<br />
<br />
The UAB IT Research Computing Group has collaborated with a number of prominent research projects at UAB to identify use cases and develop the requirements for the RCS. Our collaborators include the Center for Clinical and Translational Science (CCTS), Heflin Genomics Center, the Comprehensive Cancer Center (CCC), the Department of Computer and Information Sciences (CIS), the Department of Mechanical Engineering (ME), Lister Hill Library, the School of Optometry's Center for the Development of Functional Imaging, and Health System Information Services (HSIS). <br />
<br />
As part of the process of building this research computing platform, the UAB IT Research Computing Group has hosted an annual campus symposium on research computing and cyber-infrastructure (CI) developments and accomplishments. Starting as CyberInfrastructure (CI) Days in 2007, the name was changed to [http://docs.uabgrid.uab.edu/wiki/UAB_Research_Computing_Day '''UAB Research Computing Day'''] in 2011 to reflect the broader mission to support research. IT Research Computing also participates in other campus wide symposiums including UAB Research Core Day.<br />
<br />
== Featured Research Applications ==<br />
<br />
The Research Computing Group also helps support the campus MATLAB license with self-service installation documentation and supports using MATLAB on the HPC platform, providing a pathway to expand your computational power and freeing your laptop from serving as a compute platform.<br />
<br />
{{abox<br />
| UAB MATLAB Information |<br />
In January 2011, UAB acquired a site license from Mathworks for MATLAB, SimuLink and 42 Toolboxes. <br />
* Learn more about [[MATLAB|MATLAB and how you can use it at UAB]]<br />
* Learn more about the [[UAB TAH license|UAB Mathworks Site license]] and review [[Matlab site license FAQ|frequently asked questions about the license]]<br />
}}<br />
<br />
The UAB IT Research Computing group, the CCTS BMI, and [http://www.uab.edu/hcgs/bioinformatics Heflin Center for Genomic Science] have teamed up to help improve genomic research at UAB. Researchers can work with the scientists and research experts to produce a research pipeline from sequencing, to analysis, to publication.<br />
<br />
{{abox<br />
|'''Galaxy'''|<br />
A web front end to run analyses on the cluster fabric. Currently focused on NGS (Next Generation Sequencing; biology) analysis support. <br />
* [[Galaxy|Galaxy Project Home]]<br />
* [http://projects.uabgrid.uab.edu/galaxy Galaxy Development Wiki]<br />
}}<br />
<br />
== Data Backups ==<br />
<br />
Users of Cheaha are solely responsible for backing up their files. This includes files under '''/data/user''', '''/data/project''', and '''/home'''.<br />
<br />
{{ClusterDataBackup}}<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. Any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. We also request that you send us a list of publications based on your use of Cheaha resources.<br />
<br />
=== Description of Cheaha for Grants (short)===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2900 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an EDR InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
==== Cheaha HPC system ====<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (RC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. Research Computing in open collaboration with the campus research community is leading the design and development of these resources.<br />
<br />
==== Compute Resources ====<br />
<br />
Cheaha provides users with both a web based interface and a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. The local compute pool provides access to three generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The second generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs per node, and EDR InfiniBand interconnect. The third generation is composed of 35 nodes: 2x12 core (840 cores total) 2.60GHz Intel Xeon Gold 6126 compute nodes with 21 compute nodes at 192GB RAM, 10 nodes at 768GB RAM and 4 nodes at 1.5TB of RAM. The compute nodes combine to provide over 528 TFLOP/s of dedicated computing power.<br />
<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (OSG).<br />
<br />
==== Storage Resources ====<br />
<br />
The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA14KX hardware connected via an Infiniband fabric. The non-scratch files on the GPFS cluster are replicated to 6.6PB raw storage on a DDN SFA12KX located in another building to provide site redundancy. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
==== Network Resources ====<br />
<br />
The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverages the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone, ensuring predictable latencies. In addition, the network also consist of a secure Science DMZ with data transfer nodes (DTNs), Perfsonar measurement nodes, and a Bro security node connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using 10 Gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
==== Personnel ====<br />
<br />
UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes an HPC Architect-Manager, four Software developers, two Scientists, a system administrator and a project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
{{Grant_Ack}}</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Welcome&diff=6078Welcome2020-05-26T17:36:34Z<p>Mhanby@uab.edu: /* Compute Resources */</p>
<hr />
<div>{{Main_Banner}}<br />
Welcome to the '''Research Computing System'''<br />
<br />
The Research Computing System (RCS) provides a framework for sharing data, accessing compute power, and collaborating with peers on campus and around the globe. Our goal is to construct a dynamic "network of services" that you can use to organize your data, study it, and share outcomes.<br />
<br />
''''docs'''' (the service you are looking at while reading this text) is one of a set of core services, or libraries, available for you to organize information you gather. Docs is a wiki, an online editor to collaboratively write and share documentation. ([http://en.wikipedia.org/wiki/Wiki Wiki is a Hawaiian term] meaning fast.) You can learn more about '''docs''' on the page [[UnderstandingDocs]]. The docs wiki is filled with pages that document the many different services and applications available on the Research Computing System. If you see information that looks out of date please don't hesitate to [mailto:support@vo.uabgrid.uab.edu ask about it] or fix it.<br />
<br />
The Research Computing System is designed to provide services to researchers in three core areas:<br />
<br />
* '''Data Analysis''' - using the High Performance Computing (HPC) fabric we call [[Cheaha]] for analyzing data and running simulations. Many [[Cheaha_Software|applications are already available]] or you can install your own <br />
* '''Data Sharing''' - supporting the trusted exchange of information using virtual data containers to spark new ideas<br />
* '''Application Development''' - providing virtual machines and web-hosted development tools empowering you to serve others with your research<br />
<br />
== Support and Development ==<br />
<br />
The Research Computing System is developed and supported by UAB IT's Research Computing Group. We are also developing a core set of applications to help you to easily incorporate our services into your research processes and this documentation collection to help you leverage the resources already available. We follow the best practices of the Open Source community and develop the RCS openly. You can follow our progress via the [http://dev.uabgrid.uab.edu our development wiki].<br />
<br />
The Research Computing System is an out growth of the UABgrid pilot, launched in September 2007 which has focused on demonstrating the utility of unlimited analysis, storage, and application for research. RCS is being built on the same technology foundations used by major cloud vendors and decades of distributed systems computing research, technology that powered the last ten years of large scale systems serving prominent national and international initiatives like the [http://opensciencegrid.org/ Open Science Grid], [http://xsede.org XSEDE], [http://www.teragrid.org/ TeraGrid], the [http://lcg.web.cern.ch/LCG/ LHC Computing Grid], and [https://cabig.nci.nih.gov caBIG].<br />
<br />
== Outreach ==<br />
<br />
The UAB IT Research Computing Group has collaborated with a number of prominent research projects at UAB to identify use cases and develop the requirements for the RCS. Our collaborators include the Center for Clinical and Translational Science (CCTS), Heflin Genomics Center, the Comprehensive Cancer Center (CCC), the Department of Computer and Information Sciences (CIS), the Department of Mechanical Engineering (ME), Lister Hill Library, the School of Optometry's Center for the Development of Functional Imaging, and Health System Information Services (HSIS). <br />
<br />
As part of the process of building this research computing platform, the UAB IT Research Computing Group has hosted an annual campus symposium on research computing and cyber-infrastructure (CI) developments and accomplishments. Starting as CyberInfrastructure (CI) Days in 2007, the name was changed to [http://docs.uabgrid.uab.edu/wiki/UAB_Research_Computing_Day '''UAB Research Computing Day'''] in 2011 to reflect the broader mission to support research. IT Research Computing also participates in other campus wide symposiums including UAB Research Core Day.<br />
<br />
== Featured Research Applications ==<br />
<br />
The Research Computing Group also helps support the campus MATLAB license with self-service installation documentation and supports using MATLAB on the HPC platform, providing a pathway to expand your computational power and freeing your laptop from serving as a compute platform.<br />
<br />
{{abox<br />
| UAB MATLAB Information |<br />
In January 2011, UAB acquired a site license from Mathworks for MATLAB, SimuLink and 42 Toolboxes. <br />
* Learn more about [[MATLAB|MATLAB and how you can use it at UAB]]<br />
* Learn more about the [[UAB TAH license|UAB Mathworks Site license]] and review [[Matlab site license FAQ|frequently asked questions about the license]]<br />
}}<br />
<br />
The UAB IT Research Computing group, the CCTS BMI, and [http://www.uab.edu/hcgs/bioinformatics Heflin Center for Genomic Science] have teamed up to help improve genomic research at UAB. Researchers can work with the scientists and research experts to produce a research pipeline from sequencing, to analysis, to publication.<br />
<br />
{{abox<br />
|'''Galaxy'''|<br />
A web front end to run analyses on the cluster fabric. Currently focused on NGS (Next Generation Sequencing; biology) analysis support. <br />
* [[Galaxy|Galaxy Project Home]]<br />
* [http://projects.uabgrid.uab.edu/galaxy Galaxy Development Wiki]<br />
}}<br />
<br />
== Data Backups ==<br />
<br />
Users of Cheaha are solely responsible for backing up their files. This includes files under '''/data/user''', '''/data/project''', and '''/home'''.<br />
<br />
{{ClusterDataBackup}}<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. Any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. We also request that you send us a list of publications based on your use of Cheaha resources.<br />
<br />
=== Description of Cheaha for Grants (short)===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2900 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an EDR InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
==== Cheaha HPC system ====<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (RC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. Research Computing in open collaboration with the campus research community is leading the design and development of these resources.<br />
<br />
==== Compute Resources ====<br />
<br />
Cheaha provides users with both a web based interface and a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. The local compute pool provides access to three generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The second generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs per node, and EDR InfiniBand interconnect. The third generation is composed of 35 nodes: 2x12 core (840 cores total) 2.60GHz Intel Xeon Gold 6126 compute nodes with 21 compute nodes at 192GB RAM, 10 nodes at 768GB RAM and 4 nodes at 1.5TB of RAM. The compute nodes combine to provide over 528 TFLOP/s of dedicated computing power.<br />
<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (OSG).<br />
<br />
==== Storage Resources ====<br />
<br />
The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware connected via the Infiniband fabric. An expansion of the GPFS fabric will double the capacity and is scheduled to be on-line Fall 2018. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
==== Network Resources ====<br />
<br />
The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverages the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone, ensuring predictable latencies. In addition, the network also consist of a secure Science DMZ with data transfer nodes (DTNs), Perfsonar measurement nodes, and a Bro security node connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using 10 Gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
==== Personnel ====<br />
<br />
UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes an HPC Architect-Manager, four Software developers, two Scientists, a system administrator and a project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
{{Grant_Ack}}</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Anaconda&diff=6051Anaconda2020-03-30T15:49:15Z<p>Mhanby@uab.edu: /* Creating a Conda virtual environment */</p>
<hr />
<div>[https://conda.io/docs/user-guide/overview.html Conda] is a powerful package manager and environment manager. Conda allows you to maintain distinct environments for your different projects, with dependency packages defined and installed for each project.<br />
<br />
===Creating a Conda virtual environment===<br />
First step, direct conda to store files in $USER_DATA to avoid filling up $HOME. Create the '''$HOME/.condarc''' file by running the following code:<br />
<pre><br />
cat << "EOF" > ~/.condarc<br />
pkgs_dirs:<br />
- $USER_DATA/.conda/pkgs<br />
envs_dirs:<br />
- $USER_DATA/.conda/envs<br />
EOF<br />
</pre><br />
<br />
Load one of the conda environments available on Cheaha (Note, starting with Anaconda 2018.12, Anaconda releases changed to using YYYY.MM format for version numbers):<br />
<pre><br />
$ module -t avail Anaconda<br />
...<br />
Anaconda3/5.3.0<br />
Anaconda3/5.3.1<br />
Anaconda3/2019.10<br />
<br />
</pre><br />
<pre><br />
$ module load Anaconda3/2019.10 <br />
</pre><br />
<br />
Once you have loaded Anaconda, you can create an environment using the following command (change '''test_env''' to whatever you want to name your environment):<br />
<pre><br />
$ conda create --name test_env<br />
<br />
Solving environment: done<br />
<br />
## Package Plan ##<br />
<br />
environment location: /home/ravi89/.conda/envs/test_env<br />
<br />
added / updated specs:<br />
- setuptools<br />
<br />
<br />
The following packages will be downloaded:<br />
<br />
package | build<br />
---------------------------|-----------------<br />
python-3.7.0 | h6e4f718_3 30.6 MB<br />
wheel-0.32.1 | py37_0 35 KB<br />
setuptools-40.4.3 | py37_0 556 KB<br />
------------------------------------------------------------<br />
Total: 31.1 MB<br />
<br />
The following NEW packages will be INSTALLED:<br />
<br />
ca-certificates: 2018.03.07-0<br />
certifi: 2018.8.24-py37_1<br />
libedit: 3.1.20170329-h6b74fdf_2<br />
libffi: 3.2.1-hd88cf55_4<br />
libgcc-ng: 8.2.0-hdf63c60_1<br />
libstdcxx-ng: 8.2.0-hdf63c60_1<br />
ncurses: 6.1-hf484d3e_0<br />
openssl: 1.0.2p-h14c3975_0<br />
pip: 10.0.1-py37_0<br />
python: 3.7.0-h6e4f718_3<br />
readline: 7.0-h7b6447c_5<br />
setuptools: 40.4.3-py37_0<br />
sqlite: 3.25.2-h7b6447c_0<br />
tk: 8.6.8-hbc83047_0<br />
wheel: 0.32.1-py37_0<br />
xz: 5.2.4-h14c3975_4<br />
zlib: 1.2.11-ha838bed_2<br />
<br />
Proceed ([y]/n)? y<br />
<br />
Downloading and Extracting Packages<br />
python-3.7.0 | 30.6 MB | ########################################################################### | 100%<br />
wheel-0.32.1 | 35 KB | ########################################################################### | 100%<br />
setuptools-40.4.3 | 556 KB | ########################################################################### | 100%<br />
Preparing transaction: done<br />
Verifying transaction: done<br />
Executing transaction: done<br />
#<br />
# To activate this environment, use:<br />
# > source activate test_env<br />
#<br />
# To deactivate an active environment, use:<br />
# > source deactivate<br />
#<br />
</pre><br />
<br />
You can also specify the packages that you want to install in the conda virtual environment:<br />
<pre><br />
$ conda create --name test_env PACKAGE_NAME<br />
</pre><br />
<br />
===Listing all your conda virtual environments===<br />
In case you forget the name of your virtual environments, you can list all your virtual environments by running '''conda env list'''<br />
<pre><br />
$ conda env list<br />
# conda environments:<br />
#<br />
jupyter_test /home/ravi89/.conda/envs/jupyter_test<br />
modeller /home/ravi89/.conda/envs/modeller<br />
psypy3 /home/ravi89/.conda/envs/psypy3<br />
test /home/ravi89/.conda/envs/test<br />
test_env /home/ravi89/.conda/envs/test_env<br />
test_pytorch /home/ravi89/.conda/envs/test_pytorch<br />
tomopy /home/ravi89/.conda/envs/tomopy<br />
base * /share/apps/rc/software/Anaconda3/5.2.0<br />
DeepNLP /share/apps/rc/software/Anaconda3/5.2.0/envs/DeepNLP<br />
ubrite-jupyter-base-1.0 /share/apps/rc/software/Anaconda3/5.2.0/envs/ubrite-jupyter-base-1.0<br />
</pre><br />
NOTE: Virtual environment with the asterisk(*) next to it is the one that's currently active.<br />
<br />
===Activating a conda virtual environment===<br />
You can activate your virtual environment for use by running '''source activate''' followed by '''conda activate ENV_NAME'''<br />
<br />
<pre><br />
$ source activate<br />
$ conda activate test_env<br />
(test_env) $<br />
</pre><br />
<br />
NOTE: Your shell prompt would also include the name of the virtual environment that you activated.<br />
<br />
<br />
'''IMPORTANT!'''<br />
<br />
'''source activate <env>''' is not idempotent. Using it twice with the same environment in a given session can lead to unexpected behavior. The recommended workflow is to use '''source activate''' to source the '''conda activate''' script, followed by '''conda activate <env>'''.<br />
<br />
===Locate and install packages===<br />
Conda allows you to search for packages that you want to install:<br />
<pre><br />
(test_env) $ conda search BeautifulSoup4<br />
Loading channels: done<br />
# Name Version Build Channel<br />
beautifulsoup4 4.4.0 py27_0 pkgs/free<br />
beautifulsoup4 4.4.0 py34_0 pkgs/free<br />
beautifulsoup4 4.4.0 py35_0 pkgs/free<br />
...<br />
beautifulsoup4 4.6.3 py35_0 pkgs/main<br />
beautifulsoup4 4.6.3 py36_0 pkgs/main<br />
beautifulsoup4 4.6.3 py37_0 pkgs/main<br />
(test_env) $<br />
</pre><br />
NOTE: Search is case-insensitive<br />
<br />
You can install the packages in conda environment using<br />
<pre><br />
(test_env) $ conda install beautifulsoup4<br />
Solving environment: done<br />
<br />
## Package Plan ##<br />
<br />
environment location: /home/ravi89/.conda/envs/test_env<br />
<br />
added / updated specs:<br />
- beautifulsoup4<br />
<br />
<br />
The following packages will be downloaded:<br />
<br />
package | build<br />
---------------------------|-----------------<br />
beautifulsoup4-4.6.3 | py37_0 138 KB<br />
<br />
The following NEW packages will be INSTALLED:<br />
<br />
beautifulsoup4: 4.6.3-py37_0<br />
<br />
Proceed ([y]/n)? y<br />
<br />
<br />
Downloading and Extracting Packages<br />
beautifulsoup4-4.6.3 | 138 KB | ########################################################################### | 100%<br />
Preparing transaction: done<br />
Verifying transaction: done<br />
Executing transaction: done<br />
(test_env) $<br />
</pre><br />
<br />
===Deactivating your virtual environment===<br />
You can deactivate your virtual environment using '''source deactivate'''<br />
<pre><br />
(test_env) $ source deactivate<br />
$<br />
</pre><br />
<br />
===Sharing an environment===<br />
You may want to share your environment with someone for testing or other purposes. Sharing the environemnt file for your virtual environment is the most starightforward metohd which allows other person to quickly create an environment identical to you.<br />
====Export environment====<br />
* Activate the virtual environment that you want to export.<br />
* Export an environment.yml file<br />
<pre><br />
conda env export -n test_env > environment.yml<br />
</pre><br />
* Now you can send the recently created environment.yml file to the other person.<br />
<br />
====Create a virtual environment using environment.yml====<br />
<pre><br />
conda env create -f environment.yml -n test_env<br />
</pre><br />
<br />
===Delete a conda virtual environment===<br />
You can use the '''remove''' parameter of conda to delete a conda virtual environment that you don't need:<br />
<pre><br />
$ conda remove --name test_env --all<br />
<br />
Remove all packages in environment /home/ravi89/.conda/envs/test_env:<br />
<br />
<br />
## Package Plan ##<br />
<br />
environment location: /home/ravi89/.conda/envs/test_env<br />
<br />
<br />
The following packages will be REMOVED:<br />
<br />
beautifulsoup4: 4.6.3-py37_0<br />
ca-certificates: 2018.03.07-0<br />
certifi: 2018.8.24-py37_1<br />
libedit: 3.1.20170329-h6b74fdf_2<br />
libffi: 3.2.1-hd88cf55_4<br />
libgcc-ng: 8.2.0-hdf63c60_1<br />
libstdcxx-ng: 8.2.0-hdf63c60_1<br />
ncurses: 6.1-hf484d3e_0<br />
openssl: 1.0.2p-h14c3975_0<br />
pip: 10.0.1-py37_0<br />
python: 3.7.0-h6e4f718_3<br />
readline: 7.0-h7b6447c_5<br />
setuptools: 40.4.3-py37_0<br />
sqlite: 3.25.2-h7b6447c_0<br />
tk: 8.6.8-hbc83047_0<br />
wheel: 0.32.1-py37_0<br />
xz: 5.2.4-h14c3975_4<br />
zlib: 1.2.11-ha838bed_2<br />
<br />
Proceed ([y]/n)? y<br />
</pre><br />
<br />
===Moving conda directory===<br />
As you build new conda environments, you may find that it is taking a lot of space in your $HOME directory. Here are 2 methods:<br />
<br />
Method 1: Move a pre-existing conda directory and create a symlink<br />
<pre><br />
cd ~<br />
mv ~/.conda $USER_DATA/<br />
ln -s $USER_DATA/.conda<br />
</pre><br />
<br />
Method 2: Create a "$HOME/.condarc" file in the $HOME directory by running the following code<br />
<pre><br />
cat << "EOF" > ~/.condarc<br />
pkgs_dirs:<br />
- $USER_DATA/.conda/pkgs<br />
envs_dirs:<br />
- $USER_DATA/.conda/envs<br />
EOF<br />
</pre></div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Anaconda&diff=6039Anaconda2020-03-18T15:28:37Z<p>Mhanby@uab.edu: </p>
<hr />
<div>[https://conda.io/docs/user-guide/overview.html Conda] is a powerful package manager and environment manager. Conda allows you to maintain distinct environments for your different projects, with dependency packages defined and installed for each project.<br />
<br />
===Creating a Conda virtual environment===<br />
First step, direct conda to store files in $USER_DATA to avoid filling up $HOME. Create the '''$HOME/.condarc''' file by running the following code:<br />
<pre><br />
cat << "EOF" > ~/.condarc<br />
pkgs_dirs:<br />
- $USER_DATA/.conda/pkgs<br />
envs_dirs:<br />
- $USER_DATA/.conda/envs<br />
EOF<br />
</pre><br />
<br />
Load one of the conda environments available on cheaha:<br />
<pre><br />
$ module avail Anaconda<br />
<br />
--------------------------------------------- /share/apps/rc/modules/all ---------------------------------------------<br />
Anaconda2/4.0.0 Anaconda2/4.2.0 Anaconda3/4.4.0 Anaconda3/5.0.1 Anaconda3/5.1.0 Anaconda3/5.2.0<br />
</pre><br />
<pre><br />
$ module load Anaconda3/5.3.1 <br />
</pre><br />
Once you have loaded Anaconda, you can create an environment using the following command:<br />
<pre><br />
$ conda create --name test_env<br />
Solving environment: done<br />
<br />
## Package Plan ##<br />
<br />
environment location: /home/ravi89/.conda/envs/test_env<br />
<br />
added / updated specs:<br />
- setuptools<br />
<br />
<br />
The following packages will be downloaded:<br />
<br />
package | build<br />
---------------------------|-----------------<br />
python-3.7.0 | h6e4f718_3 30.6 MB<br />
wheel-0.32.1 | py37_0 35 KB<br />
setuptools-40.4.3 | py37_0 556 KB<br />
------------------------------------------------------------<br />
Total: 31.1 MB<br />
<br />
The following NEW packages will be INSTALLED:<br />
<br />
ca-certificates: 2018.03.07-0<br />
certifi: 2018.8.24-py37_1<br />
libedit: 3.1.20170329-h6b74fdf_2<br />
libffi: 3.2.1-hd88cf55_4<br />
libgcc-ng: 8.2.0-hdf63c60_1<br />
libstdcxx-ng: 8.2.0-hdf63c60_1<br />
ncurses: 6.1-hf484d3e_0<br />
openssl: 1.0.2p-h14c3975_0<br />
pip: 10.0.1-py37_0<br />
python: 3.7.0-h6e4f718_3<br />
readline: 7.0-h7b6447c_5<br />
setuptools: 40.4.3-py37_0<br />
sqlite: 3.25.2-h7b6447c_0<br />
tk: 8.6.8-hbc83047_0<br />
wheel: 0.32.1-py37_0<br />
xz: 5.2.4-h14c3975_4<br />
zlib: 1.2.11-ha838bed_2<br />
<br />
Proceed ([y]/n)? y<br />
<br />
Downloading and Extracting Packages<br />
python-3.7.0 | 30.6 MB | ########################################################################### | 100%<br />
wheel-0.32.1 | 35 KB | ########################################################################### | 100%<br />
setuptools-40.4.3 | 556 KB | ########################################################################### | 100%<br />
Preparing transaction: done<br />
Verifying transaction: done<br />
Executing transaction: done<br />
#<br />
# To activate this environment, use:<br />
# > source activate test_env<br />
#<br />
# To deactivate an active environment, use:<br />
# > source deactivate<br />
#<br />
</pre><br />
<br />
You can also specify the packages that you want to install in the conda virtual environment:<br />
<pre><br />
$ conda create --name test_env PACKAGE_NAME<br />
</pre><br />
<br />
===Listing all your conda virtual environments===<br />
In case you forget the name of your virtual environments, you can list all your virtual environments by running '''conda env list'''<br />
<pre><br />
$ conda env list<br />
# conda environments:<br />
#<br />
jupyter_test /home/ravi89/.conda/envs/jupyter_test<br />
modeller /home/ravi89/.conda/envs/modeller<br />
psypy3 /home/ravi89/.conda/envs/psypy3<br />
test /home/ravi89/.conda/envs/test<br />
test_env /home/ravi89/.conda/envs/test_env<br />
test_pytorch /home/ravi89/.conda/envs/test_pytorch<br />
tomopy /home/ravi89/.conda/envs/tomopy<br />
base * /share/apps/rc/software/Anaconda3/5.2.0<br />
DeepNLP /share/apps/rc/software/Anaconda3/5.2.0/envs/DeepNLP<br />
ubrite-jupyter-base-1.0 /share/apps/rc/software/Anaconda3/5.2.0/envs/ubrite-jupyter-base-1.0<br />
</pre><br />
NOTE: Virtual environment with the asterisk(*) next to it is the one that's currently active.<br />
<br />
===Activating a conda virtual environment===<br />
You can activate your virtual environment for use by running '''source activate ENV_NAME'''<br />
<pre><br />
$ source activate test_env<br />
(test_env) $<br />
</pre><br />
NOTE: Your shell prompt would also include the name of the virtual environment that you activated.<br />
<br />
===Locate and install packages===<br />
Conda allows you to search for packages that you want to install:<br />
<pre><br />
(test_env) $ conda search BeautifulSoup4<br />
Loading channels: done<br />
# Name Version Build Channel<br />
beautifulsoup4 4.4.0 py27_0 pkgs/free<br />
beautifulsoup4 4.4.0 py34_0 pkgs/free<br />
beautifulsoup4 4.4.0 py35_0 pkgs/free<br />
...<br />
beautifulsoup4 4.6.3 py35_0 pkgs/main<br />
beautifulsoup4 4.6.3 py36_0 pkgs/main<br />
beautifulsoup4 4.6.3 py37_0 pkgs/main<br />
(test_env) $<br />
</pre><br />
NOTE: Search is case-insensitive<br />
<br />
You can install the packages in conda environment using<br />
<pre><br />
(test_env) $ conda install beautifulsoup4<br />
Solving environment: done<br />
<br />
## Package Plan ##<br />
<br />
environment location: /home/ravi89/.conda/envs/test_env<br />
<br />
added / updated specs:<br />
- beautifulsoup4<br />
<br />
<br />
The following packages will be downloaded:<br />
<br />
package | build<br />
---------------------------|-----------------<br />
beautifulsoup4-4.6.3 | py37_0 138 KB<br />
<br />
The following NEW packages will be INSTALLED:<br />
<br />
beautifulsoup4: 4.6.3-py37_0<br />
<br />
Proceed ([y]/n)? y<br />
<br />
<br />
Downloading and Extracting Packages<br />
beautifulsoup4-4.6.3 | 138 KB | ########################################################################### | 100%<br />
Preparing transaction: done<br />
Verifying transaction: done<br />
Executing transaction: done<br />
(test_env) $<br />
</pre><br />
<br />
===Deactivating your virtual environment===<br />
You can deactivate your virtual environment using '''source deactivate'''<br />
<pre><br />
(test_env) $ source deactivate<br />
$<br />
</pre><br />
<br />
===Sharing an environment===<br />
You may want to share your environment with someone for testing or other purposes. Sharing the environemnt file for your virtual environment is the most starightforward metohd which allows other person to quickly create an environment identical to you.<br />
====Export environment====<br />
* Activate the virtual environment that you want to export.<br />
* Export an environment.yml file<br />
<pre><br />
conda env export -n test_env > environment.yml<br />
</pre><br />
* Now you can send the recently created environment.yml file to the other person.<br />
<br />
====Create a virtual environment using environment.yml====<br />
<pre><br />
conda env create -f environment.yml -n test_env<br />
</pre><br />
<br />
===Delete a conda virtual environment===<br />
You can use the '''remove''' parameter of conda to delete a conda virtual environment that you don't need:<br />
<pre><br />
$ conda remove --name test_env --all<br />
<br />
Remove all packages in environment /home/ravi89/.conda/envs/test_env:<br />
<br />
<br />
## Package Plan ##<br />
<br />
environment location: /home/ravi89/.conda/envs/test_env<br />
<br />
<br />
The following packages will be REMOVED:<br />
<br />
beautifulsoup4: 4.6.3-py37_0<br />
ca-certificates: 2018.03.07-0<br />
certifi: 2018.8.24-py37_1<br />
libedit: 3.1.20170329-h6b74fdf_2<br />
libffi: 3.2.1-hd88cf55_4<br />
libgcc-ng: 8.2.0-hdf63c60_1<br />
libstdcxx-ng: 8.2.0-hdf63c60_1<br />
ncurses: 6.1-hf484d3e_0<br />
openssl: 1.0.2p-h14c3975_0<br />
pip: 10.0.1-py37_0<br />
python: 3.7.0-h6e4f718_3<br />
readline: 7.0-h7b6447c_5<br />
setuptools: 40.4.3-py37_0<br />
sqlite: 3.25.2-h7b6447c_0<br />
tk: 8.6.8-hbc83047_0<br />
wheel: 0.32.1-py37_0<br />
xz: 5.2.4-h14c3975_4<br />
zlib: 1.2.11-ha838bed_2<br />
<br />
Proceed ([y]/n)? y<br />
</pre><br />
<br />
===Moving conda directory===<br />
As you build new conda environments, you may find that it is taking a lot of space in your $HOME directory. Here are 2 methods:<br />
<br />
Method 1: Move a pre-existing conda directory and create a symlink<br />
<pre><br />
cd ~<br />
mv ~/.conda $USER_DATA/<br />
ln -s $USER_DATA/.conda<br />
</pre><br />
<br />
Method 2: Create a "$HOME/.condarc" file in the $HOME directory by running the following code<br />
<pre><br />
cat << "EOF" > ~/.condarc<br />
pkgs_dirs:<br />
- $USER_DATA/.conda/pkgs<br />
envs_dirs:<br />
- $USER_DATA/.conda/envs<br />
EOF<br />
</pre></div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Anaconda&diff=6038Anaconda2020-03-18T15:07:31Z<p>Mhanby@uab.edu: /* Moving conda directory */</p>
<hr />
<div>[https://conda.io/docs/user-guide/overview.html Conda] is a powerful package manager and environment manager. Conda allows you to maintain distinct environments for your different projects, with dependency packages defined and installed for each project.<br />
<br />
===Creating a Conda virtual environment===<br />
Load one of the conda environments available on cheaha:<br />
<pre><br />
ravi89 @ c0066 ➜ ~ module avail Anaconda<br />
<br />
--------------------------------------------- /share/apps/rc/modules/all ---------------------------------------------<br />
Anaconda2/4.0.0 Anaconda2/4.2.0 Anaconda3/4.4.0 Anaconda3/5.0.1 Anaconda3/5.1.0 Anaconda3/5.2.0<br />
</pre><br />
<pre><br />
bkomal96 @ c0003 ➜ ~module load Anaconda3/5.3.1 <br />
</pre><br />
Once you have loaded Anaconda, you can create an environment using the following command:<br />
<pre><br />
ravi89 @ c0066 ➜ ~ conda create --name test_env<br />
Solving environment: done<br />
<br />
## Package Plan ##<br />
<br />
environment location: /home/ravi89/.conda/envs/test_env<br />
<br />
added / updated specs:<br />
- setuptools<br />
<br />
<br />
The following packages will be downloaded:<br />
<br />
package | build<br />
---------------------------|-----------------<br />
python-3.7.0 | h6e4f718_3 30.6 MB<br />
wheel-0.32.1 | py37_0 35 KB<br />
setuptools-40.4.3 | py37_0 556 KB<br />
------------------------------------------------------------<br />
Total: 31.1 MB<br />
<br />
The following NEW packages will be INSTALLED:<br />
<br />
ca-certificates: 2018.03.07-0<br />
certifi: 2018.8.24-py37_1<br />
libedit: 3.1.20170329-h6b74fdf_2<br />
libffi: 3.2.1-hd88cf55_4<br />
libgcc-ng: 8.2.0-hdf63c60_1<br />
libstdcxx-ng: 8.2.0-hdf63c60_1<br />
ncurses: 6.1-hf484d3e_0<br />
openssl: 1.0.2p-h14c3975_0<br />
pip: 10.0.1-py37_0<br />
python: 3.7.0-h6e4f718_3<br />
readline: 7.0-h7b6447c_5<br />
setuptools: 40.4.3-py37_0<br />
sqlite: 3.25.2-h7b6447c_0<br />
tk: 8.6.8-hbc83047_0<br />
wheel: 0.32.1-py37_0<br />
xz: 5.2.4-h14c3975_4<br />
zlib: 1.2.11-ha838bed_2<br />
<br />
Proceed ([y]/n)? y<br />
<br />
<br />
Downloading and Extracting Packages<br />
python-3.7.0 | 30.6 MB | ########################################################################### | 100%<br />
wheel-0.32.1 | 35 KB | ########################################################################### | 100%<br />
setuptools-40.4.3 | 556 KB | ########################################################################### | 100%<br />
Preparing transaction: done<br />
Verifying transaction: done<br />
Executing transaction: done<br />
#<br />
# To activate this environment, use:<br />
# > source activate test_env<br />
#<br />
# To deactivate an active environment, use:<br />
# > source deactivate<br />
#<br />
</pre><br />
<br />
You can also specify the packages that you want to install in the conda virtual environment:<br />
<pre><br />
ravi89 @ c0066 ➜ ~ conda create --name test_env PACKAGE_NAME<br />
</pre><br />
<br />
===Listing all your conda virtual environments===<br />
In case you forget the name of your virtual environments, you can list all your virtual environments by running '''conda env list'''<br />
<pre><br />
ravi89 @ c0066 ➜ ~ conda env list<br />
# conda environments:<br />
#<br />
jupyter_test /home/ravi89/.conda/envs/jupyter_test<br />
modeller /home/ravi89/.conda/envs/modeller<br />
psypy3 /home/ravi89/.conda/envs/psypy3<br />
test /home/ravi89/.conda/envs/test<br />
test_env /home/ravi89/.conda/envs/test_env<br />
test_pytorch /home/ravi89/.conda/envs/test_pytorch<br />
tomopy /home/ravi89/.conda/envs/tomopy<br />
base * /share/apps/rc/software/Anaconda3/5.2.0<br />
DeepNLP /share/apps/rc/software/Anaconda3/5.2.0/envs/DeepNLP<br />
ubrite-jupyter-base-1.0 /share/apps/rc/software/Anaconda3/5.2.0/envs/ubrite-jupyter-base-1.0<br />
<br />
ravi89 @ c0066 ➜ ~<br />
</pre><br />
NOTE: Virtual environment with the asterisk(*) next to it is the one that's currently active.<br />
<br />
===Activating a conda virtual environment===<br />
You can activate your virtual environment for use by running '''source activate ENV_NAME'''<br />
<pre><br />
ravi89 @ c0066 ➜ ~ source activate test_env<br />
(test_env) ravi89 @ c0066 ➜ ~<br />
</pre><br />
NOTE: Your shell prompt would also include the name of the virtual environment that you activated.<br />
<br />
===Locate and install packages===<br />
Conda allows you to search for packages that you want to install:<br />
<pre><br />
(test_env) ravi89 @ c0066 ➜ ~ conda search BeautifulSoup4<br />
Loading channels: done<br />
# Name Version Build Channel<br />
beautifulsoup4 4.4.0 py27_0 pkgs/free<br />
beautifulsoup4 4.4.0 py34_0 pkgs/free<br />
beautifulsoup4 4.4.0 py35_0 pkgs/free<br />
beautifulsoup4 4.4.1 py27_0 pkgs/free<br />
beautifulsoup4 4.4.1 py34_0 pkgs/free<br />
beautifulsoup4 4.4.1 py35_0 pkgs/free<br />
beautifulsoup4 4.5.1 py27_0 pkgs/free<br />
beautifulsoup4 4.5.1 py34_0 pkgs/free<br />
beautifulsoup4 4.5.1 py35_0 pkgs/free<br />
beautifulsoup4 4.5.1 py36_0 pkgs/free<br />
beautifulsoup4 4.5.3 py27_0 pkgs/free<br />
beautifulsoup4 4.5.3 py34_0 pkgs/free<br />
beautifulsoup4 4.5.3 py35_0 pkgs/free<br />
beautifulsoup4 4.5.3 py36_0 pkgs/free<br />
beautifulsoup4 4.6.0 py27_0 pkgs/free<br />
beautifulsoup4 4.6.0 py27_1 pkgs/main<br />
beautifulsoup4 4.6.0 py27h3f86ba9_1 pkgs/main<br />
beautifulsoup4 4.6.0 py34_0 pkgs/free<br />
beautifulsoup4 4.6.0 py35_0 pkgs/free<br />
beautifulsoup4 4.6.0 py35h442a8c9_1 pkgs/main<br />
beautifulsoup4 4.6.0 py36_0 pkgs/free<br />
beautifulsoup4 4.6.0 py36_1 pkgs/main<br />
beautifulsoup4 4.6.0 py36h49b8c8c_1 pkgs/main<br />
beautifulsoup4 4.6.0 py37_1 pkgs/main<br />
beautifulsoup4 4.6.1 py27_0 pkgs/main<br />
beautifulsoup4 4.6.1 py35_0 pkgs/main<br />
beautifulsoup4 4.6.1 py36_0 pkgs/main<br />
beautifulsoup4 4.6.1 py37_0 pkgs/main<br />
beautifulsoup4 4.6.3 py27_0 pkgs/main<br />
beautifulsoup4 4.6.3 py35_0 pkgs/main<br />
beautifulsoup4 4.6.3 py36_0 pkgs/main<br />
beautifulsoup4 4.6.3 py37_0 pkgs/main<br />
(test_env) ravi89 @ c0066 ➜ ~<br />
</pre><br />
NOTE: Search is case-insensitive<br />
<br />
You can install the packages in conda environment using<br />
<pre><br />
(test_env) ravi89 @ c0066 ➜ ~ conda install beautifulsoup4<br />
Solving environment: done<br />
<br />
## Package Plan ##<br />
<br />
environment location: /home/ravi89/.conda/envs/test_env<br />
<br />
added / updated specs:<br />
- beautifulsoup4<br />
<br />
<br />
The following packages will be downloaded:<br />
<br />
package | build<br />
---------------------------|-----------------<br />
beautifulsoup4-4.6.3 | py37_0 138 KB<br />
<br />
The following NEW packages will be INSTALLED:<br />
<br />
beautifulsoup4: 4.6.3-py37_0<br />
<br />
Proceed ([y]/n)? y<br />
<br />
<br />
Downloading and Extracting Packages<br />
beautifulsoup4-4.6.3 | 138 KB | ########################################################################### | 100%<br />
Preparing transaction: done<br />
Verifying transaction: done<br />
Executing transaction: done<br />
(test_env) ravi89 @ c0066 ➜ ~<br />
</pre><br />
<br />
===Deactivating your virtual environment===<br />
You can deactivate your virtual environment using '''source deactivate'''<br />
<pre><br />
(test_env) ravi89 @ c0066 ➜ ~ source deactivate<br />
ravi89 @ c0066 ➜ ~<br />
</pre><br />
<br />
===Sharing an environment===<br />
You may want to share your environment with someone for testing or other purposes. Sharing the environemnt file for your virtual environment is the most starightforward metohd which allows other person to quickly create an environment identical to you.<br />
====Export environment====<br />
* Activate the virtual environment that you want to export.<br />
* Export an environment.yml file<br />
<pre><br />
conda env export -n test_env > environment.yml<br />
</pre><br />
* Now you can send the recently created environment.yml file to the other person.<br />
<br />
====Create a virtual environment using environment.yml====<br />
<pre><br />
conda env create -f environment.yml -n test_env<br />
</pre><br />
<br />
===Delete a conda virtual environment===<br />
You can use remove parameter of conda to delete a conda virtual environment that you don't need:<br />
<pre><br />
ravi89 @ c0066 ➜ ~ conda remove --name test_env --all<br />
<br />
Remove all packages in environment /home/ravi89/.conda/envs/test_env:<br />
<br />
<br />
## Package Plan ##<br />
<br />
environment location: /home/ravi89/.conda/envs/test_env<br />
<br />
<br />
The following packages will be REMOVED:<br />
<br />
beautifulsoup4: 4.6.3-py37_0<br />
ca-certificates: 2018.03.07-0<br />
certifi: 2018.8.24-py37_1<br />
libedit: 3.1.20170329-h6b74fdf_2<br />
libffi: 3.2.1-hd88cf55_4<br />
libgcc-ng: 8.2.0-hdf63c60_1<br />
libstdcxx-ng: 8.2.0-hdf63c60_1<br />
ncurses: 6.1-hf484d3e_0<br />
openssl: 1.0.2p-h14c3975_0<br />
pip: 10.0.1-py37_0<br />
python: 3.7.0-h6e4f718_3<br />
readline: 7.0-h7b6447c_5<br />
setuptools: 40.4.3-py37_0<br />
sqlite: 3.25.2-h7b6447c_0<br />
tk: 8.6.8-hbc83047_0<br />
wheel: 0.32.1-py37_0<br />
xz: 5.2.4-h14c3975_4<br />
zlib: 1.2.11-ha838bed_2<br />
<br />
Proceed ([y]/n)? y<br />
<br />
ravi89 @ c0066 ➜ ~<br />
</pre><br />
<br />
===Moving conda directory===<br />
As you build new conda environments, you may find that it is taking a lot of space in your $HOME directory. Here are 2 methods:<br />
<br />
Method 1: Move a pre-existing conda directory and create a symlink<br />
<pre><br />
cd ~<br />
mv ~/.conda $USER_DATA/<br />
ln -s $USER_DATA/.conda<br />
</pre><br />
<br />
Method 2: Create a ".condarc" file in the $HOME directory containing the following<br />
<pre><br />
pkgs_dirs:<br />
- $USER_DATA/.conda/pkgs<br />
envs_dirs:<br />
- $USER_DATA/.conda/envs<br />
</pre></div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Collaborator_Account&diff=6033Collaborator Account2020-02-27T16:34:21Z<p>Mhanby@uab.edu: </p>
<hr />
<div>This page describes the process for a UAB employee to request for a Cheaha account for an external collaborator (i.e. a person who does not have a UAB BlazerID).<br />
<br />
==Create XIAS Account==<br />
XIAS Accounts, or external access account. allows UAB employees to sponsor external collaborators to utilize some of the UAB resources for which the user has been granted access. Creating XIAS account is a self-service interface which allows you to sponsor and create an account for your collaborator at [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website].<br />
<br />
For additional information, see the [https://apps.idm.uab.edu/xias/top XIAS help page].<br />
<br />
'''Going through the sponsorship process, you are stating that you know the individual(s) and are responsible for their actions while they are using the XIAS accounts.'''<br />
<br />
===Create a site===<br />
The [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website] has two options on the left-hand panel: [https://idm.uab.edu/cgi-cas/xrmi/sites Manage Projects/Sites] and [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.]<br />
<br />
* Choose Manage '''Projects/sites'''<br />
*Click '''New''' to create a new site<br />
* Fill in all the information i.e. Short Description, Long Description, Start date and End date <br />
** '''End date''' is the expiration date for the site, the users added in the next section cannot have an expiration date beyond the sites '''End date'''. The dates should be in the format '''YYYY-MM-DD'''<br />
** URIs are the resources that the sponsored users should have access to. If the resources are applications or servers then the manager of that resource must do what is necessary within that resource to authorize the external users to gain access. Add the following for Cheaha access:<br />
*** '''VPN.DPO.UAB.EDU'''<br />
*** '''rc.uab.edu'''<br />
* Click on '''Add''' button to create the site<br />
<br />
=== Create a user ===<br />
Once the new site has been created:<br />
<br />
* Click [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.] from the left hand panel<br />
* In the drop-down select your XIAS site<br />
* Click the '''Register''' button to add new users<br />
** Enter an end date for the new site user(s) in the format '''YYYY-MM-DD'''. The date cannot extend past the site's end date!<br />
** Enter the collaborator's email address in the box under the end date. You can add multiple users by putting each on a separate line.<br />
<br />
=== Collaborator ===<br />
Inform the collaborator(s) to expect an email from '''UAB Identity Management''' [[userservices@uab.edu]] containing instructions to complete their registration.<br />
<br />
They will need to complete the process within '''72 hours''' of receipt of the email!<br />
<br />
==Request an account on Cheaha==<br />
Once the steps above have been completed of adding/sponsoring XIAS account for your collaborator, send us an email on support@listserv.uab.edu with information about the collaborator. Please don't forget to include their PrimaryID and email address which you used to create their XIAS account, as it would become their Username on [[Cheaha]].</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Collaborator_Account&diff=6032Collaborator Account2020-02-27T16:28:52Z<p>Mhanby@uab.edu: /* Collaborator */</p>
<hr />
<div>This Page lays down the step for you as a UAB employee to request for a cheaha account for your collaborator.<br />
<br />
==Create XIAS Account==<br />
XIAS Accounts, or external access account. allows UAB employees to sponsor external collaborators to utilize some of the UAB resources for which the user has been granted access. Creating XIAS account is a self-service interface which allows you to sponsor and create an account for your collaborator at [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website].<br />
<br />
For additional information, see the [https://apps.idm.uab.edu/xias/top XIAS help page].<br />
<br />
'''Going through the sponsorship process, you are stating that you know the individual(s) and are responsible for their actions while they are using the XIAS accounts.'''<br />
<br />
===Create a site===<br />
The [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website] has two options on the left-hand panel: [https://idm.uab.edu/cgi-cas/xrmi/sites Manage Projects/Sites] and [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.]<br />
<br />
* Choose Manage '''Projects/sites'''<br />
*Click '''New''' to create a new site<br />
* Fill in all the information i.e. Short Description, Long Description, Start date and End date <br />
** '''End date''' is the expiration date for the site, the users added in the next section cannot have an expiration date beyond the sites '''End date'''. The dates should be in the format '''YYYY-MM-DD'''<br />
** URIs are the resources that the sponsored users should have access to. If the resources are applications or servers then the manager of that resource must do what is necessary within that resource to authorize the external users to gain access. Add the following for Cheaha access:<br />
*** '''VPN.DPO.UAB.EDU'''<br />
*** '''rc.uab.edu'''<br />
* Click on '''Add''' button to create the site<br />
<br />
=== Create a user===<br />
Once the new site has been created:<br />
<br />
* Click [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.] from the left hand panel<br />
* In the drop-down select your XIAS site<br />
* Click the '''Register''' button to add new users<br />
** Enter an end date for the new site user(s) in the format '''YYYY-MM-DD'''. The date cannot extend past the site's end date!<br />
** Enter the collaborator's email address in the box under the end date. You can add multiple users by putting each on a separate line.<br />
<br />
=== Collaborator===<br />
Inform the collaborator(s) to expect an email from '''UAB Identity Management''' [[userservices@uab.edu]] containing instructions to complete their registration. The email will contain a registration code that will '''expire in 72 hours'''!<br />
<br />
'''NOTE:''' It can take a few hours for the the email notification to be sent to the user.<br />
<br />
==Request an account on cheaha==<br />
Once you have completed the steps of adding/sponsoring XIAS account for your collaborator, send us an email on support@listserv.uab.edu with information about the collaborator. Please don't forget to include their PrimaryID and email address which you used to create their XIAS account, as it would become their Username on [[Cheaha]].</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Collaborator_Account&diff=6031Collaborator Account2020-02-27T16:27:52Z<p>Mhanby@uab.edu: /* Create XIAS Account */</p>
<hr />
<div>This Page lays down the step for you as a UAB employee to request for a cheaha account for your collaborator.<br />
<br />
==Create XIAS Account==<br />
XIAS Accounts, or external access account. allows UAB employees to sponsor external collaborators to utilize some of the UAB resources for which the user has been granted access. Creating XIAS account is a self-service interface which allows you to sponsor and create an account for your collaborator at [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website].<br />
<br />
For additional information, see the [https://apps.idm.uab.edu/xias/top XIAS help page].<br />
<br />
'''Going through the sponsorship process, you are stating that you know the individual(s) and are responsible for their actions while they are using the XIAS accounts.'''<br />
<br />
===Create a site===<br />
The [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website] has two options on the left-hand panel: [https://idm.uab.edu/cgi-cas/xrmi/sites Manage Projects/Sites] and [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.]<br />
<br />
* Choose Manage '''Projects/sites'''<br />
*Click '''New''' to create a new site<br />
* Fill in all the information i.e. Short Description, Long Description, Start date and End date <br />
** '''End date''' is the expiration date for the site, the users added in the next section cannot have an expiration date beyond the sites '''End date'''. The dates should be in the format '''YYYY-MM-DD'''<br />
** URIs are the resources that the sponsored users should have access to. If the resources are applications or servers then the manager of that resource must do what is necessary within that resource to authorize the external users to gain access. Add the following for Cheaha access:<br />
*** '''VPN.DPO.UAB.EDU'''<br />
*** '''rc.uab.edu'''<br />
* Click on '''Add''' button to create the site<br />
<br />
=== Create a user===<br />
Once the new site has been created:<br />
<br />
* Click [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.] from the left hand panel<br />
* In the drop-down select your XIAS site<br />
* Click the '''Register''' button to add new users<br />
** Enter an end date for the new site user(s) in the format '''YYYY-MM-DD'''. The date cannot extend past the site's end date!<br />
** Enter the collaborator's email address in the box under the end date. You can add multiple users by putting each on a separate line.<br />
<br />
=== Collaborator===<br />
Inform the collaborator(s) to expect an email from '''UAB Identity Management''' <userservices@uab.edu> containing instructions to complete their registration. The email will contain a registration code that will '''expire in 72 hours'''!<br />
<br />
'''NOTE:''' It can take a few hours for the the email notification to be sent to the user.<br />
<br />
==Request an account on cheaha==<br />
Once you have completed the steps of adding/sponsoring XIAS account for your collaborator, send us an email on support@listserv.uab.edu with information about the collaborator. Please don't forget to include their PrimaryID and email address which you used to create their XIAS account, as it would become their Username on [[Cheaha]].</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Collaborator_Account&diff=6028Collaborator Account2020-02-12T18:16:19Z<p>Mhanby@uab.edu: /* Create a site */</p>
<hr />
<div>This Page lays down the step for you as a UAB employee to request for a cheaha account for your collaborator.<br />
<br />
==Create XIAS Account==<br />
XIAS Accounts, or external access account. allows UAB employees to sponsor external collaborators to utilize some of the UAB resources for which the user has been granted access. Creating XIAS account is a self-service interface which allows you to sponsor and create an account for your collaborator at [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website].<br />
<br />
For additional information, see the [https://apps.idm.uab.edu/xias/top XIAS help page].<br />
<br />
'''When you go through the sponsorship process you are stating that you know the individual(s) and are responsible for their actions while they are using the XIAS accounts.'''<br />
<br />
===Create a site===<br />
When you go to the [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website], you'll notice two options on the left-hand panel: [https://idm.uab.edu/cgi-cas/xrmi/sites Manage Projects/Sites] and [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.]<br />
<br />
* Choose Manage Projects/sites.<br />
* Over there click on '''New''' to create a new site.<br />
* Fill in all the information i.e. Short Description, Long Description, Start date and End date. <br />
** Remember that your users cannot have '''End date''' beyond your sites '''End date'''.<br />
** Note that the start and end dates should be in the format '''YYYY-MM-DD'''<br />
** URIs are the resources that the sponsored users should have access to. If the resources are applications or servers then the manager of that resource must do what is necessary within that resource to authorize the external users to gain access.<br />
* In the URIs section: fill out '''VPN.DPO.UAB.EDU''', '''rc.uab.edu''' and '''cheaha.rc.uab.edu''' <br />
* Click on '''Add''' button to create the site.<br />
<br />
=== Create a user===<br />
<br />
* Now choose [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.] from the left hand panel.<br />
* In the drop-down select your XIAS site.<br />
* To add new users click the '''Register''' button. To review the users already there and change their end date click the '''List''' button.<br />
* To register new user(s) enter an end date for that user’s access. <br />
** The date must be on or before the end date for the site and in the format YYYY-MM-DD<br />
* Enter the email addresses of the user(s) (your collaborator's email) in the box under the end date. You can add multiple users by putting each on a separate line.<br />
<br />
=== Collaborator===<br />
Once you have gone through the above steps, your collaborator should receive an automated email from XIAS with a code that they can use to complete their registration.<br />
<br />
'''NOTE:''' It can take up to 4 hours for new account create completion and the email notification to be sent<br />
<br />
==Request an account on cheaha==<br />
Once you have completed the steps of adding/sponsoring XIAS account for your collaborator, send us an email on support@listserv.uab.edu with information about the collaborator. Please don't forget to include their PrimaryID and email address which you used to create their XIAS account, as it would become their Username on [[Cheaha]].</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Welcome&diff=5933Welcome2019-10-01T16:23:00Z<p>Mhanby@uab.edu: </p>
<hr />
<div>{{Main_Banner}}<br />
Welcome to the '''Research Computing System'''<br />
<br />
The Research Computing System (RCS) provides a framework for sharing data, accessing compute power, and collaborating with peers on campus and around the globe. Our goal is to construct a dynamic "network of services" that you can use to organize your data, study it, and share outcomes.<br />
<br />
''''docs'''' (the service you are looking at while reading this text) is one of a set of core services, or libraries, available for you to organize information you gather. Docs is a wiki, an online editor to collaboratively write and share documentation. ([http://en.wikipedia.org/wiki/Wiki Wiki is a Hawaiian term] meaning fast.) You can learn more about '''docs''' on the page [[UnderstandingDocs]]. The docs wiki is filled with pages that document the many different services and applications available on the Research Computing System. If you see information that looks out of date please don't hesitate to [mailto:support@vo.uabgrid.uab.edu ask about it] or fix it.<br />
<br />
The Research Computing System is designed to provide services to researchers in three core areas:<br />
<br />
* '''Data Analysis''' - using the High Performance Computing (HPC) fabric we call [[Cheaha]] for analyzing data and running simulations. Many [[Cheaha_Software|applications are already available]] or you can install your own <br />
* '''Data Sharing''' - supporting the trusted exchange of information using virtual data containers to spark new ideas<br />
* '''Application Development''' - providing virtual machines and web-hosted development tools empowering you to serve others with your research<br />
<br />
== Support and Development ==<br />
<br />
The Research Computing System is developed and supported by UAB IT's Research Computing Group. We are also developing a core set of applications to help you to easily incorporate our services into your research processes and this documentation collection to help you leverage the resources already available. We follow the best practices of the Open Source community and develop the RCS openly. You can follow our progress via the [http://dev.uabgrid.uab.edu our development wiki].<br />
<br />
The Research Computing System is an out growth of the UABgrid pilot, launched in September 2007 which has focused on demonstrating the utility of unlimited analysis, storage, and application for research. RCS is being built on the same technology foundations used by major cloud vendors and decades of distributed systems computing research, technology that powered the last ten years of large scale systems serving prominent national and international initiatives like the [http://opensciencegrid.org/ Open Science Grid], [http://xsede.org XSEDE], [http://www.teragrid.org/ TeraGrid], the [http://lcg.web.cern.ch/LCG/ LHC Computing Grid], and [https://cabig.nci.nih.gov caBIG].<br />
<br />
== Outreach ==<br />
<br />
The UAB IT Research Computing Group has collaborated with a number of prominent research projects at UAB to identify use cases and develop the requirements for the RCS. Our collaborators include the Center for Clinical and Translational Science (CCTS), Heflin Genomics Center, the Comprehensive Cancer Center (CCC), the Department of Computer and Information Sciences (CIS), the Department of Mechanical Engineering (ME), Lister Hill Library, the School of Optometry's Center for the Development of Functional Imaging, and Health System Information Services (HSIS). <br />
<br />
As part of the process of building this research computing platform, the UAB IT Research Computing Group has hosted an annual campus symposium on research computing and cyber-infrastructure (CI) developments and accomplishments. Starting as CyberInfrastructure (CI) Days in 2007, the name was changed to [http://docs.uabgrid.uab.edu/wiki/UAB_Research_Computing_Day '''UAB Research Computing Day'''] in 2011 to reflect the broader mission to support research. IT Research Computing also participates in other campus wide symposiums including UAB Research Core Day.<br />
<br />
== Featured Research Applications ==<br />
<br />
The Research Computing Group also helps support the campus MATLAB license with self-service installation documentation and supports using MATLAB on the HPC platform, providing a pathway to expand your computational power and freeing your laptop from serving as a compute platform.<br />
<br />
{{abox<br />
| UAB MATLAB Information |<br />
In January 2011, UAB acquired a site license from Mathworks for MATLAB, SimuLink and 42 Toolboxes. <br />
* Learn more about [[MATLAB|MATLAB and how you can use it at UAB]]<br />
* Learn more about the [[UAB TAH license|UAB Mathworks Site license]] and review [[Matlab site license FAQ|frequently asked questions about the license]]<br />
}}<br />
<br />
The UAB IT Research Computing group, the CCTS BMI, and [http://www.uab.edu/hcgs/bioinformatics Heflin Center for Genomic Science] have teamed up to help improve genomic research at UAB. Researchers can work with the scientists and research experts to produce a research pipeline from sequencing, to analysis, to publication.<br />
<br />
{{abox<br />
|'''Galaxy'''|<br />
A web front end to run analyses on the cluster fabric. Currently focused on NGS (Next Generation Sequencing; biology) analysis support. <br />
* [[Galaxy|Galaxy Project Home]]<br />
* [http://projects.uabgrid.uab.edu/galaxy Galaxy Development Wiki]<br />
}}<br />
<br />
== Data Backups ==<br />
<br />
Users of Cheaha are solely responsible for backing up their files. This includes files under '''/data/user''', '''/data/project''', and '''/home'''.<br />
<br />
To restate, the Cheaha file systems are NOT backed up!<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. Any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. We also request that you send us a list of publications based on your use of Cheaha resources.<br />
<br />
=== Description of Cheaha for Grants (short)===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2900 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an EDR InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
==== Cheaha HPC system ====<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (RC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. Research Computing in open collaboration with the campus research community is leading the design and development of these resources.<br />
<br />
==== Compute Resources ====<br />
<br />
Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a cluster desktop. The local compute pool provides access to two generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The newest generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs per node, and EDR InfiniBand interconnect. The compute nodes combine to provide over 468 TFLOP/s of dedicated computing power.<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (OSG).<br />
<br />
==== Storage Resources ====<br />
<br />
The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware connected via the Infiniband fabric. An expansion of the GPFS fabric will double the capacity and is scheduled to be on-line Fall 2018. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
==== Network Resources ====<br />
<br />
The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverages the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone, ensuring predictable latencies. In addition, the network also consist of a secure Science DMZ with data transfer nodes (DTNs), Perfsonar measurement nodes, and a Bro security node connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using 10 Gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
==== Personnel ====<br />
<br />
UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes an HPC Architect-Manager, four Software developers, two Scientists, a system administrator and a project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
This work was supported in part by the National Science Foundation under Grants Nos. OAC-1541310, the University of Alabama at Birmingham, and the Alabama Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Alabama at Birmingham.</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Cheaha_GettingStarted&diff=5916Cheaha GettingStarted2019-05-09T15:47:18Z<p>Mhanby@uab.edu: /* Client Configuration */</p>
<hr />
<div>Cheaha is a cluster computing environment for UAB researchers. Information about the history and future plans for Cheaha is available on the [[Cheaha]] page.<br />
<br />
== Access (Cluster Account Request) ==<br />
<br />
To request an account on [[Cheaha]], please {{CheahaAccountRequest}}. Please include some background information about the work you plan on doing on the cluster and the group you work with, ie. your lab or affiliation.<br />
<br />
'''NOTE:'''<br />
The email you send to support@listserv.uab.edu will send back a '''confirmation email which must be acknowledged''' in order to submit the account request.<br />
This additional step is meant to cut down on spam to the support list and is only needed for the initial account creation request sent to the support list. <br />
<br />
Usage of Cheaha is governed by [https://www.uab.edu/policies/content/Pages/UAB-IT-POL-0000004.aspx UAB's Acceptable Use Policy (AUP)] for computer resources. <br />
<br />
=== External Collaborator===<br />
To request an account for an external collaborator, please follow the steps [https://docs.uabgrid.uab.edu/wiki/Collaborator_Account here.]<br />
<br />
== Login ==<br />
===Overview===<br />
Once your account has been created, you'll receive an email containing your user ID, generally your Blazer ID. Logging into Cheaha requires an SSH client. Most UAB Windows workstations already have an SSH client installed, possibly named '''SSH Secure Shell Client''' or [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY]. Linux and Mac OS X systems should have an SSH client installed by default.<br />
<br />
Usage of Cheaha is governed by [https://www.uab.edu/policies/content/Pages/UAB-IT-POL-0000004.aspx UAB's Acceptable Use Policy (AUP)] for computer and network resources.<br />
<br />
===Client Configuration===<br />
This section will cover steps to configure Windows, Linux and Mac OS X clients to connect to Cheaha.<br />
<br />
The official DNS name of Cheaha's frontend machine is ''cheaha.rc.uab.edu''. If you want to refer to the machine as ''cheaha'', you'll have to either add the "rc.uab.edu" to you computer's DNS search path. On Unix-derived systems (Linux, Mac) you can edit your computers /etc/resolv.conf as follows (you'll need administrator access to edit this file)<br />
<pre><br />
search rc.uab.edu<br />
</pre><br />
Or you can customize your SSH configuration to use the short name "cheaha" as a connection name. On systems using OpenSSH you can add the following to your ~/.ssh/config file<br />
<br />
<pre><br />
Host cheaha<br />
Hostname cheaha.rc.uab.edu<br />
</pre><br />
<br />
====Linux====<br />
Linux systems, regardless of the flavor (RedHat, SuSE, Ubuntu, etc...), should already have an SSH client on the system as part of the default install.<br />
# Start a terminal (on RedHat click Applications -> Accessories -> Terminal, on Ubuntu Ctrl+Alt+T)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Mac OS X====<br />
Mac OS X is a Unix operating system (BSD) and has a built in ssh client.<br />
# Start a terminal (click Finder, type Terminal and double click on Terminal under the Applications category)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Windows====<br />
There are many SSH clients available for Windows, some commercial and some that are free (GPL). This section will cover two clients that are commonly found on UAB Windows systems.<br />
=====MobaXterm=====<br />
[http://mobaxterm.mobatek.net/ MobaXterm] is a free (also available for a price in a Profession version) suite of SSH tools. Of the Windows clients we've used, MobaXterm is the easiest to use and feature complete. [http://mobaxterm.mobatek.net/features.html Features] include (but not limited to):<br />
* SSH client (in a handy web browser like tabbed interface)<br />
* Embedded Cygwin (which allows Windows users to run many Linux commands like grep, rsync, sed)<br />
* Remote file system browser (graphical SFTP)<br />
* X11 forwarding for remotely displaying graphical content from Cheaha<br />
* Installs without requiring Windows Administrator rights<br />
<br />
Start MobaXterm and click the Session toolbar button (top left). Click SSH for the session type, enter the following information and click OK. Once finished, double click cheaha.rc.uab.edu in the list of Saved sessions under PuTTY sessions:<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Remote host'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|}<br />
<br />
=====PuTTY=====<br />
[http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY] is a free suite of SSH and telnet tools written and maintained by [http://www.pobox.com/~anakin/ Simon Tatham]. PuTTY supports SSH, secure FTP (SFTP), and X forwarding (XTERM) among other tools.<br />
<br />
* Start PuTTY (Click START -> All Programs -> PuTTY -> PuTTY). The 'PuTTY Configuration' window will open<br />
* Use these settings for each of the clusters that you would like to configure<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host Name (or IP address)'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Saved Sessions'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|}<br />
* Click '''Save''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start PuTTY, simply double click on the cluster name under the 'Saved Sessions' list<br />
<br />
=====SSH Secure Shell Client=====<br />
SSH Secure Shell is a commercial application that is installed on many Windows workstations on campus and can be configured as follows:<br />
* Start the program (Click START -> All Programs -> SSH Secure Shell -> Secure Shell Client). The 'default - SSH Secure Shell' window will open<br />
* Click File -> Profiles -> Add Profile to open the 'Add Profile' window<br />
* Type in the name of the cluster (for example: cheaha) in the field and click 'Add to Profiles'<br />
* Click File -> Profiles -> Edit Profiles to open the 'Profiles' window<br />
* Single click on your new profile name<br />
* Use these settings for the clusters<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host name'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''User name'''<br />
|blazerid (insert your blazerid here)<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Encryption algorithm'''<br />
|<Default><br />
|-<br />
|'''MAC algorithm'''<br />
|<Default><br />
|-<br />
|'''Compression'''<br />
|<None><br />
|-<br />
|'''Terminal answerback'''<br />
|vt100<br />
|-<br />
|}<br />
* Leave 'Connect through firewall' and 'Request tunnels only' unchecked<br />
* Click '''OK''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start SSH Secure Shell, click 'Profiles' and click the cluster name<br />
<br />
=== Logging in to Cheaha ===<br />
No matter which client you use to connect to the Cheaha, the first time you connect, the SSH client should display a message asking if you would like to import the hosts public key. Answer '''Yes''' to this question.<br />
<br />
* Connect to Cheaha using one of the methods listed above<br />
* Answer '''Yes''' to import the cluster's public key<br />
** Enter your BlazerID password<br />
<br />
* After successfully logging in for the first time, You may see the following message '''just press ENTER for the next three prompts, don't type any passphrases!'''<br />
<br />
It doesn't appear that you have set up your ssh key.<br />
This process will make the files:<br />
/home/joeuser/.ssh/id_rsa.pub<br />
/home/joeuser/.ssh/id_rsa<br />
/home/joeuser/.ssh/authorized_keys<br />
<br />
Generating public/private rsa key pair.<br />
Enter file in which to save the key (/home/joeuser/.ssh/id_rsa):<br />
** Enter file in which to save the key (/home/joeuser/.ssh/id_rsa):'''Press Enter'''<br />
** Enter passphrase (empty for no passphrase):'''Press Enter'''<br />
** Enter same passphrase again:'''Press Enter'''<br />
Your identification has been saved in /home/joeuser/.ssh/id_rsa.<br />
Your public key has been saved in /home/joeuser/.ssh/id_rsa.pub.<br />
The key fingerprint is:<br />
f6:xx:xx:xx:xx:dd:9a:79:7b:83:xx:f9:d7:a7:d6:27 joeuser@cheaha.rc.uab.edu<br />
<br />
==== Users without a blazerid (collaborators from other universities) ====<br />
** If you were issued a temporary password, enter it (Passwords are CaSE SensitivE!!!) You should see a message similar to this<br />
You are required to change your password immediately (password aged)<br />
<br />
WARNING: Your password has expired.<br />
You must change your password now and login again!<br />
Changing password for user joeuser.<br />
Changing password for joeuser<br />
(current) UNIX password:<br />
*** (current) UNIX password: '''Enter your temporary password at this prompt and press enter'''<br />
*** New UNIX password: '''Enter your new strong password and press enter'''<br />
*** Retype new UNIX password: '''Enter your new strong password again and press enter'''<br />
*** After you enter your new password for the second time and press enter, the shell may exit automatically. If it doesn't, type exit and press enter<br />
*** Log in again, this time use your new password<br />
<br />
Congratulations, you should now have a command prompt and be ready to start [[Cheaha_GettingStarted#Example_Batch_Job_Script | submitting jobs]]!!!<br />
<br />
== Hardware ==<br />
[[Image:Chehah2_2016.png|center|thumb|450px|Logical Diagram of Cheaha Configuration]]<br />
<br />
The Cheaha Compute Platform includes commodity compute hardware, totaling 2800 compute cores and over 4.7PB of usable storage (6.6PB raw capacity). The following descriptions highlight the current hardware profile that provides an aggregate theoretical peak performance of 468 teraflops.<br />
<br />
* Compute <br />
** 36 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 38 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 256GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 14 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 384GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 4 Compute Nodes with Nvidia Tesla K80 and two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 4 Compute Nodes with Intel Phi coprocessor SE10/7120 and two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 18 Compute Nodes with two 14 core processors (Intel Xeon E5-2680 v4 2.4GHz)with 256GB DDR4 RAM, four NVIDIA Tesla P100 16GB GPUs, EDR InfiniBand and 10GigE network cards<br />
<br />
* Networking<br />
**FDR and EDR InfiniBand Switch<br />
** 10Gigabit Ethernet Switch<br />
<br />
* Storage -- DDN SFA12KX with GPFS) <br />
** 2 x 12KX40D-56IB controllers<br />
** 10 x SS8460 disk enclosures<br />
** 825 x 4K SAS drives<br />
<br />
* Management <br />
** Management node and gigabit switch for cluster management<br />
** Bright Advanced Cluster Management software licenses<br />
<br />
== Cluster Software ==<br />
* BrightCM 7.2<br />
* CentOS 7.2 x86_64<br />
* [[Slurm]] 15.08<br />
<br />
== Queuing System ==<br />
All work on Cheaha must be submitted to '''our queuing system ([[Slurm]])'''. A common mistake made by new users is to run 'jobs' on the login node. This section gives a basic overview of what a queuing system is and why we use it.<br />
=== What is a queuing system? ===<br />
* Software that gives users fair allocation of the cluster's resources<br />
* Schedules jobs based using resource requests (the following are commonly requested resources, there are many more that are available)<br />
** Number of processors (often referred to as "slots")<br />
** Maximum memory (RAM) required per slot<br />
** Maximum run time<br />
* Common queuing systems:<br />
** '''[[Slurm]]'''<br />
** Sun Grid Engine (Also know as SGE, OGE, GE)<br />
** OpenPBS<br />
** Torque<br />
** LSF (load sharing facility)<br />
<br />
[http://slurm.schedmd.com/ Slurm] is a queue management system and stands for Simple Linux Utility for Resource Management. Slurm was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. '''[[Slurm]]''' is now the primary job manager on Cheaha, it replaces SUN Grid Engine ([[https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted_deprecated SGE]]) the job manager used earlier. Instructions of using SLURM and writing SLURM scripts for jobs submission on Cheaha can be found '''[[Slurm | here]]'''.<br />
<br />
=== Typical Workflow ===<br />
* Stage data to $USER_SCRATCH (your scratch directory)<br />
* Research how to run your code in "batch" mode. Batch mode typically means the ability to run it from the command line without requiring any interaction from the user.<br />
* Identify the appropriate resources needed to run the job. The following are mandatory resource requests for all jobs on Cheaha<br />
** Maximum memory (RAM) required per slot<br />
** Maximum runtime<br />
* Write a job script specifying queuing system parameters, resource requests and commands to run program<br />
* Submit script to queuing system (sbatch script.job)<br />
* Monitor job (squeue)<br />
* Review the results and resubmit as necessary<br />
* Clean up the scratch directory by moving or deleting the data off of the cluster<br />
<br />
=== Resource Requests ===<br />
Accurate resource requests are extremely important to the health of the over all cluster. In order for Cheaha to operate properly, the queing system must know how much runtime and RAM each job will need.<br />
<br />
==== Mandatory Resource Requests ====<br />
<br />
* -t, --time=<time><br />
Set a limit on the total run time of the job allocation. If the requested time limit exceeds the partition's time limit, the job will be left in a PENDING state (possibly indefinitely).<br />
* For Array jobs, this represents the maximum run time for each task<br />
** For serial or parallel jobs, this represents the maximum run time for the entire job<br />
<br />
* --mem-per-cpu=<MB><br />
Mimimum memory required per allocated CPU in MegaBytes.<br />
<br />
==== Other Common Resource Requests ====<br />
* -N, --nodes=<minnodes[-maxnodes]><br />
Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count.<br />
<br />
* -n, --ntasks=<number><br />
sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node<br />
<br />
* --mem=<MB><br />
Specify the real memory required per node in MegaBytes.<br />
<br />
* -c, --cpus-per-task=<ncpus><br />
Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task.<br />
<br />
* -p, --partition=<partition_names><br />
Request a specific partition for the resource allocation. Available partitions are: express(max 2 hrs), short(max 12 hrs), medium(max 50 hrs), long(max 150 hrs), sinteractive(0-2 hrs)<br />
<br />
=== Submitting Jobs ===<br />
Batch Jobs are submitted on Cheaha by using the "sbatch" command. The full manual for sbtach is available by running the following command<br />
man sbatch<br />
<br />
==== Job Script File Format ====<br />
To submit a job to the queuing systems, you will first define your job in a script (a text file) and then submit that script to the queuing system.<br />
<br />
The script file needs to be '''formatted as a UNIX file''', not a Windows or Mac text file. In geek speak, this means that the end of line (EOL) character should be a line feed (LF) rather than a carriage return line feed (CRLF) for Windows or carriage return (CR) for Mac.<br />
<br />
If you submit a job script formatted as a Windows or Mac text file, your job will likely fail with misleading messages, for example that the path specified does not exist.<br />
<br />
Windows '''Notepad''' does not have the ability to save files using the UNIX file format. Do NOT use Notepad to create files intended for use on the clusters. Instead use one of the alternative text editors listed in the following section.<br />
<br />
===== Converting Files to UNIX Format =====<br />
====== Dos2Unix Method ======<br />
The lines below that begin with $ are commands, the $ represents the command prompt and should not be typed!<br />
<br />
The dos2unix program can be used to convert Windows text files to UNIX files with a simple command. After you have copied the file to your home directory on the cluster, you can identify that the file is a Windows file by executing the following (Windows uses CR LF as the line terminator, where UNIX uses only LF and Mac uses only CR):<br />
<pre><br />
$ file testfile.txt<br />
<br />
testfile.txt: ASCII text, with CRLF line terminators<br />
</pre><br />
<br />
Now, convert the file to UNIX<br />
<pre><br />
$ dos2unix testfile.txt<br />
<br />
dos2unix: converting file testfile.txt to UNIX format ...<br />
</pre><br />
<br />
Verify the conversion using the file command<br />
<pre><br />
$ file testfile.txt<br />
<br />
testfile.txt: ASCII text<br />
</pre><br />
<br />
====== Alternative Windows Text Editors ======<br />
There are many good text editors available for Windows that have the capability to save files using the UNIX file format. Here are a few:<br />
* [[http://www.geany.org/ Geany]] is an excellent free text editor for Windows and Linux that supports Windows, UNIX and Mac file formats, syntax highlighting and many programming features. To convert from Windows to UNIX click '''Document''' click '''Set Line Endings''' and then '''Convert and Set to LF (Unix)'''<br />
* [[http://notepad-plus.sourceforge.net/uk/site.htm Notepad++]] is a great free Windows text editor that supports Windows, UNIX and Mac file formats, syntax highlighting and many programming features. To convert from Windows to UNIX click '''Format''' and then click '''Convert to UNIX Format'''<br />
* [[http://www.textpad.com/ TextPad]] is another excellent Windows text editor. TextPad is not free, however.<br />
<br />
==== Example Batch Job Script ====<br />
A shared cluster environment like Cheaha uses a job scheduler to run tasks on the cluster to provide optimal resource sharing among users. Cheaha uses a job scheduling system call Slurm to schedule and manage jobs. A user needs to tell Slurm about resource requirements (e.g. CPU, memory) so that it can schedule jobs effectively. These resource requirements along with actual application code can be specified in a single file commonly referred as 'Job Script/File'. Following is a simple job script that prints job number and hostname.<br />
<br />
'''Note:'''Jobs '''must request''' the appropriate partition (ex: ''--partition=short'') to satisfy the jobs resource request (maximum runtime, number of compute nodes, etc...)<br />
<pre><br />
#!/bin/bash<br />
#<br />
#SBATCH --job-name=test<br />
#SBATCH --output=res.txt<br />
#SBATCH --ntasks=1<br />
#SBATCH --partition=express<br />
#SBATCH --time=10:00<br />
#SBATCH --mem-per-cpu=100<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
srun hostname<br />
srun sleep 60<br />
</pre><br />
<br />
Lines starting with '#SBATCH' have a special meaning in the Slurm world. Slurm specific configuration options are specified after the '#SBATCH' characters. Above configuration options are useful for most job scripts and for additional configuration options refer to Slurm commands manual. A job script is submitted to the cluster using Slurm specific commands. There are many commands available, but following three commands are the most common:<br />
* sbatch - to submit job<br />
* scancel - to delete job<br />
* squeue - to view job status<br />
<br />
We can submit above job script using sbatch command:<br />
<pre><br />
$ sbatch HelloCheaha.sh<br />
Submitted batch job 52707<br />
</pre><br />
<br />
When the job script is submitted, Slurm queues it up and assigns it a job number (e.g. 52707 in above example). The job number is available inside job script using environment variable $JOB_ID. This variable can be used inside job script to create job related directory structure or file names.<br />
<br />
=== Interactive Resources ===<br />
Login Node (the host that you connected to when you setup the SSH connection to Cheaha) is supposed to be used for submitting jobs and/or lighter prep work required for the job scripts. '''Do not run heavy computations on the login node'''. If you have a heavier workload to prepare for a batch job (eg. compiling code or other manipulations of data) or your compute application requires interactive control, you should request a dedicated interactive node for this work.<br />
<br />
Interactive resources are requested by submitting an "interactive" job to the scheduler. Interactive jobs will provide you a command line on a compute resource that you can use just like you would the command line on the login node. The difference is that the scheduler has dedicated the requested resources to your job and you can run your interactive commands without having to worry about impacting other users on the login node.<br />
<br />
Interactive jobs, that can be run on command line, are requested with the '''srun''' command. <br />
<br />
<pre><br />
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash<br />
</pre><br />
<br />
This command requests for 4 cores (--cpus-per-task) for a single task (--ntasks) with each cpu requesting size 4GB of RAM (--mem-per-cpu) for 8 hrs (--time).<br />
<br />
More advanced interactive scenarios to support graphical applications are available using [https://docs.uabgrid.uab.edu/wiki/Setting_Up_VNC_Session VNC] or X11 tunneling [http://www.uab.edu/it/software X-Win32 2014 for Windows]<br />
<br />
Interactive jobs that requires running a graphical application, are requested with the '''sinteractive''' command, via '''Terminal''' on your VNC window.<br />
<pre><br />
sinteractive --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME<br />
</pre><br />
Please note, sinteractive starts your shell in a screen session. Screen is a terminal emulator that is designed to make it possible to detach and reattach a session. This feature can mostly be ignored. If you application uses `ctrl-a` as a special command sequence (e.g. Emacs), however, you may find the application doesn't receive this special character. When using screen, you need to type `ctrl-a a` (ctrl-a followed by a single "a" key press) to send a ctrl-a to your application. Screen uses ctrl-a as it's own command character, so this special sequence issues the command to screen to "send ctrl-a to my app". Learn more about [https://www.gnu.org/software/screen/manual/html_node/Overview.html#Overview screen from it's documentation].<br />
<br />
== Storage ==<br />
=== Privacy ===<br />
{{SensitiveInformation}}<br />
=== No Automatic Backups ===<br />
<br />
There is no automatic back up of any data on the cluster (home, scratch, or whatever). All data back up is managed by you. If you aren't managing a data back up process, then you have no backup data.<br />
<br />
=== Home directories ===<br />
<br />
Your home directory on Cheaha is NFS-mounted to the compute nodes as /home/$USER or $HOME. It is acceptable to use your home directory as a location to store job scripts, custom code, and libraries. You are responsible for keeping your home directory under 10GB in size!<br />
<br />
'''The home directory must not be used to store large amounts of data.''' Please use $USER_SCRATCH <br />
for actively used data sets and $USER_DATA for storage of non scratch data.<br />
<br />
=== Scratch ===<br />
Research Computing policy requires that all bulky input and output must be located on the scratch space. The home directory is intended to store your job scripts, log files, libraries and other supporting files.<br />
<br />
'''Important Information:'''<br />
* Scratch space (network and local) '''is not backed up'''.<br />
* Research Computing expects each user to keep their scratch areas clean. The cluster scratch area are not to be used for archiving data.<br />
<br />
Cheaha has two types of scratch space, network mounted and local.<br />
* Network scratch ($USER_SCRATCH) is available on the login node and each compute node. This storage is a GPFS high performance file system providing roughly 4.7PB of usable storage. This should be your jobs primary working directory, unless the job would benefit from local scratch (see below).<br />
* Local scratch is physically located on each compute node and is not accessible to the other nodes (including the login node). This space is useful if the job performs a lot of file I/O. Most of the jobs that run on our clusters do not fall into this category. Because the local scratch is inaccessible outside the job, it is important to note that you must move any data between local scratch to your network accessible scratch within your job. For example, step 1 in the job could be to copy the input from $USER_SCRATCH to ${USER_SCRATCH}, step 2 code execution, step 3 move the data back to $USER_SCRATCH.<br />
<br />
==== Network Scratch ====<br />
Network scratch is available using the environment variable $USER_SCRATCH or directly by /data/scratch/$USER<br />
<br />
It is advisable to use the environment variable whenever possible rather than the hard coded path.<br />
<br />
==== Local Scratch ====<br />
Each compute node has a local scratch directory that is accessible via the variable '''$LOCAL_SCRATCH'''. If your job performs a lot of file I/O, the job should use $LOCAL_SCRATCH rather than $USER_SCRATCH to prevent bogging down the network scratch file system. The amount of scratch space available is approximately 800GB.<br />
<br />
The $LOCAL_SCRATCH is a special temporary directory and it's important to note that this directory is deleted when the job completes, so the job script has to move the results to $USER_SCRATCH or other location prior to the job exiting.<br />
<br />
Note that $LOCAL_SCRATCH is only useful for jobs in which all processes run on the same compute node, so MPI jobs are not candidates for this solution.<br />
<br />
The following is an array job example that uses $LOCAL_SCRATCH by transferring the inputs into $LOCAL_SCRATCH at the beginning of the script and the result out of $LOCAL_SCRATCH at the end of the script.<br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --array=1-10<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=R_array_job<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=R_array_job.err<br />
#SBATCH --output=R_array_job.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes and the appropriate partition<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load R/3.2.0-goolf-1.7.20<br />
<br />
echo "TMPDIR: $LOCAL_SCRATCH"<br />
<br />
cd $LOCAL_SCRATCH<br />
# Create a working directory under the special scheduler local scratch directory<br />
# using the array job's taskID<br />
mdkir $SLURM_ARRAY_TASK_ID<br />
cd $SLURM_ARRAY_TASK_ID<br />
<br />
# Next copy the input data to the local scratch<br />
echo "Copying input data from network scratch to $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID - $(date)<br />
# The input data in this case has a numerical file extension that<br />
# matches $SLURM_ARRAY_TASK_ID<br />
cp -a $USER_SCRATCH/GeneData/INP*.$SLURM_ARRAY_TASK_ID ./<br />
echo "copied input data from network scratch to $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID - $(date)<br />
<br />
someapp -S 1 -D 10 -i INP*.$SLURM_ARRAY_TASK_ID -o geneapp.out.$SLURM_ARRAY_TASK_ID<br />
<br />
# Lastly copy the results back to network scratch<br />
echo "Copying results from local $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID to network - $(date)<br />
cp -a geneapp.out.$SLURM_ARRAY_TASK_ID $USER_SCRATCH/GeneData/<br />
echo "Copied results from local $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID to network - $(date)<br />
<br />
</pre><br />
<br />
=== Project Storage ===<br />
Cheaha has a location where shared data can be stored called $SHARE_PROJECT. As with user scratch, this area '''is not backed up'''!<br />
<br />
This is helpful if a team of researchers must access the same data. Please open a help desk ticket to request a project directory under $SHARE_PROJECT.<br />
<br />
=== Uploading Data ===<br />
{{SensitiveInformation}}<br />
Data can be moved onto the cluster (pushed) from a remote client (ie. you desktop) via SCP or SFTP. Data can also be downloaded to the cluster (pulled) by issuing transfer commands once you are logged into the cluster. Common transfer methods are `wget <URL>`, FTP, or SCP, and depend on how the data is made available from the data provider.<br />
<br />
Large data sets should be staged directly to your $USER_SCRATCH directory so as not to fill up $HOME. If you are working on a data set shared with multiple users, it's preferable to request space in $SHARE_PROJECT rather than duplicating the data for each user.<br />
<br />
== Environment Modules ==<br />
[http://modules.sourceforge.net/ Environment Modules] is installed on Cheaha and should be used when constructing your job scripts if an applicable module file exists. Using the module command you can easily configure your environment for specific software packages without having to know the specific environment variables and values to set. Modules allows you to dynamically configure your environment without having to logout / login for the changes to take affect.<br />
<br />
If you find that specific software does not have a module, please submit a [http://etlab.eng.uab.edu/ helpdesk ticket] to request the module.<br />
<br />
* Cheaha supports bash completion for the module command. For example, type 'module' and press the TAB key twice to see a list of options:<br />
<pre><br />
module TAB TAB<br />
<br />
add display initlist keyword refresh switch use <br />
apropos help initprepend list rm unload whatis <br />
avail initadd initrm load show unuse <br />
clear initclear initswitch purge swap update<br />
</pre><br />
<br />
* To see the list of available modulefiles on the cluster, run the '''module avail''' command (note the example list below may not be complete!) or '''module load ''' followed by two tab key presses:<br />
<pre><br />
module avail<br />
<br />
----------------------------------------------------------------------------------------- /cm/shared/modulefiles -----------------------------------------------------------------------------------------<br />
acml/gcc/64/5.3.1 acml/open64-int64/mp/fma4/5.3.1 fftw2/openmpi/gcc/64/float/2.1.5 intel-cluster-runtime/ia32/3.8 netcdf/gcc/64/4.3.3.1<br />
acml/gcc/fma4/5.3.1 blacs/openmpi/gcc/64/1.1patch03 fftw2/openmpi/open64/64/double/2.1.5 intel-cluster-runtime/intel64/3.8 netcdf/open64/64/4.3.3.1<br />
acml/gcc/mp/64/5.3.1 blacs/openmpi/open64/64/1.1patch03 fftw2/openmpi/open64/64/float/2.1.5 intel-cluster-runtime/mic/3.8 netperf/2.7.0<br />
acml/gcc/mp/fma4/5.3.1 blas/gcc/64/3.6.0 fftw3/openmpi/gcc/64/3.3.4 intel-tbb-oss/ia32/44_20160526oss open64/4.5.2.1<br />
acml/gcc-int64/64/5.3.1 blas/open64/64/3.6.0 fftw3/openmpi/open64/64/3.3.4 intel-tbb-oss/intel64/44_20160526oss openblas/dynamic/0.2.15<br />
acml/gcc-int64/fma4/5.3.1 bonnie++/1.97.1 gdb/7.9 iozone/3_434 openmpi/gcc/64/1.10.1<br />
acml/gcc-int64/mp/64/5.3.1 cmgui/7.2 globalarrays/openmpi/gcc/64/5.4 lapack/gcc/64/3.6.0 openmpi/open64/64/1.10.1<br />
acml/gcc-int64/mp/fma4/5.3.1 cuda75/blas/7.5.18 globalarrays/openmpi/open64/64/5.4 lapack/open64/64/3.6.0 pbspro/13.0.2.153173<br />
acml/open64/64/5.3.1 cuda75/fft/7.5.18 hdf5/1.6.10 mpich/ge/gcc/64/3.2 puppet/3.8.4<br />
acml/open64/fma4/5.3.1 cuda75/gdk/352.79 hdf5_18/1.8.16 mpich/ge/open64/64/3.2 rc-base<br />
acml/open64/mp/64/5.3.1 cuda75/nsight/7.5.18 hpl/2.1 mpiexec/0.84_432 scalapack/mvapich2/gcc/64/2.0.2<br />
acml/open64/mp/fma4/5.3.1 cuda75/profiler/7.5.18 hwloc/1.10.1 mvapich/gcc/64/1.2rc1 scalapack/openmpi/gcc/64/2.0.2<br />
acml/open64-int64/64/5.3.1 cuda75/toolkit/7.5.18 intel/compiler/32/15.0/2015.5.223 mvapich/open64/64/1.2rc1 sge/2011.11p1<br />
acml/open64-int64/fma4/5.3.1 default-environment intel/compiler/64/15.0/2015.5.223 mvapich2/gcc/64/2.2b slurm/15.08.6<br />
acml/open64-int64/mp/64/5.3.1 fftw2/openmpi/gcc/64/double/2.1.5 intel-cluster-checker/2.2.2 mvapich2/open64/64/2.2b torque/6.0.0.1<br />
<br />
---------------------------------------------------------------------------------------- /share/apps/modulefiles -----------------------------------------------------------------------------------------<br />
rc/BrainSuite/15b rc/freesurfer/freesurfer-5.3.0 rc/intel/compiler/64/ps_2016/2016.0.047 rc/matlab/R2015a rc/SAS/v9.4<br />
rc/cmg/2012.116.G rc/gromacs-intel/5.1.1 rc/Mathematica/10.3 rc/matlab/R2015b<br />
rc/dsistudio/dsistudio-20151020 rc/gtool/0.7.5 rc/matlab/R2012a rc/MRIConvert/2.0.8<br />
<br />
--------------------------------------------------------------------------------------- /share/apps/rc/modules/all ---------------------------------------------------------------------------------------<br />
AFNI/linux_openmp_64-goolf-1.7.20-20160616 gperf/3.0.4-intel-2016a MVAPICH2/2.2b-GCC-4.9.3-2.25<br />
Amber/14-intel-2016a-AmberTools-15-patchlevel-13-13 grep/2.15-goolf-1.4.10 NASM/2.11.06-goolf-1.7.20<br />
annovar/2016Feb01-foss-2015b-Perl-5.22.1 GROMACS/5.0.5-intel-2015b-hybrid NASM/2.11.08-foss-2015b<br />
ant/1.9.6-Java-1.7.0_80 GSL/1.16-goolf-1.7.20 NASM/2.11.08-intel-2016a<br />
APBS/1.4-linux-static-x86_64 GSL/1.16-intel-2015b NASM/2.12.02-foss-2016a<br />
ASHS/rev103_20140612 GSL/2.1-foss-2015b NASM/2.12.02-intel-2015b<br />
Aspera-Connect/3.6.1 gtool/0.7.5_linux_x86_64 NASM/2.12.02-intel-2016a<br />
ATLAS/3.10.1-gompi-1.5.12-LAPACK-3.4.2 guile/1.8.8-GNU-4.9.3-2.25 ncurses/5.9-foss-2015b<br />
Autoconf/2.69-foss-2016a HAPGEN2/2.2.0 ncurses/5.9-GCC-4.8.4<br />
Autoconf/2.69-GCC-4.8.4 HarfBuzz/1.2.7-intel-2016a ncurses/5.9-GNU-4.9.3-2.25<br />
Autoconf/2.69-GNU-4.9.3-2.25 HDF5/1.8.15-patch1-intel-2015b ncurses/5.9-goolf-1.4.10<br />
. <br />
.<br />
.<br />
.<br />
</pre><br />
<br />
Some software packages have multiple module files, for example:<br />
* GCC/4.7.2 <br />
* GCC/4.8.1 <br />
* GCC/4.8.2 <br />
* GCC/4.8.4 <br />
* GCC/4.9.2 <br />
* GCC/4.9.3 <br />
* GCC/4.9.3-2.25 <br />
<br />
In this case, the GCC module will always load the latest version, so loading this module is equivalent to loading GCC/4.9.3-2.25. If you always want to use the latest version, use this approach. If you want use a specific version, use the module file containing the appropriate version number.<br />
<br />
Some modules, when loaded, will actually load other modules. For example, the ''GROMACS/5.0.5-intel-2015b-hybrid '' module will also load ''intel/2015b'' and other related tools.<br />
<br />
* To load a module, ex: for a GROMACS job, use the following '''module load''' command in your job script:<br />
<pre><br />
module load GROMACS/5.0.5-intel-2015b-hybrid <br />
</pre><br />
<br />
* To see a list of the modules that you currently have loaded use the '''module list''' command<br />
<pre><br />
module list<br />
<br />
Currently Loaded Modulefiles:<br />
1) slurm/15.08.6 9) impi/5.0.3.048-iccifort-2015.3.187-GNU-4.9.3-2.25 17) Tcl/8.6.3-intel-2015b<br />
2) rc-base 10) iimpi/7.3.5-GNU-4.9.3-2.25 18) SQLite/3.8.8.1-intel-2015b<br />
3) GCC/4.9.3-binutils-2.25 11) imkl/11.2.3.187-iimpi-7.3.5-GNU-4.9.3-2.25 19) Tk/8.6.3-intel-2015b-no-X11<br />
4) binutils/2.25-GCC-4.9.3-binutils-2.25 12) intel/2015b 20) Python/2.7.9-intel-2015b<br />
5) GNU/4.9.3-2.25 13) bzip2/1.0.6-intel-2015b 21) Boost/1.58.0-intel-2015b-Python-2.7.9<br />
6) icc/2015.3.187-GNU-4.9.3-2.25 14) zlib/1.2.8-intel-2015b 22) GROMACS/5.0.5-intel-2015b-hybrid<br />
7) ifort/2015.3.187-GNU-4.9.3-2.25 15) ncurses/5.9-intel-2015b<br />
8) iccifort/2015.3.187-GNU-4.9.3-2.25 16) libreadline/6.3-intel-2015b<br />
</pre><br />
<br />
* A module can be removed from your environment by using the '''module unload''' command:<br />
<pre><br />
module unload GROMACS/5.0.5-intel-2015b-hybrid<br />
</pre><br />
<br />
* The definition of a module can also be viewed using the '''module show''' command, revealing what a specific module will do to your environment:<br />
<pre><br />
module show GROMACS/5.0.5-intel-2015b-hybrid <br />
-------------------------------------------------------------------<br />
/share/apps/rc/modules/all/GROMACS/5.0.5-intel-2015b-hybrid:<br />
<br />
module-whatis GROMACS is a versatile package to perform molecular dynamics,<br />
i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. - Homepage: http://www.gromacs.org <br />
conflict GROMACS <br />
prepend-path CPATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/include <br />
prepend-path LD_LIBRARY_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64 <br />
prepend-path LIBRARY_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64 <br />
prepend-path MANPATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/share/man <br />
prepend-path PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/bin <br />
prepend-path PKG_CONFIG_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64/pkgconfig <br />
setenv EBROOTGROMACS /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid <br />
setenv EBVERSIONGROMACS 5.0.5 <br />
setenv EBDEVELGROMACS /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/easybuild/GROMACS-5.0.5-intel-2015b-hybrid-easybuild-devel <br />
-------------------------------------------------------------------<br />
</pre><br />
<br />
=== Error Using Modules from a Job Script ===<br />
<br />
If you are using modules and the command your job executes runs fine from the command line but fails when you run it from the job, you may be having an issue with the script initialization. If you see this error in your job error output file<br />
<pre><br />
-bash: module: line 1: syntax error: unexpected end of file<br />
-bash: error importing function definition for `BASH_FUNC_module'<br />
</pre><br />
Add the command `unset module` before calling your module files. The -V job argument will cause a conflict with the module function used in your script.<br />
<br />
== Sample Job Scripts ==<br />
The following are sample job scripts, please be careful to edit these for your environment (i.e. replace <font color="red">YOUR_EMAIL_ADDRESS</font> with your real email address), set the h_rt to an appropriate runtime limit and modify the job name and any other parameters.<br />
<br />
'''Hello World''' is the classic example used throughout programming. We don't want to buck the system, so we'll use it as well to demonstrate jobs submission with one minor variation: our hello world will send us a greeting using the name of whatever machine it runs on. For example, when run on the Cheaha login node, it would print "Hello from login001".<br />
<br />
=== Hello World (serial) ===<br />
<br />
A serial job is one that can run independently of other commands, ie. it doesn't depend on the data from other jobs running simultaneously. You can run many serial jobs in any order. This is a common solution to processing lots of data when each command works on a single piece of data. For example, running the same conversion on 100s of images.<br />
<br />
Here we show how to create job script for one simple command. Running more than one command just requires submitting more jobs.<br />
<br />
* Create your hello world application. Run this command to create a script, turn it into to a command, and run the command (just copy and past the following on to the command line).<br />
1. Create the file:<br />
<pre><br />
$ vim helloworld.sh<br />
</pre><br />
<br />
2. Write into "helloworld.sh" file (To write in vim editor: press '''shift + I''' )<br />
<pre><br />
#!/bin/bash<br />
echo Hello from `hostname`<br />
</pre><br />
<br />
3. Save the file by pressing the '''esc''' key, type the following<br />
<pre><br />
:wq<br />
</pre><br />
<br />
4. Need to give permission the "helloworld.sh" file<br />
<pre><br />
$ chmod +x helloworld.sh<br />
</pre><br />
<br />
* Create the Slurm job script that will request 256 MB RAM and a maximum runtime of 10 minutes.<br />
1. Create the JOB file:<br />
<pre><br />
$ vim helloworld.job<br />
</pre><br />
<br />
2. Write into "helloworld.job" file (To write in vim editor: press '''shift + I''' )<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=helloworld<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=helloworld.err<br />
#SBATCH --output=helloworld.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=$USER@uab.edu<br />
<br />
./helloworld.sh<br />
</pre><br />
<br />
3. Save the file by pressing the '''esc''' key, type the following<br />
<pre><br />
:wq<br />
</pre><br />
<br />
* Submit the job to Slurm scheduler and check the status using squeue<br />
<pre><br />
$ sbatch helloworld.job<br />
Submitted batch job 52888<br />
</pre><br />
* When the job completes, you should have output files named helloworld.out and helloworld.err <br />
<pre><br />
$ cat helloworld.out <br />
Hello from c0003<br />
</pre><br />
<br />
=== Hello World (parallel with MPI) ===<br />
<br />
MPI is used to coordinate the activity of many computations occurring in parallel. It is commonly used in simulation software for molecular dynamics, fluid dynamics, and similar domains where there is significant communication (data) exchanged between cooperating process.<br />
<br />
Here is a simple parallel Slurm job script for running commands the rely on MPI. This example also includes the example of compiling the code and submitting the job script to the Slurm scheduler.<br />
<br />
* First, create a directory for the Hello World jobs<br />
<pre><br />
$ mkdir -p ~/jobs/helloworld<br />
$ cd ~/jobs/helloworld<br />
</pre><br />
* Create the Hello World code written in C (this example of MPI enabled Hello World includes a 3 minute sleep to ensure the job runs for several minutes, a normal hello world example would run in a matter of seconds).<br />
<pre><br />
$ vi helloworld-mpi.c<br />
</pre><br />
<pre><br />
#include <stdio.h><br />
#include <mpi.h><br />
<br />
main(int argc, char **argv)<br />
{<br />
int rank, size;<br />
<br />
int i, j;<br />
float f;<br />
<br />
MPI_Init(&argc,&argv);<br />
MPI_Comm_rank(MPI_COMM_WORLD, &rank);<br />
MPI_Comm_size(MPI_COMM_WORLD, &size);<br />
<br />
printf("Hello World from process %d of %d.\n", rank, size);<br />
sleep(180);<br />
for (j=0; j<=100000; j++)<br />
for(i=0; i<=100000; i++)<br />
f=i*2.718281828*i+i+i*3.141592654;<br />
<br />
MPI_Finalize();<br />
}<br />
</pre><br />
* Compile the code, first purging any modules you may have loaded followed by loading the module for OpenMPI GNU. The mpicc command will compile the code and produce a binary named helloworld_gnu_openmpi<br />
<pre><br />
$ module purge<br />
$ module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
<br />
$ mpicc helloworld-mpi.c -o helloworld_gnu_openmpi<br />
</pre><br />
* Create the Slurm job script that will request 8 cpu slots and a maximum runtime of 10 minutes<br />
<pre><br />
$ vi helloworld.job<br />
</pre><br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=helloworld_mpi<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=helloworld_mpi.err<br />
#SBATCH --output=helloworld_mpi.out<br />
#SBATCH --ntasks=8<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:01:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
mpirun -np $SLURM_NTASKS helloworld_gnu_openmpi<br />
</pre><br />
* Submit the job to Slurm scheduler and check the status using squeue -u $USER<br />
<pre><br />
$ sbatch helloworld.job<br />
<br />
Submitted batch job 52893<br />
<br />
$ squeue -u BLAZERID<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
52893 express hellowor BLAZERID R 2:07 2 c[0005-0006]<br />
<br />
</pre><br />
* When the job completes, you should have output files named helloworld_mpi.out and helloworld_mpi.err<br />
<pre><br />
$ cat helloworld_mpi.out<br />
<br />
Hello World from process 1 of 8.<br />
Hello World from process 3 of 8.<br />
Hello World from process 4 of 8.<br />
Hello World from process 7 of 8.<br />
Hello World from process 5 of 8.<br />
Hello World from process 6 of 8.<br />
Hello World from process 0 of 8.<br />
Hello World from process 2 of 8.<br />
</pre><br />
<br />
=== Hello World (serial) -- revisited ===<br />
<br />
The job submit scripts (sbatch scripts) are actually bash shell scripts in their own right. The reason for using the funky #SBATCH prefix in the scripts is so that bash interprets any such line as a comment and won't execute it. Because the # character starts a comment in bash, we can weave the Slurm scheduler directives (the #SBATCH lines) into standard bash scripts. This lets us build scripts that we can execute locally and then easily run the same script to on a cluster node by calling it with sbatch. This can be used to our advantage to create a more fluid experience in moving between development and production job runs. <br />
<br />
The following example is a simple variation on the serial job above. All we will do is convert our Slurm job script into a command called helloworld that calls the helloworld.sh command.<br />
<br />
If the first line of a file is #!/bin/bash and that file is executable, the shell will automatically run the command as if were any other system command, eg. ls. That is, the ".sh" extension on our HelloWorld.sh script is completely optional and is only meaningful to the user.<br />
<br />
Copy the serial helloworld.job script to a new file, add a the special #!/bin/bash as the first line, and make it executable with the following command (note: those are single quotes in the echo command): <br />
<pre><br />
echo '#!/bin/bash' | cat helloworld.job > helloworld ; chmod +x helloworld<br />
</pre><br />
<br />
Our sbatch script has now become a regular command. We can now execute the command with the simple prefix "./helloworld", which means "execute this file in the current directory":<br />
<pre><br />
./helloworld<br />
Hello from login001<br />
</pre><br />
Or if we want to run the command on a compute node, replace the "./" prefix with "sbatch ":<br />
<pre><br />
$ sbatch helloworld<br />
Submitted batch job 53001<br />
</pre><br />
And when the cluster run is complete you can look at the content of the output:<br />
<pre><br />
$ $ cat helloworld.out <br />
Hello from c0003<br />
</pre><br />
<br />
You can use this approach of treating you sbatch files as command wrappers to build a collection of commands that can be executed locally or via sbatch. The other examples can be restructured similarly.<br />
<br />
To avoid having to use the "./" prefix, just add the current directory to your PATH. Also, if you plan to do heavy development using this feature on the cluster, please be sure to run [https://docs.uabgrid.uab.edu/wiki/Slurm#Interactive_Session sinteractive] first so you don't load the login node with your development work.<br />
<br />
=== Gromacs ===<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --partition=short<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=test_gromacs<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=test_gromacs.err<br />
#SBATCH --output=test_gromacs.out<br />
#SBATCH --ntasks=8<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=10:00:00<br />
#SBATCH --mem-per-cpu=2048<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
<br />
module load GROMACS/5.0.5-intel-2015b-hybrid <br />
<br />
# Change directory to the job working directory if not already there<br />
cd ${USER_SCRATCH}/jobs/gromacs<br />
<br />
# Single precision<br />
MDRUN=mdrun_mpi<br />
<br />
# Enter your tpr file over here<br />
export MYFILE=example.tpr<br />
<br />
mpirun -np SLURM_NTASKS $MDRUN -v -s $MYFILE -o $MYFILE -c $MYFILE -x $MYFILE -e $MYFILE -g ${MYFILE}.log<br />
<br />
</pre><br />
<br />
=== R ===<br />
<br />
The following is an example job script that will use an array of 10 tasks (--array=1-10), each task has a max runtime of 2 hours and will use no more than 256 MB of RAM per task.<br />
<br />
Create a working directory and the job submission script<br />
<pre><br />
$ mkdir -p ~/jobs/ArrayExample<br />
$ cd ~/jobs/ArrayExample<br />
$ vi R-example-array.job<br />
</pre><br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --array=1-10<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=R_array_job<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=R_array_job.err<br />
#SBATCH --output=R_array_job.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load R/3.2.0-goolf-1.7.20 <br />
cd ~/jobs/ArrayExample/rep$SLURM_ARRAY_TASK_ID<br />
srun R CMD BATCH rscript.R<br />
</pre><br />
<br />
Submit the job to the Slurm scheduler and check the status of the job using the squeue command<br />
<pre><br />
$ sbatch R-example-array.job<br />
$ squeue -u $USER<br />
</pre><br />
<br />
== Installed Software ==<br />
<br />
A partial list of installed software with additional instructions for their use is available on the [[Cheaha Software]] page.</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Cheaha_GettingStarted&diff=5915Cheaha GettingStarted2019-05-09T15:46:38Z<p>Mhanby@uab.edu: /* Access (Cluster Account Request) */</p>
<hr />
<div>Cheaha is a cluster computing environment for UAB researchers. Information about the history and future plans for Cheaha is available on the [[Cheaha]] page.<br />
<br />
== Access (Cluster Account Request) ==<br />
<br />
To request an account on [[Cheaha]], please {{CheahaAccountRequest}}. Please include some background information about the work you plan on doing on the cluster and the group you work with, ie. your lab or affiliation.<br />
<br />
'''NOTE:'''<br />
The email you send to support@listserv.uab.edu will send back a '''confirmation email which must be acknowledged''' in order to submit the account request.<br />
This additional step is meant to cut down on spam to the support list and is only needed for the initial account creation request sent to the support list. <br />
<br />
Usage of Cheaha is governed by [https://www.uab.edu/policies/content/Pages/UAB-IT-POL-0000004.aspx UAB's Acceptable Use Policy (AUP)] for computer resources. <br />
<br />
=== External Collaborator===<br />
To request an account for an external collaborator, please follow the steps [https://docs.uabgrid.uab.edu/wiki/Collaborator_Account here.]<br />
<br />
== Login ==<br />
===Overview===<br />
Once your account has been created, you'll receive an email containing your user ID, generally your Blazer ID. Logging into Cheaha requires an SSH client. Most UAB Windows workstations already have an SSH client installed, possibly named '''SSH Secure Shell Client''' or [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY]. Linux and Mac OS X systems should have an SSH client installed by default.<br />
<br />
Usage of Cheaha is governed by [https://www.uab.edu/policies/content/Pages/UAB-IT-POL-0000004.aspx UAB's Acceptable Use Policy (AUP)] for computer and network resources.<br />
<br />
===Client Configuration===<br />
This section will cover steps to configure Windows, Linux and Mac OS X clients to connect to Cheaha.<br />
====Linux====<br />
Linux systems, regardless of the flavor (RedHat, SuSE, Ubuntu, etc...), should already have an SSH client on the system as part of the default install.<br />
# Start a terminal (on RedHat click Applications -> Accessories -> Terminal, on Ubuntu Ctrl+Alt+T)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Mac OS X====<br />
Mac OS X is a Unix operating system (BSD) and has a built in ssh client.<br />
# Start a terminal (click Finder, type Terminal and double click on Terminal under the Applications category)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Windows====<br />
There are many SSH clients available for Windows, some commercial and some that are free (GPL). This section will cover two clients that are commonly found on UAB Windows systems.<br />
=====MobaXterm=====<br />
[http://mobaxterm.mobatek.net/ MobaXterm] is a free (also available for a price in a Profession version) suite of SSH tools. Of the Windows clients we've used, MobaXterm is the easiest to use and feature complete. [http://mobaxterm.mobatek.net/features.html Features] include (but not limited to):<br />
* SSH client (in a handy web browser like tabbed interface)<br />
* Embedded Cygwin (which allows Windows users to run many Linux commands like grep, rsync, sed)<br />
* Remote file system browser (graphical SFTP)<br />
* X11 forwarding for remotely displaying graphical content from Cheaha<br />
* Installs without requiring Windows Administrator rights<br />
<br />
Start MobaXterm and click the Session toolbar button (top left). Click SSH for the session type, enter the following information and click OK. Once finished, double click cheaha.rc.uab.edu in the list of Saved sessions under PuTTY sessions:<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Remote host'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|}<br />
<br />
=====PuTTY=====<br />
[http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY] is a free suite of SSH and telnet tools written and maintained by [http://www.pobox.com/~anakin/ Simon Tatham]. PuTTY supports SSH, secure FTP (SFTP), and X forwarding (XTERM) among other tools.<br />
<br />
* Start PuTTY (Click START -> All Programs -> PuTTY -> PuTTY). The 'PuTTY Configuration' window will open<br />
* Use these settings for each of the clusters that you would like to configure<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host Name (or IP address)'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Saved Sessions'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|}<br />
* Click '''Save''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start PuTTY, simply double click on the cluster name under the 'Saved Sessions' list<br />
<br />
=====SSH Secure Shell Client=====<br />
SSH Secure Shell is a commercial application that is installed on many Windows workstations on campus and can be configured as follows:<br />
* Start the program (Click START -> All Programs -> SSH Secure Shell -> Secure Shell Client). The 'default - SSH Secure Shell' window will open<br />
* Click File -> Profiles -> Add Profile to open the 'Add Profile' window<br />
* Type in the name of the cluster (for example: cheaha) in the field and click 'Add to Profiles'<br />
* Click File -> Profiles -> Edit Profiles to open the 'Profiles' window<br />
* Single click on your new profile name<br />
* Use these settings for the clusters<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host name'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''User name'''<br />
|blazerid (insert your blazerid here)<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Encryption algorithm'''<br />
|<Default><br />
|-<br />
|'''MAC algorithm'''<br />
|<Default><br />
|-<br />
|'''Compression'''<br />
|<None><br />
|-<br />
|'''Terminal answerback'''<br />
|vt100<br />
|-<br />
|}<br />
* Leave 'Connect through firewall' and 'Request tunnels only' unchecked<br />
* Click '''OK''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start SSH Secure Shell, click 'Profiles' and click the cluster name<br />
<br />
=== Logging in to Cheaha ===<br />
No matter which client you use to connect to the Cheaha, the first time you connect, the SSH client should display a message asking if you would like to import the hosts public key. Answer '''Yes''' to this question.<br />
<br />
* Connect to Cheaha using one of the methods listed above<br />
* Answer '''Yes''' to import the cluster's public key<br />
** Enter your BlazerID password<br />
<br />
* After successfully logging in for the first time, You may see the following message '''just press ENTER for the next three prompts, don't type any passphrases!'''<br />
<br />
It doesn't appear that you have set up your ssh key.<br />
This process will make the files:<br />
/home/joeuser/.ssh/id_rsa.pub<br />
/home/joeuser/.ssh/id_rsa<br />
/home/joeuser/.ssh/authorized_keys<br />
<br />
Generating public/private rsa key pair.<br />
Enter file in which to save the key (/home/joeuser/.ssh/id_rsa):<br />
** Enter file in which to save the key (/home/joeuser/.ssh/id_rsa):'''Press Enter'''<br />
** Enter passphrase (empty for no passphrase):'''Press Enter'''<br />
** Enter same passphrase again:'''Press Enter'''<br />
Your identification has been saved in /home/joeuser/.ssh/id_rsa.<br />
Your public key has been saved in /home/joeuser/.ssh/id_rsa.pub.<br />
The key fingerprint is:<br />
f6:xx:xx:xx:xx:dd:9a:79:7b:83:xx:f9:d7:a7:d6:27 joeuser@cheaha.rc.uab.edu<br />
<br />
==== Users without a blazerid (collaborators from other universities) ====<br />
** If you were issued a temporary password, enter it (Passwords are CaSE SensitivE!!!) You should see a message similar to this<br />
You are required to change your password immediately (password aged)<br />
<br />
WARNING: Your password has expired.<br />
You must change your password now and login again!<br />
Changing password for user joeuser.<br />
Changing password for joeuser<br />
(current) UNIX password:<br />
*** (current) UNIX password: '''Enter your temporary password at this prompt and press enter'''<br />
*** New UNIX password: '''Enter your new strong password and press enter'''<br />
*** Retype new UNIX password: '''Enter your new strong password again and press enter'''<br />
*** After you enter your new password for the second time and press enter, the shell may exit automatically. If it doesn't, type exit and press enter<br />
*** Log in again, this time use your new password<br />
<br />
Congratulations, you should now have a command prompt and be ready to start [[Cheaha_GettingStarted#Example_Batch_Job_Script | submitting jobs]]!!!<br />
<br />
== Hardware ==<br />
[[Image:Chehah2_2016.png|center|thumb|450px|Logical Diagram of Cheaha Configuration]]<br />
<br />
The Cheaha Compute Platform includes commodity compute hardware, totaling 2800 compute cores and over 4.7PB of usable storage (6.6PB raw capacity). The following descriptions highlight the current hardware profile that provides an aggregate theoretical peak performance of 468 teraflops.<br />
<br />
* Compute <br />
** 36 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 38 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 256GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 14 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 384GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 4 Compute Nodes with Nvidia Tesla K80 and two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 4 Compute Nodes with Intel Phi coprocessor SE10/7120 and two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 18 Compute Nodes with two 14 core processors (Intel Xeon E5-2680 v4 2.4GHz)with 256GB DDR4 RAM, four NVIDIA Tesla P100 16GB GPUs, EDR InfiniBand and 10GigE network cards<br />
<br />
* Networking<br />
**FDR and EDR InfiniBand Switch<br />
** 10Gigabit Ethernet Switch<br />
<br />
* Storage -- DDN SFA12KX with GPFS) <br />
** 2 x 12KX40D-56IB controllers<br />
** 10 x SS8460 disk enclosures<br />
** 825 x 4K SAS drives<br />
<br />
* Management <br />
** Management node and gigabit switch for cluster management<br />
** Bright Advanced Cluster Management software licenses<br />
<br />
== Cluster Software ==<br />
* BrightCM 7.2<br />
* CentOS 7.2 x86_64<br />
* [[Slurm]] 15.08<br />
<br />
== Queuing System ==<br />
All work on Cheaha must be submitted to '''our queuing system ([[Slurm]])'''. A common mistake made by new users is to run 'jobs' on the login node. This section gives a basic overview of what a queuing system is and why we use it.<br />
=== What is a queuing system? ===<br />
* Software that gives users fair allocation of the cluster's resources<br />
* Schedules jobs based using resource requests (the following are commonly requested resources, there are many more that are available)<br />
** Number of processors (often referred to as "slots")<br />
** Maximum memory (RAM) required per slot<br />
** Maximum run time<br />
* Common queuing systems:<br />
** '''[[Slurm]]'''<br />
** Sun Grid Engine (Also know as SGE, OGE, GE)<br />
** OpenPBS<br />
** Torque<br />
** LSF (load sharing facility)<br />
<br />
[http://slurm.schedmd.com/ Slurm] is a queue management system and stands for Simple Linux Utility for Resource Management. Slurm was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. '''[[Slurm]]''' is now the primary job manager on Cheaha, it replaces SUN Grid Engine ([[https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted_deprecated SGE]]) the job manager used earlier. Instructions of using SLURM and writing SLURM scripts for jobs submission on Cheaha can be found '''[[Slurm | here]]'''.<br />
<br />
=== Typical Workflow ===<br />
* Stage data to $USER_SCRATCH (your scratch directory)<br />
* Research how to run your code in "batch" mode. Batch mode typically means the ability to run it from the command line without requiring any interaction from the user.<br />
* Identify the appropriate resources needed to run the job. The following are mandatory resource requests for all jobs on Cheaha<br />
** Maximum memory (RAM) required per slot<br />
** Maximum runtime<br />
* Write a job script specifying queuing system parameters, resource requests and commands to run program<br />
* Submit script to queuing system (sbatch script.job)<br />
* Monitor job (squeue)<br />
* Review the results and resubmit as necessary<br />
* Clean up the scratch directory by moving or deleting the data off of the cluster<br />
<br />
=== Resource Requests ===<br />
Accurate resource requests are extremely important to the health of the over all cluster. In order for Cheaha to operate properly, the queing system must know how much runtime and RAM each job will need.<br />
<br />
==== Mandatory Resource Requests ====<br />
<br />
* -t, --time=<time><br />
Set a limit on the total run time of the job allocation. If the requested time limit exceeds the partition's time limit, the job will be left in a PENDING state (possibly indefinitely).<br />
* For Array jobs, this represents the maximum run time for each task<br />
** For serial or parallel jobs, this represents the maximum run time for the entire job<br />
<br />
* --mem-per-cpu=<MB><br />
Mimimum memory required per allocated CPU in MegaBytes.<br />
<br />
==== Other Common Resource Requests ====<br />
* -N, --nodes=<minnodes[-maxnodes]><br />
Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count.<br />
<br />
* -n, --ntasks=<number><br />
sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node<br />
<br />
* --mem=<MB><br />
Specify the real memory required per node in MegaBytes.<br />
<br />
* -c, --cpus-per-task=<ncpus><br />
Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task.<br />
<br />
* -p, --partition=<partition_names><br />
Request a specific partition for the resource allocation. Available partitions are: express(max 2 hrs), short(max 12 hrs), medium(max 50 hrs), long(max 150 hrs), sinteractive(0-2 hrs)<br />
<br />
=== Submitting Jobs ===<br />
Batch Jobs are submitted on Cheaha by using the "sbatch" command. The full manual for sbtach is available by running the following command<br />
man sbatch<br />
<br />
==== Job Script File Format ====<br />
To submit a job to the queuing systems, you will first define your job in a script (a text file) and then submit that script to the queuing system.<br />
<br />
The script file needs to be '''formatted as a UNIX file''', not a Windows or Mac text file. In geek speak, this means that the end of line (EOL) character should be a line feed (LF) rather than a carriage return line feed (CRLF) for Windows or carriage return (CR) for Mac.<br />
<br />
If you submit a job script formatted as a Windows or Mac text file, your job will likely fail with misleading messages, for example that the path specified does not exist.<br />
<br />
Windows '''Notepad''' does not have the ability to save files using the UNIX file format. Do NOT use Notepad to create files intended for use on the clusters. Instead use one of the alternative text editors listed in the following section.<br />
<br />
===== Converting Files to UNIX Format =====<br />
====== Dos2Unix Method ======<br />
The lines below that begin with $ are commands, the $ represents the command prompt and should not be typed!<br />
<br />
The dos2unix program can be used to convert Windows text files to UNIX files with a simple command. After you have copied the file to your home directory on the cluster, you can identify that the file is a Windows file by executing the following (Windows uses CR LF as the line terminator, where UNIX uses only LF and Mac uses only CR):<br />
<pre><br />
$ file testfile.txt<br />
<br />
testfile.txt: ASCII text, with CRLF line terminators<br />
</pre><br />
<br />
Now, convert the file to UNIX<br />
<pre><br />
$ dos2unix testfile.txt<br />
<br />
dos2unix: converting file testfile.txt to UNIX format ...<br />
</pre><br />
<br />
Verify the conversion using the file command<br />
<pre><br />
$ file testfile.txt<br />
<br />
testfile.txt: ASCII text<br />
</pre><br />
<br />
====== Alternative Windows Text Editors ======<br />
There are many good text editors available for Windows that have the capability to save files using the UNIX file format. Here are a few:<br />
* [[http://www.geany.org/ Geany]] is an excellent free text editor for Windows and Linux that supports Windows, UNIX and Mac file formats, syntax highlighting and many programming features. To convert from Windows to UNIX click '''Document''' click '''Set Line Endings''' and then '''Convert and Set to LF (Unix)'''<br />
* [[http://notepad-plus.sourceforge.net/uk/site.htm Notepad++]] is a great free Windows text editor that supports Windows, UNIX and Mac file formats, syntax highlighting and many programming features. To convert from Windows to UNIX click '''Format''' and then click '''Convert to UNIX Format'''<br />
* [[http://www.textpad.com/ TextPad]] is another excellent Windows text editor. TextPad is not free, however.<br />
<br />
==== Example Batch Job Script ====<br />
A shared cluster environment like Cheaha uses a job scheduler to run tasks on the cluster to provide optimal resource sharing among users. Cheaha uses a job scheduling system call Slurm to schedule and manage jobs. A user needs to tell Slurm about resource requirements (e.g. CPU, memory) so that it can schedule jobs effectively. These resource requirements along with actual application code can be specified in a single file commonly referred as 'Job Script/File'. Following is a simple job script that prints job number and hostname.<br />
<br />
'''Note:'''Jobs '''must request''' the appropriate partition (ex: ''--partition=short'') to satisfy the jobs resource request (maximum runtime, number of compute nodes, etc...)<br />
<pre><br />
#!/bin/bash<br />
#<br />
#SBATCH --job-name=test<br />
#SBATCH --output=res.txt<br />
#SBATCH --ntasks=1<br />
#SBATCH --partition=express<br />
#SBATCH --time=10:00<br />
#SBATCH --mem-per-cpu=100<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
srun hostname<br />
srun sleep 60<br />
</pre><br />
<br />
Lines starting with '#SBATCH' have a special meaning in the Slurm world. Slurm specific configuration options are specified after the '#SBATCH' characters. Above configuration options are useful for most job scripts and for additional configuration options refer to Slurm commands manual. A job script is submitted to the cluster using Slurm specific commands. There are many commands available, but following three commands are the most common:<br />
* sbatch - to submit job<br />
* scancel - to delete job<br />
* squeue - to view job status<br />
<br />
We can submit above job script using sbatch command:<br />
<pre><br />
$ sbatch HelloCheaha.sh<br />
Submitted batch job 52707<br />
</pre><br />
<br />
When the job script is submitted, Slurm queues it up and assigns it a job number (e.g. 52707 in above example). The job number is available inside job script using environment variable $JOB_ID. This variable can be used inside job script to create job related directory structure or file names.<br />
<br />
=== Interactive Resources ===<br />
Login Node (the host that you connected to when you setup the SSH connection to Cheaha) is supposed to be used for submitting jobs and/or lighter prep work required for the job scripts. '''Do not run heavy computations on the login node'''. If you have a heavier workload to prepare for a batch job (eg. compiling code or other manipulations of data) or your compute application requires interactive control, you should request a dedicated interactive node for this work.<br />
<br />
Interactive resources are requested by submitting an "interactive" job to the scheduler. Interactive jobs will provide you a command line on a compute resource that you can use just like you would the command line on the login node. The difference is that the scheduler has dedicated the requested resources to your job and you can run your interactive commands without having to worry about impacting other users on the login node.<br />
<br />
Interactive jobs, that can be run on command line, are requested with the '''srun''' command. <br />
<br />
<pre><br />
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash<br />
</pre><br />
<br />
This command requests for 4 cores (--cpus-per-task) for a single task (--ntasks) with each cpu requesting size 4GB of RAM (--mem-per-cpu) for 8 hrs (--time).<br />
<br />
More advanced interactive scenarios to support graphical applications are available using [https://docs.uabgrid.uab.edu/wiki/Setting_Up_VNC_Session VNC] or X11 tunneling [http://www.uab.edu/it/software X-Win32 2014 for Windows]<br />
<br />
Interactive jobs that requires running a graphical application, are requested with the '''sinteractive''' command, via '''Terminal''' on your VNC window.<br />
<pre><br />
sinteractive --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME<br />
</pre><br />
Please note, sinteractive starts your shell in a screen session. Screen is a terminal emulator that is designed to make it possible to detach and reattach a session. This feature can mostly be ignored. If you application uses `ctrl-a` as a special command sequence (e.g. Emacs), however, you may find the application doesn't receive this special character. When using screen, you need to type `ctrl-a a` (ctrl-a followed by a single "a" key press) to send a ctrl-a to your application. Screen uses ctrl-a as it's own command character, so this special sequence issues the command to screen to "send ctrl-a to my app". Learn more about [https://www.gnu.org/software/screen/manual/html_node/Overview.html#Overview screen from it's documentation].<br />
<br />
== Storage ==<br />
=== Privacy ===<br />
{{SensitiveInformation}}<br />
=== No Automatic Backups ===<br />
<br />
There is no automatic back up of any data on the cluster (home, scratch, or whatever). All data back up is managed by you. If you aren't managing a data back up process, then you have no backup data.<br />
<br />
=== Home directories ===<br />
<br />
Your home directory on Cheaha is NFS-mounted to the compute nodes as /home/$USER or $HOME. It is acceptable to use your home directory as a location to store job scripts, custom code, and libraries. You are responsible for keeping your home directory under 10GB in size!<br />
<br />
'''The home directory must not be used to store large amounts of data.''' Please use $USER_SCRATCH <br />
for actively used data sets and $USER_DATA for storage of non scratch data.<br />
<br />
=== Scratch ===<br />
Research Computing policy requires that all bulky input and output must be located on the scratch space. The home directory is intended to store your job scripts, log files, libraries and other supporting files.<br />
<br />
'''Important Information:'''<br />
* Scratch space (network and local) '''is not backed up'''.<br />
* Research Computing expects each user to keep their scratch areas clean. The cluster scratch area are not to be used for archiving data.<br />
<br />
Cheaha has two types of scratch space, network mounted and local.<br />
* Network scratch ($USER_SCRATCH) is available on the login node and each compute node. This storage is a GPFS high performance file system providing roughly 4.7PB of usable storage. This should be your jobs primary working directory, unless the job would benefit from local scratch (see below).<br />
* Local scratch is physically located on each compute node and is not accessible to the other nodes (including the login node). This space is useful if the job performs a lot of file I/O. Most of the jobs that run on our clusters do not fall into this category. Because the local scratch is inaccessible outside the job, it is important to note that you must move any data between local scratch to your network accessible scratch within your job. For example, step 1 in the job could be to copy the input from $USER_SCRATCH to ${USER_SCRATCH}, step 2 code execution, step 3 move the data back to $USER_SCRATCH.<br />
<br />
==== Network Scratch ====<br />
Network scratch is available using the environment variable $USER_SCRATCH or directly by /data/scratch/$USER<br />
<br />
It is advisable to use the environment variable whenever possible rather than the hard coded path.<br />
<br />
==== Local Scratch ====<br />
Each compute node has a local scratch directory that is accessible via the variable '''$LOCAL_SCRATCH'''. If your job performs a lot of file I/O, the job should use $LOCAL_SCRATCH rather than $USER_SCRATCH to prevent bogging down the network scratch file system. The amount of scratch space available is approximately 800GB.<br />
<br />
The $LOCAL_SCRATCH is a special temporary directory and it's important to note that this directory is deleted when the job completes, so the job script has to move the results to $USER_SCRATCH or other location prior to the job exiting.<br />
<br />
Note that $LOCAL_SCRATCH is only useful for jobs in which all processes run on the same compute node, so MPI jobs are not candidates for this solution.<br />
<br />
The following is an array job example that uses $LOCAL_SCRATCH by transferring the inputs into $LOCAL_SCRATCH at the beginning of the script and the result out of $LOCAL_SCRATCH at the end of the script.<br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --array=1-10<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=R_array_job<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=R_array_job.err<br />
#SBATCH --output=R_array_job.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes and the appropriate partition<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load R/3.2.0-goolf-1.7.20<br />
<br />
echo "TMPDIR: $LOCAL_SCRATCH"<br />
<br />
cd $LOCAL_SCRATCH<br />
# Create a working directory under the special scheduler local scratch directory<br />
# using the array job's taskID<br />
mdkir $SLURM_ARRAY_TASK_ID<br />
cd $SLURM_ARRAY_TASK_ID<br />
<br />
# Next copy the input data to the local scratch<br />
echo "Copying input data from network scratch to $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID - $(date)<br />
# The input data in this case has a numerical file extension that<br />
# matches $SLURM_ARRAY_TASK_ID<br />
cp -a $USER_SCRATCH/GeneData/INP*.$SLURM_ARRAY_TASK_ID ./<br />
echo "copied input data from network scratch to $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID - $(date)<br />
<br />
someapp -S 1 -D 10 -i INP*.$SLURM_ARRAY_TASK_ID -o geneapp.out.$SLURM_ARRAY_TASK_ID<br />
<br />
# Lastly copy the results back to network scratch<br />
echo "Copying results from local $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID to network - $(date)<br />
cp -a geneapp.out.$SLURM_ARRAY_TASK_ID $USER_SCRATCH/GeneData/<br />
echo "Copied results from local $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID to network - $(date)<br />
<br />
</pre><br />
<br />
=== Project Storage ===<br />
Cheaha has a location where shared data can be stored called $SHARE_PROJECT. As with user scratch, this area '''is not backed up'''!<br />
<br />
This is helpful if a team of researchers must access the same data. Please open a help desk ticket to request a project directory under $SHARE_PROJECT.<br />
<br />
=== Uploading Data ===<br />
{{SensitiveInformation}}<br />
Data can be moved onto the cluster (pushed) from a remote client (ie. you desktop) via SCP or SFTP. Data can also be downloaded to the cluster (pulled) by issuing transfer commands once you are logged into the cluster. Common transfer methods are `wget <URL>`, FTP, or SCP, and depend on how the data is made available from the data provider.<br />
<br />
Large data sets should be staged directly to your $USER_SCRATCH directory so as not to fill up $HOME. If you are working on a data set shared with multiple users, it's preferable to request space in $SHARE_PROJECT rather than duplicating the data for each user.<br />
<br />
== Environment Modules ==<br />
[http://modules.sourceforge.net/ Environment Modules] is installed on Cheaha and should be used when constructing your job scripts if an applicable module file exists. Using the module command you can easily configure your environment for specific software packages without having to know the specific environment variables and values to set. Modules allows you to dynamically configure your environment without having to logout / login for the changes to take affect.<br />
<br />
If you find that specific software does not have a module, please submit a [http://etlab.eng.uab.edu/ helpdesk ticket] to request the module.<br />
<br />
* Cheaha supports bash completion for the module command. For example, type 'module' and press the TAB key twice to see a list of options:<br />
<pre><br />
module TAB TAB<br />
<br />
add display initlist keyword refresh switch use <br />
apropos help initprepend list rm unload whatis <br />
avail initadd initrm load show unuse <br />
clear initclear initswitch purge swap update<br />
</pre><br />
<br />
* To see the list of available modulefiles on the cluster, run the '''module avail''' command (note the example list below may not be complete!) or '''module load ''' followed by two tab key presses:<br />
<pre><br />
module avail<br />
<br />
----------------------------------------------------------------------------------------- /cm/shared/modulefiles -----------------------------------------------------------------------------------------<br />
acml/gcc/64/5.3.1 acml/open64-int64/mp/fma4/5.3.1 fftw2/openmpi/gcc/64/float/2.1.5 intel-cluster-runtime/ia32/3.8 netcdf/gcc/64/4.3.3.1<br />
acml/gcc/fma4/5.3.1 blacs/openmpi/gcc/64/1.1patch03 fftw2/openmpi/open64/64/double/2.1.5 intel-cluster-runtime/intel64/3.8 netcdf/open64/64/4.3.3.1<br />
acml/gcc/mp/64/5.3.1 blacs/openmpi/open64/64/1.1patch03 fftw2/openmpi/open64/64/float/2.1.5 intel-cluster-runtime/mic/3.8 netperf/2.7.0<br />
acml/gcc/mp/fma4/5.3.1 blas/gcc/64/3.6.0 fftw3/openmpi/gcc/64/3.3.4 intel-tbb-oss/ia32/44_20160526oss open64/4.5.2.1<br />
acml/gcc-int64/64/5.3.1 blas/open64/64/3.6.0 fftw3/openmpi/open64/64/3.3.4 intel-tbb-oss/intel64/44_20160526oss openblas/dynamic/0.2.15<br />
acml/gcc-int64/fma4/5.3.1 bonnie++/1.97.1 gdb/7.9 iozone/3_434 openmpi/gcc/64/1.10.1<br />
acml/gcc-int64/mp/64/5.3.1 cmgui/7.2 globalarrays/openmpi/gcc/64/5.4 lapack/gcc/64/3.6.0 openmpi/open64/64/1.10.1<br />
acml/gcc-int64/mp/fma4/5.3.1 cuda75/blas/7.5.18 globalarrays/openmpi/open64/64/5.4 lapack/open64/64/3.6.0 pbspro/13.0.2.153173<br />
acml/open64/64/5.3.1 cuda75/fft/7.5.18 hdf5/1.6.10 mpich/ge/gcc/64/3.2 puppet/3.8.4<br />
acml/open64/fma4/5.3.1 cuda75/gdk/352.79 hdf5_18/1.8.16 mpich/ge/open64/64/3.2 rc-base<br />
acml/open64/mp/64/5.3.1 cuda75/nsight/7.5.18 hpl/2.1 mpiexec/0.84_432 scalapack/mvapich2/gcc/64/2.0.2<br />
acml/open64/mp/fma4/5.3.1 cuda75/profiler/7.5.18 hwloc/1.10.1 mvapich/gcc/64/1.2rc1 scalapack/openmpi/gcc/64/2.0.2<br />
acml/open64-int64/64/5.3.1 cuda75/toolkit/7.5.18 intel/compiler/32/15.0/2015.5.223 mvapich/open64/64/1.2rc1 sge/2011.11p1<br />
acml/open64-int64/fma4/5.3.1 default-environment intel/compiler/64/15.0/2015.5.223 mvapich2/gcc/64/2.2b slurm/15.08.6<br />
acml/open64-int64/mp/64/5.3.1 fftw2/openmpi/gcc/64/double/2.1.5 intel-cluster-checker/2.2.2 mvapich2/open64/64/2.2b torque/6.0.0.1<br />
<br />
---------------------------------------------------------------------------------------- /share/apps/modulefiles -----------------------------------------------------------------------------------------<br />
rc/BrainSuite/15b rc/freesurfer/freesurfer-5.3.0 rc/intel/compiler/64/ps_2016/2016.0.047 rc/matlab/R2015a rc/SAS/v9.4<br />
rc/cmg/2012.116.G rc/gromacs-intel/5.1.1 rc/Mathematica/10.3 rc/matlab/R2015b<br />
rc/dsistudio/dsistudio-20151020 rc/gtool/0.7.5 rc/matlab/R2012a rc/MRIConvert/2.0.8<br />
<br />
--------------------------------------------------------------------------------------- /share/apps/rc/modules/all ---------------------------------------------------------------------------------------<br />
AFNI/linux_openmp_64-goolf-1.7.20-20160616 gperf/3.0.4-intel-2016a MVAPICH2/2.2b-GCC-4.9.3-2.25<br />
Amber/14-intel-2016a-AmberTools-15-patchlevel-13-13 grep/2.15-goolf-1.4.10 NASM/2.11.06-goolf-1.7.20<br />
annovar/2016Feb01-foss-2015b-Perl-5.22.1 GROMACS/5.0.5-intel-2015b-hybrid NASM/2.11.08-foss-2015b<br />
ant/1.9.6-Java-1.7.0_80 GSL/1.16-goolf-1.7.20 NASM/2.11.08-intel-2016a<br />
APBS/1.4-linux-static-x86_64 GSL/1.16-intel-2015b NASM/2.12.02-foss-2016a<br />
ASHS/rev103_20140612 GSL/2.1-foss-2015b NASM/2.12.02-intel-2015b<br />
Aspera-Connect/3.6.1 gtool/0.7.5_linux_x86_64 NASM/2.12.02-intel-2016a<br />
ATLAS/3.10.1-gompi-1.5.12-LAPACK-3.4.2 guile/1.8.8-GNU-4.9.3-2.25 ncurses/5.9-foss-2015b<br />
Autoconf/2.69-foss-2016a HAPGEN2/2.2.0 ncurses/5.9-GCC-4.8.4<br />
Autoconf/2.69-GCC-4.8.4 HarfBuzz/1.2.7-intel-2016a ncurses/5.9-GNU-4.9.3-2.25<br />
Autoconf/2.69-GNU-4.9.3-2.25 HDF5/1.8.15-patch1-intel-2015b ncurses/5.9-goolf-1.4.10<br />
. <br />
.<br />
.<br />
.<br />
</pre><br />
<br />
Some software packages have multiple module files, for example:<br />
* GCC/4.7.2 <br />
* GCC/4.8.1 <br />
* GCC/4.8.2 <br />
* GCC/4.8.4 <br />
* GCC/4.9.2 <br />
* GCC/4.9.3 <br />
* GCC/4.9.3-2.25 <br />
<br />
In this case, the GCC module will always load the latest version, so loading this module is equivalent to loading GCC/4.9.3-2.25. If you always want to use the latest version, use this approach. If you want use a specific version, use the module file containing the appropriate version number.<br />
<br />
Some modules, when loaded, will actually load other modules. For example, the ''GROMACS/5.0.5-intel-2015b-hybrid '' module will also load ''intel/2015b'' and other related tools.<br />
<br />
* To load a module, ex: for a GROMACS job, use the following '''module load''' command in your job script:<br />
<pre><br />
module load GROMACS/5.0.5-intel-2015b-hybrid <br />
</pre><br />
<br />
* To see a list of the modules that you currently have loaded use the '''module list''' command<br />
<pre><br />
module list<br />
<br />
Currently Loaded Modulefiles:<br />
1) slurm/15.08.6 9) impi/5.0.3.048-iccifort-2015.3.187-GNU-4.9.3-2.25 17) Tcl/8.6.3-intel-2015b<br />
2) rc-base 10) iimpi/7.3.5-GNU-4.9.3-2.25 18) SQLite/3.8.8.1-intel-2015b<br />
3) GCC/4.9.3-binutils-2.25 11) imkl/11.2.3.187-iimpi-7.3.5-GNU-4.9.3-2.25 19) Tk/8.6.3-intel-2015b-no-X11<br />
4) binutils/2.25-GCC-4.9.3-binutils-2.25 12) intel/2015b 20) Python/2.7.9-intel-2015b<br />
5) GNU/4.9.3-2.25 13) bzip2/1.0.6-intel-2015b 21) Boost/1.58.0-intel-2015b-Python-2.7.9<br />
6) icc/2015.3.187-GNU-4.9.3-2.25 14) zlib/1.2.8-intel-2015b 22) GROMACS/5.0.5-intel-2015b-hybrid<br />
7) ifort/2015.3.187-GNU-4.9.3-2.25 15) ncurses/5.9-intel-2015b<br />
8) iccifort/2015.3.187-GNU-4.9.3-2.25 16) libreadline/6.3-intel-2015b<br />
</pre><br />
<br />
* A module can be removed from your environment by using the '''module unload''' command:<br />
<pre><br />
module unload GROMACS/5.0.5-intel-2015b-hybrid<br />
</pre><br />
<br />
* The definition of a module can also be viewed using the '''module show''' command, revealing what a specific module will do to your environment:<br />
<pre><br />
module show GROMACS/5.0.5-intel-2015b-hybrid <br />
-------------------------------------------------------------------<br />
/share/apps/rc/modules/all/GROMACS/5.0.5-intel-2015b-hybrid:<br />
<br />
module-whatis GROMACS is a versatile package to perform molecular dynamics,<br />
i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. - Homepage: http://www.gromacs.org <br />
conflict GROMACS <br />
prepend-path CPATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/include <br />
prepend-path LD_LIBRARY_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64 <br />
prepend-path LIBRARY_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64 <br />
prepend-path MANPATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/share/man <br />
prepend-path PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/bin <br />
prepend-path PKG_CONFIG_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64/pkgconfig <br />
setenv EBROOTGROMACS /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid <br />
setenv EBVERSIONGROMACS 5.0.5 <br />
setenv EBDEVELGROMACS /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/easybuild/GROMACS-5.0.5-intel-2015b-hybrid-easybuild-devel <br />
-------------------------------------------------------------------<br />
</pre><br />
<br />
=== Error Using Modules from a Job Script ===<br />
<br />
If you are using modules and the command your job executes runs fine from the command line but fails when you run it from the job, you may be having an issue with the script initialization. If you see this error in your job error output file<br />
<pre><br />
-bash: module: line 1: syntax error: unexpected end of file<br />
-bash: error importing function definition for `BASH_FUNC_module'<br />
</pre><br />
Add the command `unset module` before calling your module files. The -V job argument will cause a conflict with the module function used in your script.<br />
<br />
== Sample Job Scripts ==<br />
The following are sample job scripts, please be careful to edit these for your environment (i.e. replace <font color="red">YOUR_EMAIL_ADDRESS</font> with your real email address), set the h_rt to an appropriate runtime limit and modify the job name and any other parameters.<br />
<br />
'''Hello World''' is the classic example used throughout programming. We don't want to buck the system, so we'll use it as well to demonstrate jobs submission with one minor variation: our hello world will send us a greeting using the name of whatever machine it runs on. For example, when run on the Cheaha login node, it would print "Hello from login001".<br />
<br />
=== Hello World (serial) ===<br />
<br />
A serial job is one that can run independently of other commands, ie. it doesn't depend on the data from other jobs running simultaneously. You can run many serial jobs in any order. This is a common solution to processing lots of data when each command works on a single piece of data. For example, running the same conversion on 100s of images.<br />
<br />
Here we show how to create job script for one simple command. Running more than one command just requires submitting more jobs.<br />
<br />
* Create your hello world application. Run this command to create a script, turn it into to a command, and run the command (just copy and past the following on to the command line).<br />
1. Create the file:<br />
<pre><br />
$ vim helloworld.sh<br />
</pre><br />
<br />
2. Write into "helloworld.sh" file (To write in vim editor: press '''shift + I''' )<br />
<pre><br />
#!/bin/bash<br />
echo Hello from `hostname`<br />
</pre><br />
<br />
3. Save the file by pressing the '''esc''' key, type the following<br />
<pre><br />
:wq<br />
</pre><br />
<br />
4. Need to give permission the "helloworld.sh" file<br />
<pre><br />
$ chmod +x helloworld.sh<br />
</pre><br />
<br />
* Create the Slurm job script that will request 256 MB RAM and a maximum runtime of 10 minutes.<br />
1. Create the JOB file:<br />
<pre><br />
$ vim helloworld.job<br />
</pre><br />
<br />
2. Write into "helloworld.job" file (To write in vim editor: press '''shift + I''' )<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=helloworld<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=helloworld.err<br />
#SBATCH --output=helloworld.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=$USER@uab.edu<br />
<br />
./helloworld.sh<br />
</pre><br />
<br />
3. Save the file by pressing the '''esc''' key, type the following<br />
<pre><br />
:wq<br />
</pre><br />
<br />
* Submit the job to Slurm scheduler and check the status using squeue<br />
<pre><br />
$ sbatch helloworld.job<br />
Submitted batch job 52888<br />
</pre><br />
* When the job completes, you should have output files named helloworld.out and helloworld.err <br />
<pre><br />
$ cat helloworld.out <br />
Hello from c0003<br />
</pre><br />
<br />
=== Hello World (parallel with MPI) ===<br />
<br />
MPI is used to coordinate the activity of many computations occurring in parallel. It is commonly used in simulation software for molecular dynamics, fluid dynamics, and similar domains where there is significant communication (data) exchanged between cooperating process.<br />
<br />
Here is a simple parallel Slurm job script for running commands the rely on MPI. This example also includes the example of compiling the code and submitting the job script to the Slurm scheduler.<br />
<br />
* First, create a directory for the Hello World jobs<br />
<pre><br />
$ mkdir -p ~/jobs/helloworld<br />
$ cd ~/jobs/helloworld<br />
</pre><br />
* Create the Hello World code written in C (this example of MPI enabled Hello World includes a 3 minute sleep to ensure the job runs for several minutes, a normal hello world example would run in a matter of seconds).<br />
<pre><br />
$ vi helloworld-mpi.c<br />
</pre><br />
<pre><br />
#include <stdio.h><br />
#include <mpi.h><br />
<br />
main(int argc, char **argv)<br />
{<br />
int rank, size;<br />
<br />
int i, j;<br />
float f;<br />
<br />
MPI_Init(&argc,&argv);<br />
MPI_Comm_rank(MPI_COMM_WORLD, &rank);<br />
MPI_Comm_size(MPI_COMM_WORLD, &size);<br />
<br />
printf("Hello World from process %d of %d.\n", rank, size);<br />
sleep(180);<br />
for (j=0; j<=100000; j++)<br />
for(i=0; i<=100000; i++)<br />
f=i*2.718281828*i+i+i*3.141592654;<br />
<br />
MPI_Finalize();<br />
}<br />
</pre><br />
* Compile the code, first purging any modules you may have loaded followed by loading the module for OpenMPI GNU. The mpicc command will compile the code and produce a binary named helloworld_gnu_openmpi<br />
<pre><br />
$ module purge<br />
$ module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
<br />
$ mpicc helloworld-mpi.c -o helloworld_gnu_openmpi<br />
</pre><br />
* Create the Slurm job script that will request 8 cpu slots and a maximum runtime of 10 minutes<br />
<pre><br />
$ vi helloworld.job<br />
</pre><br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=helloworld_mpi<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=helloworld_mpi.err<br />
#SBATCH --output=helloworld_mpi.out<br />
#SBATCH --ntasks=8<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:01:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
mpirun -np $SLURM_NTASKS helloworld_gnu_openmpi<br />
</pre><br />
* Submit the job to Slurm scheduler and check the status using squeue -u $USER<br />
<pre><br />
$ sbatch helloworld.job<br />
<br />
Submitted batch job 52893<br />
<br />
$ squeue -u BLAZERID<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
52893 express hellowor BLAZERID R 2:07 2 c[0005-0006]<br />
<br />
</pre><br />
* When the job completes, you should have output files named helloworld_mpi.out and helloworld_mpi.err<br />
<pre><br />
$ cat helloworld_mpi.out<br />
<br />
Hello World from process 1 of 8.<br />
Hello World from process 3 of 8.<br />
Hello World from process 4 of 8.<br />
Hello World from process 7 of 8.<br />
Hello World from process 5 of 8.<br />
Hello World from process 6 of 8.<br />
Hello World from process 0 of 8.<br />
Hello World from process 2 of 8.<br />
</pre><br />
<br />
=== Hello World (serial) -- revisited ===<br />
<br />
The job submit scripts (sbatch scripts) are actually bash shell scripts in their own right. The reason for using the funky #SBATCH prefix in the scripts is so that bash interprets any such line as a comment and won't execute it. Because the # character starts a comment in bash, we can weave the Slurm scheduler directives (the #SBATCH lines) into standard bash scripts. This lets us build scripts that we can execute locally and then easily run the same script to on a cluster node by calling it with sbatch. This can be used to our advantage to create a more fluid experience in moving between development and production job runs. <br />
<br />
The following example is a simple variation on the serial job above. All we will do is convert our Slurm job script into a command called helloworld that calls the helloworld.sh command.<br />
<br />
If the first line of a file is #!/bin/bash and that file is executable, the shell will automatically run the command as if were any other system command, eg. ls. That is, the ".sh" extension on our HelloWorld.sh script is completely optional and is only meaningful to the user.<br />
<br />
Copy the serial helloworld.job script to a new file, add a the special #!/bin/bash as the first line, and make it executable with the following command (note: those are single quotes in the echo command): <br />
<pre><br />
echo '#!/bin/bash' | cat helloworld.job > helloworld ; chmod +x helloworld<br />
</pre><br />
<br />
Our sbatch script has now become a regular command. We can now execute the command with the simple prefix "./helloworld", which means "execute this file in the current directory":<br />
<pre><br />
./helloworld<br />
Hello from login001<br />
</pre><br />
Or if we want to run the command on a compute node, replace the "./" prefix with "sbatch ":<br />
<pre><br />
$ sbatch helloworld<br />
Submitted batch job 53001<br />
</pre><br />
And when the cluster run is complete you can look at the content of the output:<br />
<pre><br />
$ $ cat helloworld.out <br />
Hello from c0003<br />
</pre><br />
<br />
You can use this approach of treating you sbatch files as command wrappers to build a collection of commands that can be executed locally or via sbatch. The other examples can be restructured similarly.<br />
<br />
To avoid having to use the "./" prefix, just add the current directory to your PATH. Also, if you plan to do heavy development using this feature on the cluster, please be sure to run [https://docs.uabgrid.uab.edu/wiki/Slurm#Interactive_Session sinteractive] first so you don't load the login node with your development work.<br />
<br />
=== Gromacs ===<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --partition=short<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=test_gromacs<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=test_gromacs.err<br />
#SBATCH --output=test_gromacs.out<br />
#SBATCH --ntasks=8<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=10:00:00<br />
#SBATCH --mem-per-cpu=2048<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
<br />
module load GROMACS/5.0.5-intel-2015b-hybrid <br />
<br />
# Change directory to the job working directory if not already there<br />
cd ${USER_SCRATCH}/jobs/gromacs<br />
<br />
# Single precision<br />
MDRUN=mdrun_mpi<br />
<br />
# Enter your tpr file over here<br />
export MYFILE=example.tpr<br />
<br />
mpirun -np SLURM_NTASKS $MDRUN -v -s $MYFILE -o $MYFILE -c $MYFILE -x $MYFILE -e $MYFILE -g ${MYFILE}.log<br />
<br />
</pre><br />
<br />
=== R ===<br />
<br />
The following is an example job script that will use an array of 10 tasks (--array=1-10), each task has a max runtime of 2 hours and will use no more than 256 MB of RAM per task.<br />
<br />
Create a working directory and the job submission script<br />
<pre><br />
$ mkdir -p ~/jobs/ArrayExample<br />
$ cd ~/jobs/ArrayExample<br />
$ vi R-example-array.job<br />
</pre><br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --array=1-10<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=R_array_job<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=R_array_job.err<br />
#SBATCH --output=R_array_job.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load R/3.2.0-goolf-1.7.20 <br />
cd ~/jobs/ArrayExample/rep$SLURM_ARRAY_TASK_ID<br />
srun R CMD BATCH rscript.R<br />
</pre><br />
<br />
Submit the job to the Slurm scheduler and check the status of the job using the squeue command<br />
<pre><br />
$ sbatch R-example-array.job<br />
$ squeue -u $USER<br />
</pre><br />
<br />
== Installed Software ==<br />
<br />
A partial list of installed software with additional instructions for their use is available on the [[Cheaha Software]] page.</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Collaborator_Account&diff=5910Collaborator Account2019-04-12T17:07:26Z<p>Mhanby@uab.edu: /* Collaborator */</p>
<hr />
<div>This Page lays down the step for you as a UAB employee to request for a cheaha account for your collaborator.<br />
<br />
==Create XIAS Account==<br />
XIAS Accounts, or external access account. allows UAB employees to sponsor external collaborators to utilize some of the UAB resources for which the user has been granted access. Creating XIAS account is a self-service interface which allows you to sponsor and create an account for your collaborator at [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website].<br />
<br />
For additional information, see the [https://apps.idm.uab.edu/xias/top XIAS help page].<br />
<br />
'''When you go through the sponsorship process you are stating that you know the individual(s) and are responsible for their actions while they are using the XIAS accounts.'''<br />
<br />
===Create a site===<br />
When you go to the [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website], you'll notice two options on the left-hand panel: [https://idm.uab.edu/cgi-cas/xrmi/sites Manage Projects/Sites] and [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.]<br />
<br />
* Choose Manage Projects/sites.<br />
* Over there click on '''New''' to create a new site.<br />
* Fill in all the information i.e. Short Description, Long Description, Start date and End date. <br />
** Remember that your users cannot have '''End date''' beyond your sites '''End date'''.<br />
** Note that the start and end dates should be in the format '''YYYY-MM-DD'''<br />
** URIs are the resources that the sponsored users should have access to. If the resources are applications or servers then the manager of that resource must do what is necessary within that resource to authorize the external users to gain access.<br />
* In the URIs section: fill out '''VPN.DPO.UAB.EDU''' and '''cheaha.rc.uab.edu''' <br />
* Click on '''Add''' button to create the site.<br />
<br />
=== Create a user===<br />
<br />
* Now choose [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.] from the left hand panel.<br />
* In the drop-down select your XIAS site.<br />
* To add new users click the '''Register''' button. To review the users already there and change their end date click the '''List''' button.<br />
* To register new user(s) enter an end date for that user’s access. <br />
** The date must be on or before the end date for the site and in the format YYYY-MM-DD<br />
* Enter the email addresses of the user(s) (your collaborator's email) in the box under the end date. You can add multiple users by putting each on a separate line.<br />
<br />
=== Collaborator===<br />
Once you have gone through the above steps, your collaborator should receive an automated email from XIAS with a code that they can use to complete their registration.<br />
<br />
'''NOTE:''' It can take up to 4 hours for new account create completion and the email notification to be sent<br />
<br />
==Request an account on cheaha==<br />
Once you have completed the steps of adding/sponsoring XIAS account for your collaborator, send us an email on support@listserv.uab.edu with information about the collaborator. Please don't forget to include their PrimaryID and email address which you used to create their XIAS account, as it would become their Username on [[Cheaha]].</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Collaborator_Account&diff=5909Collaborator Account2019-04-12T17:07:11Z<p>Mhanby@uab.edu: /* Create a user */</p>
<hr />
<div>This Page lays down the step for you as a UAB employee to request for a cheaha account for your collaborator.<br />
<br />
==Create XIAS Account==<br />
XIAS Accounts, or external access account. allows UAB employees to sponsor external collaborators to utilize some of the UAB resources for which the user has been granted access. Creating XIAS account is a self-service interface which allows you to sponsor and create an account for your collaborator at [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website].<br />
<br />
For additional information, see the [https://apps.idm.uab.edu/xias/top XIAS help page].<br />
<br />
'''When you go through the sponsorship process you are stating that you know the individual(s) and are responsible for their actions while they are using the XIAS accounts.'''<br />
<br />
===Create a site===<br />
When you go to the [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website], you'll notice two options on the left-hand panel: [https://idm.uab.edu/cgi-cas/xrmi/sites Manage Projects/Sites] and [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.]<br />
<br />
* Choose Manage Projects/sites.<br />
* Over there click on '''New''' to create a new site.<br />
* Fill in all the information i.e. Short Description, Long Description, Start date and End date. <br />
** Remember that your users cannot have '''End date''' beyond your sites '''End date'''.<br />
** Note that the start and end dates should be in the format '''YYYY-MM-DD'''<br />
** URIs are the resources that the sponsored users should have access to. If the resources are applications or servers then the manager of that resource must do what is necessary within that resource to authorize the external users to gain access.<br />
* In the URIs section: fill out '''VPN.DPO.UAB.EDU''' and '''cheaha.rc.uab.edu''' <br />
* Click on '''Add''' button to create the site.<br />
<br />
=== Create a user===<br />
<br />
* Now choose [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.] from the left hand panel.<br />
* In the drop-down select your XIAS site.<br />
* To add new users click the '''Register''' button. To review the users already there and change their end date click the '''List''' button.<br />
* To register new user(s) enter an end date for that user’s access. <br />
** The date must be on or before the end date for the site and in the format YYYY-MM-DD<br />
* Enter the email addresses of the user(s) (your collaborator's email) in the box under the end date. You can add multiple users by putting each on a separate line.<br />
<br />
=== Collaborator===<br />
Once you have gone through the above steps, your collaborator should receive an automated email from XIAS with a code that they can use to complete their registration.<br />
<br />
==Request an account on cheaha==<br />
Once you have completed the steps of adding/sponsoring XIAS account for your collaborator, send us an email on support@listserv.uab.edu with information about the collaborator. Please don't forget to include their PrimaryID and email address which you used to create their XIAS account, as it would become their Username on [[Cheaha]].</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Collaborator_Account&diff=5908Collaborator Account2019-04-12T17:06:30Z<p>Mhanby@uab.edu: /* Create a user */</p>
<hr />
<div>This Page lays down the step for you as a UAB employee to request for a cheaha account for your collaborator.<br />
<br />
==Create XIAS Account==<br />
XIAS Accounts, or external access account. allows UAB employees to sponsor external collaborators to utilize some of the UAB resources for which the user has been granted access. Creating XIAS account is a self-service interface which allows you to sponsor and create an account for your collaborator at [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website].<br />
<br />
For additional information, see the [https://apps.idm.uab.edu/xias/top XIAS help page].<br />
<br />
'''When you go through the sponsorship process you are stating that you know the individual(s) and are responsible for their actions while they are using the XIAS accounts.'''<br />
<br />
===Create a site===<br />
When you go to the [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website], you'll notice two options on the left-hand panel: [https://idm.uab.edu/cgi-cas/xrmi/sites Manage Projects/Sites] and [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.]<br />
<br />
* Choose Manage Projects/sites.<br />
* Over there click on '''New''' to create a new site.<br />
* Fill in all the information i.e. Short Description, Long Description, Start date and End date. <br />
** Remember that your users cannot have '''End date''' beyond your sites '''End date'''.<br />
** Note that the start and end dates should be in the format '''YYYY-MM-DD'''<br />
** URIs are the resources that the sponsored users should have access to. If the resources are applications or servers then the manager of that resource must do what is necessary within that resource to authorize the external users to gain access.<br />
* In the URIs section: fill out '''VPN.DPO.UAB.EDU''' and '''cheaha.rc.uab.edu''' <br />
* Click on '''Add''' button to create the site.<br />
<br />
=== Create a user===<br />
<br />
* Now choose [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.] from the left hand panel.<br />
* In the drop-down select your XIAS site.<br />
* To add new users click the '''Register''' button. To review the users already there and change their end date click the '''List''' button.<br />
* To register new user(s) enter an end date for that user’s access. <br />
** The date must be on or before the end date for the site and in the format YYYY-MM-DD<br />
* Enter the email addresses of the user(s) (your collaborator's email) in the box under the end date. You can add multiple users by putting each on a separate line.<br />
<br />
'''Note:''' It can take up to 4 hours for new account create completion and the email notification to be sent<br />
<br />
=== Collaborator===<br />
Once you have gone through the above steps, your collaborator should receive an automated email from XIAS with a code that they can use to complete their registration.<br />
<br />
==Request an account on cheaha==<br />
Once you have completed the steps of adding/sponsoring XIAS account for your collaborator, send us an email on support@listserv.uab.edu with information about the collaborator. Please don't forget to include their PrimaryID and email address which you used to create their XIAS account, as it would become their Username on [[Cheaha]].</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Collaborator_Account&diff=5907Collaborator Account2019-04-12T14:40:42Z<p>Mhanby@uab.edu: /* Create XIAS Account */</p>
<hr />
<div>This Page lays down the step for you as a UAB employee to request for a cheaha account for your collaborator.<br />
<br />
==Create XIAS Account==<br />
XIAS Accounts, or external access account. allows UAB employees to sponsor external collaborators to utilize some of the UAB resources for which the user has been granted access. Creating XIAS account is a self-service interface which allows you to sponsor and create an account for your collaborator at [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website].<br />
<br />
For additional information, see the [https://apps.idm.uab.edu/xias/top XIAS help page].<br />
<br />
'''When you go through the sponsorship process you are stating that you know the individual(s) and are responsible for their actions while they are using the XIAS accounts.'''<br />
<br />
===Create a site===<br />
When you go to the [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website], you'll notice two options on the left-hand panel: [https://idm.uab.edu/cgi-cas/xrmi/sites Manage Projects/Sites] and [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.]<br />
<br />
* Choose Manage Projects/sites.<br />
* Over there click on '''New''' to create a new site.<br />
* Fill in all the information i.e. Short Description, Long Description, Start date and End date. <br />
** Remember that your users cannot have '''End date''' beyond your sites '''End date'''.<br />
** Note that the start and end dates should be in the format '''YYYY-MM-DD'''<br />
** URIs are the resources that the sponsored users should have access to. If the resources are applications or servers then the manager of that resource must do what is necessary within that resource to authorize the external users to gain access.<br />
* In the URIs section: fill out '''VPN.DPO.UAB.EDU''' and '''cheaha.rc.uab.edu''' <br />
* Click on '''Add''' button to create the site.<br />
<br />
=== Create a user===<br />
<br />
* Now choose [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.] from the left hand panel.<br />
* In the drop-down select your XIAS site.<br />
* To add new users click the '''Register''' button. To review the users already there and change their end date click the '''List''' button.<br />
* To register new user(s) enter an end date for that user’s access. <br />
** The date must be on or before the end date for the site and in the format YYYY-MM-DD<br />
* Enter the email addresses of the user(s) (your collaborator's email) in the box under the end date. You can add multiple users by putting each on a separate line.<br />
<br />
=== Collaborator===<br />
Once you have gone through the above steps, your collaborator should receive an automated email from XIAS with a code that they can use to complete their registration.<br />
<br />
==Request an account on cheaha==<br />
Once you have completed the steps of adding/sponsoring XIAS account for your collaborator, send us an email on support@listserv.uab.edu with information about the collaborator. Please don't forget to include their PrimaryID and email address which you used to create their XIAS account, as it would become their Username on [[Cheaha]].</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Collaborator_Account&diff=5906Collaborator Account2019-04-12T14:34:53Z<p>Mhanby@uab.edu: /* Collaborator */</p>
<hr />
<div>This Page lays down the step for you as a UAB employee to request for a cheaha account for your collaborator.<br />
<br />
==Create XIAS Account==<br />
XIAS Accounts, or external access account. allows UAB employees to sponsor external collaborators to utilize some of the UAB resources for which the user has been granted access. Creating XIAS account is a self-service interface which allows you to sponsor and create an account for your collaborator at [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website].<br />
<br />
'''When you go through the sponsorship process you are stating that you know the individual(s) and are responsible for their actions while they are using the XIAS accounts.'''<br />
<br />
===Create a site===<br />
When you go to the [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website], you'll notice two options on the left-hand panel: [https://idm.uab.edu/cgi-cas/xrmi/sites Manage Projects/Sites] and [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.]<br />
<br />
* Choose Manage Projects/sites.<br />
* Over there click on '''New''' to create a new site.<br />
* Fill in all the information i.e. Short Description, Long Description, Start date and End date. <br />
** Remember that your users cannot have '''End date''' beyond your sites '''End date'''.<br />
** Note that the start and end dates should be in the format '''YYYY-MM-DD'''<br />
** URIs are the resources that the sponsored users should have access to. If the resources are applications or servers then the manager of that resource must do what is necessary within that resource to authorize the external users to gain access.<br />
* In the URIs section: fill out '''VPN.DPO.UAB.EDU''' and '''cheaha.rc.uab.edu''' <br />
* Click on '''Add''' button to create the site.<br />
<br />
=== Create a user===<br />
<br />
* Now choose [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.] from the left hand panel.<br />
* In the drop-down select your XIAS site.<br />
* To add new users click the '''Register''' button. To review the users already there and change their end date click the '''List''' button.<br />
* To register new user(s) enter an end date for that user’s access. <br />
** The date must be on or before the end date for the site and in the format YYYY-MM-DD<br />
* Enter the email addresses of the user(s) (your collaborator's email) in the box under the end date. You can add multiple users by putting each on a separate line.<br />
<br />
=== Collaborator===<br />
Once you have gone through the above steps, your collaborator should receive an automated email from XIAS with a code that they can use to complete their registration.<br />
<br />
==Request an account on cheaha==<br />
Once you have completed the steps of adding/sponsoring XIAS account for your collaborator, send us an email on support@listserv.uab.edu with information about the collaborator. Please don't forget to include their PrimaryID and email address which you used to create their XIAS account, as it would become their Username on [[Cheaha]].</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Collaborator_Account&diff=5905Collaborator Account2019-04-12T14:33:35Z<p>Mhanby@uab.edu: /* Create a site */</p>
<hr />
<div>This Page lays down the step for you as a UAB employee to request for a cheaha account for your collaborator.<br />
<br />
==Create XIAS Account==<br />
XIAS Accounts, or external access account. allows UAB employees to sponsor external collaborators to utilize some of the UAB resources for which the user has been granted access. Creating XIAS account is a self-service interface which allows you to sponsor and create an account for your collaborator at [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website].<br />
<br />
'''When you go through the sponsorship process you are stating that you know the individual(s) and are responsible for their actions while they are using the XIAS accounts.'''<br />
<br />
===Create a site===<br />
When you go to the [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website], you'll notice two options on the left-hand panel: [https://idm.uab.edu/cgi-cas/xrmi/sites Manage Projects/Sites] and [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.]<br />
<br />
* Choose Manage Projects/sites.<br />
* Over there click on '''New''' to create a new site.<br />
* Fill in all the information i.e. Short Description, Long Description, Start date and End date. <br />
** Remember that your users cannot have '''End date''' beyond your sites '''End date'''.<br />
** Note that the start and end dates should be in the format '''YYYY-MM-DD'''<br />
** URIs are the resources that the sponsored users should have access to. If the resources are applications or servers then the manager of that resource must do what is necessary within that resource to authorize the external users to gain access.<br />
* In the URIs section: fill out '''VPN.DPO.UAB.EDU''' and '''cheaha.rc.uab.edu''' <br />
* Click on '''Add''' button to create the site.<br />
<br />
=== Create a user===<br />
<br />
* Now choose [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.] from the left hand panel.<br />
* In the drop-down select your XIAS site.<br />
* To add new users click the '''Register''' button. To review the users already there and change their end date click the '''List''' button.<br />
* To register new user(s) enter an end date for that user’s access. <br />
** The date must be on or before the end date for the site and in the format YYYY-MM-DD<br />
* Enter the email addresses of the user(s) (your collaborator's email) in the box under the end date. You can add multiple users by putting each on a separate line.<br />
<br />
=== Collaborator===<br />
Once you have gone through the above steps, your collaborator should receive an automated email from XIAS with a code, that he can use to complete their registration.<br />
<br />
==Request an account on cheaha==<br />
Once you have completed the steps of adding/sponsoring XIAS account for your collaborator, send us an email on support@listserv.uab.edu with information about the collaborator. Please don't forget to include their PrimaryID and email address which you used to create their XIAS account, as it would become their Username on [[Cheaha]].</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Collaborator_Account&diff=5904Collaborator Account2019-04-12T14:31:44Z<p>Mhanby@uab.edu: /* Create XIAS Account */</p>
<hr />
<div>This Page lays down the step for you as a UAB employee to request for a cheaha account for your collaborator.<br />
<br />
==Create XIAS Account==<br />
XIAS Accounts, or external access account. allows UAB employees to sponsor external collaborators to utilize some of the UAB resources for which the user has been granted access. Creating XIAS account is a self-service interface which allows you to sponsor and create an account for your collaborator at [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website].<br />
<br />
'''When you go through the sponsorship process you are stating that you know the individual(s) and are responsible for their actions while they are using the XIAS accounts.'''<br />
<br />
===Create a site===<br />
When you go to the [https://idm.uab.edu/cgi-cas/xrmi/sites XIAS website], you'll notice two options on the left-hand panel: [https://idm.uab.edu/cgi-cas/xrmi/sites Manage Projects/Sites] and [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.]<br />
<br />
* Choose Manage Projects/sites.<br />
* Over there click on '''New''' to create a new site.<br />
* Fill in all the information i.e. Short Description, Long Description, Start date and End date. <br />
** Remember that your users cannot have '''End date''' beyond your sites '''End date'''.<br />
** Note that the start and end dates should be in the format YYYY-MM-DD<br />
** URIs are the resources that the sponsored users should have access to. If the resources are applications or servers then the manager of that resource must do what is necessary within that resource to authorize the external users to gain access.<br />
* In the URIs section: fill out '''VPN.DPO.UAB.EDU''' and '''cheaha.rc.uab.edu''' <br />
* Click on '''Add''' button to create the site.<br />
<br />
=== Create a user===<br />
<br />
* Now choose [https://idm.uab.edu/cgi-cas/xrmi/users Manage Users.] from the left hand panel.<br />
* In the drop-down select your XIAS site.<br />
* To add new users click the '''Register''' button. To review the users already there and change their end date click the '''List''' button.<br />
* To register new user(s) enter an end date for that user’s access. <br />
** The date must be on or before the end date for the site and in the format YYYY-MM-DD<br />
* Enter the email addresses of the user(s) (your collaborator's email) in the box under the end date. You can add multiple users by putting each on a separate line.<br />
<br />
=== Collaborator===<br />
Once you have gone through the above steps, your collaborator should receive an automated email from XIAS with a code, that he can use to complete their registration.<br />
<br />
==Request an account on cheaha==<br />
Once you have completed the steps of adding/sponsoring XIAS account for your collaborator, send us an email on support@listserv.uab.edu with information about the collaborator. Please don't forget to include their PrimaryID and email address which you used to create their XIAS account, as it would become their Username on [[Cheaha]].</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Resources&diff=5899Resources2019-04-09T17:51:42Z<p>Mhanby@uab.edu: /* Storage Resources */ Added 2018 DDN SFA14K storage</p>
<hr />
<div>The [[wikipedia:Cyberinfrastructure|Cyberinfrastructure]] supporting UAB investigators includes high performance computing clusters, high-speed storage systems, campus, state-wide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
[[Cheaha]] is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing Services group (UAB ITRCS) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. UAB ITRCS in open collaboration with community members is leading the design and development of these resources. UAB IT’s Infrastructure Services group provides operational support and maintenance of these resources.<br />
<br />
A description of the facilities available to UAB researchers are described below. If you would like an account on the HPC system, please {{CheahaAccountRequest}} and provide a short statement on your intended use of the resources and your affiliation with the university. <br />
<br />
== UAB High Performance Computing (HPC) Clusters ==<br />
<br />
=== Compute Resources ===<br />
<br />
The current compute fabric for this system is anchored by the [[Cheaha]] cluster, a commodity cluster with 2800 cores connected by low-latency Fourteen Data Rate (FDR) and Enhanced Data Rate (EDR) InfiniBand networks. <br />
<br />
A historical description of the different hardware generations are summarized in the following table:<br />
* Gen7: 18 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs, and EDR InfiniBand interconnect (supported by UAB, 2017).<br />
* Gen6: 96 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs (supported by UAB, 2015/2016).<br />
* Gen5: 12 2x8 core (192 cores total) 2.0 GHz Intel Xeon E2650 nodes with 96GB RAM per node and 10 Gbps interconnect dedicated to OpenStack and Ceph (supported by UAB IT, 2012).<br />
* Gen4: 3 2x8 core (48 cores total) 2.70 GHz Intel Xeon compute nodes with 384GB RAM per node (24GB per core), QDR InfiniBand interconnect (supported by Section on Statistical Genetics, School of Public Health, 2012).<br />
* Gen3: 48 2x6 core (576 cores total) 2.66 GHz Intel Xeon compute nodes with 48GB RAM per node (4GB per core), QDR InfiniBand interconnect (supported by NIH grant S10RR026723-01, 2010)<br />
* Gen2: 24 2x4 (192 cores total) Intel 3.0 GHz Intel Xeon compute nodes with 16GB RAM per node (2GB per core), DDR InfiniBand interconnect (supported by UAB IT, 2008). <br />
* Gen1: 60 2-core (120 cores total) AMD 1.6GHz Opteron 64-bit compute nodes with 2GB RAM per node (1GB per core), and Gigabit Ethernet connectivity between the nodes (supported by Alabama EPSCoR Research Infrastructure Initiative, NSF EPS-0091853, 2005). <br />
<br />
{{CheahaTflops}}<br />
<br />
=== Storage Resources ===<br />
<br />
In 2009, annual investment funds were directed toward establishing a fully connected dual data rate Infiniband network between the compute nodes added in 2008 and laying the foundation for a research storage system with a 60TB DDN storage system accessed via the Lustre distributed file system. In 2010, UAB was awarded an NIH Small Instrumentation Grant (SIG) to further increase analytical and storage capacity by an additional 120TB of high performance Lustre storage on a DDN hardware (retired in 2016). In Fall 2013, UAB IT Research Computing acquired an OpenStack cloud and Ceph storage software fabric through a partnership between Dell and Inktank in order to extend cloud-computing solutions to the researchers at UAB and enhance the interfacing capabilities for HPC. This storage system provides an aggregate of half-petabytes of raw storage that is distributed across 12 compute nodes each with node having 16 cores, 96GB RAM, and 36TB of storage and connected together with a 10Gigabit Ethernet networking (pilot implementation retired in Spring 2017)). During 2016, as part of the Alabama Innovation Fund grant working in partnership with numerous departments, 6.6PB raw GPFS storage on DDN SFA12KX hardware was added to meet the growing data needs of UAB researchers. In Fall 2018, UAB IT Research Computing upgraded the 6PB GPFS storage backend with the next generation DDN SFA14KX. This hardware improved HPC performance by increasing the speed at which research application can access their data sets. In 2019, the SFA12KX was moved to a remote data center and will act as a replicate pair for the SFA14KX.<br />
<br />
=== Network Resources ===<br />
<br />
==== Research Network ====<br />
'''UAB 10GigE Research Network''' The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverage the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone ensures predictable latencies.<br />
<br />
==== Campus Network ====<br />
'''Campus High Speed Network Connectivity''' The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second backplanes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using gigabit Ethernet links over single mode optical fiber. Within multi-floor buildings, a gigabit Ethernet building backbone over multimode optical fiber is used and Category 5 or better, unshielded twisted pair wiring connects desktops to the network. Computer server clusters are connected to the building entrance using Gigabit Ethernet. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
<br />
==== Regional Networks ====<br />
'''Off-campus Network Connections''' UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN). <br />
<br />
UAB was awarded the NSF CC*DNI Networking Infrastructure grant ([http://www.nsf.gov/awardsearch/showAward?AWD_ID=1541310 CC-NIE-1541310]) in Fall 2016 to establish a dedicated high-speed research network (UAB Science DMZ) that establishes a 40Gbps networking core and provides researchers at UAB with 10Gbps connections from selected computers to the shared computational facility.<br />
<br />
== Regional and National Resources ==<br />
<br />
=== Alabama Supercomputing Center (ASC) ===<br />
<br />
Alabama Supercomputer Center (ASC) (http://www.asc.edu) is a State-wide resource located in Hunstville, Alabama. The ASC provides UAB investigators with access to a variety of high performance computing resources. These resources include:<br />
* The SGI UV (ULTRAVIOLET) has 256 Xeon E5-4640 CPU cores operating at 2.4 GHz and 4 TB of shared memory, and 182 terabytes in the GPFS storage cluster.<br />
* A Dense Memory Cluster (DMC) HPC system has 2216 CPU cores and 16 terabytes of distributed memory. Each compute node has a local disk (up to 1.9 terabytes of which are accessible as /tmp). Also attached to the DMC is a high performance GPFS storage cluster, which has 45 terabytes of high performance storage accessible as /scratch from each node. Home directories as well as third party applications use a separate GPFS volume and share 137 terabytes of storage. The machine is physically configured as a cluster of 8, 16, or 20 CPU core SMP boards. Ninety-six nodes have 2.26 GHz Intel quad-core Nehalem processors and 24 gigabytes of memory. Forty nodes have 2.3 GHz AMD 8-core Opteron Magny-Cours processors and 128 gigabytes of memory. Forty nodes have 2.5 GHz Intel 10-core Xeon Ivy Bridge processors and 128 gigabytes of memory.<br />
* A large number of software packages are installed supporting a variety of analyses including programs for Computational Structural Analysis, Design Analysis, Quantum Chemistry, Molecular Mechanics/Dynamics, Crystallography, Fluid Dynamics, Statistics, Visualization, and Bioinformatics.<br />
<br />
=== Open Science Grid ===<br />
<br />
UAB is a member of the SURAgrid Virtual Organization (SGVO)_ on the Open Science Grid (OSG) (http://opensciencegrid.org)<br />
This is a national compute network consists of nearly 80,000 compute cores aggregated across national facilities and contributing member sites. The OSG provides operational support for the interconnection middleware and facilities research and operational engagement between members.</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Resources&diff=5898Resources2019-04-09T16:00:09Z<p>Mhanby@uab.edu: /* Storage Resources */</p>
<hr />
<div>The [[wikipedia:Cyberinfrastructure|Cyberinfrastructure]] supporting UAB investigators includes high performance computing clusters, high-speed storage systems, campus, state-wide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
[[Cheaha]] is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing Services group (UAB ITRCS) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. UAB ITRCS in open collaboration with community members is leading the design and development of these resources. UAB IT’s Infrastructure Services group provides operational support and maintenance of these resources.<br />
<br />
A description of the facilities available to UAB researchers are described below. If you would like an account on the HPC system, please {{CheahaAccountRequest}} and provide a short statement on your intended use of the resources and your affiliation with the university. <br />
<br />
== UAB High Performance Computing (HPC) Clusters ==<br />
<br />
=== Compute Resources ===<br />
<br />
The current compute fabric for this system is anchored by the [[Cheaha]] cluster, a commodity cluster with 2800 cores connected by low-latency Fourteen Data Rate (FDR) and Enhanced Data Rate (EDR) InfiniBand networks. <br />
<br />
A historical description of the different hardware generations are summarized in the following table:<br />
* Gen7: 18 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs, and EDR InfiniBand interconnect (supported by UAB, 2017).<br />
* Gen6: 96 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs (supported by UAB, 2015/2016).<br />
* Gen5: 12 2x8 core (192 cores total) 2.0 GHz Intel Xeon E2650 nodes with 96GB RAM per node and 10 Gbps interconnect dedicated to OpenStack and Ceph (supported by UAB IT, 2012).<br />
* Gen4: 3 2x8 core (48 cores total) 2.70 GHz Intel Xeon compute nodes with 384GB RAM per node (24GB per core), QDR InfiniBand interconnect (supported by Section on Statistical Genetics, School of Public Health, 2012).<br />
* Gen3: 48 2x6 core (576 cores total) 2.66 GHz Intel Xeon compute nodes with 48GB RAM per node (4GB per core), QDR InfiniBand interconnect (supported by NIH grant S10RR026723-01, 2010)<br />
* Gen2: 24 2x4 (192 cores total) Intel 3.0 GHz Intel Xeon compute nodes with 16GB RAM per node (2GB per core), DDR InfiniBand interconnect (supported by UAB IT, 2008). <br />
* Gen1: 60 2-core (120 cores total) AMD 1.6GHz Opteron 64-bit compute nodes with 2GB RAM per node (1GB per core), and Gigabit Ethernet connectivity between the nodes (supported by Alabama EPSCoR Research Infrastructure Initiative, NSF EPS-0091853, 2005). <br />
<br />
{{CheahaTflops}}<br />
<br />
=== Storage Resources ===<br />
<br />
In 2009, annual investment funds were directed toward establishing a fully connected dual data rate Infiniband network between the compute nodes added in 2008 and laying the foundation for a research storage system with a 60TB DDN storage system accessed via the Lustre distributed file system. In 2010, UAB was awarded an NIH Small Instrumentation Grant (SIG) to further increase analytical and storage capacity by an additional 120TB of high performance Lustre storage on a DDN hardware (retired in 2016). In Fall 2013, UAB IT Research Computing acquired an OpenStack cloud and Ceph storage software fabric through a partnership between Dell and Inktank in order to extend cloud-computing solutions to the researchers at UAB and enhance the interfacing capabilities for HPC. This storage system provides an aggregate of half-petabytes of raw storage that is distributed across 12 compute nodes each with node having 16 cores, 96GB RAM, and 36TB of storage and connected together with a 10Gigabit Ethernet networking (retired in Spring 2017). During 2016, as part of the Alabama Innovation Fund grant working in partnership with numerous departments, 6.6PB raw GPFS storage on DDN SFA12KX hardware was added to meet the growing data needs of UAB researchers.<br />
<br />
=== Network Resources ===<br />
<br />
==== Research Network ====<br />
'''UAB 10GigE Research Network''' The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverage the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone ensures predictable latencies.<br />
<br />
==== Campus Network ====<br />
'''Campus High Speed Network Connectivity''' The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second backplanes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using gigabit Ethernet links over single mode optical fiber. Within multi-floor buildings, a gigabit Ethernet building backbone over multimode optical fiber is used and Category 5 or better, unshielded twisted pair wiring connects desktops to the network. Computer server clusters are connected to the building entrance using Gigabit Ethernet. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
<br />
==== Regional Networks ====<br />
'''Off-campus Network Connections''' UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN). <br />
<br />
UAB was awarded the NSF CC*DNI Networking Infrastructure grant ([http://www.nsf.gov/awardsearch/showAward?AWD_ID=1541310 CC-NIE-1541310]) in Fall 2016 to establish a dedicated high-speed research network (UAB Science DMZ) that establishes a 40Gbps networking core and provides researchers at UAB with 10Gbps connections from selected computers to the shared computational facility.<br />
<br />
== Regional and National Resources ==<br />
<br />
=== Alabama Supercomputing Center (ASC) ===<br />
<br />
Alabama Supercomputer Center (ASC) (http://www.asc.edu) is a State-wide resource located in Hunstville, Alabama. The ASC provides UAB investigators with access to a variety of high performance computing resources. These resources include:<br />
* The SGI UV (ULTRAVIOLET) has 256 Xeon E5-4640 CPU cores operating at 2.4 GHz and 4 TB of shared memory, and 182 terabytes in the GPFS storage cluster.<br />
* A Dense Memory Cluster (DMC) HPC system has 2216 CPU cores and 16 terabytes of distributed memory. Each compute node has a local disk (up to 1.9 terabytes of which are accessible as /tmp). Also attached to the DMC is a high performance GPFS storage cluster, which has 45 terabytes of high performance storage accessible as /scratch from each node. Home directories as well as third party applications use a separate GPFS volume and share 137 terabytes of storage. The machine is physically configured as a cluster of 8, 16, or 20 CPU core SMP boards. Ninety-six nodes have 2.26 GHz Intel quad-core Nehalem processors and 24 gigabytes of memory. Forty nodes have 2.3 GHz AMD 8-core Opteron Magny-Cours processors and 128 gigabytes of memory. Forty nodes have 2.5 GHz Intel 10-core Xeon Ivy Bridge processors and 128 gigabytes of memory.<br />
* A large number of software packages are installed supporting a variety of analyses including programs for Computational Structural Analysis, Design Analysis, Quantum Chemistry, Molecular Mechanics/Dynamics, Crystallography, Fluid Dynamics, Statistics, Visualization, and Bioinformatics.<br />
<br />
=== Open Science Grid ===<br />
<br />
UAB is a member of the SURAgrid Virtual Organization (SGVO)_ on the Open Science Grid (OSG) (http://opensciencegrid.org)<br />
This is a national compute network consists of nearly 80,000 compute cores aggregated across national facilities and contributing member sites. The OSG provides operational support for the interconnection middleware and facilities research and operational engagement between members.</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Setting_Up_VNC_Session&diff=5894Setting Up VNC Session2019-02-14T00:49:02Z<p>Mhanby@uab.edu: /* Start your VNC Desktop */</p>
<hr />
<div>[[wikipedia:Virtual_Network_Computing|Virtual Network Computing (VNC)]] is a cross-platform desktop sharing system to interact with a remote system's desktop using a graphical interface. This page covers basic instructions to access a desktop on [[Cheaha]] using VNC. These basic instructions support a variety of use-cases where access to graphical applications on the cluster is helpful or required. If you are interested in knowing more options or detailed technical information, then please take a look at man pages of specified commands.<br />
<br />
== One Time Setup ==<br />
VNC use on Cheaha requires a one-time-setup to configure settings to starting the virtual desktop. These instructions will configure the VNC server to use the Gnome desktop environment, the default desktop environment on the cluster. (Alternatively, you can run the vncserver command without this configure and and start a very basic (but harder to use) desktop environment.) To get started [[Cheaha_GettingStarted#Login | log in to cheaha via ssh.]]<br />
<br />
=== Set VNC Session Password ===<br />
You must maintain a password for your VNC server sessions using the vncpasswd command. The password is validated each time a connection comes in, so it can be changed on the fly using vncpasswd command anytime later. '''Remember this password as you will be prompted for it when you access your cluster desktop'''. By default, the command stores an obfuscated version of the password in the file $HOME/.vnc/passwd.<br />
<br />
<pre><br />
$ vncpasswd <br />
</pre><br />
<br />
=== Configure the Cluster Desktop ===<br />
The vncserver command relies on a configuration script to start your virtual desktop environment. The [[wikipedia:GNOME|GNOME2]] desktop provides a familiar desktop experience and can be selected by creating the following vncserver startup script (~/.vnc/xstartup).<br />
<br />
<pre><br />
mkdir $HOME/.vnc<br />
<br />
cat > $HOME/.vnc/xstartup <<\EOF<br />
#!/bin/sh<br />
<br />
# Start up the standard system desktop<br />
unset SESSION_MANAGER<br />
unset DBUS_SESSION_BUS_ADDRESS<br />
<br />
/usr/bin/mate-session<br />
<br />
[ -x /etc/vnc/xstartup ] && exec /etc/vnc/xstartup<br />
[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources<br />
x-window-manager &<br />
<br />
EOF<br />
</pre><br />
<br />
By default a VNC server displays graphical environment using a tab-window-manager. If the above xstartup file is absent, then a file with the default tab-window-manager settings will be created by the vncserver command during startup. If you want to switch to the GNOME desktop, simply replace this default file with the settings above. <br />
<br />
This completes the one-time setup on the cluster for creating a VNC server password and selecting the preferred desktop environment.<br />
<br />
=== Select a VNC Client ===<br />
You will also need a VNC client on your personal desktop in order to remotely access your cluster desktop. <br />
<br />
Mac OS comes with a native VNC client so you don't need to use any third-party software. Chicken of the VNC is a popular alternative on Mac OS to the native VNC client, especially for older Mac OS, pre-10.7.<br />
<br />
Most Linux systems have the VNC software installed so you can simply use the vncviewer command to access a VNC server. <br />
<br />
If you use MS Windows then you will need to install a VNC client. Here is a list of VNC client softwares and you can any one of it to access VNC server. <br />
* http://www.tightvnc.com/ (Mac, Linux and Windows)<br />
* http://www.realvnc.com/ (Mac, Linux and Windows)<br />
* http://sourceforge.net/projects/cotvnc/ (Mac)<br />
<br />
== Start your VNC Desktop == <br />
Your VNC desktop must be started before you can connect to it. To start the VNC desktop you need to log into cheaha using an [[Cheaha_GettingStarted#Login|standard SSH connection]]. The VNC server is started by executing the vncserver command after you log in to cheaha. It will run in the background and continue running even after you log out of the SSH session that was used to run the vncserver command.<br />
<br />
To start the VNC desktop run the vncserver command. You will see a short message like the following from the vncserver before it goes into the background. You will need this information to connect to your desktop.<br />
<pre><br />
$ vncserver <br />
New 'login001:24 (blazer)' desktop is login001:24<br />
<br />
Starting applications specified in /home/blazer/.vnc/xstartup<br />
Log file is /home/blazer/.vnc/login001:24.log<br />
</pre><br />
<br />
The above command output indicates that a VNC server is started on VNC X-display number 24, which translates to system port 5924. The vncserver automatically selects this port from a list of available ports.<br />
<br />
The actual system port on which VNC server is listening for connections is obtained by adding a VNC base port (default: port 5900) and a VNC X-display number (24 in above case). Alternatively you can specify a high numbered system port directly (e.g. 5927) using '-rfbport <port-number>' option and the vncserver will try to use it if it's available. See vncserver's man page for details.<br />
<br />
Please note that the vncserver will continue to run in the backgound on the head node until it is explicitly stopped. This allows you to reconnect to the same desktop session without having to first start the vncserver, leaving all your desktop applications active. When you no longer need your desktop, simply log out of your desktop using the desktop's log out menu option or by explicitly ending the vncserver command with the 'vncserver -kill ' command.<br />
<br />
=== Alternate Cluster Desktop Sizes ===<br />
The default size of your cluster desktop is 1024x768 pixels. If you want to start your desktop with an alternate geometry to match your application, personal desktop environment, or other preferences, simply add a "-geometry hieghtxwidth" argument to your vncserver command. For example, if you want a wide screen geometry popular with laptops, you might start the VNC server with:<br />
<pre><br />
vncserver -geometry 1280x800<br />
</pre><br />
<br />
== Stop your VNC Desktop == <br />
Stopping the VNC process is done using the ''vncserver -kill'' command. The command takes a single argument, the display port.<br />
<br />
The VNC server display port can be found using the following command (display port format is a ''':''' followed by 1 or more digits):<br />
<pre><br />
vncserver -list<br />
<br />
X DISPLAY # PROCESS ID<br />
:4 52904<br />
</pre><br />
<br />
In the above example, the VNC display port is ''':4'''. Terminating the VNC desktop can now be done via:<br />
<pre><br />
vncserver -kill :4<br />
</pre><br />
<br />
== Establish a Network Connection to your VNC Server ==<br />
<br />
As indicated in the output from the vncserver command, the VNC desktop is listening for connections on a higher numbered port. This port isn't directly accessible from the internet. Hence, we need to use SSH local port forwarding to connect to this server.<br />
<br />
This SSH session provides the connection to your VNC desktop and must remain active while you use the desktop. You can disconnect and reconnect to your desktop by establishing this SSH session whenever you need to access your desktop. In other words, your desktop remains active across your connections to it. This supports a mobile work environment.<br />
<br />
=== Port-forwarding from Linux or Mac Systems ===<br />
Set up SSH port forwarding using the native SSH command. <br />
<pre><br />
# ssh -L <local-port>:<remote-system-host>:<remote-system-port> USERID@<SSH-server-host><br />
$ ssh -L 5924:localhost:5924 USERID@cheaha.rc.uab.edu<br />
</pre><br />
Above command will forward connections on local port 5924 to a remote system's (same as SSH server host Cheaha - hence localhost) port 5924.<br />
<br />
=== Port-forwarding from Windows Systems ===<br />
Windows users need to establish the connection using whatever SSH software they commonly use. The following is an example configuration using Putty client on Windows. Be sure to press the "Add" button to save the configuration with the session and ensure the tunnel is opened when the connection is established.<br />
<br />
[[File:Putty-SSH-Tunnel.png]]<br />
<br />
== Access your Cluster Desktop ==<br />
<br />
With the network connection to the VNC server established, you can access your cluster desktop using your preferred VNC client. When you access your cluster desktop you will be prompted for the VNC password you created during the one time setup above.<br />
<br />
The VNC client will actually connect to your local machine, eg. "localhost", because it relies on the SSH port forwarding to connect to the VNC server on the cluster. You do this because you have already created the real connection to Cheaha using the SSH tunnel. The SSH tunnel "listens" on your local host and forwards all of your VNC traffic across the network to your VNC server on the cluster.<br />
<br />
You can access the VNC server using the following connection scenarios based on your personal desktop environment.<br />
<br />
==== From Mac ====<br />
<br />
'''For Mac OSX 10.8 and higher'''<br />
Mac users can use the default VNC client and start it from Finder. Press '''cmd+k''' to bring up the "connect to server" window. Enter the following connection string in Finder: <br />
<pre>vnc://localhost:5924 </pre><br />
The connection string pattern is "vnc://<vnc-server>:<vnc-port>". Adjust your port setting for the specific value of your cluster desktop given when you run vncserver above.<br />
<br />
'''For Mac OSX 10.7 and lower'''<br />
Download and install Chicken of the VNC from [http://sourceforge.net/projects/cotvnc/| sourceforge].<br />
Start COTVNC and enter the following in the host window and provide the VNC password you created during set up when prompted:<br />
<pre>localhost:5924</pre><br />
<br />
<br />
==== From Linux ====<br />
Linux users can use the command<br />
<pre><br />
vncviewer :24 <br />
</pre><br />
<br />
===== Shortcut for Linux Users =====<br />
Linux users can optionally skip the explicit SSH tunnel setup described above by using the -via argument to the vncviewer command. The "-via <gateway>" will set up the SSH tunnel implicitly. For the above example, the following command would be used:<br />
<pre><br />
vncviewer -via cheaha.rc.uab.edu :24<br />
</pre><br />
This option is preferred since it will also establish VNC settings that are more efficient for slow networks. See the man page for vncviewer for details on other encodings.<br />
<br />
==== From Windows ====<br />
Windows users should use whatever connection string is applicable to their VNC client. <br />
<br />
Remember to use "localhost" as the host address in your VNC client. You do this because you have already created the real connection to Cheaha using the SSH tunnel. The SSH tunnel "listens" on your local host and forwards all of your VNC traffic across the network to your VNC server on the cluster.<br />
<br />
== Using your Desktop ==<br />
Once we have a VNC session established with Gnome desktop environment, we can use it to launch any graphical application on Cheaha or use it to open GUI (X11) supported SSH session with a remote system in the cluster. <br />
<br />
VNC can be particularly useful when we are trying to access and X Windows application from MS Windows, as native X11 setup on Windows is typically more involved than the VNC setup above. For example, it's much easier to start X11 based SSH session with the remote system on the cluster from above Gnome desktop environment than doing all X11 setup on Windows.<br />
<pre> <br />
$ ssh -X $USER@172.x.x.x<br />
</pre><br />
<br />
=== Performance Considerations for Slow Networks ===<br />
<br />
If the network you are using to connect to your VNC session is slow (eg. wifi or off campus), you may be able to improve the responsiveness of the VNC session by adjusting simple desktop settings in your VNC desktop. The VNC screen needs to be repainted every time your desktop is modified, eg. opening or moving a window. Any bit of data you don't have to send will improve the drawing speed. Most modern desktops default to a pretty picture. While nice to look at these pictures contain lots data. If you set your desktop background to a solid color (no gradients) the screen refresh will be much quicker (see System->Preferences->Desktop Background). Also, if you change to a basic windowing theme it will also speed screen refreshes (see System->Preferences->Themes->Mist).</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Setting_Up_VNC_Session&diff=5893Setting Up VNC Session2019-02-13T19:13:45Z<p>Mhanby@uab.edu: /* Configure the Cluster Desktop */</p>
<hr />
<div>[[wikipedia:Virtual_Network_Computing|Virtual Network Computing (VNC)]] is a cross-platform desktop sharing system to interact with a remote system's desktop using a graphical interface. This page covers basic instructions to access a desktop on [[Cheaha]] using VNC. These basic instructions support a variety of use-cases where access to graphical applications on the cluster is helpful or required. If you are interested in knowing more options or detailed technical information, then please take a look at man pages of specified commands.<br />
<br />
== One Time Setup ==<br />
VNC use on Cheaha requires a one-time-setup to configure settings to starting the virtual desktop. These instructions will configure the VNC server to use the Gnome desktop environment, the default desktop environment on the cluster. (Alternatively, you can run the vncserver command without this configure and and start a very basic (but harder to use) desktop environment.) To get started [[Cheaha_GettingStarted#Login | log in to cheaha via ssh.]]<br />
<br />
=== Set VNC Session Password ===<br />
You must maintain a password for your VNC server sessions using the vncpasswd command. The password is validated each time a connection comes in, so it can be changed on the fly using vncpasswd command anytime later. '''Remember this password as you will be prompted for it when you access your cluster desktop'''. By default, the command stores an obfuscated version of the password in the file $HOME/.vnc/passwd.<br />
<br />
<pre><br />
$ vncpasswd <br />
</pre><br />
<br />
=== Configure the Cluster Desktop ===<br />
The vncserver command relies on a configuration script to start your virtual desktop environment. The [[wikipedia:GNOME|GNOME2]] desktop provides a familiar desktop experience and can be selected by creating the following vncserver startup script (~/.vnc/xstartup).<br />
<br />
<pre><br />
mkdir $HOME/.vnc<br />
<br />
cat > $HOME/.vnc/xstartup <<\EOF<br />
#!/bin/sh<br />
<br />
# Start up the standard system desktop<br />
unset SESSION_MANAGER<br />
unset DBUS_SESSION_BUS_ADDRESS<br />
<br />
/usr/bin/mate-session<br />
<br />
[ -x /etc/vnc/xstartup ] && exec /etc/vnc/xstartup<br />
[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources<br />
x-window-manager &<br />
<br />
EOF<br />
</pre><br />
<br />
By default a VNC server displays graphical environment using a tab-window-manager. If the above xstartup file is absent, then a file with the default tab-window-manager settings will be created by the vncserver command during startup. If you want to switch to the GNOME desktop, simply replace this default file with the settings above. <br />
<br />
This completes the one-time setup on the cluster for creating a VNC server password and selecting the preferred desktop environment.<br />
<br />
=== Select a VNC Client ===<br />
You will also need a VNC client on your personal desktop in order to remotely access your cluster desktop. <br />
<br />
Mac OS comes with a native VNC client so you don't need to use any third-party software. Chicken of the VNC is a popular alternative on Mac OS to the native VNC client, especially for older Mac OS, pre-10.7.<br />
<br />
Most Linux systems have the VNC software installed so you can simply use the vncviewer command to access a VNC server. <br />
<br />
If you use MS Windows then you will need to install a VNC client. Here is a list of VNC client softwares and you can any one of it to access VNC server. <br />
* http://www.tightvnc.com/ (Mac, Linux and Windows)<br />
* http://www.realvnc.com/ (Mac, Linux and Windows)<br />
* http://sourceforge.net/projects/cotvnc/ (Mac)<br />
<br />
== Start your VNC Desktop == <br />
Your VNC desktop must be started before you can connect to it. To start the VNC desktop you need to log into cheaha using an [[Cheaha_GettingStarted#Login|standard SSH connection]]. The VNC server is started by executing the vncserver command after you log in to cheaha. It will run in the background and continue running even after you log out of the SSH session that was used to run the vncserver command.<br />
<br />
To start the VNC desktop run the vncserver command. You will see a short message like the following from the vncserver before it goes into the background. You will need this information to connect to your desktop.<br />
<pre><br />
$ vncserver <br />
New 'login001:24 (blazer)' desktop is login001:24<br />
<br />
Starting applications specified in /home/blazer/.vnc/xstartup<br />
Log file is /home/blazer/.vnc/login001:24.log<br />
</pre><br />
<br />
The above command output indicates that a VNC server is started on VNC X-display number 24, which translates to system port 5924. The vncserver automatically selects this port from a list of available ports.<br />
<br />
The actual system port on which VNC server is listening for connections is obtained by adding a VNC base port (default: port 5900) and a VNC X-display number (24 in above case). Alternatively you can specify a high numbered system port directly (e.g. 5927) using '-rfbport <port-number>' option and the vncserver will try to use it if it's available. See vncserver's man page for details.<br />
<br />
Please note that the vncserver will continue to run in the backgound on the head node until it is explicitly stopped. This allows you to reconnect to the same desktop session without having to first start the vncserver, leaving all your desktop applications active. When you no longer need your desktop, simply log out of your desktop using the desktop's log out menu option or by explicitly ending the vncserver command with the 'vncserver -kill ' command.<br />
<br />
=== Alternate Cluster Desktop Sizes ===<br />
The default size of your cluster desktop is 1024x768 pixels. If you want to start your desktop with an alternate geometry to match your application, personal desktop environment, or other preferences, simply add a "-geometry hieghtxwidth" argument to your vncserver command. For example, if you want a wide screen geometry popular with laptops, you might start the VNC server with:<br />
<pre><br />
vncserver -geometry 1280x800<br />
</pre><br />
<br />
== Establish a Network Connection to your VNC Server ==<br />
<br />
As indicated in the output from the vncserver command, the VNC desktop is listening for connections on a higher numbered port. This port isn't directly accessible from the internet. Hence, we need to use SSH local port forwarding to connect to this server.<br />
<br />
This SSH session provides the connection to your VNC desktop and must remain active while you use the desktop. You can disconnect and reconnect to your desktop by establishing this SSH session whenever you need to access your desktop. In other words, your desktop remains active across your connections to it. This supports a mobile work environment.<br />
<br />
=== Port-forwarding from Linux or Mac Systems ===<br />
Set up SSH port forwarding using the native SSH command. <br />
<pre><br />
# ssh -L <local-port>:<remote-system-host>:<remote-system-port> USERID@<SSH-server-host><br />
$ ssh -L 5924:localhost:5924 USERID@cheaha.rc.uab.edu<br />
</pre><br />
Above command will forward connections on local port 5924 to a remote system's (same as SSH server host Cheaha - hence localhost) port 5924.<br />
<br />
=== Port-forwarding from Windows Systems ===<br />
Windows users need to establish the connection using whatever SSH software they commonly use. The following is an example configuration using Putty client on Windows. Be sure to press the "Add" button to save the configuration with the session and ensure the tunnel is opened when the connection is established.<br />
<br />
[[File:Putty-SSH-Tunnel.png]]<br />
<br />
== Access your Cluster Desktop ==<br />
<br />
With the network connection to the VNC server established, you can access your cluster desktop using your preferred VNC client. When you access your cluster desktop you will be prompted for the VNC password you created during the one time setup above.<br />
<br />
The VNC client will actually connect to your local machine, eg. "localhost", because it relies on the SSH port forwarding to connect to the VNC server on the cluster. You do this because you have already created the real connection to Cheaha using the SSH tunnel. The SSH tunnel "listens" on your local host and forwards all of your VNC traffic across the network to your VNC server on the cluster.<br />
<br />
You can access the VNC server using the following connection scenarios based on your personal desktop environment.<br />
<br />
==== From Mac ====<br />
<br />
'''For Mac OSX 10.8 and higher'''<br />
Mac users can use the default VNC client and start it from Finder. Press '''cmd+k''' to bring up the "connect to server" window. Enter the following connection string in Finder: <br />
<pre>vnc://localhost:5924 </pre><br />
The connection string pattern is "vnc://<vnc-server>:<vnc-port>". Adjust your port setting for the specific value of your cluster desktop given when you run vncserver above.<br />
<br />
'''For Mac OSX 10.7 and lower'''<br />
Download and install Chicken of the VNC from [http://sourceforge.net/projects/cotvnc/| sourceforge].<br />
Start COTVNC and enter the following in the host window and provide the VNC password you created during set up when prompted:<br />
<pre>localhost:5924</pre><br />
<br />
<br />
==== From Linux ====<br />
Linux users can use the command<br />
<pre><br />
vncviewer :24 <br />
</pre><br />
<br />
===== Shortcut for Linux Users =====<br />
Linux users can optionally skip the explicit SSH tunnel setup described above by using the -via argument to the vncviewer command. The "-via <gateway>" will set up the SSH tunnel implicitly. For the above example, the following command would be used:<br />
<pre><br />
vncviewer -via cheaha.rc.uab.edu :24<br />
</pre><br />
This option is preferred since it will also establish VNC settings that are more efficient for slow networks. See the man page for vncviewer for details on other encodings.<br />
<br />
==== From Windows ====<br />
Windows users should use whatever connection string is applicable to their VNC client. <br />
<br />
Remember to use "localhost" as the host address in your VNC client. You do this because you have already created the real connection to Cheaha using the SSH tunnel. The SSH tunnel "listens" on your local host and forwards all of your VNC traffic across the network to your VNC server on the cluster.<br />
<br />
== Using your Desktop ==<br />
Once we have a VNC session established with Gnome desktop environment, we can use it to launch any graphical application on Cheaha or use it to open GUI (X11) supported SSH session with a remote system in the cluster. <br />
<br />
VNC can be particularly useful when we are trying to access and X Windows application from MS Windows, as native X11 setup on Windows is typically more involved than the VNC setup above. For example, it's much easier to start X11 based SSH session with the remote system on the cluster from above Gnome desktop environment than doing all X11 setup on Windows.<br />
<pre> <br />
$ ssh -X $USER@172.x.x.x<br />
</pre><br />
<br />
=== Performance Considerations for Slow Networks ===<br />
<br />
If the network you are using to connect to your VNC session is slow (eg. wifi or off campus), you may be able to improve the responsiveness of the VNC session by adjusting simple desktop settings in your VNC desktop. The VNC screen needs to be repainted every time your desktop is modified, eg. opening or moving a window. Any bit of data you don't have to send will improve the drawing speed. Most modern desktops default to a pretty picture. While nice to look at these pictures contain lots data. If you set your desktop background to a solid color (no gradients) the screen refresh will be much quicker (see System->Preferences->Desktop Background). Also, if you change to a basic windowing theme it will also speed screen refreshes (see System->Preferences->Themes->Mist).</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Setting_Up_VNC_Session&diff=5892Setting Up VNC Session2019-02-13T19:12:50Z<p>Mhanby@uab.edu: /* Configure the Cluster Desktop */ Updated xstartup to reflect current Cheaha running state</p>
<hr />
<div>[[wikipedia:Virtual_Network_Computing|Virtual Network Computing (VNC)]] is a cross-platform desktop sharing system to interact with a remote system's desktop using a graphical interface. This page covers basic instructions to access a desktop on [[Cheaha]] using VNC. These basic instructions support a variety of use-cases where access to graphical applications on the cluster is helpful or required. If you are interested in knowing more options or detailed technical information, then please take a look at man pages of specified commands.<br />
<br />
== One Time Setup ==<br />
VNC use on Cheaha requires a one-time-setup to configure settings to starting the virtual desktop. These instructions will configure the VNC server to use the Gnome desktop environment, the default desktop environment on the cluster. (Alternatively, you can run the vncserver command without this configure and and start a very basic (but harder to use) desktop environment.) To get started [[Cheaha_GettingStarted#Login | log in to cheaha via ssh.]]<br />
<br />
=== Set VNC Session Password ===<br />
You must maintain a password for your VNC server sessions using the vncpasswd command. The password is validated each time a connection comes in, so it can be changed on the fly using vncpasswd command anytime later. '''Remember this password as you will be prompted for it when you access your cluster desktop'''. By default, the command stores an obfuscated version of the password in the file $HOME/.vnc/passwd.<br />
<br />
<pre><br />
$ vncpasswd <br />
</pre><br />
<br />
=== Configure the Cluster Desktop ===<br />
The vncserver command relies on a configuration script to start your virtual desktop environment. The [[wikipedia:GNOME|GNOME]]2 desktop provides a familiar desktop experience and can be selected by creating the following vncserver startup script (~/.vnc/xstartup).<br />
<br />
<pre><br />
mkdir $HOME/.vnc<br />
<br />
cat > $HOME/.vnc/xstartup <<\EOF<br />
#!/bin/sh<br />
<br />
# Start up the standard system desktop<br />
unset SESSION_MANAGER<br />
unset DBUS_SESSION_BUS_ADDRESS<br />
<br />
/usr/bin/mate-session<br />
<br />
[ -x /etc/vnc/xstartup ] && exec /etc/vnc/xstartup<br />
[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources<br />
x-window-manager &<br />
</pre><br />
<br />
By default a VNC server displays graphical environment using a tab-window-manager. If the above xstartup file is absent, then a file with the default tab-window-manager settings will be created by the vncserver command during startup. If you want to switch to the GNOME desktop, simply replace this default file with the settings above. <br />
<br />
This completes the one-time setup on the cluster for creating a VNC server password and selecting the preferred desktop environment.<br />
<br />
=== Select a VNC Client ===<br />
You will also need a VNC client on your personal desktop in order to remotely access your cluster desktop. <br />
<br />
Mac OS comes with a native VNC client so you don't need to use any third-party software. Chicken of the VNC is a popular alternative on Mac OS to the native VNC client, especially for older Mac OS, pre-10.7.<br />
<br />
Most Linux systems have the VNC software installed so you can simply use the vncviewer command to access a VNC server. <br />
<br />
If you use MS Windows then you will need to install a VNC client. Here is a list of VNC client softwares and you can any one of it to access VNC server. <br />
* http://www.tightvnc.com/ (Mac, Linux and Windows)<br />
* http://www.realvnc.com/ (Mac, Linux and Windows)<br />
* http://sourceforge.net/projects/cotvnc/ (Mac)<br />
<br />
== Start your VNC Desktop == <br />
Your VNC desktop must be started before you can connect to it. To start the VNC desktop you need to log into cheaha using an [[Cheaha_GettingStarted#Login|standard SSH connection]]. The VNC server is started by executing the vncserver command after you log in to cheaha. It will run in the background and continue running even after you log out of the SSH session that was used to run the vncserver command.<br />
<br />
To start the VNC desktop run the vncserver command. You will see a short message like the following from the vncserver before it goes into the background. You will need this information to connect to your desktop.<br />
<pre><br />
$ vncserver <br />
New 'login001:24 (blazer)' desktop is login001:24<br />
<br />
Starting applications specified in /home/blazer/.vnc/xstartup<br />
Log file is /home/blazer/.vnc/login001:24.log<br />
</pre><br />
<br />
The above command output indicates that a VNC server is started on VNC X-display number 24, which translates to system port 5924. The vncserver automatically selects this port from a list of available ports.<br />
<br />
The actual system port on which VNC server is listening for connections is obtained by adding a VNC base port (default: port 5900) and a VNC X-display number (24 in above case). Alternatively you can specify a high numbered system port directly (e.g. 5927) using '-rfbport <port-number>' option and the vncserver will try to use it if it's available. See vncserver's man page for details.<br />
<br />
Please note that the vncserver will continue to run in the backgound on the head node until it is explicitly stopped. This allows you to reconnect to the same desktop session without having to first start the vncserver, leaving all your desktop applications active. When you no longer need your desktop, simply log out of your desktop using the desktop's log out menu option or by explicitly ending the vncserver command with the 'vncserver -kill ' command.<br />
<br />
=== Alternate Cluster Desktop Sizes ===<br />
The default size of your cluster desktop is 1024x768 pixels. If you want to start your desktop with an alternate geometry to match your application, personal desktop environment, or other preferences, simply add a "-geometry hieghtxwidth" argument to your vncserver command. For example, if you want a wide screen geometry popular with laptops, you might start the VNC server with:<br />
<pre><br />
vncserver -geometry 1280x800<br />
</pre><br />
<br />
== Establish a Network Connection to your VNC Server ==<br />
<br />
As indicated in the output from the vncserver command, the VNC desktop is listening for connections on a higher numbered port. This port isn't directly accessible from the internet. Hence, we need to use SSH local port forwarding to connect to this server.<br />
<br />
This SSH session provides the connection to your VNC desktop and must remain active while you use the desktop. You can disconnect and reconnect to your desktop by establishing this SSH session whenever you need to access your desktop. In other words, your desktop remains active across your connections to it. This supports a mobile work environment.<br />
<br />
=== Port-forwarding from Linux or Mac Systems ===<br />
Set up SSH port forwarding using the native SSH command. <br />
<pre><br />
# ssh -L <local-port>:<remote-system-host>:<remote-system-port> USERID@<SSH-server-host><br />
$ ssh -L 5924:localhost:5924 USERID@cheaha.rc.uab.edu<br />
</pre><br />
Above command will forward connections on local port 5924 to a remote system's (same as SSH server host Cheaha - hence localhost) port 5924.<br />
<br />
=== Port-forwarding from Windows Systems ===<br />
Windows users need to establish the connection using whatever SSH software they commonly use. The following is an example configuration using Putty client on Windows. Be sure to press the "Add" button to save the configuration with the session and ensure the tunnel is opened when the connection is established.<br />
<br />
[[File:Putty-SSH-Tunnel.png]]<br />
<br />
== Access your Cluster Desktop ==<br />
<br />
With the network connection to the VNC server established, you can access your cluster desktop using your preferred VNC client. When you access your cluster desktop you will be prompted for the VNC password you created during the one time setup above.<br />
<br />
The VNC client will actually connect to your local machine, eg. "localhost", because it relies on the SSH port forwarding to connect to the VNC server on the cluster. You do this because you have already created the real connection to Cheaha using the SSH tunnel. The SSH tunnel "listens" on your local host and forwards all of your VNC traffic across the network to your VNC server on the cluster.<br />
<br />
You can access the VNC server using the following connection scenarios based on your personal desktop environment.<br />
<br />
==== From Mac ====<br />
<br />
'''For Mac OSX 10.8 and higher'''<br />
Mac users can use the default VNC client and start it from Finder. Press '''cmd+k''' to bring up the "connect to server" window. Enter the following connection string in Finder: <br />
<pre>vnc://localhost:5924 </pre><br />
The connection string pattern is "vnc://<vnc-server>:<vnc-port>". Adjust your port setting for the specific value of your cluster desktop given when you run vncserver above.<br />
<br />
'''For Mac OSX 10.7 and lower'''<br />
Download and install Chicken of the VNC from [http://sourceforge.net/projects/cotvnc/| sourceforge].<br />
Start COTVNC and enter the following in the host window and provide the VNC password you created during set up when prompted:<br />
<pre>localhost:5924</pre><br />
<br />
<br />
==== From Linux ====<br />
Linux users can use the command<br />
<pre><br />
vncviewer :24 <br />
</pre><br />
<br />
===== Shortcut for Linux Users =====<br />
Linux users can optionally skip the explicit SSH tunnel setup described above by using the -via argument to the vncviewer command. The "-via <gateway>" will set up the SSH tunnel implicitly. For the above example, the following command would be used:<br />
<pre><br />
vncviewer -via cheaha.rc.uab.edu :24<br />
</pre><br />
This option is preferred since it will also establish VNC settings that are more efficient for slow networks. See the man page for vncviewer for details on other encodings.<br />
<br />
==== From Windows ====<br />
Windows users should use whatever connection string is applicable to their VNC client. <br />
<br />
Remember to use "localhost" as the host address in your VNC client. You do this because you have already created the real connection to Cheaha using the SSH tunnel. The SSH tunnel "listens" on your local host and forwards all of your VNC traffic across the network to your VNC server on the cluster.<br />
<br />
== Using your Desktop ==<br />
Once we have a VNC session established with Gnome desktop environment, we can use it to launch any graphical application on Cheaha or use it to open GUI (X11) supported SSH session with a remote system in the cluster. <br />
<br />
VNC can be particularly useful when we are trying to access and X Windows application from MS Windows, as native X11 setup on Windows is typically more involved than the VNC setup above. For example, it's much easier to start X11 based SSH session with the remote system on the cluster from above Gnome desktop environment than doing all X11 setup on Windows.<br />
<pre> <br />
$ ssh -X $USER@172.x.x.x<br />
</pre><br />
<br />
=== Performance Considerations for Slow Networks ===<br />
<br />
If the network you are using to connect to your VNC session is slow (eg. wifi or off campus), you may be able to improve the responsiveness of the VNC session by adjusting simple desktop settings in your VNC desktop. The VNC screen needs to be repainted every time your desktop is modified, eg. opening or moving a window. Any bit of data you don't have to send will improve the drawing speed. Most modern desktops default to a pretty picture. While nice to look at these pictures contain lots data. If you set your desktop background to a solid color (no gradients) the screen refresh will be much quicker (see System->Preferences->Desktop Background). Also, if you change to a basic windowing theme it will also speed screen refreshes (see System->Preferences->Themes->Mist).</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=MATLAB_DCS&diff=5891MATLAB DCS2019-02-11T19:41:48Z<p>Mhanby@uab.edu: </p>
<hr />
<div>{{Matlab_deprecated}}<br />
<br />
<br />
The [http://www.mathworks.com/products/distriben MATLAB Distributed Computing Server (MATLAB DCS)] is a parallel computing extension to MATLAB that enables processing to be spread across a large number of worker nodes, accelerating the speed at which compute intensive operations can complete.<br />
<br />
UAB IT Research Computing maintains a 128 worker node license for the the [[Cheaha]] computing platform. In order to use DCS on Cheaha, you will need to [[MATLAB#Using_MATLAB|use a MATLAB instance]] with the Parallel Computing Toolbox installed. <br />
<br />
In order to leverage the MATLAB worker nodes on [[Cheaha]] the Parallel Computing Toolbox will need to be configured to submit compute tasks to [[Cheaha]] by following the steps in this document.<br />
<br />
{{MatlabAppPage}}<br />
<br />
== Overview ==<br />
<br />
The following outline highlights the steps involved to configure your MatLab install and write programs that submit tasks to the worker nodes of the Distributed Computing Server on Cheaha:<br />
<br />
* Configure the Task Submit Environment (One-time Setup)<br />
** [[MatLab|Install MatLab]] with the Parallel Computing Toolbox on your Windows / Linux / Mac workstation<br />
** Download and extract the MatLab task submission functions to your workstation MatLab environment<br />
** Define the "cheaha" parallel configuration in your workstation MatLab environment to submit tasks to Cheaha<br />
** Run the validation tests to ensure your "cheaha" parallel configuration works<br />
* Develop and Run Parallel Computing Applications<br />
** Write, test and debug your parallel code on your local workstation using the default "local" parallel configuration<br />
** Once your code works, select the "cheaha" parallel configuration to submit tasks to the Cheaha cluster. Note: your workstation MatLab application does not need to keep running after the tasks are submitted.<br />
** You will receive an email when the tasks you submitted are complete<br />
** Use your workstation MatLab application to retrieve the results<br />
** When you are finished with your job contexts, clean up the job related content to free disk space<br />
<br />
== Using MATLAB DCS ==<br />
<br />
The MATLAB Distributed Computing Services (DCS) are accessed via the Parallel Computing Toolbox (PCT) which is installed as part of your desktop [[MATLAB|MATLAB installation]]. The PCT allows MATLAB running on your workstation to send MATLAB code and data (tasks) to the cluster directly from the comfort of your familiar MATLAB environment on your desktop. This makes the expanded compute power of [[Cheaha]] available to processes work loads that exceed the capabilities of your desktop computer. Once your tasks are submitted to [[Cheaha]], your desktop MATLAB is also free to move on to other tasks or be closed completely, freeing your desktop or laptop for your other activities.<br />
<br />
Configuring the Parallel Computing Toolbox involves three steps documented below:<br />
# install MATLAB submit functions on your workstation<br />
# configure the "cheaha" parallel computing target to which PCT tasks can be submitted<br />
# run the validation tests to confirm a working installation.<br />
<br />
This page documents the DCS configuration for MATLAB 2010b and later. For DCS configuration instructions on previous versions of MATLAB, please see the page [[MatLab DCS R2010a and Earlier]] <br />
<br />
Using MATLAB DCS requires you have a cluster account on [[Cheaha]]. Please request an account by sending an email to [[mailto:support@listserv.uab.edu]] and include your campus affiliation and a brief statement of your research interests for using the cluster. <br />
<br />
=== MATLAB DCS from Your Desktop ===<br />
<br />
==== MATLAB Submit Functions ====<br />
<br />
The MATLAB submit functions create a cluster job context for your code and are responsible for transferring your code and the data it analyzes to the cluster for processing.<br />
<br />
These submit functions must be installed on your computer and must be accessible to MATLAB via the MATLAB PATH environment. The easiest way to accomplish this is to copy the submit functions to the default directory created for by MATLAB. These directories on the respective operating systems are listed below.<br />
<br />
All operating systems (Windows, Linux and Mac) are supported by the same set of submit functions. The functions are written in MATLAB making them cross-platform and only dependent on the version of MATLAB in use.<br />
<br />
# Download the MATLAB submit functions<br />
#* [http://projects.uabgrid.uab.edu/matlab/browser/trunk/distributables/matlab-R2013a-nonshared.zip?format=raw Submit Functions for MATLAB R2013a] -(updated 09/04/2013)<br />
#* [http://projects.uabgrid.uab.edu/matlab/browser/trunk/distributables/matlab-R2012a-nonshared.zip?format=raw Submit Functions for MATLAB R2012a] -(updated 03/07/2012)<br />
#* [http://projects.uabgrid.uab.edu/matlab/browser/trunk/distributables/matlab-R2011b-nonshared.zip?format=raw Submit Functions for MATLAB R2010b, R2011a, R2011b] -(updated 02/21/2011)<br />
# Unzip the files to a directory included in your MATLAB PATH setting. Recommended locations are:<br />
#* Windows: <pre>My Documents\MATLAB</pre><br />
#* Linux: <pre>$HOME/Documents/MATLAB</pre><br />
#* Mac: <pre>$HOME/Documents/MATLAB</pre><br />
<br />
Once the submit function files have been downloaded and unzipped in the above paths, restart MATLAB to ensure they are properly loaded in your environment.<br />
<br />
NOTE: If you choose not to use the above path recommendations, your MATLAB PATH may be viewed/altered by starting the MATLAB client on your workstation and clicking File -> Set Path and adding the path in which you unpacked the submit functions.<br />
<br />
==== Parallel Computing Toolbox Configuration ====<br />
<br />
The Parallel Computing Toolbox (PCT) enables language extensions in MATLAB that support dividing your application into tasks that can be executed in parallel. By default, all of these tasks will run on your local workstation using the pre-defined "local" PCT configuration.<br />
<br />
To run these tasks on the Cheaha compute cluster, a new configuration for the PCT must be defined. In this section we will create the "cheaha" configuration and run a quick validation test to confirm its operation.<br />
<br />
'''Prior to continuing''', make sure you:<br />
* can establish an SSH connection to Cheaha<br />
* have followed the steps in the previous section<br />
<br />
===== Create the "cheaha" PCT Configuration =====<br />
Download and save the Cheaha cluster configuration file for your MATLAB version<br />
# [http://projects.uabgrid.uab.edu/matlab/browser/trunk/parallel-configs/cheaha-R2011b.mat?format=raw R2010b, R2011a, R2011b] cluster configuration file<br />
#* Start MATLAB on your workstation<br />
#* Click the "Parallel" menu<br />
#* Click "Manage Configurations"<br />
#* In the "Configurations Manager" window, click "File -> Import"<br />
#* Browse to the location where you saved the '''cheaha-R2011b.mat''' file, select it, and click "Open"<br />
# [http://projects.uabgrid.uab.edu/matlab/browser/trunk/parallel-configs/cheaha-R2012a.settings?format=raw R2012a] cluster configuration file<br />
#* Start MATLAB R2012a on your workstation<br />
#* Click the "Parallel" menu<br />
#* Click "Manage Cluster Profiles"<br />
#* In the "Cluster Profile Manager" window, click the "Import" button on the toolbar<br />
#* Browse to the location where you saved the '''cheaha-R2012a.settings''' file, select it, and click "Open"<br />
<br />
The Configuration Manager for R2011b and prior should now list a new entry named "cheaha" as shown in the following image, R2012a and later will also have a new entry in the Cluster Profile list:<br />
[[Image:2011_config_mngr.png|none|x400px]]<br />
<br />
===== Personalize the "cheaha" PCT Configuration -2011b and earlier =====<br />
<br />
#Double click on cheaha in the Configuration Manager window to open the configuration editor. (Note: stretch the "Generic Scheduler Configuration Properties" window to the right so that you can view all of the text in the fields making it easier to read and edit correctly.) <br />
# Edit the following fields to use your personal data directories<br />
#* '''ClusterMatlabRoot''': Make sure that the Root directory of MATLAB installation for workers matches the exact version of MATLAB you are using on your workstation. In this example '''/share/apps/mathworks/R2011a''' matches a MATLAB R2011a workstation install. Change the "R2011a" to match your workstation MATLAB version.<br />
#* '''DataLocation''' : Change the directory path where job data is stored to an existing directory on your workstation where MATLAB can stage job files.<br />
#* '''ParallelSubmitFcn''': Change the text "YOURUSERID" to your login id on Cheaha<br />
#* '''SubmitFcn''' : Change the text "YOURUSERID" to your login id on Cheaha<br />
# Click 'OK'to save the configuration<br />
# SSH to cheaha and make sure to create the $USER_SCRATCH/matlab directory. If this directory does not exist, the parallel computing toolbox jobs will fail.<br />
<br />
The initial configuration will look similar to this screen shot. You will need to edit the fields as describe in the preceding steps before you can use the configuration. '''NOTE: be sure to replace the template user name settings "YOURUSERNAME" with the appropriate settings for your desktop and cluster account.''' <br />
<br />
[[Image:Cheaha_parallel_config.png|none|x650px]]<br />
<br />
===== Personalize the "cheaha" PCT Configuration -2012a =====<br />
<br />
#Double click on cheaha in the Configuration Manager window to open the configuration editor. (Note: stretch the "Generic Scheduler Configuration Properties" window to the right so that you can view all of the text in the fields making it easier to read and edit correctly.) <br />
# Edit the following fields to use your personal data directories<br />
#* '''ClusterMatlabRoot''': Make sure that the Root directory of MATLAB installation for workers matches the exact version of MATLAB you are using on your workstation. In this example '''/share/apps/mathworks/R2012a''' matches a MATLAB R2012a workstation install. Change the "R2012a" to match your workstation MATLAB version.<br />
#* '''DataLocation''' : Change the directory path where job data is stored to an existing directory on your workstation where MATLAB can stage job files.<br />
#* '''independentSubmitFcn''': Change the text "YOURUSERID" to your login id on Cheaha<br />
#* '''communicatingSubmitFcn''' : Change the text "YOURUSERID" to your login id on Cheaha<br />
# Click 'OK'to save the configuration<br />
# SSH to cheaha and make sure to create the $USER_SCRATCH/matlab directory. If this directory does not exist, the parallel computing toolbox jobs will fail.<br />
<br />
The initial configuration will look similar to this screen shot. You will need to edit the fields as describe in the preceding steps before you can use the configuration. '''NOTE: be sure to replace the template user name settings "YOURUSERNAME" and "YOURUSERID" with the appropriate settings for your desktop and cluster account.''' <br />
<br />
[[Image:MATLAB_R2012a_configuration.png|none|x650px]]<br />
<br />
===== Validate the "cheaha" PCT Configuration =====<br />
<br />
# Before starting validation please make sure the directory 'lustre/scratch/YOURUSERID/matlab' (please convert all settings to point to the new preferred location) or ''''/scratch/user/YOURUSERID/matlab'''' (preferred) exists on the scratch space on the Cheaha. If it does not please SSH into Cheaha and create the directory before proceeding.<br />
# Select Cheaha on the configuration manager page and click 'Start Validation'<br />
# Wait for the validation to complete. This might take a few minutes and you ask for User credentials on Cheaha. All tests other than 'Matlabpool' validate on the Cheaha and the output is as shown.<br />
<br />
[[Image:Validation.png|none|x400px]]<br />
<br />
===== Begin Using MATLAB DCS from your Desktop =====<br />
<br />
The MATLAB DCS is now configured for Desktop usage. A simple parallel wave job "rParforWave" to test the configuration is described in [[MatLab_DCS_Examples]].<br />
<br />
A summary of the above steps is available at [[MATLAB_workshop_2011]] with additional examples and submit scripts available in the [[MATLAB_workshop_2011#Workshop_Demo.27s | workshop demo ]] section.<br />
<br />
=== MATLAB DCS from Cheaha ===<br />
<br />
MATLAB can be started interactively from [[Cheaha]] via an SSH session using the [[MatLab CLI|command line]], X windows forwarding, or VNC. This is very similar to using MATLAB from your desktop: in order to leverage the compute power of the cluster the Parallel Computing Toolbox must be configured to send tasks to the cluster scheduler.<br />
<br />
If you do not follow these configuration steps, parallel tasks will be executed locally on the cluster log in node (head node). This will negatively impact your own and others interactive use of this log in node and may lead to your computations being stopped administratively.<br />
<br />
==== MATLAB Submit Functions ====<br />
<br />
The MATLAB submit functions create a cluster job context for your code. When running MATLAB from Cheaha, the submit functions are already installed and no additional actions are required by the user for this step.<br />
<br />
==== Parallel Computing Toolbox Configuration ====<br />
<br />
The Parallel Computing Toolbox (PCT) for your copy of MATLAB running in your cluster account must be configured to submit tasks to the compute nodes of the cluster. Keep in mind that running MATLAB interactively on the cluster is the same as running it from your own desktop: MATLAB runs all tasks on the machine on which it is running unless it is told to send the work to another computer. This condition holds even "inside" the Cheaha cluster: the cluster compute nodes are physically separate computers and MATLAB must request access via the scheduler just like any other job running on the cluster.<br />
<br />
Configuring the PCT when running MATLAB interactively from Cheaha is just like the configuration when running MATLAB from your desktop with two exceptions:<br />
* you must transfer any code or data to your Cheaha account explicitly outside of MATLAB using standard cluster procedures, aka SSH.<br />
* when MATLAB submits your tasks to the compute nodes, it benefits from the shared storage on the cluster and does not need to further copy your code and data to the compute nodes<br />
<br />
To address these differences, follow the [[#Parallel Computing Toolbox Configuration|PCT instructions above]] and when [[#Personalize the "cheaha" PCT Configuration|editing the "cheaha" configuration]] modify the steps for the following fields:<br />
# '''Folder where job data is stored (DataLocation)''': specify a directory in your personal Cheaha account<br />
# '''Function called when submitting parallel jobs (ParallelSubmitFnc)''': change the value to "{@ParallelSubmitFnc}"<br />
# '''Function called when submitting distributed jobs (DistributedSubmitFnc)''': change the value to "{@DistributedSubmitFnc}"<br />
# '''Job data location is accessible from both client and cluster nodes''': change this value to "True"<br />
<br />
Now save the "cheaha" configuring by clicking OK and proceeding to the [[#Validate the "cheaha" PCT Configuration|validation tests described above]]. Note: when running MATLAB interactively on Cheaha MatlabPool works and the validation tests are expected to pass (you'll see a green checkmark).<br />
<br />
== References ==<br />
* [http://www.mathworks.com/help/toolbox/distcomp/ Parallel Computing Toolbox on-line documentation]<br />
* [http://www.mathworks.com/access/helpdesk/help/pdf_doc/distcomp/distcomp.pdf Parallel Computing Toolbox User's Guide (pdf)] - The 655 page MATLAB User Guide for the Parallel Computing Toolbox and is recommended reading!<br />
<br />
{{MATLAB Support}}<br />
<br />
[[Category:MATLAB]][[Category:MATLAB installation]]</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Cheaha_Quick_Start_(Deprecated)&diff=5890Cheaha Quick Start (Deprecated)2019-02-11T19:40:34Z<p>Mhanby@uab.edu: </p>
<hr />
<div>'''NOTE: This page is still under development. Please refer to [[Cheaha_GettingStarted]] page for detailed documentation.'''<br />
<br />
Cheaha is a shared cluster computing environment for UAB researchers. Cheaha offers total 8.75 TFLOPS compute power, 200 TB high-performance storage and 2.8 TB memory. See [[Cheaha_Quick_Start_Hardware]] for more details on compute platform, but first let's get started with an example and see how easy it is to use.<br />
<br />
If you have any questions about Cheaha usage then please contact Research Computing team at support@listserv.uab.edu .<br />
<br />
== Logging In ==<br />
More [[Cheaha_GettingStarted#Login|detailed login instructions]] are also available.<br />
<br />
Most users will authenticate to Cheaha using their BlazerID and associated password using an SSH (Secure Shell) client. The basic syntax is as follows:<br />
<br />
<pre><br />
ssh BLAZERID@cheaha.uabgrid.uab.edu<br />
</pre><br />
<br />
== Hello Cheaha! ==<br />
A shared cluster environment like Cheaha uses a job scheduler to run tasks on the cluster to provide optimal resource sharing among users. Cheaha uses a job scheduling system call SGE to schedule and manage jobs. A user needs to tell SGE about resource requirements (e.g. CPU, memory) so that it can schedule jobs effectively. These resource requirements along with actual application code can be specified in a single file commonly referred as 'Job Script/File'. Following is a simple job script that prints job number and hostname.<br />
<br />
<pre><br />
#!/bin/bash<br />
#<br />
# Define the shell used by your compute job<br />
#$ -S /bin/bash<br />
#<br />
# Run in the current directory from where you submit the job<br />
#$ -cwd<br />
#<br />
# Set the maximum runtime for the job (ex: 10 minutes)<br />
#$ -l h_rt=00:10:00<br />
# Set the maximum amount of RAM needed per slot (ex: 512 MB's)<br />
#$ -l vf=512M<br />
#<br />
# Your email address<br />
#$ -M YOUR_EMAIL_ADDRESS<br />
#<br />
# Notification Options:<br />
# b Mail is sent at the beginning of the job<br />
# e Mail is sent at the end of the job<br />
# a Mail is sent when the job is aborted or rescheduled<br />
# s Mail is sent when the job is suspended<br />
# n No mail is sent<br />
#$ -m eas<br />
#<br />
# Use the environment from your current shell<br />
#<br />
#$ -V<br />
<br />
echo "The job $JOB_ID is running on $HOSTNAME"<br />
<br />
</pre><br />
<br />
Lines starting with '#$' have a special meaning in the SGE world. SGE specific configuration options are specified after the '#$' characters. Above configuration options are useful for most job scripts and for additional configuration options refer to SGE commands manual. A job script is submitted to the cluster using SGE specific commands. There are many commands available, but following three commands are the most common:<br />
* qsub - to submit job<br />
* qdel - to delete job<br />
* qstat - to view job status<br />
<br />
We can submit above job script using qsub command:<br />
<pre><br />
$ qsub HelloCheaha.sh<br />
Your job 9043385 (HelloCheaha.sh) has been submitted<br />
</pre><br />
<br />
When the job script is submitted, SGE queues it up and assigns it a job number (e.g. 9043385 in above example). The job number is available inside job script using environment variable $JOB_ID. This variable can be used inside job script to create job related directory structure or file names. [[Cheaha_Quick_Start_Job_Script_Examples]] provides more job script examples and [[Cheaha_Quick_Start_SGE_Commands]] provides more information about SGE commands.<br />
<br />
== Software ==<br />
Cheaha's software stack includes many scientific computing softwares. Below is list of popular softwares available on Cheaha:<br />
* [[Amber]]<br />
* [[FFTW]]<br />
* [[Gromacs]]<br />
* [[GSL]]<br />
* [[NAMD]]<br />
* [[VMD]]<br />
* [[Intel Compilers]]<br />
* [[GNU Compilers]]<br />
* [[Java]]<br />
* [[R]]<br />
* [[OpenMPI]]<br />
* [[MATLAB]]<br />
<br />
These softwares can be included in a job environment using [http://modules.sourceforge.net/ environment modules]. Environment modules make environment variables modification easy and repeatable. Please refer to [[Cheaha_Quick_Start_Softwares]] page for more details.<br />
<br />
== Storage ==<br />
A non-trivial analysis requires a good storage backend that supports large file staging, access control and performance. Cheaha storage fabric includes a high-performance parallel file system called [http://wiki.lustre.org/index.php/Main_Page Lustre] which handles large files efficiently. It's available for all Cheaha users and specific details are covered on [[Cheaha_Quick_Start_Storage]] page.<br />
<br />
== Graphical Interface ==<br />
Some applications use graphical interface to perform certain actions (e.g. submit buttons, file selections etc.). Cheaha supports graphical applications using an interactive X-Windows session with SGE's qrsh command. This will allow you to run graphical applications like MATLAB or AFNI on Cheaha. Refer to [[Cheaha_Quick_Start_Interactive_Jobs]] for details on running graphical X-Windows applications.<br />
<br />
== Scheduling Policies ==<br />
<br />
== Support ==<br />
If you have any questions about our documentation or need any help with Cheaha then please contact us on support@listserv.uab.edu . Cheaha is maintained by [[About_Research_Computing|UAB IT's Research Computing team]].</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=UnderstandingDocs&diff=5889UnderstandingDocs2019-02-11T19:39:13Z<p>Mhanby@uab.edu: </p>
<hr />
<div>'''Docs''' is a wiki platform for documentation built on the popular Mediawiki platform, the same platform used to run Wikipedia. The docs wiki platform is hosted on our nascent Research Computing System, a campus cloud platform that hosts a growing number of research-related applications and services.<br />
<br />
For more information, please contact [mailto:support@listserv.uab.edu support@listserv.uab.edu].</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Cheaha&diff=5888Cheaha2019-02-11T19:38:04Z<p>Mhanby@uab.edu: Changed support@vo.uabgrid.uab.edu to support@listserv.uab.edu</p>
<hr />
<div>{{Main_Banner}}<br />
'''Cheaha''' is a campus resource dedicated to enhancing research computing productivity at UAB. [http://cheaha.uabgrid.uab.edu Cheaha] is managed by [http://www.uab.edu/it UAB Information Technology's Research Computing group (UAB ITRC)] and is available to members of the UAB community in need of increased computational capacity. Cheaha supports [http://en.wikipedia.org/wiki/High-performance_computing high-performance computing (HPC)] and [http://en.wikipedia.org/wiki/High-throughput_computing high throughput computing (HTC)] paradigms.<br />
<br />
Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a [[Setting_Up_VNC_Session|cluster desktop]]. The local compute pool provides access to compute hardware based on the [http://en.wikipedia.org/wiki/X86_64 x86-64 64-bit architecture]. The compute resources are organized into a unified Research Computing System. The compute fabric for this system is anchored by the Cheaha cluster, [[ Resources |a commodity cluster with approximately 2400 cores]] connected by low-latency Fourteen Data Rate (FDR) InfiniBand networks. The compute nodes are backed by 6.6PB raw GPFS storage on DDN SFA12KX hardware, an additional 20TB available for home directories on a traditional Hitachi SAN, and other ancillary services. The compute nodes combine to provide over 110TFlops of dedicated computing power. <br />
<br />
Cheaha is composed of resources that span data centers located in the UAB Shared Computing facility UAB 936 Building and the RUST Computer Center. Resource design and development is lead by UAB IT Research Computing in open collaboration with community members. Operational [mailto:support@listserv.uab.edu support] is provided by UAB IT's Research Computing group.<br />
<br />
Cheaha is named in honor of [http://en.wikipedia.org/wiki/Cheaha_Mountain Cheaha Mountain], the highest peak in the state of Alabama. Cheaha is a popular destination whose summit offers clear vistas of the surrounding landscape. (Cheaha Mountain photo-streams on [http://www.flickr.com/search/?q=cheaha Flikr] and [http://picasaweb.google.com/lh/view?q=cheaha&psc=G&filter=1# Picasa]).<br />
<br />
== Using ==<br />
<br />
=== Getting Started ===<br />
<br />
For information on getting an account, logging in, and running a job, please see [[Cheaha2_GettingStarted|Getting Started]].<br />
<br />
== History ==<br />
<br />
[[Image:Research-computing-platform.png|right|thumb|450px|Logical Diagram of Cheaha Configuration]]<br />
<br />
=== 2005 ===<br />
<br />
In 2002 UAB was awarded an infrastructure development grant through the NSF EPsCoR program. This led to the 2005 acquisition of a 64 node compute cluster with two AMD Opteron 242 1.6Ghz CPUs per node (128 total cores). This cluster was named Cheaha. Cheaha expanded the compute capacity available at UAB and was the first general-access resource for the community. It lead to expanded roles for UAB IT in research computing support through the development of the UAB Shared HPC Facility in BEC and provided further engagement in Globus-based grid computing resource development on campus via UABgrid and regionally via [http://www.suragrid.org SURAgrid].<br />
<br />
=== 2008 ===<br />
<br />
In 2008, money was allocated by UAB IT for hardware upgrades which lead to the acquisition of an additional 192 cores based on a Dell clustering solution with Intel Quad-Core E5450 3.0Ghz CPU in August of 2008. This uprade migrated Cheaha's core infrastructure to the Dell blade clustering solution. It provided a 3 fold increase in processor density over the original hardware and enables more computing power to be located in the same physical space with room for expansion, an important consideration in light of the continued growth in processing demand. This hardware represented a major technology upgrade that included space for additional expansion to address over-all capacity demand and enable resource reservation. <br />
<br />
The 2008 upgrade began a continuous resource improvement plan that includes a phased development approach for Cheaha with on-going increases in capacity and feature enhancements being brought into production via an [http://projects.uabgrid.uab.edu/cheaha open community process].<br />
<br />
Software improvements rolled into the 2008 upgrade included grid computing services to access distributed compute resources and orchestrate jobs using the [http://www.gridway.org GridWay] meta-scheduler. An initial 10Gigabit Ethernet link establishing the UABgrid Research Network was designed to supports high speed data transfers between clusters connected to this network.<br />
<br />
=== 2009 ===<br />
<br />
In 2009, annual investment funds were directed toward establishing a fully connected dual data rate Infiniband network between the compute nodes added in 2008 and laying the foundation for a research storage system with a 60TB DDN storage system accessed via the Lustre distributed file system. The Infiniband and storage fabrics were designed to support significant increases in research data sets and their associate analytical demand.<br />
<br />
=== 2010 ===<br />
<br />
In 2010, UAB was awarded an NIH Small Instrumentation Grant (SIG) to further increase analytical and storage capacity. The grant funds were combined with the annual investment funds adding 576 cores (48 nodes) based on the Intel Westmere 2.66 GHz CPU, a quad data rate Infiniband fabric with 32 uplinks, an additional 120 TB of storage for the DDN fabric, and additional hardware to improve reliability. Additional improvements to the research compute platform involved extending the UAB Research Network to link the BEC and RUST data centers, adding 20TB of user and ancillary services storage<br />
<br />
=== 2012 ===<br />
<br />
In 2012, UAB IT Research Computing invested in the foundation hardware to expand long term storage and virtual machine capabilities with aqcuisition of 12 Dell 720xd system, each containing 16 cores, 96GB RAM, and 36TB of storage, creating a 192 core and 432TB virtual compute and storage fabric.<br />
<br />
Additionaly hardware investment by the School of Public Health's Section on Statistical Genetics added three 384GB large memory nodes and an additional 48 cores to the QDR Infiniband fabric.<br />
<br />
=== 2013 ===<br />
<br />
In 2013, UAB IT Research Computing acquired an [http://blogs.uabgrid.uab.edu/jpr/2013/03/were-going-with-openstack/ OpenStack cloud and Ceph storage software fabric] through a partnership between Dell and Inktank in order to [http://dev.uabgrid.uab.edu extend cloud computing solutions] to the researchers at UAB and enhance the interfacing capabilities for HPC.<br />
<br />
=== 2015 === <br />
<br />
UAB IT received $500,000 from the university’s Mission Support Fund for a compute cluster seed expansion of 48 teraflops. This added 936 cores across 40 nodes with 2x12 core 2.5 GHz Intel Xeon E5-2680 v3 compute nodes and FDR InfiniBand interconnect.<br />
<br />
UAB received a $500,000 grant from the Alabama Innovation Fund for a three petabyte research storage array. This funding with additional matching from UAB provided a multi-petabyte [https://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] parallel file system to the cluster which went live in 2016.<br />
<br />
=== 2016 ===<br />
<br />
In 2016 UAB IT Research computing received additional funding from Deans of CAS, Engineering, and Public Heath to grow the compute capacity provided by the prior year's seed funding. This added an additional compute nodes providing researchers at UAB with a 96 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. More information can be found at [[Resources]]. <br />
<br />
In addition to the compute, the GPFS six petabyte file system came online. This file system, provided each user five terabyte of personal space, additional space for shared projects and a greatly expanded scratch storage all in a single file system.<br />
<br />
The 2015 and 2016 investments combined to provide a completely new core for the Cheaha cluster, allowing the retirement of earlier compute generations.<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. If you are using Cheaha for grant funded research please send information about your grant (funding source and grant number), a statement of intent for the research project and a list of the applications you are using to UAB IT Research Computing. If you are using Cheaha for exploratory research, please send a similar note on your research interest. Finally, any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. Please note, your acknowledgment may also need to include an addition statement acknowledging grant-funded hardware. We also ask that you send any references to publications based on your use of Cheaha compute resources.<br />
<br />
=== Description of Cheaha for Grants (short) ===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2900 conventional Intel 64-bit x86 CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
==== Cheaha HPC system ====<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (RC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. Research Computing in open collaboration with the campus research community is leading the design and development of these resources.<br />
<br />
==== Compute Resources ====<br />
<br />
Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a cluster desktop. The local compute pool provides access to two generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The newest generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs per node, and EDR InfiniBand interconnect. The compute nodes combine to provide over 468 TLOP/s of dedicated computing power.<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (OSG).<br />
<br />
==== Storage Resources ====<br />
<br />
The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware connected via the Infiniband fabric. An expansion of the GPFS fabric will double the capacity and is scheduled to be on-line Fall 2018. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
==== Network Resources ====<br />
<br />
The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverages the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone, ensuring predictable latencies. In addition, the network also consist of a secure Science DMZ with data transfer nodes (DTNs), Perfsonar measurement nodes, and a Bro security node connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using 10 Gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
==== Personnel ====<br />
<br />
UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes an HPC Architect-Manager, four Software developers, two Scientists, a system administrator and a project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
This work was supported in part by the National Science Foundation under Grants Nos. OAC-1541310, the University of Alabama at Birmingham, and the Alabama Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Alabama at Birmingham.<br />
<br />
== System Profile ==<br />
<br />
=== Performance ===<br />
{{CheahaTflops}}<br />
<br />
=== Hardware ===<br />
<br />
The Cheaha Compute Platform includes three generations of commodity compute hardware, totaling 868 compute cores, 2.8TB of RAM, and over 200TB of storage.<br />
<br />
The hardware is grouped into generations designated gen1, gen2, and gen3 (oldest to newest). The following descriptions highlight the hardware profile for each generation. <br />
<br />
* Generation 1 (gen1) -- 64 2-CPU AMD 1.6 GHz compute nodes with Gigabit interconnect. This is the original hardware collection purchased with NSF EPSCoR funds in 2005, approx $150K. These nodes are sometimes called the "Verari" nodes. These nodes are tagged as "verari-compute-#-#" in the ROCKS naming convention.<br />
* Generation 2 (gen2) -- 24 2x4 core (196 cores total) Intel 3.0 GHz Intel compute nodes with dual data rate Infiniband interconnect and the initial high-perf storage implementation using 60TB DDN. This is the hardware collection purchased exclusively with the annual VPIT funds allocation, approx $150K/yr for the 2008 and 2009 fiscal years. These nodes are sometimes confusingly called "cheaha2" or "cheaha" nodes. These nodes are tagged as "cheaha-compute-#-#" in the ROCKS naming convention. <br />
* Generation 3 (gen3) -- 48 2x6 core (576 cores total) 2.66 GHz Intel compute nodes with quad data rate Infiniband, ScaleMP, and the high-perf storage build-out for capacity and redundancy with 120TB DDN. This is the hardware collection purchased with a combination of the NIH SIG funds and some of the 2010 annual VPIT investment. These nodes were given the code name "sipsey" and tagged as such in the node naming for the queue system. These nodes are tagged as "sipsey-compute-#-#" in the ROCKS naming convention. 16 of the gen3 nodes (sipsey-compute-0-1 thru sipsey-compute-0-16) were upgraded in 2014 from 48GB to 96GB of memory per node. <br />
* Generation 4 (gen4) -- 3 16 core (48 cores total) compute nodes. This hardware collection purchase by [http://www.soph.uab.edu/ssg/people/tiwari Dr. Hemant Tiwari of SSG]. These nodes were given the code name "ssg" and tagged as such in the node naming for the queue system. These nodes are tagged as "ssg-compute-0-#" in the ROCKS naming convention. <br />
* Generation 6 (gen6) -- <br />
** 44 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards (4 nodes with NVIDIA K80 GPUs and 4 nodes with Intel Xeon Phi 7120P accelerators)<br />
** 38 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 256GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 14 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 384GB DDR4 RAM, FDR InfiniBand and 10GigE network card<br />
** FDR InfiniBand Switch<br />
** 10Gigabit Ethernet Switch<br />
** Management node and gigabit switch for cluster management<br />
** Bright Advanced Cluster Management software licenses <br />
<br />
Summarized, Cheaha's compute pool includes:<br />
* gen4 is 48 cores of [http://ark.intel.com/products/64583/Intel-Xeon-Processor-E5-2680-20M-Cache-2_70-GHz-8_00-GTs-Intel-QPI 2.70GHz eight-core Intel Xeon E5-2680 processors] with 24G of RAM per core or 384GB total<br />
* gen3 is 192 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 8Gb RAM per core or 96GB total<br />
* gen3 is 384 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 4Gb RAM per core or 48GB total<br />
* gen2 is 192 cores of [http://ark.intel.com/products/33083/Intel-Xeon-Processor-E5450-12M-Cache-3_00-GHz-1333-MHz-FSB 3.0GHz quad-core Intel Xeon E5450 processors] with 2Gb RAM per core<br />
* gen1 is 100 cores of 1.6GhZ AMD Opteron 242 processors with 1Gb RAM per core <br />
<br />
{|border="1" cellpadding="2" cellspacing="0"<br />
|+ Physical Nodes<br />
|- bgcolor=grey<br />
!gen!!queue!!#nodes!!cores/node!!RAM/node<br />
|-<br />
|gen6|| default || 44 || 24 || 128G<br />
|-<br />
|gen6|| default || 38 || 24 || 256G<br />
|-<br />
|gen6|| default || 14 || 24 || 384G<br />
|-<br />
|gen5||Ceph/OpenStack|| 12 || 20 || 96G<br />
|-<br />
|gen4||ssg||3||16||385G<br />
|-<br />
|gen3||sipsey||16||12||96G<br />
|-<br />
|gen3||sipsey||32||12||48G<br />
|-<br />
|gen2||cheaha||24||8||16G<br />
|}<br />
<br />
=== Software ===<br />
<br />
Details of the software available on Cheaha can be found on the [https://docs.uabgrid.uab.edu/wiki/Cheaha_Software Installed software page], an overview follows.<br />
<br />
Cheaha uses [http://modules.sourceforge.net/ Environment Modules] to support account configuration. Please follow these [http://me.eng.uab.edu/wiki/index.php?title=Cheaha#Environment_Modules specific steps for using environment modules].<br />
<br />
Cheaha's software stack is built with the [http://www.brightcomputing.com Bright Cluster Manager]. Cheaha's operating system is CentOS with the following major cluster components:<br />
* BrightCM 7.2<br />
* CentOS 7.2 x86_64<br />
* [[Slurm]] 15.08<br />
<br />
A brief summary of the some of the available computational software and tools available includes:<br />
* Amber<br />
* FFTW<br />
* Gromacs<br />
* GSL<br />
* NAMD<br />
* VMD<br />
* Intel Compilers<br />
* GNU Compilers<br />
* Java<br />
* R<br />
* OpenMPI<br />
* MATLAB<br />
<br />
=== Network ===<br />
<br />
Cheaha is connected to the UAB Research Network which provides a dedicated 10Gbs networking backplane between clusters located in the 936 data center and the campus network core. Data transfers rates of almost 8Gbps between these hosts have been demonstrated using Grid FTP, a multi-channel file transfer service that is used to move data between clusters as part of the job management operations. This performance promises very efficient job management and the seamless integration of other clusters as connectivity to the research network is expanded.<br />
<br />
=== Benchmarks ===<br />
<br />
The continuous resource improvement process involves collecting benchmarks of the system. One of the measures of greatest interest to users of the system are benchmarks of specific application codes. The following benchmarks have been performed on the system and will be further expanded as additional benchmarks are performed.<br />
<br />
* [[Cheaha-BGL_Comparison|Cheaha-BGL Comparison]]<br />
<br />
* [[Gromacs_Benchmark|Gromacs]]<br />
<br />
* [[NAMD_Benchmarks|NAMD]]<br />
<br />
=== Cluster Usage Statistics ===<br />
<br />
Cheaha uses Bright Cluster Manager to report cluster performance data. This information provides a helpful overview of the current and historical operating stats for Cheaha. You can access the status monitoring page [https://cheaha-master01.rc.uab.edu/userportal/ here] (accessible only on the UAB network or through VPN).<br />
<br />
== Availability ==<br />
<br />
Cheaha is a general-purpose computer resource made available to the UAB community by UAB IT. As such, it is available for legitimate research and educational needs and is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources. <br />
<br />
Many software packages commonly used across UAB are available via Cheaha.<br />
<br />
To request access to Cheaha, please send a request to [mailto:support@listserv.uab.edu send a request] to the cluster support group.<br />
<br />
Cheaha's intended use implies broad access to the community, however, no guarantees are made that specific computational resources will be available to all users. Availability guarantees can only be made for reserved resources.<br />
<br />
=== Secure Shell Access ===<br />
<br />
Please configure you client secure shell software to use the official host name to access Cheaha:<br />
<br />
<pre><br />
cheaha.rc.uab.edu<br />
</pre><br />
<br />
== Scheduling Framework ==<br />
<br />
[http://slurm.schedmd.com/ Slurm] is a queue management system and stands for Simple Linux Utility for Resource Management. Slurm was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. '''[[Slurm]]''' is now the primary job manager on Cheaha, it replaces SUN Grid Engine (SGE) the job manager used earlier.<br />
<br />
Slurm is similar in many ways to GridEngine or most other queue systems. You write a batch script then submit it to the queue manager (scheduler). The queue manager then schedules your job to run on the queue (or '''partition''' in Slurm parlance) that you designate. Below we will provide an outline of how to submit jobs to Slurm, how Slurm decides when to schedule your job, and how to monitor progress.<br />
<br />
== Support ==<br />
<br />
Operational support for Cheaha is provided by the Research Computing group in UAB IT. For questions regarding the operational status of Cheaha please send your request to [mailto:support@listserv.uab.edu support@listserv.uab.edu]. As a user of Cheaha you will automatically by subscribed to the hpc-announce email list. This subscription is mandatory for all users of Cheaha. It is our way of communicating important information regarding Cheaha to you. The traffic on this list is restricted to official communication and has a very low volume.<br />
<br />
We have limited capacity, however, to support non-operational issue like "How do I write a job script" or "How do I compile a program". For such requests, you may find it more fruitful to send your questions to the hpc-users email list and request help from our peers in the HPC community at UAB. As with all mailing lists, please observe [http://lifehacker.com/5473859/basic-etiquette-for-email-lists-and-forums common mailing etiquette].<br />
<br />
Finally, please remember that as you learned about HPC from others it becomes part of your responsibilty to help others on their quest. You should update this documentation or respond to mailing list requests of others. <br />
<br />
You can subscribe to hpc-users by sending an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=subscribe%20hpc-users sympa@vo.uabgrid.uab.edu with the subject ''subscribe hpc-users''].<br />
<br />
You can unsubribe from hpc-users by sending an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=unsubscribe%20hpc-users sympa@vo.uabgrid.uab.edu with the subject ''unsubscribe hpc-users''].<br />
<br />
You can review archives of the list in the [http://vo.uabgrid.uab.edu/sympa/arc/hpc-users web hpc-archives].<br />
<br />
If you need help using the list service please send an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=help sympa@vo.uabgrid.uab.edu with the subject ''help'']<br />
<br />
If you have questions about the operation of the list itself, please send an email to the owners of the list:<br />
<br />
[mailto:hpc-users-request@vo.uabgrid.uab.edu sympa@vo.uabgrid.uab.edu with a subject relavent to your issue with the list]<br />
<br />
If you are interested in contributing to the enhancement of HPC features at UAB or would like to talk to other cluster administrators, [mailto:sympa@vo.uabgrid.uab.edu?subject=subscribe%20hpc-dev please join the hpc developers community at UAB].</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=SSH_Key_Authentication&diff=5817SSH Key Authentication2018-08-28T18:02:50Z<p>Mhanby@uab.edu: /* Putty */</p>
<hr />
<div>These instructions assist existing users of Cheaha in getting access to new Cheaha.<br />
<br />
<br />
===Mac OS X===<br />
<br />
* On your Mac open '''Terminal''' application. <br />
* Run the following command on your '''terminal: <br />
<pre><br />
ssh-keygen -t rsa<br />
</pre> <br />
* You can put a passphrase for your SSH key (''' Not mandatory but highly recommended''')<br />
* A '''id_rsa.pub''' file would have been created.<br />
* Open the file by running '''less .ssh/id_rsa.pub''' and copy the content.<br />
* Press '''q''' to exit out of the file.<br />
* Now SSH to your '''cheaha.rc.uab.edu''' account , and paste the content in '''~/.ssh/authorized_keys''' using your favorite editor.<br />
* Now '''log out''' from cheaha.rc.uab.edu and login again. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run '''ssh blazerid@cheaha.rc.uab.edu''' from next time.<br />
<br />
===Linux===<br />
<br />
* On your linux machine open '''Terminal''' application. <br />
* Run the following command on your '''terminal: <br />
<pre><br />
ssh-keygen -t rsa<br />
</pre> <br />
* You can put a passphrase for your SSH key (''' Not mandatory but highly recommended''')<br />
* A '''id_rsa.pub''' file would have been created.<br />
* Open the file by running '''less .ssh/id_rsa.pub''' and copy the content.<br />
* Press '''q''' to exit out of the file.<br />
* Now SSH to your '''cheaha.rc.uab.edu''' account , and paste the content in '''~/.ssh/authorized_keys''' using your favorite editor.<br />
* Now '''log out''' from cheaha.rc.uab.edu and login again. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run '''ssh blazerid@cheaha.rc.uab.edu''' from next time.<br />
<br />
===Windows===<br />
<br />
====Putty====<br />
<br />
You will require a tool called '''puttygen''', to generate SSH keys for the pairing purpose. You can download it [http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html here]. Once you have downloaded and installed '''putty''' and '''puttygen''', follow these instructions:<br />
<br />
* Launch PuTTY Key Generator.<br />
* Click the Generate button and press a series of random keystrokes to aid in generating the key.<br />
* Enter a unique key passphrase in the Key passphrase and Confirm passphrase fields.<br />
* Save the public and private keys by clicking the Save public key and Save private key buttons.<br />
* Right click the filed '''Public key for pasting into OpenSSH authorized_keys file''', choose '''Select All''', right click again and select Copy<br />
* Now open application '''Putty'''.<br />
* Set up your session for '''cheaha.rc.uab.edu''' in PuTTy. (If you don't know how, follow these [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#PuTTY instructions]).<br />
* Login to your Cheaha account.<br />
* Paste the content of the '''Public key''' that you previously copied to the clip board in '''Puttygen''' into the '''~/.ssh/authorized_keys''' file using your favorite editor.<br />
* Now select your saved session for '''cheaha.rc.uab.edu'''.<br />
* Click '''Connection > SSH > Auth''' in the left-hand navigation pane and configure the private key to use by clicking Browse under Private key file for authentication.<br />
* Navigate to the location where you saved your private key earlier, select the file, and click Open.<br />
* The private key path is now displayed in the Private key file for authentication field.<br />
* Click Session in the left-hand navigation pane and click '''Save''' in the Load, save or delete a stored session section.<br />
* Click Open to begin your session with the server. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run your '''cheaha.rc.uab.edu''' profile from next time.<br />
<br />
====SSH Secure Shell Client====<br />
<br />
* In SSH Secure Shell, from the '''Edit''' menu, select '''Settings...''' <br />
* In the window that opens, select '''Global Settings''', then '''User Authentication''', and then '''Keys'''.<br />
* Under "Key pair management", click Generate New.... In the window that appears, click Next.<br />
* In the Key Generation window that appears:<br />
** From the drop-down list next to '''Key Type:''', select from the following:<br />
***If you want to take less time to initially generate the key, select '''DSA'''.<br />
*** If you want to take less time during each connection for the server to verify your key, select '''RSA'''.<br />
** From the the drop-down list next to '''Key Length:''', select at least '''1024'''. You may choose a greater key length, but the time it takes to generate the key, as well as the time it takes to authenticate using it, will go up.<br />
* Click '''Next'''. The key generation process will start. When it's complete, click Next again.<br />
* In the '''File Name:''' field, enter a name for the file where SSH Secure Shell will store your '''private key'''. Your '''public key''' will be stored in a file with the same name, plus a '''.pub extension'''. <br />
** '''Important:''' You can put a passphrase for your SSH key ( Not mandatory but highly recommended)<br />
* To complete the key generation process, click '''Next''', and then '''Finish'''.<br />
* At the '''Settings''' screen, click '''OK'''.<br />
* Copy the content of .pub file generated.<br />
* Now SSH to your '''cheaha.rc.uab.edu''' account, following the instructions [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#SSH_Secure_Shell_Client here] , and paste the content in '''~/.ssh/authorized_keys''' using your favorite editor.<br />
* Now '''exit/logout''' from your account on '''cheaha.uabgrid.uab.edu''' and login again. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run your '''cheaha.rc.uab.edu''' profile from next time.</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Galaxy&diff=5680Galaxy2017-12-13T14:48:29Z<p>Mhanby@uab.edu: /* Support */</p>
<hr />
<div>__TOC__<br />
<br />
= Overview =<br />
The UAB Galaxy platform for experimental biology and comparative genomics is designed to help you analyze multiple alignments, compare genomic annotations, profile metagenomic samples and more from your web browser. This platform is built on [http://main.g2.bx.psu.edu/ Galaxy], backed by the [http://docs.uabgrid.uab.edu/wiki/Cheaha Cheaha compute cluster], and powered by [http://uabgrid.uab.edu/ UABgrid]. <br />
<br />
The primary uses of UAB Galaxy are to provide a simple web interface for NGS (short read sequencing) analysis for genomic and transcriptomic datasets, using tools like BWA, Bowtie, Tophat and Cufflinks, as well as simple sequence manipulation via the EMBOSS toolkit.<br />
<br />
== Using Galaxy / [[UAB Galaxy Workshop Tutorial|Tutorials]] ==<br />
<br />
There are numerous [http://wiki.g2.bx.psu.edu/Learn/Screencasts general tutorials] online at the [http://main.g2.bx.psu.edu/ Penn State public Galaxy site] that are worth looking at.<br />
<br />
There are also several [[UAB Galaxy Workshop Tutorial|UAB tutorials on NGS Analysis with Galaxy]], created for [[2011_HPC_Boot_Camp|HPC Boot Camp 2011]] and a nice talk by Jeremy Goecks during [[2011|Research Computing Day 2011.]]<br />
<br />
== Support == <br />
UAB galaxy-users list-serv: [https://listserv.uab.edu/scgi-bin/wa?SUBED1=GALAXY-HELP&A=1 subscribe] [https://listserv.uab.edu/scgi-bin/wa?SUBED1=GALAXY-HELP&A=1 search]. <br />
<br />
UAB galaxy-help list-serv: [mailto:galaxy-help@listserv.uab.edu] to contact admins of the UAB galaxy instance.<br />
<br />
== Privacy ==<br />
<br />
Note that your data will be stored on the cluster filesystem, and while not accessible to ordinary users, it can be easily accessed by any of the galaxy or cluster administrators. It is not encrypted. Do not store sensitive information in this system.<br />
<br />
= Galaxy@UAB =<br />
The UAB Galaxy instance can be accessed at https://galaxy.uabgrid.uab.edu using BlazerID credentials. No account on the cluster is needed. <br />
However, the tools installed for galaxy (BWA, etc) can be accessed via the command line if you have an account on the cluster.<br />
<br />
== Loading Data == <br />
See [[Galaxy_File_Uploads]].<br />
<br />
== Available Tools == <br />
Following is a partial list highlighting some of the important tools available. Additional tools can be installed upon request. To search for tools already integrated into the Galaxy system, see the [http://toolshed.g2.bx.psu.edu/ Galaxy ToolShed].<br />
<br />
<br />
{| border="1"<br />
|+ <br />
! Software !! Version !! Information<br />
|-<br />
! bwa<br />
| 0.5.9-r26 || Align genomic short reads to a reference genome<br />
|-<br />
! bowtie <br />
| 0.12.7 || Align genomic short reads to a reference genome<br />
|-<br />
! tophat<br />
| 1.4.0 || Align transcriptome short reads to a reference genome<br />
|-<br />
! cufflinks, cuffdiff, cuffcompare<br />
| 1.3.0 || Reconstruct and quantify transcript levels from tophat alignments.<br />
|-<br />
! samtools<br />
| 0.1.12a || Alignment (SAM/BAM file) manipulations<br />
|-<br />
! velvet<br />
| 1.1.03 || Denovo Assembly<br />
|-<br />
! [http://en.wikipedia.org/wiki/EMBOSS EMBOSS]<br />
| 6.3.1 || European Molecular Biology Open Software Suite - sequence manipulation and format conversion<br />
|-<br />
|}<br />
<br />
== Installed Genome Indexes ==<br />
<br />
You can always use your own genome by uploading the .fasta into your history, but alignments against installed (pre-indexed) genomes run much more quickly. If you need an additional genome installed, please contact [mailto:galaxy-help@vo.uabgrid.uab.edu].<br />
{| border="1"<br />
|+ <br />
! dbkey !! Genome !! Accessions<br />
|-<br />
| hg19 || Human Feb. 2009 (GRCh37/hg19) (hg19)<br />
|-<br />
| hg18 || Human Mar. 2006 (NCBI36/hg18) (hg18)<br />
|-<br />
| hg17 || Human May 2004 (NCBI35/hg17) (hg17)<br />
|-<br />
| hg16 || Human July 2003 (NCBI34/hg16) (hg16)<br />
|-<br />
| mm10 || Mouse Dec. 2011 (GRCm38/mm10) (mm10)<br />
|-<br />
| mm9 || Mouse July 2007 (NCBI37/mm9) (mm9)<br />
|-<br />
| mm8 || Mouse Feb. 2006 (NCBI36/mm8) (mm8)<br />
|-<br />
| mm7 || Mouse Aug. 2005 (NCBI35/mm7) (mm7)<br />
|-<br />
| mm6 <br />
|-<br />
| mm5<br />
|-<br />
|sacCer3 || S. cerevisiae Apr. 2011 (SacCer_Apr2011/sacCer3) (sacCer3)<br />
|-<br />
|sacCer2 || S. cerevisiae June 2008 (SGD/sacCer2) (sacCer2)<br />
|-<br />
|ce10 || C. elegans Oct. 2010 (WS220/ce10) (ce10)<br />
|-<br />
|rn5 || Rat Mar. 2012 (RGSC 5.0/rn5) (rn5)<br />
|-<br />
|rn4 || Rat Nov. 2004 (Baylor 3.4/rn4) (rn4)<br />
|-<br />
|danRer7 || Zebrafish Jul. 2010 (Zv9/danRer7) (danRer7)<br />
|-<br />
|eschColi_APEC_O1 || Escherichia coli APEC O1 || chr=5082025<br />
|-<br />
|eschColi_CFT073 || Escherichia coli CFT073 || chr=5231428<br />
|-<br />
|eschColi_EC4115 || Escherichia coli EC4115 || chr=5572075,plasmid_pO157=94644,plasmid_pEC4115=37452<br />
|-<br />
|eschColi_K12 || Escherichia coli K12 || chr=4639675<br />
|-<br />
|eschColi_EDL993 || Escherichia coli O157:H7 EDL933 || NC_007414=92077,NC_002655=5528445<br />
|-<br />
|eschColi_O157H7 || Escherichia coli O157:H7 EDL933 || NC_007414=92077,NC_002655=5528445<br />
|-<br />
|eschColi_TW14359 || Escherichia coli TW14359 || chr=5528136,plasmid_pO157=94601<br />
|-<br />
|}<br />
<br />
== Additional Genomes that can be quickly installed ==<br />
These are pre-indexed genomes we can easily download from Penn State's [http://wiki.galaxyproject.org/Admin/Data%20Integration Galaxy Data-Cache]<br />
<br />
=== Organisms ===<br />
* AaegL1<br />
* Acropora_digitifera<br />
* AgamP3<br />
* Arabidopsis_thaliana_TAIR10<br />
* Arabidopsis_thaliana_TAIR9<br />
* Araly1<br />
* Bombyx_mori_p50T_2.0<br />
* CpipJ1<br />
* Homo_sapiens_AK1<br />
* Homo_sapiens_nuHg19_mtrCRS<br />
* Hydra_JCVI<br />
* IscaW1<br />
* PhumU1<br />
* Physcomitrella_patens_patens<br />
* Ptrichocarpa_156<br />
* Saccharomyces_cerevisiae_S288C_SGD2010<br />
* Schizosaccharomyces_pombe_1.1<br />
* Spur_v2.6<br />
* Sscrofa9.58<br />
* Tcacao_1.0<br />
* Tcas_3.0<br />
* Theobroma_cocoa<br />
* Zea_mays_B73_RefGen_v2<br />
* ailMel1<br />
* anoCar1<br />
* anoCar2<br />
* anoGam1<br />
* apiMel1<br />
* apiMel2<br />
* apiMel3<br />
* apiMel4.5<br />
* aplCal1<br />
* bighorn_sheep<br />
* borEut13<br />
* bosTau2<br />
* bosTau3<br />
* bosTau4<br />
* bosTau5<br />
* bosTau6<br />
* bosTau7<br />
* bosTauMd3<br />
* braFlo1<br />
* caeJap1<br />
* caePb1<br />
* caePb2<br />
* caeRem2<br />
* caeRem3<br />
* calJac1<br />
* calJac3<br />
* canFam1<br />
* canFam2<br />
* canFam3<br />
* cavPor2<br />
* cavPor3<br />
* cb3<br />
* ce10<br />
* ce2<br />
* ce3<br />
* ce4<br />
* ce5<br />
* ce6<br />
* ce7<br />
* ce8<br />
* ce9<br />
* choHof1<br />
* chrPic1<br />
* ci2<br />
* danRer2<br />
* danRer3<br />
* danRer4<br />
* danRer5<br />
* danRer6<br />
* danRer7<br />
* dasNov1<br />
* dasNov2<br />
* dipOrd1<br />
* dm1<br />
* dm2<br />
* dm3<br />
* dp3<br />
* dp4<br />
* droAna1<br />
* droAna2<br />
* droAna3<br />
* droEre1<br />
* droEre2<br />
* droGri1<br />
* droGri2<br />
* droMoj1<br />
* droMoj2<br />
* droMoj3<br />
* droPer1<br />
* droSec1<br />
* droSim1<br />
* droVir1<br />
* droVir2<br />
* droVir3<br />
* droWil1<br />
* droYak1<br />
* droYak2<br />
* echTel1<br />
* emf<br />
* equCab1<br />
* equCab2<br />
* equCab2_chrM<br />
* eriEur1<br />
* felCat3<br />
* felCat4<br />
* fr1<br />
* fr2<br />
* fr3<br />
* galGal2<br />
* galGal3<br />
* galGal4<br />
* gasAcu1<br />
* geoFor1<br />
* gorGor1<br />
* gorGor3<br />
* hetGla1<br />
* hetGla2<br />
* hg16<br />
* hg17<br />
* hg18<br />
* hg19<br />
* hg_g1k_v37<br />
* lMaj5<br />
* lengths<br />
* loxAfr3<br />
* loxAfr4<br />
* macEug1<br />
* melGal1<br />
* melUnd1<br />
* micMur1<br />
* mm10<br />
* mm5<br />
* mm6<br />
* mm7<br />
* mm8<br />
* mm9<br />
* monDom4<br />
* monDom5<br />
* myoLuc1<br />
* myoLuc2<br />
* nomLeu1<br />
* nomLeu2<br />
* ochPri2<br />
* ornAna1<br />
* oryCun1<br />
* oryCun2<br />
* oryLat1<br />
* oryLat2<br />
* oryza_sativa_japonica_nipponbare_IRGSP4.0<br />
* otoGar1<br />
* oviAri1<br />
* pUC18<br />
* panTro1<br />
* panTro2<br />
* panTro3<br />
* papHam1<br />
* petMar1<br />
* phiX<br />
* ponAbe2<br />
* priPac1<br />
* rheMac2<br />
* rheMac3<br />
* rn3<br />
* rn4<br />
* rn5<br />
* sacCer1<br />
* sacCer2<br />
* sacCer3<br />
* sarHar1<br />
* sorAra1<br />
* strPur2<br />
* strPur3<br />
* susScr2<br />
* susScr3<br />
* taeGut1<br />
* tarSyr1<br />
* tetNig1<br />
* tetNig2<br />
* triCas2<br />
* tupBel1<br />
* venter1<br />
* xenTro1<br />
* xenTro2<br />
* xenTro3<br />
<br />
=== Microbes ===<br />
<br />
* Staphylococcus_aureus_aureus_USA300_FPR3757<br />
* Xanthomonas_oryzae_PXO99A<br />
* acidBact_ELLIN345<br />
* acidCell_11B<br />
* acidCryp_JF_5<br />
* acidJS42<br />
* acinSp_ADP1<br />
* actiPleu_L20<br />
* aerPer1<br />
* aeroHydr_ATCC7966<br />
* alcaBork_SK2<br />
* alkaEhrl_MLHE_1<br />
* anabVari_ATCC29413<br />
* anaeDeha_2CP_C<br />
* anapMarg_ST_MARIES<br />
* aquiAeol<br />
* archFulg1<br />
* arthFB24<br />
* azoaSp_EBN1<br />
* azorCaul2<br />
* baciAnth_AMES<br />
* baciHalo<br />
* baciSubt<br />
* bactThet_VPI_5482<br />
* bartHens_HOUSTON_1<br />
* baumCica_HOMALODISCA<br />
* bdelBact<br />
* bifiLong<br />
* blocFlor<br />
* bordBron<br />
* borrBurg<br />
* bradJapo<br />
* brucMeli<br />
* buchSp<br />
* burk383<br />
* burkCeno_AU_1054<br />
* burkCeno_HI2424<br />
* burkCepa_AMMD<br />
* burkMall_ATCC23344<br />
* burkPseu_1106A<br />
* burkThai_E264<br />
* burkViet_G4<br />
* burkXeno_LB400<br />
* caldMaqu1<br />
* caldSacc_DSM8903<br />
* campFetu_82_40<br />
* campJeju<br />
* campJeju_81_176<br />
* campJeju_RM1221<br />
* candCars_RUDDII<br />
* candPela_UBIQUE_HTCC1<br />
* carbHydr_Z_2901<br />
* caulCres<br />
* chlaPneu_CWL029<br />
* chlaTrac<br />
* chloChlo_CAD3<br />
* chloTepi_TLS<br />
* chroSale_DSM3043<br />
* chroViol<br />
* clavMich_NCPPB_382<br />
* colwPsyc_34H<br />
* coryEffi_YS_314<br />
* coxiBurn<br />
* cytoHutc_ATCC33406<br />
* dechArom_RCB<br />
* dehaEthe_195<br />
* deinGeot_DSM11300<br />
* deinRadi<br />
* desuHafn_Y51<br />
* desuPsyc_LSV54<br />
* desuRedu_MI_1<br />
* desuVulg_HILDENBOROUG<br />
* dichNodo_VCS1703A<br />
* ehrlRumi_WELGEVONDEN<br />
* ente638<br />
* enteFaec_V583<br />
* erwiCaro_ATROSEPTICA<br />
* erytLito_HTCC2594<br />
* eschColi_APEC_O1<br />
* eschColi_CFT073<br />
* eschColi_EC4115<br />
* eschColi_EDL933<br />
* eschColi_K12<br />
* eschColi_MG1655<br />
* eschColi_O157H7<br />
* eschColi_TW14359<br />
* flavJohn_UW101<br />
* franCcI3<br />
* franTula_TULARENSIS<br />
* fusoNucl<br />
* geobKaus_HTA426<br />
* geobMeta_GS15<br />
* geobSulf<br />
* geobTher_NG80_2<br />
* geobUran_RF4<br />
* gloeViol<br />
* glucOxyd_621H<br />
* gramFors_KT0803<br />
* granBeth_CGDNIH1<br />
* haemInfl_KW20<br />
* haemSomn_129PT<br />
* haheChej_KCTC_2396<br />
* halMar1<br />
* haloHalo1<br />
* haloHalo_SL1<br />
* haloWals1<br />
* heliAcin_SHEEBA<br />
* heliHepa<br />
* heliPylo_26695<br />
* heliPylo_HPAG1<br />
* heliPylo_J99<br />
* hermArse<br />
* hypeButy1<br />
* hyphNept_ATCC15444<br />
* idioLoih_L2TR<br />
* jannCCS1<br />
* lactLact<br />
* lactPlan<br />
* lactSali_UCC118<br />
* lawsIntr_PHE_MN1_00<br />
* legiPneu_PHILADELPHIA<br />
* leifXyli_XYLI_CTCB0<br />
* leptInte<br />
* leucMese_ATCC8293<br />
* listInno<br />
* magnMC1<br />
* magnMagn_AMB_1<br />
* mannSucc_MBEL55E<br />
* mariAqua_VT8<br />
* mariMari_MCS10<br />
* mculMari1<br />
* mesoFlor_L1<br />
* mesoLoti<br />
* metAce1<br />
* metMar1<br />
* metaSedu<br />
* methAeol1<br />
* methBark1<br />
* methBoon1<br />
* methBurt2<br />
* methCaps_BATH<br />
* methFlag_KT<br />
* methHung1<br />
* methJann1<br />
* methKand1<br />
* methLabrZ_1<br />
* methMari_C5_1<br />
* methMari_C7<br />
* methMaze1<br />
* methPetr_PM1<br />
* methSmit1<br />
* methStad1<br />
* methTher1<br />
* methTherPT1<br />
* methVann1<br />
* moorTher_ATCC39073<br />
* mycoGeni<br />
* mycoTube_H37RV<br />
* myxoXant_DK_1622<br />
* nanEqu1<br />
* natrPhar1<br />
* neisGono_FA1090_1<br />
* neisMeni_FAM18_1<br />
* neisMeni_MC58_1<br />
* neisMeni_Z2491_1<br />
* neorSenn_MIYAYAMA<br />
* nitrEuro<br />
* nitrMult_ATCC25196<br />
* nitrOcea_ATCC19707<br />
* nitrWino_NB_255<br />
* nocaFarc_IFM10152<br />
* nocaJS61<br />
* nostSp<br />
* novoArom_DSM12444<br />
* oceaIhey<br />
* oenoOeni_PSU_1<br />
* onioYell_PHYTOPLASMA<br />
* orieTsut_BORYONG<br />
* paraDeni_PD1222<br />
* paraSp_UWE25<br />
* pastMult<br />
* pediPent_ATCC25745<br />
* peloCarb<br />
* peloLute_DSM273<br />
* peloTher_SI<br />
* photLumi<br />
* photProf_SS9<br />
* picrTorr1<br />
* pireSp<br />
* polaJS66<br />
* polyQLWP<br />
* porpGing_W83<br />
* procMari_CCMP1375<br />
* propAcne_KPA171202<br />
* pseuAeru<br />
* pseuHalo_TAC125<br />
* psycArct_273_4<br />
* psycIngr_37<br />
* pyrAby1<br />
* pyrAer1<br />
* pyrFur2<br />
* pyrHor1<br />
* pyroArse1<br />
* pyroCali1<br />
* pyroIsla1<br />
* ralsEutr_JMP134<br />
* ralsSola<br />
* rhizEtli_CFN_42<br />
* rhodPalu_CGA009<br />
* rhodRHA1<br />
* rhodRubr_ATCC11170<br />
* rhodSpha_2_4_1<br />
* rickBell_RML369_C<br />
* roseDeni_OCH_114<br />
* rubrXyla_DSM9941<br />
* saccDegr_2_40<br />
* saccEryt_NRRL_2338<br />
* saliRube_DSM13855<br />
* saliTrop_CNB_440<br />
* salmEnte_PARATYPI_ATC<br />
* salmTyph<br />
* salmTyph_TY2<br />
* shewANA3<br />
* shewAmaz<br />
* shewBalt<br />
* shewDeni<br />
* shewFrig<br />
* shewLoihPV4<br />
* shewMR4<br />
* shewMR7<br />
* shewOnei<br />
* shewPutrCN32<br />
* shewW318<br />
* shigFlex_2A<br />
* siliPome_DSS_3<br />
* sinoMeli<br />
* sodaGlos_MORSITANS<br />
* soliUsit_ELLIN6076<br />
* sphiAlas_RB2256<br />
* stapAure_MU50<br />
* stapMari1<br />
* streCoel<br />
* strePyog_M1_GAS<br />
* sulSol1<br />
* sulfAcid1<br />
* sulfToko1<br />
* symbTher_IAM14863<br />
* synePCC6<br />
* syneSp_WH8102<br />
* syntAcid_SB<br />
* syntFuma_MPOB<br />
* syntWolf_GOETTINGEN<br />
* therAcid1<br />
* therElon<br />
* therFusc_YX<br />
* therKoda1<br />
* therMari<br />
* therPend1<br />
* therPetr_RKU_1<br />
* therTeng<br />
* therTher_HB27<br />
* therTher_HB8<br />
* therVolc1<br />
* thioCrun_XCL_2<br />
* thioDeni_ATCC25259<br />
* thioDeni_ATCC33889<br />
* trepPall<br />
* tricEryt_IMS101<br />
* tropWhip_TW08_27<br />
* uncuMeth_RCI<br />
* ureaUrea<br />
* vermEise_EF01_2<br />
* vibrChol1<br />
* vibrChol_O395_1<br />
* vibrFisc_ES114_1<br />
* vibrPara1<br />
* vibrVuln_CMCP6_1<br />
* vibrVuln_YJ016_1<br />
* wiggBrev<br />
* wolbEndo_OF_DROSOPHIL<br />
* woliSucc<br />
* xantCamp<br />
* xyleFast<br />
* yersPest_CO92<br />
* zymoMobi_ZM4<br />
<br />
<br />
[[Category:Software]][[Category:Bioinformatics]][[Category:NGS]]</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Setting_Up_VNC_Session&diff=5578Setting Up VNC Session2017-07-31T16:11:41Z<p>Mhanby@uab.edu: /* Configure the Cluster Desktop */</p>
<hr />
<div>[[wikipedia:Virtual_Network_Computing|Virtual Network Computing (VNC)]] is a cross-platform desktop sharing system to interact with a remote system's desktop using a graphical interface. This page covers basic instructions to access a desktop on [[Cheaha]] using VNC. These basic instructions support a variety of use-cases where access to graphical applications on the cluster is helpful or required. If you are interested in knowing more options or detailed technical information, then please take a look at man pages of specified commands.<br />
<br />
== One Time Setup ==<br />
VNC use on Cheaha requires a one-time-setup to configure settings to starting the virtual desktop. These instructions will configure the VNC server to use the Gnome desktop environment, the default desktop environment on the cluster. (Alternatively, you can run the vncserver command without this configure and and start a very basic (but harder to use) desktop environment.) To get started [[Cheaha_GettingStarted#Login | log in to cheaha via ssh.]]<br />
<br />
=== Set VNC Session Password ===<br />
You must maintain a password for your VNC server sessions using the vncpasswd command. The password is validated each time a connection comes in, so it can be changed on the fly using vncpasswd command anytime later. '''Remember this password as you will be prompted for it when you access your cluster desktop'''. By default, the command stores an obfuscated version of the password in the file $HOME/.vnc/passwd.<br />
<br />
<pre><br />
$ vncpasswd <br />
</pre><br />
<br />
=== Configure the Cluster Desktop ===<br />
The vncserver command relies on a configuration script to start your virtual desktop environment. The [[wikipedia:GNOME|GNOME]] desktop provides a familiar desktop experience and can be selected by creating the following vncserver startup script (~/.vnc/xstartup).<br />
<br />
<pre><br />
mkdir $HOME/.vnc<br />
<br />
cat > $HOME/.vnc/xstartup <<\EOF<br />
#!/bin/sh<br />
<br />
#!/bin/sh<br />
<br />
# Start up the standard system desktop<br />
unset SESSION_MANAGER<br />
unset DBUS_SESSION_BUS_ADDRESS<br />
<br />
#exec /etc/X11/xinit/xinitrc<br />
/usr/bin/mate-session<br />
<br />
[ -x /etc/vnc/xstartup ] && exec /etc/vnc/xstartup<br />
[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources<br />
xsetroot -solid grey<br />
vncconfig -iconic &<br />
x-terminal-emulator -geometry 80x24+10+10 -ls -title "$VNCDESKTOP Desktop" &<br />
x-window-manager &<br />
<br />
EOF<br />
<br />
chmod +x $HOME/.vnc/xstartup<br />
</pre><br />
<br />
By default a VNC server displays graphical environment using a tab-window-manager. If the above xstartup file is absent, then a file with the default tab-window-manager settings will be created by the vncserver command during startup. If you want to switch to the GNOME desktop, simply replace this default file with the settings above. <br />
<br />
This completes the one-time setup on the cluster for creating a VNC server password and selecting the preferred desktop environment.<br />
<br />
=== Select a VNC Client ===<br />
You will also need a VNC client on your personal desktop in order to remotely access your cluster desktop. <br />
<br />
Mac OS comes with a native VNC client so you don't need to use any third-party software. Chicken of the VNC is a popular alternative on Mac OS to the native VNC client, especially for older Mac OS, pre-10.7.<br />
<br />
Most Linux systems have the VNC software installed so you can simply use the vncviewer command to access a VNC server. <br />
<br />
If you use MS Windows then you will need to install a VNC client. Here is a list of VNC client softwares and you can any one of it to access VNC server. <br />
* http://www.tightvnc.com/ (Mac, Linux and Windows)<br />
* http://www.realvnc.com/ (Mac, Linux and Windows)<br />
* http://sourceforge.net/projects/cotvnc/ (Mac)<br />
<br />
== Start your VNC Desktop == <br />
Your VNC desktop must be started before you can connect to it. To start the VNC desktop you need to log into cheaha using an [[Cheaha_GettingStarted#Login|standard SSH connection]]. The VNC server is started by executing the vncserver command after you log in to cheaha. It will run in the background and continue running even after you log out of the SSH session that was used to run the vncserver command.<br />
<br />
To start the VNC desktop run the vncserver command. You will see a short message like the following from the vncserver before it goes into the background. You will need this information to connect to your desktop.<br />
<pre><br />
$ vncserver <br />
New 'login001:24 (blazer)' desktop is login001:24<br />
<br />
Starting applications specified in /home/blazer/.vnc/xstartup<br />
Log file is /home/blazer/.vnc/login001:24.log<br />
</pre><br />
<br />
The above command output indicates that a VNC server is started on VNC X-display number 24, which translates to system port 5924. The vncserver automatically selects this port from a list of available ports.<br />
<br />
The actual system port on which VNC server is listening for connections is obtained by adding a VNC base port (default: port 5900) and a VNC X-display number (24 in above case). Alternatively you can specify a high numbered system port directly (e.g. 5927) using '-rfbport <port-number>' option and the vncserver will try to use it if it's available. See vncserver's man page for details.<br />
<br />
Please note that the vncserver will continue to run in the backgound on the head node until it is explicitly stopped. This allows you to reconnect to the same desktop session without having to first start the vncserver, leaving all your desktop applications active. When you no longer need your desktop, simply log out of your desktop using the desktop's log out menu option or by explicitly ending the vncserver command with the 'vncserver -kill ' command.<br />
<br />
=== Alternate Cluster Desktop Sizes ===<br />
The default size of your cluster desktop is 1024x768 pixels. If you want to start your desktop with an alternate geometry to match your application, personal desktop environment, or other preferences, simply add a "-geometry hieghtxwidth" argument to your vncserver command. For example, if you want a wide screen geometry popular with laptops, you might start the VNC server with:<br />
<pre><br />
vncserver -geometry 1280x800<br />
</pre><br />
<br />
== Establish a Network Connection to your VNC Server ==<br />
<br />
As indicated in the output from the vncserver command, the VNC desktop is listening for connections on a higher numbered port. This port isn't directly accessible from the internet. Hence, we need to use SSH local port forwarding to connect to this server.<br />
<br />
This SSH session provides the connection to your VNC desktop and must remain active while you use the desktop. You can disconnect and reconnect to your desktop by establishing this SSH session whenever you need to access your desktop. In other words, your desktop remains active across your connections to it. This supports a mobile work environment.<br />
<br />
=== Port-forwarding from Linux or Mac Systems ===<br />
Set up SSH port forwarding using the native SSH command. <br />
<pre><br />
# ssh -L <local-port>:<remote-system-host>:<remote-system-port> USERID@<SSH-server-host><br />
$ ssh -L 5924:localhost:5924 USERID@cheaha.rc.uab.edu<br />
</pre><br />
Above command will forward connections on local port 5924 to a remote system's (same as SSH server host Cheaha - hence localhost) port 5924.<br />
<br />
=== Port-forwarding from Windows Systems ===<br />
Windows users need to establish the connection using whatever SSH software they commonly use. The following is an example configuration using Putty client on Windows.<br />
<br />
[[File:Putty-SSH-Tunnel.png]]<br />
<br />
== Access your Cluster Desktop ==<br />
<br />
With the network connection to the VNC server established, you can access your cluster desktop using your preferred VNC client. When you access your cluster desktop you will be prompted for the VNC password you created during the one time setup above.<br />
<br />
The VNC client will actually connect to your local machine, eg. "localhost", because it relies on the SSH port forwarding to connect to the VNC server on the cluster. You do this because you have already created the real connection to Cheaha using the SSH tunnel. The SSH tunnel "listens" on your local host and forwards all of your VNC traffic across the network to your VNC server on the cluster.<br />
<br />
You can access the VNC server using the following connection scenarios based on your personal desktop environment.<br />
<br />
==== From Mac ====<br />
<br />
'''For Mac OSX 10.8 and higher'''<br />
Mac users can use the default VNC client and start it from Finder. Press '''cmd+k''' to bring up the "connect to server" window. Enter the following connection string in Finder: <br />
<pre>vnc://localhost:5924 </pre><br />
The connection string pattern is "vnc://<vnc-server>:<vnc-port>". Adjust your port setting for the specific value of your cluster desktop given when you run vncserver above.<br />
<br />
'''For Mac OSX 10.7 and lower'''<br />
Download and install Chicken of the VNC from [http://sourceforge.net/projects/cotvnc/| sourceforge].<br />
Start COTVNC and enter the following in the host window and provide the VNC password you created during set up when prompted:<br />
<pre>localhost:5924</pre><br />
<br />
<br />
==== From Linux ====<br />
Linux users can use the command<br />
<pre><br />
vncviewer :24 <br />
</pre><br />
<br />
===== Shortcut for Linux Users =====<br />
Linux users can optionally skip the explicit SSH tunnel setup described above by using the -via argument to the vncviewer command. The "-via <gateway>" will set up the SSH tunnel implicitly. For the above example, the following command would be used:<br />
<pre><br />
vncviewer -via cheaha.rc.uab.edu :24<br />
</pre><br />
This option is preferred since it will also establish VNC settings that are more efficient for slow networks. See the man page for vncviewer for details on other encodings.<br />
<br />
==== From Windows ====<br />
Windows users should use whatever connection string is applicable to their VNC client. <br />
<br />
Remember to use "localhost" as the host address in your VNC client. You do this because you have already created the real connection to Cheaha using the SSH tunnel. The SSH tunnel "listens" on your local host and forwards all of your VNC traffic across the network to your VNC server on the cluster.<br />
<br />
== Using your Desktop ==<br />
Once we have a VNC session established with Gnome desktop environment, we can use it to launch any graphical application on Cheaha or use it to open GUI (X11) supported SSH session with a remote system in the cluster. <br />
<br />
VNC can be particularly useful when we are trying to access and X Windows application from MS Windows, as native X11 setup on Windows is typically more involved than the VNC setup above. For example, it's much easier to start X11 based SSH session with the remote system on the cluster from above Gnome desktop environment than doing all X11 setup on Windows.<br />
<pre> <br />
$ ssh -X $USER@172.x.x.x<br />
</pre><br />
<br />
=== Performance Considerations for Slow Networks ===<br />
<br />
If the network you are using to connect to your VNC session is slow (eg. wifi or off campus), you may be able to improve the responsiveness of the VNC session by adjusting simple desktop settings in your VNC desktop. The VNC screen needs to be repainted every time your desktop is modified, eg. opening or moving a window. Any bit of data you don't have to send will improve the drawing speed. Most modern desktops default to a pretty picture. While nice to look at these pictures contain lots data. If you set your desktop background to a solid color (no gradients) the screen refresh will be much quicker (see System->Preferences->Desktop Background). Also, if you change to a basic windowing theme it will also speed screen refreshes (see System->Preferences->Themes->Mist).</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Cheaha_GettingStarted&diff=5564Cheaha GettingStarted2017-05-19T20:17:29Z<p>Mhanby@uab.edu: /* Project Storage */ fixed typo in SHARE_PROJECT</p>
<hr />
<div>Cheaha is a cluster computing environment for UAB researchers.Information about the history and future plans for Cheaha is available on the [[cheaha]] page.<br />
<br />
== Access (Cluster Account Request) ==<br />
<br />
To request an account on [[Cheaha]], please {{CheahaAccountRequest}}. Please include some background information about the work you plan on doing on the cluster and the group you work with, ie. your lab or affiliation.<br />
<br />
Usage of Cheaha is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources. <br />
<br />
The official DNS name of Cheaha's frontend machine is ''cheaha.rc.uab.edu''. If you want to refer to the machine as ''cheaha'', you'll have to either add the "rc.uab.edu" to you computer's DNS search path. On Unix-derived systems (Linux, Mac) you can edit your computers /etc/resolv.conf as follows (you'll need administrator access to edit this file)<br />
<pre><br />
search rc.uab.edu<br />
</pre><br />
Or you can customize your SSH configuration to use the short name "cheaha" as a connection name. On systems using OpenSSH you can add the following to your ~/.ssh/config file<br />
<br />
<pre><br />
Host cheaha<br />
Hostname cheaha.rc.uab.edu<br />
</pre><br />
<br />
== Login ==<br />
===Overview===<br />
Once your account has been created, you'll receive an email containing your user ID, generally your Blazer ID. Logging into Cheaha requires an SSH client. Most UAB Windows workstations already have an SSH client installed, possibly named '''SSH Secure Shell Client''' or [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY]. Linux and Mac OS X systems should have an SSH client installed by default.<br />
<br />
Usage of Cheaha is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources.<br />
<br />
===Client Configuration===<br />
This section will cover steps to configure Windows, Linux and Mac OS X clients to connect to Cheaha.<br />
====Linux====<br />
Linux systems, regardless of the flavor (RedHat, SuSE, Ubuntu, etc...), should already have an SSH client on the system as part of the default install.<br />
# Start a terminal (on RedHat click Applications -> Accessories -> Terminal, on Ubuntu Ctrl+Alt+T)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Mac OS X====<br />
Mac OS X is a Unix operating system (BSD) and has a built in ssh client.<br />
# Start a terminal (click Finder, type Terminal and double click on Terminal under the Applications category)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Windows====<br />
There are many SSH clients available for Windows, some commercial and some that are free (GPL). This section will cover two clients that are commonly found on UAB Windows systems.<br />
=====MobaXterm=====<br />
[http://mobaxterm.mobatek.net/ MobaXterm] is a free (also available for a price in a Profession version) suite of SSH tools. Of the Windows clients we've used, MobaXterm is the easiest to use and feature complete. [http://mobaxterm.mobatek.net/features.html Features] include (but not limited to):<br />
* SSH client (in a handy web browser like tabbed interface)<br />
* Embedded Cygwin (which allows Windows users to run many Linux commands like grep, rsync, sed)<br />
* Remote file system browser (graphical SFTP)<br />
* X11 forwarding for remotely displaying graphical content from Cheaha<br />
* Installs without requiring Windows Administrator rights<br />
<br />
Start MobaXterm and click the Session toolbar button (top left). Click SSH for the session type, enter the following information and click OK. Once finished, double click cheaha.rc.uab.edu in the list of Saved sessions under PuTTY sessions:<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Remote host'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|}<br />
<br />
=====PuTTY=====<br />
[http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY] is a free suite of SSH and telnet tools written and maintained by [http://www.pobox.com/~anakin/ Simon Tatham]. PuTTY supports SSH, secure FTP (SFTP), and X forwarding (XTERM) among other tools.<br />
<br />
* Start PuTTY (Click START -> All Programs -> PuTTY -> PuTTY). The 'PuTTY Configuration' window will open<br />
* Use these settings for each of the clusters that you would like to configure<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host Name (or IP address)'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Saved Sessions'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|}<br />
* Click '''Save''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start PuTTY, simply double click on the cluster name under the 'Saved Sessions' list<br />
<br />
=====SSH Secure Shell Client=====<br />
SSH Secure Shell is a commercial application that is installed on many Windows workstations on campus and can be configured as follows:<br />
* Start the program (Click START -> All Programs -> SSH Secure Shell -> Secure Shell Client). The 'default - SSH Secure Shell' window will open<br />
* Click File -> Profiles -> Add Profile to open the 'Add Profile' window<br />
* Type in the name of the cluster (for example: cheaha) in the field and click 'Add to Profiles'<br />
* Click File -> Profiles -> Edit Profiles to open the 'Profiles' window<br />
* Single click on your new profile name<br />
* Use these settings for the clusters<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host name'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''User name'''<br />
|blazerid (insert your blazerid here)<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Encryption algorithm'''<br />
|<Default><br />
|-<br />
|'''MAC algorithm'''<br />
|<Default><br />
|-<br />
|'''Compression'''<br />
|<None><br />
|-<br />
|'''Terminal answerback'''<br />
|vt100<br />
|-<br />
|}<br />
* Leave 'Connect through firewall' and 'Request tunnels only' unchecked<br />
* Click '''OK''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start SSH Secure Shell, click 'Profiles' and click the cluster name<br />
<br />
=== Logging in to Cheaha ===<br />
No matter which client you use to connect to the Cheaha, the first time you connect, the SSH client should display a message asking if you would like to import the hosts public key. Answer '''Yes''' to this question.<br />
<br />
* Connect to Cheaha using one of the methods listed above<br />
* Answer '''Yes''' to import the cluster's public key<br />
** Enter your BlazerID password<br />
<br />
* After successfully logging in for the first time, You may see the following message '''just press ENTER for the next three prompts, don't type any passphrases!'''<br />
<br />
It doesn't appear that you have set up your ssh key.<br />
This process will make the files:<br />
/home/joeuser/.ssh/id_rsa.pub<br />
/home/joeuser/.ssh/id_rsa<br />
/home/joeuser/.ssh/authorized_keys<br />
<br />
Generating public/private rsa key pair.<br />
Enter file in which to save the key (/home/joeuser/.ssh/id_rsa):<br />
** Enter file in which to save the key (/home/joeuser/.ssh/id_rsa):'''Press Enter'''<br />
** Enter passphrase (empty for no passphrase):'''Press Enter'''<br />
** Enter same passphrase again:'''Press Enter'''<br />
Your identification has been saved in /home/joeuser/.ssh/id_rsa.<br />
Your public key has been saved in /home/joeuser/.ssh/id_rsa.pub.<br />
The key fingerprint is:<br />
f6:xx:xx:xx:xx:dd:9a:79:7b:83:xx:f9:d7:a7:d6:27 joeuser@cheaha.rc.uab.edu<br />
<br />
==== Users without a blazerid (collaborators from other universities) ====<br />
** If you were issued a temporary password, enter it (Passwords are CaSE SensitivE!!!) You should see a message similar to this<br />
You are required to change your password immediately (password aged)<br />
<br />
WARNING: Your password has expired.<br />
You must change your password now and login again!<br />
Changing password for user joeuser.<br />
Changing password for joeuser<br />
(current) UNIX password:<br />
*** (current) UNIX password: '''Enter your temporary password at this prompt and press enter'''<br />
*** New UNIX password: '''Enter your new strong password and press enter'''<br />
*** Retype new UNIX password: '''Enter your new strong password again and press enter'''<br />
*** After you enter your new password for the second time and press enter, the shell may exit automatically. If it doesn't, type exit and press enter<br />
*** Log in again, this time use your new password<br />
<br />
Congratulations, you should now have a command prompt and be ready to start [[Cheaha_GettingStarted#Example_Batch_Job_Script | submitting jobs]]!!!<br />
<br />
== Hardware ==<br />
[[Image:Chehah2_2016.png|center|thumb|450px|Logical Diagram of Cheaha Configuration]]<br />
<br />
=== Hardware ===<br />
<br />
The Cheaha Compute Platform includes three generations of commodity compute hardware, totaling 2340 compute cores, 20 TB of RAM, and over 4.7PB of storage.<br />
<br />
The hardware is grouped into generations designated gen3, gen4, gen5 and gen6(oldest to newest). The following descriptions highlight the hardware profile for each generation. <br />
<br />
* Generation 3 (gen3) -- 48 2x6 core (576 cores total) 2.66 GHz Intel compute nodes with quad data rate Infiniband, ScaleMP, and the high-perf storage build-out for capacity and redundancy with 120TB DDN. This is the hardware collection purchased with a combination of the NIH SIG funds and some of the 2010 annual VPIT investment. These nodes were given the code name "sipsey" and tagged as such in the node naming for the queue system. These nodes are tagged as "sipsey-compute-#-#" in the ROCKS naming convention. 16 of the gen3 nodes (sipsey-compute-0-1 thru sipsey-compute-0-16) were upgraded in 2014 from 48GB to 96GB of memory per node.<br />
<br />
* Generation 4 (gen4) -- 3 16 core (48 cores total) compute nodes. This hardware collection purchase by [http://www.soph.uab.edu/ssg/people/tiwari Hemant Tiwari of SSG]. These nodes were given the code name "ssg" and tagged as such in the node naming for the queue system. These nodes are tagged as "ssg-compute-0-#" in the ROCKS naming convention.<br />
<br />
* DDN GPFS storage cluster<br />
** 2 x 12KX40D-56IB controllers<br />
** 10 x SS8460 disk enclosures<br />
** 825 x 4K SAS drives<br />
<br />
* Generation 6 (gen6) -- <br />
** 36 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 38 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 256GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 14 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 384GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 4 Compute Nodes with Nvidia Tesla K80 and two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 4 Compute Nodes with Intel Phi coprocessor SE10/7120 and two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** FDR InfiniBand Switch<br />
** 10Gigabit Ethernet Switch<br />
** Management node and gigabit switch for cluster management<br />
** Bright Advanced Cluster Management software licenses <br />
<br />
Summarized, Cheaha's compute pool includes:<br />
* gen4 is 48 cores of [http://ark.intel.com/products/64583/Intel-Xeon-Processor-E5-2680-20M-Cache-2_70-GHz-8_00-GTs-Intel-QPI 2.70GHz eight-core Intel Xeon E5-2680 processors] with 24G of RAM per core or 384GB total<br />
* gen3.1 is 192 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 8Gb RAM per core or 96GB total<br />
* gen3 is 384 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 4Gb RAM per core or 48GB total <br />
<br />
<br />
{|border="1" cellpadding="2" cellspacing="0"<br />
|+ Physical Nodes<br />
|- bgcolor=grey<br />
!gen!!queue!!#nodes!!cores/node!!RAM/node<br />
|-<br />
|gen6|| ?? || 36 || 24 || 128G<br />
|-<br />
|gen6|| ?? || 38 || 24 || 256G<br />
|-<br />
|gen6|| ?? || 14 || 24 || 384G<br />
|-<br />
|gen5||openstack(?)|| ? || ? || ?G<br />
|-<br />
|gen4||ssg||3||16||385G<br />
|-<br />
|gen3.1||sipsey||16||12||96G<br />
|-<br />
|gen3||sipsey||32||12||48G<br />
|-<br />
|gen2||cheaha||24||8||16G<br />
|}<br />
<br />
=== Performance ===<br />
{{CheahaTflops}}<br />
<br />
== Cluster Software ==<br />
* BrightCM 7.2<br />
* CentOS 7.2 x86_64<br />
* [[Slurm]] 15.08<br />
<br />
== Queuing System ==<br />
All work on Cheaha must be submitted to our queuing system ([[Slurm]]). A common mistake made by new users is to run 'jobs' on the login node. This section gives a basic overview of what a queuing system is and why we use it.<br />
=== What is a queuing system? ===<br />
* Software that gives users fair allocation of the cluster's resources<br />
* Schedules jobs based using resource requests (the following are commonly requested resources, there are many more that are available)<br />
** Number of processors (often referred to as "slots")<br />
** Maximum memory (RAM) required per slot<br />
** Maximum run time<br />
* Common queuing systems:<br />
** '''[[Slurm]]'''<br />
** Sun Grid Engine (Also know as SGE, OGE, GE)<br />
** OpenPBS<br />
** Torque<br />
** LSF (load sharing facility)<br />
<br />
[http://slurm.schedmd.com/ Slurm] is a queue management system and stands for Simple Linux Utility for Resource Management. Slurm was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. '''[[Slurm]]''' is now the primary job manager on Cheaha, it replaces SUN Grid Engine ([[https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted_deprecated SGE]]) the job manager used earlier.<br />
<br />
=== Typical Workflow ===<br />
* Stage data to $USER_SCRATCH (your scratch directory)<br />
* Research how to run your code in "batch" mode. Batch mode typically means the ability to run it from the command line without requiring any interaction from the user.<br />
* Identify the appropriate resources needed to run the job. The following are mandatory resource requests for all jobs on Cheaha<br />
** Maximum memory (RAM) required per slot<br />
** Maximum runtime<br />
* Write a job script specifying queuing system parameters, resource requests and commands to run program<br />
* Submit script to queuing system (sbatch script.job)<br />
* Monitor job (squeue)<br />
* Review the results and resubmit as necessary<br />
* Clean up the scratch directory by moving or deleting the data off of the cluster<br />
<br />
=== Resource Requests ===<br />
Accurate resource requests are extremely important to the health of the over all cluster. In order for Cheaha to operate properly, the queing system must know how much runtime and RAM each job will need.<br />
<br />
==== Mandatory Resource Requests ====<br />
<br />
* -t, --time=<time><br />
Set a limit on the total run time of the job allocation. If the requested time limit exceeds the partition's time limit, the job will be left in a PENDING state (possibly indefinitely).<br />
* For Array jobs, this represents the maximum run time for each task<br />
** For serial or parallel jobs, this represents the maximum run time for the entire job<br />
<br />
* --mem-per-cpu=<MB><br />
Mimimum memory required per allocated CPU in MegaBytes.<br />
<br />
==== Other Common Resource Requests ====<br />
* -N, --nodes=<minnodes[-maxnodes]><br />
Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count.<br />
<br />
* -n, --ntasks=<number><br />
sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node<br />
<br />
* --mem=<MB><br />
Specify the real memory required per node in MegaBytes.<br />
<br />
* -c, --cpus-per-task=<ncpus><br />
Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task.<br />
<br />
* -p, --partition=<partition_names><br />
Request a specific partition for the resource allocation. Available partitions are: express(max 2 hrs), short(max 12 hrs), medium(max 50 hrs), long(max 150 hrs), sinteractive(0-2 hrs)<br />
<br />
=== Submitting Jobs ===<br />
Batch Jobs are submitted on Cheaha by using the "sbatch" command. The full manual for sbtach is available by running the following command<br />
man sbatch<br />
<br />
==== Job Script File Format ====<br />
To submit a job to the queuing systems, you will first define your job in a script (a text file) and then submit that script to the queuing system.<br />
<br />
The script file needs to be '''formatted as a UNIX file''', not a Windows or Mac text file. In geek speak, this means that the end of line (EOL) character should be a line feed (LF) rather than a carriage return line feed (CRLF) for Windows or carriage return (CR) for Mac.<br />
<br />
If you submit a job script formatted as a Windows or Mac text file, your job will likely fail with misleading messages, for example that the path specified does not exist.<br />
<br />
Windows '''Notepad''' does not have the ability to save files using the UNIX file format. Do NOT use Notepad to create files intended for use on the clusters. Instead use one of the alternative text editors listed in the following section.<br />
<br />
===== Converting Files to UNIX Format =====<br />
====== Dos2Unix Method ======<br />
The lines below that begin with $ are commands, the $ represents the command prompt and should not be typed!<br />
<br />
The dos2unix program can be used to convert Windows text files to UNIX files with a simple command. After you have copied the file to your home directory on the cluster, you can identify that the file is a Windows file by executing the following (Windows uses CR LF as the line terminator, where UNIX uses only LF and Mac uses only CR):<br />
<pre><br />
$ file testfile.txt<br />
<br />
testfile.txt: ASCII text, with CRLF line terminators<br />
</pre><br />
<br />
Now, convert the file to UNIX<br />
<pre><br />
$ dos2unix testfile.txt<br />
<br />
dos2unix: converting file testfile.txt to UNIX format ...<br />
</pre><br />
<br />
Verify the conversion using the file command<br />
<pre><br />
$ file testfile.txt<br />
<br />
testfile.txt: ASCII text<br />
</pre><br />
<br />
====== Alternative Windows Text Editors ======<br />
There are many good text editors available for Windows that have the capability to save files using the UNIX file format. Here are a few:<br />
* [[http://www.geany.org/ Geany]] is an excellent free text editor for Windows and Linux that supports Windows, UNIX and Mac file formats, syntax highlighting and many programming features. To convert from Windows to UNIX click '''Document''' click '''Set Line Endings''' and then '''Convert and Set to LF (Unix)'''<br />
* [[http://notepad-plus.sourceforge.net/uk/site.htm Notepad++]] is a great free Windows text editor that supports Windows, UNIX and Mac file formats, syntax highlighting and many programming features. To convert from Windows to UNIX click '''Format''' and then click '''Convert to UNIX Format'''<br />
* [[http://www.textpad.com/ TextPad]] is another excellent Windows text editor. TextPad is not free, however.<br />
<br />
==== Example Batch Job Script ====<br />
A shared cluster environment like Cheaha uses a job scheduler to run tasks on the cluster to provide optimal resource sharing among users. Cheaha uses a job scheduling system call Slurm to schedule and manage jobs. A user needs to tell Slurm about resource requirements (e.g. CPU, memory) so that it can schedule jobs effectively. These resource requirements along with actual application code can be specified in a single file commonly referred as 'Job Script/File'. Following is a simple job script that prints job number and hostname.<br />
<br />
'''Note:'''Jobs '''must request''' the appropriate partition (ex: ''--partition=short'') to satisfy the jobs resource request (maximum runtime, number of compute nodes, etc...)<br />
<pre><br />
#!/bin/bash<br />
#<br />
#SBATCH --job-name=test<br />
#SBATCH --output=res.txt<br />
#SBATCH --ntasks=1<br />
#SBATCH --partition=express<br />
#SBATCH --time=10:00<br />
#SBATCH --mem-per-cpu=100<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
srun hostname<br />
srun sleep 60<br />
</pre><br />
<br />
Lines starting with '#SBATCH' have a special meaning in the Slurm world. Slurm specific configuration options are specified after the '#SBATCH' characters. Above configuration options are useful for most job scripts and for additional configuration options refer to Slurm commands manual. A job script is submitted to the cluster using Slurm specific commands. There are many commands available, but following three commands are the most common:<br />
* sbatch - to submit job<br />
* scancel - to delete job<br />
* squeue - to view job status<br />
<br />
We can submit above job script using sbatch command:<br />
<pre><br />
$ sbatch HelloCheaha.sh<br />
Submitted batch job 52707<br />
</pre><br />
<br />
When the job script is submitted, Slurm queues it up and assigns it a job number (e.g. 52707 in above example). The job number is available inside job script using environment variable $JOB_ID. This variable can be used inside job script to create job related directory structure or file names.<br />
<br />
=== Interactive Resources ===<br />
Login Node (the host that you connected to when you setup the SSH connection to Cheaha) is supposed to be used for submitting jobs and/or lighter prep work required for the job scripts. '''Do not run heavy computations on the login node'''. If you have a heavier workload to prepare for a batch job (eg. compiling code or other manipulations of data) or your compute application requires interactive control, you should request a dedicated interactive node for this work.<br />
<br />
Interactive resources are requested by submitting an "interactive" job to the scheduler. Interactive jobs will provide you a command line on a compute resource that you can use just like you would the command line on the login node. The difference is that the scheduler has dedicated the requested resources to your job and you can run your interactive commands without having to worry about impacting other users on the login node.<br />
<br />
Interactive jobs, that can be run on command line, are requested with the '''srun''' command. <br />
<br />
<pre><br />
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash<br />
</pre><br />
<br />
This command requests for 4 cores (--cpus-per-task) for a single task (--ntasks) with each cpu requesting size 4GB of RAM (--mem-per-cpu) for 8 hrs (--time).<br />
<br />
More advanced interactive scenarios to support graphical applications are available using [https://docs.uabgrid.uab.edu/wiki/Setting_Up_VNC_Session VNC] or X11 tunneling [http://www.uab.edu/it/software X-Win32 2014 for Windows]<br />
<br />
Interactive jobs that requires running a graphical application, are requested with the '''sinteractive''' command, via '''Terminal''' on your VNC window.<br />
<br />
== Storage ==<br />
<br />
=== No Automatic Backups ===<br />
<br />
There is no automatic back up of any data on the cluster (home, scratch, or whatever). All data back up is managed by you. If you aren't managing a data back up process, then you have no backup data.<br />
<br />
=== Home directories ===<br />
<br />
Your home directory on Cheaha is NFS-mounted to the compute nodes as /home/$USER or $HOME. It is acceptable to use your home directory as a location to store job scripts, custom code, and libraries. You are responsible for keeping your home directory under 10GB in size!<br />
<br />
'''The home directory must not be used to store large amounts of data.''' Please use $USER_SCRATCH <br />
for actively used data sets and $USER_DATA for storage of non scratch data.<br />
<br />
=== Scratch ===<br />
Research Computing policy requires that all bulky input and output must be located on the scratch space. The home directory is intended to store your job scripts, log files, libraries and other supporting files.<br />
<br />
'''Important Information:'''<br />
* Scratch space (network and local) '''is not backed up'''.<br />
* Research Computing expects each user to keep their scratch areas clean. The cluster scratch area are not to be used for archiving data.<br />
<br />
Cheaha has two types of scratch space, network mounted and local.<br />
* Network scratch ($USER_SCRATCH) is available on the login node and each compute node. This storage is a Lustre high performance file system providing roughly 240TB of storage. This should be your jobs primary working directory, unless the job would benefit from local scratch (see below).<br />
* Local scratch is physically located on each compute node and is not accessible to the other nodes (including the login node). This space is useful if the job performs a lot of file I/O. Most of the jobs that run on our clusters do not fall into this category. Because the local scratch is inaccessible outside the job, it is important to note that you must move any data between local scratch to your network accessible scratch within your job. For example, step 1 in the job could be to copy the input from $USER_SCRATCH to ${USER_SCRATCH}, step 2 code execution, step 3 move the data back to $USER_SCRATCH.<br />
<br />
==== Network Scratch ====<br />
Network scratch is available using the environment variable $USER_SCRATCH or directly by /data/scratch/$USER<br />
<br />
It is advisable to use the environment variable whenever possible rather than the hard coded path.<br />
<br />
==== Local Scratch ====<br />
Each compute node has a local scratch directory that is accessible via the variable '''$LOCAL_SCRATCH'''. If your job performs a lot of file I/O, the job should use $LOCAL_SCRATCH rather than $USER_SCRATCH to prevent bogging down the network scratch file system. The amount of scratch space available is approximately 800GB.<br />
<br />
The $LOCAL_SCRATCH is a special temporary directory and it's important to note that this directory is deleted when the job completes, so the job script has to move the results to $USER_SCRATCH or other location prior to the job exiting.<br />
<br />
Note that $LOCAL_SCRATCH is only useful for jobs in which all processes run on the same compute node, so MPI jobs are not candidates for this solution.<br />
<br />
The following is an array job example that uses $LOCAL_SCRATCH by transferring the inputs into $LOCAL_SCRATCH at the beginning of the script and the result out of $LOCAL_SCRATCH at the end of the script.<br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --array=1-10<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=R_array_job<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=R_array_job.err<br />
#SBATCH --output=R_array_job.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes and the appropriate partition<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load R/3.2.0-goolf-1.7.20<br />
<br />
echo "TMPDIR: $LOCAL_SCRATCH"<br />
<br />
cd $LOCAL_SCRATCH<br />
# Create a working directory under the special scheduler local scratch directory<br />
# using the array job's taskID<br />
mdkir $SLURM_ARRAY_TASK_ID<br />
cd $SLURM_ARRAY_TASK_ID<br />
<br />
# Next copy the input data to the local scratch<br />
echo "Copying input data from network scratch to $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID - $(date)<br />
# The input data in this case has a numerical file extension that<br />
# matches $SLURM_ARRAY_TASK_ID<br />
cp -a $USER_SCRATCH/GeneData/INP*.$SLURM_ARRAY_TASK_ID ./<br />
echo "copied input data from network scratch to $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID - $(date)<br />
<br />
someapp -S 1 -D 10 -i INP*.$SLURM_ARRAY_TASK_ID -o geneapp.out.$SLURM_ARRAY_TASK_ID<br />
<br />
# Lastly copy the results back to network scratch<br />
echo "Copying results from local $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID to network - $(date)<br />
cp -a geneapp.out.$SLURM_ARRAY_TASK_ID $USER_SCRATCH/GeneData/<br />
echo "Copied results from local $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID to network - $(date)<br />
<br />
</pre><br />
<br />
=== Project Storage ===<br />
Cheaha has a location where shared data can be stored called $SHARE_PROJECT . As with user scratch, this area '''is not backed up'''!<br />
<br />
This is helpful if a team of researchers must access the same data. Please open a help desk ticket to request a project directory under $SHARE_PROJECT.<br />
<br />
=== Uploading Data ===<br />
<br />
Data can be moved onto the cluster (pushed) from a remote client (ie. you desktop) via SCP or SFTP. Data can also be downloaded to the cluster (pulled) by issuing transfer commands once you are logged into the cluster. Common transfer methods are `wget <URL>`, FTP, or SCP, and depend on how the data is made available from the data provider.<br />
<br />
Large data sets should be staged directly to your $USER_SCRATCH directory so as not to fill up $HOME. If you are working on a data set shared with multiple users, it's preferable to request space in $SHARE_SCRATCH rather than duplicating the data for each user.<br />
<br />
== Environment Modules ==<br />
[http://modules.sourceforge.net/ Environment Modules] is installed on Cheaha and should be used when constructing your job scripts if an applicable module file exists. Using the module command you can easily configure your environment for specific software packages without having to know the specific environment variables and values to set. Modules allows you to dynamically configure your environment without having to logout / login for the changes to take affect.<br />
<br />
If you find that specific software does not have a module, please submit a [http://etlab.eng.uab.edu/ helpdesk ticket] to request the module.<br />
<br />
* Cheaha supports bash completion for the module command. For example, type 'module' and press the TAB key twice to see a list of options:<br />
<pre><br />
module TAB TAB<br />
<br />
add display initlist keyword refresh switch use <br />
apropos help initprepend list rm unload whatis <br />
avail initadd initrm load show unuse <br />
clear initclear initswitch purge swap update<br />
</pre><br />
<br />
* To see the list of available modulefiles on the cluster, run the '''module avail''' command (note the example list below may not be complete!) or '''module load ''' followed by two tab key presses:<br />
<pre><br />
module avail<br />
<br />
----------------------------------------------------------------------------------------- /cm/shared/modulefiles -----------------------------------------------------------------------------------------<br />
acml/gcc/64/5.3.1 acml/open64-int64/mp/fma4/5.3.1 fftw2/openmpi/gcc/64/float/2.1.5 intel-cluster-runtime/ia32/3.8 netcdf/gcc/64/4.3.3.1<br />
acml/gcc/fma4/5.3.1 blacs/openmpi/gcc/64/1.1patch03 fftw2/openmpi/open64/64/double/2.1.5 intel-cluster-runtime/intel64/3.8 netcdf/open64/64/4.3.3.1<br />
acml/gcc/mp/64/5.3.1 blacs/openmpi/open64/64/1.1patch03 fftw2/openmpi/open64/64/float/2.1.5 intel-cluster-runtime/mic/3.8 netperf/2.7.0<br />
acml/gcc/mp/fma4/5.3.1 blas/gcc/64/3.6.0 fftw3/openmpi/gcc/64/3.3.4 intel-tbb-oss/ia32/44_20160526oss open64/4.5.2.1<br />
acml/gcc-int64/64/5.3.1 blas/open64/64/3.6.0 fftw3/openmpi/open64/64/3.3.4 intel-tbb-oss/intel64/44_20160526oss openblas/dynamic/0.2.15<br />
acml/gcc-int64/fma4/5.3.1 bonnie++/1.97.1 gdb/7.9 iozone/3_434 openmpi/gcc/64/1.10.1<br />
acml/gcc-int64/mp/64/5.3.1 cmgui/7.2 globalarrays/openmpi/gcc/64/5.4 lapack/gcc/64/3.6.0 openmpi/open64/64/1.10.1<br />
acml/gcc-int64/mp/fma4/5.3.1 cuda75/blas/7.5.18 globalarrays/openmpi/open64/64/5.4 lapack/open64/64/3.6.0 pbspro/13.0.2.153173<br />
acml/open64/64/5.3.1 cuda75/fft/7.5.18 hdf5/1.6.10 mpich/ge/gcc/64/3.2 puppet/3.8.4<br />
acml/open64/fma4/5.3.1 cuda75/gdk/352.79 hdf5_18/1.8.16 mpich/ge/open64/64/3.2 rc-base<br />
acml/open64/mp/64/5.3.1 cuda75/nsight/7.5.18 hpl/2.1 mpiexec/0.84_432 scalapack/mvapich2/gcc/64/2.0.2<br />
acml/open64/mp/fma4/5.3.1 cuda75/profiler/7.5.18 hwloc/1.10.1 mvapich/gcc/64/1.2rc1 scalapack/openmpi/gcc/64/2.0.2<br />
acml/open64-int64/64/5.3.1 cuda75/toolkit/7.5.18 intel/compiler/32/15.0/2015.5.223 mvapich/open64/64/1.2rc1 sge/2011.11p1<br />
acml/open64-int64/fma4/5.3.1 default-environment intel/compiler/64/15.0/2015.5.223 mvapich2/gcc/64/2.2b slurm/15.08.6<br />
acml/open64-int64/mp/64/5.3.1 fftw2/openmpi/gcc/64/double/2.1.5 intel-cluster-checker/2.2.2 mvapich2/open64/64/2.2b torque/6.0.0.1<br />
<br />
---------------------------------------------------------------------------------------- /share/apps/modulefiles -----------------------------------------------------------------------------------------<br />
rc/BrainSuite/15b rc/freesurfer/freesurfer-5.3.0 rc/intel/compiler/64/ps_2016/2016.0.047 rc/matlab/R2015a rc/SAS/v9.4<br />
rc/cmg/2012.116.G rc/gromacs-intel/5.1.1 rc/Mathematica/10.3 rc/matlab/R2015b<br />
rc/dsistudio/dsistudio-20151020 rc/gtool/0.7.5 rc/matlab/R2012a rc/MRIConvert/2.0.8<br />
<br />
--------------------------------------------------------------------------------------- /share/apps/rc/modules/all ---------------------------------------------------------------------------------------<br />
AFNI/linux_openmp_64-goolf-1.7.20-20160616 gperf/3.0.4-intel-2016a MVAPICH2/2.2b-GCC-4.9.3-2.25<br />
Amber/14-intel-2016a-AmberTools-15-patchlevel-13-13 grep/2.15-goolf-1.4.10 NASM/2.11.06-goolf-1.7.20<br />
annovar/2016Feb01-foss-2015b-Perl-5.22.1 GROMACS/5.0.5-intel-2015b-hybrid NASM/2.11.08-foss-2015b<br />
ant/1.9.6-Java-1.7.0_80 GSL/1.16-goolf-1.7.20 NASM/2.11.08-intel-2016a<br />
APBS/1.4-linux-static-x86_64 GSL/1.16-intel-2015b NASM/2.12.02-foss-2016a<br />
ASHS/rev103_20140612 GSL/2.1-foss-2015b NASM/2.12.02-intel-2015b<br />
Aspera-Connect/3.6.1 gtool/0.7.5_linux_x86_64 NASM/2.12.02-intel-2016a<br />
ATLAS/3.10.1-gompi-1.5.12-LAPACK-3.4.2 guile/1.8.8-GNU-4.9.3-2.25 ncurses/5.9-foss-2015b<br />
Autoconf/2.69-foss-2016a HAPGEN2/2.2.0 ncurses/5.9-GCC-4.8.4<br />
Autoconf/2.69-GCC-4.8.4 HarfBuzz/1.2.7-intel-2016a ncurses/5.9-GNU-4.9.3-2.25<br />
Autoconf/2.69-GNU-4.9.3-2.25 HDF5/1.8.15-patch1-intel-2015b ncurses/5.9-goolf-1.4.10<br />
. <br />
.<br />
.<br />
.<br />
</pre><br />
<br />
Some software packages have multiple module files, for example:<br />
* GCC/4.7.2 <br />
* GCC/4.8.1 <br />
* GCC/4.8.2 <br />
* GCC/4.8.4 <br />
* GCC/4.9.2 <br />
* GCC/4.9.3 <br />
* GCC/4.9.3-2.25 <br />
<br />
In this case, the GCC module will always load the latest version, so loading this module is equivalent to loading GCC/4.9.3-2.25. If you always want to use the latest version, use this approach. If you want use a specific version, use the module file containing the appropriate version number.<br />
<br />
Some modules, when loaded, will actually load other modules. For example, the ''GROMACS/5.0.5-intel-2015b-hybrid '' module will also load ''intel/2015b'' and other related tools.<br />
<br />
* To load a module, ex: for a GROMACS job, use the following '''module load''' command in your job script:<br />
<pre><br />
module load GROMACS/5.0.5-intel-2015b-hybrid <br />
</pre><br />
<br />
* To see a list of the modules that you currently have loaded use the '''module list''' command<br />
<pre><br />
module list<br />
<br />
Currently Loaded Modulefiles:<br />
1) slurm/15.08.6 9) impi/5.0.3.048-iccifort-2015.3.187-GNU-4.9.3-2.25 17) Tcl/8.6.3-intel-2015b<br />
2) rc-base 10) iimpi/7.3.5-GNU-4.9.3-2.25 18) SQLite/3.8.8.1-intel-2015b<br />
3) GCC/4.9.3-binutils-2.25 11) imkl/11.2.3.187-iimpi-7.3.5-GNU-4.9.3-2.25 19) Tk/8.6.3-intel-2015b-no-X11<br />
4) binutils/2.25-GCC-4.9.3-binutils-2.25 12) intel/2015b 20) Python/2.7.9-intel-2015b<br />
5) GNU/4.9.3-2.25 13) bzip2/1.0.6-intel-2015b 21) Boost/1.58.0-intel-2015b-Python-2.7.9<br />
6) icc/2015.3.187-GNU-4.9.3-2.25 14) zlib/1.2.8-intel-2015b 22) GROMACS/5.0.5-intel-2015b-hybrid<br />
7) ifort/2015.3.187-GNU-4.9.3-2.25 15) ncurses/5.9-intel-2015b<br />
8) iccifort/2015.3.187-GNU-4.9.3-2.25 16) libreadline/6.3-intel-2015b<br />
</pre><br />
<br />
* A module can be removed from your environment by using the '''module unload''' command:<br />
<pre><br />
module unload GROMACS/5.0.5-intel-2015b-hybrid<br />
</pre><br />
<br />
* The definition of a module can also be viewed using the '''module show''' command, revealing what a specific module will do to your environment:<br />
<pre><br />
module show GROMACS/5.0.5-intel-2015b-hybrid <br />
-------------------------------------------------------------------<br />
/share/apps/rc/modules/all/GROMACS/5.0.5-intel-2015b-hybrid:<br />
<br />
module-whatis GROMACS is a versatile package to perform molecular dynamics,<br />
i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. - Homepage: http://www.gromacs.org <br />
conflict GROMACS <br />
prepend-path CPATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/include <br />
prepend-path LD_LIBRARY_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64 <br />
prepend-path LIBRARY_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64 <br />
prepend-path MANPATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/share/man <br />
prepend-path PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/bin <br />
prepend-path PKG_CONFIG_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64/pkgconfig <br />
setenv EBROOTGROMACS /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid <br />
setenv EBVERSIONGROMACS 5.0.5 <br />
setenv EBDEVELGROMACS /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/easybuild/GROMACS-5.0.5-intel-2015b-hybrid-easybuild-devel <br />
-------------------------------------------------------------------<br />
</pre><br />
<br />
=== Error Using Modules from a Job Script ===<br />
<br />
If you are using modules and the command your job executes runs fine from the command line but fails when you run it from the job, you may be having an issue with the script initialization. If you see this error in your job error output file<br />
<pre><br />
-bash: module: line 1: syntax error: unexpected end of file<br />
-bash: error importing function definition for `BASH_FUNC_module'<br />
</pre><br />
Add the command `unset module` before calling your module files. The -V job argument will cause a conflict with the module function used in your script.<br />
<br />
== Sample Job Scripts ==<br />
The following are sample job scripts, please be careful to edit these for your environment (i.e. replace <font color="red">YOUR_EMAIL_ADDRESS</font> with your real email address), set the h_rt to an appropriate runtime limit and modify the job name and any other parameters.<br />
<br />
'''Hello World''' is the classic example used throughout programming. We don't want to buck the system, so we'll use it as well to demonstrate jobs submission with one minor variation: our hello world will send us a greeting using the name of whatever machine it runs on. For example, when run on the Cheaha login node, it would print "Hello from login001".<br />
<br />
=== Hello World (serial) ===<br />
<br />
A serial job is one that can run independently of other commands, ie. it doesn't depend on the data from other jobs running simultaneously. You can run many serial jobs in any order. This is a common solution to processing lots of data when each command works on a single piece of data. For example, running the same conversion on 100s of images.<br />
<br />
Here we show how to create job script for one simple command. Running more than one command just requires submitting more jobs.<br />
<br />
* Create your hello world application. Run this command to create a script, turn it into to a command, and run the command (just copy and past the following on to the command line).<br />
<pre><br />
cat > helloworld.sh << EOF<br />
#!/bin/bash<br />
echo Hello from `hostname`<br />
EOF<br />
chmod +x helloworld.sh<br />
./helloworld.sh<br />
</pre><br />
<br />
* Create the Slurm job script that will request 256 MB RAM and a maximum runtime of 10 minutes.<br />
<pre><br />
$ vi helloworld.job<br />
</pre><br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=helloworld<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=helloworld.err<br />
#SBATCH --output=helloworld.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=$USER@uab.edu<br />
<br />
./helloworld.sh<br />
</pre><br />
* Submit the job to Slurm scheduler and check the status using squeue<br />
<pre><br />
$ sbatch helloworld.job<br />
Submitted batch job 52888<br />
</pre><br />
* When the job completes, you should have output files named helloworld.out and helloworld.err <br />
<pre><br />
$ cat helloworld.out <br />
Hello from c0003<br />
</pre><br />
<br />
=== Hello World (parallel with MPI) ===<br />
<br />
MPI is used to coordinate the activity of many computations occurring in parallel. It is commonly used in simulation software for molecular dynamics, fluid dynamics, and similar domains where there is significant communication (data) exchanged between cooperating process.<br />
<br />
Here is a simple parallel Slurm job script for running commands the rely on MPI. This example also includes the example of compiling the code and submitting the job script to the Slurm scheduler.<br />
<br />
* First, create a directory for the Hello World jobs<br />
<pre><br />
$ mkdir -p ~/jobs/helloworld<br />
$ cd ~/jobs/helloworld<br />
</pre><br />
* Create the Hello World code written in C (this example of MPI enabled Hello World includes a 3 minute sleep to ensure the job runs for several minutes, a normal hello world example would run in a matter of seconds).<br />
<pre><br />
$ vi helloworld-mpi.c<br />
</pre><br />
<pre><br />
#include <stdio.h><br />
#include <mpi.h><br />
<br />
main(int argc, char **argv)<br />
{<br />
int rank, size;<br />
<br />
int i, j;<br />
float f;<br />
<br />
MPI_Init(&argc,&argv);<br />
MPI_Comm_rank(MPI_COMM_WORLD, &rank);<br />
MPI_Comm_size(MPI_COMM_WORLD, &size);<br />
<br />
printf("Hello World from process %d of %d.\n", rank, size);<br />
sleep(180);<br />
for (j=0; j<=100000; j++)<br />
for(i=0; i<=100000; i++)<br />
f=i*2.718281828*i+i+i*3.141592654;<br />
<br />
MPI_Finalize();<br />
}<br />
</pre><br />
* Compile the code, first purging any modules you may have loaded followed by loading the module for OpenMPI GNU. The mpicc command will compile the code and produce a binary named helloworld_gnu_openmpi<br />
<pre><br />
$ module purge<br />
$ module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
<br />
$ mpicc helloworld-mpi.c -o helloworld_gnu_openmpi<br />
</pre><br />
* Create the Slurm job script that will request 8 cpu slots and a maximum runtime of 10 minutes<br />
<pre><br />
$ vi helloworld.job<br />
</pre><br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=helloworld_mpi<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=helloworld_mpi.err<br />
#SBATCH --output=helloworld_mpi.out<br />
#SBATCH --ntasks=8<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:01:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
mpirun -np $SLURM_NTASKS helloworld_gnu_openmpi<br />
</pre><br />
* Submit the job to Slurm scheduler and check the status using squeue -u $USER<br />
<pre><br />
$ sbatch helloworld.job<br />
<br />
Submitted batch job 52893<br />
<br />
$ squeue -u BLAZERID<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
52893 express hellowor BLAZERID R 2:07 2 c[0005-0006]<br />
<br />
</pre><br />
* When the job completes, you should have output files named helloworld_mpi.out and helloworld_mpi.err<br />
<pre><br />
$ cat helloworld_mpi.out<br />
<br />
Hello World from process 1 of 8.<br />
Hello World from process 3 of 8.<br />
Hello World from process 4 of 8.<br />
Hello World from process 7 of 8.<br />
Hello World from process 5 of 8.<br />
Hello World from process 6 of 8.<br />
Hello World from process 0 of 8.<br />
Hello World from process 2 of 8.<br />
</pre><br />
<br />
=== Hello World (serial) -- revisited ===<br />
<br />
The job submit scripts (sbatch scripts) are actually bash shell scripts in their own right. The reason for using the funky #SBATCH prefix in the scripts is so that bash interprets any such line as a comment and won't execute it. Because the # character starts a comment in bash, we can weave the Slurm scheduler directives (the #SBATCH lines) into standard bash scripts. This lets us build scripts that we can execute locally and then easily run the same script to on a cluster node by calling it with sbatch. This can be used to our advantage to create a more fluid experience in moving between development and production job runs. <br />
<br />
The following example is a simple variation on the serial job above. All we will do is convert our Slurm job script into a command called helloworld that calls the helloworld.sh command.<br />
<br />
If the first line of a file is #!/bin/bash and that file is executable, the shell will automatically run the command as if were any other system command, eg. ls. That is, the ".sh" extension on our HelloWorld.sh script is completely optional and is only meaningful to the user.<br />
<br />
Copy the serial helloworld.job script to a new file, add a the special #!/bin/bash as the first line, and make it executable with the following command (note: those are single quotes in the echo command): <br />
<pre><br />
echo '#!/bin/bash' | cat helloworld.job > helloworld ; chmod +x helloworld<br />
</pre><br />
<br />
Our sbatch script has now become a regular command. We can now execute the command with the simple prefix "./helloworld", which means "execute this file in the current directory":<br />
<pre><br />
./helloworld<br />
Hello from login001<br />
</pre><br />
Or if we want to run the command on a compute node, replace the "./" prefix with "sbatch ":<br />
<pre><br />
$ sbatch helloworld<br />
Submitted batch job 53001<br />
</pre><br />
And when the cluster run is complete you can look at the content of the output:<br />
<pre><br />
$ $ cat helloworld.out <br />
Hello from c0003<br />
</pre><br />
<br />
You can use this approach of treating you sbatch files as command wrappers to build a collection of commands that can be executed locally or via sbatch. The other examples can be restructured similarly.<br />
<br />
To avoid having to use the "./" prefix, just add the current directory to your PATH. Also, if you plan to do heavy development using this feature on the cluster, please be sure to run [https://docs.uabgrid.uab.edu/wiki/Slurm#Interactive_Session sinteractive] first so you don't load the login node with your development work.<br />
<br />
=== Gromacs ===<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=test_gromacs<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=test_gromacs.err<br />
#SBATCH --output=test_gromacs.out<br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=4<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:01:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
<br />
module load GROMACS/5.0.5-intel-2015b-hybrid <br />
<br />
# Change directory to the job working directory if not already there<br />
cd ${USER_SCRATCH}/jobs/gromacs<br />
<br />
# Single precision<br />
MDRUN=mdrun_mpi<br />
<br />
# Enter your tpr file over here<br />
export MYFILE=example.tpr<br />
<br />
srun mpirun $MDRUN -v -s $MYFILE -o $MYFILE -c $MYFILE -x $MYFILE -e $MYFILE -g ${MYFILE}.log<br />
<br />
</pre><br />
<br />
=== R ===<br />
<br />
The following is an example job script that will use an array of 10 tasks (--array=1-10), each task has a max runtime of 2 hours and will use no more than 256 MB of RAM per task.<br />
<br />
Create a working directory and the job submission script<br />
<pre><br />
$ mkdir -p ~/jobs/ArrayExample<br />
$ cd ~/jobs/ArrayExample<br />
$ vi R-example-array.job<br />
</pre><br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --array=1-10<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=R_array_job<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=R_array_job.err<br />
#SBATCH --output=R_array_job.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load R/3.2.0-goolf-1.7.20 <br />
cd ~/jobs/ArrayExample/rep$SLURM_ARRAY_TASK_ID<br />
srun R CMD BATCH rscript.R<br />
</pre><br />
<br />
Submit the job to the Slurm scheduler and check the status of the job using the squeue command<br />
<pre><br />
$ sbatch R-example-array.job<br />
$ squeue -u $USER<br />
</pre><br />
<br />
== Installed Software ==<br />
<br />
A partial list of installed software with additional instructions for their use is available on the [[Cheaha Software]] page.</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Cheaha_GettingStarted&diff=5559Cheaha GettingStarted2017-05-11T17:23:21Z<p>Mhanby@uab.edu: /* Home directories */</p>
<hr />
<div>Cheaha is a cluster computing environment for UAB researchers.Information about the history and future plans for Cheaha is available on the [[cheaha]] page.<br />
<br />
== Access (Cluster Account Request) ==<br />
<br />
To request an account on [[Cheaha]], please {{CheahaAccountRequest}}. Please include some background information about the work you plan on doing on the cluster and the group you work with, ie. your lab or affiliation.<br />
<br />
Usage of Cheaha is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources. <br />
<br />
The official DNS name of Cheaha's frontend machine is ''cheaha.rc.uab.edu''. If you want to refer to the machine as ''cheaha'', you'll have to either add the "rc.uab.edu" to you computer's DNS search path. On Unix-derived systems (Linux, Mac) you can edit your computers /etc/resolv.conf as follows (you'll need administrator access to edit this file)<br />
<pre><br />
search rc.uab.edu<br />
</pre><br />
Or you can customize your SSH configuration to use the short name "cheaha" as a connection name. On systems using OpenSSH you can add the following to your ~/.ssh/config file<br />
<br />
<pre><br />
Host cheaha<br />
Hostname cheaha.rc.uab.edu<br />
</pre><br />
<br />
== Login ==<br />
===Overview===<br />
Once your account has been created, you'll receive an email containing your user ID, generally your Blazer ID. Logging into Cheaha requires an SSH client. Most UAB Windows workstations already have an SSH client installed, possibly named '''SSH Secure Shell Client''' or [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY]. Linux and Mac OS X systems should have an SSH client installed by default.<br />
<br />
Usage of Cheaha is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources.<br />
<br />
===Client Configuration===<br />
This section will cover steps to configure Windows, Linux and Mac OS X clients to connect to Cheaha.<br />
====Linux====<br />
Linux systems, regardless of the flavor (RedHat, SuSE, Ubuntu, etc...), should already have an SSH client on the system as part of the default install.<br />
# Start a terminal (on RedHat click Applications -> Accessories -> Terminal, on Ubuntu Ctrl+Alt+T)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Mac OS X====<br />
Mac OS X is a Unix operating system (BSD) and has a built in ssh client.<br />
# Start a terminal (click Finder, type Terminal and double click on Terminal under the Applications category)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Windows====<br />
There are many SSH clients available for Windows, some commercial and some that are free (GPL). This section will cover two clients that are commonly found on UAB Windows systems.<br />
=====MobaXterm=====<br />
[http://mobaxterm.mobatek.net/ MobaXterm] is a free (also available for a price in a Profession version) suite of SSH tools. Of the Windows clients we've used, MobaXterm is the easiest to use and feature complete. [http://mobaxterm.mobatek.net/features.html Features] include (but not limited to):<br />
* SSH client (in a handy web browser like tabbed interface)<br />
* Embedded Cygwin (which allows Windows users to run many Linux commands like grep, rsync, sed)<br />
* Remote file system browser (graphical SFTP)<br />
* X11 forwarding for remotely displaying graphical content from Cheaha<br />
* Installs without requiring Windows Administrator rights<br />
<br />
Start MobaXterm and click the Session toolbar button (top left). Click SSH for the session type, enter the following information and click OK. Once finished, double click cheaha.rc.uab.edu in the list of Saved sessions under PuTTY sessions:<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Remote host'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|}<br />
<br />
=====PuTTY=====<br />
[http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY] is a free suite of SSH and telnet tools written and maintained by [http://www.pobox.com/~anakin/ Simon Tatham]. PuTTY supports SSH, secure FTP (SFTP), and X forwarding (XTERM) among other tools.<br />
<br />
* Start PuTTY (Click START -> All Programs -> PuTTY -> PuTTY). The 'PuTTY Configuration' window will open<br />
* Use these settings for each of the clusters that you would like to configure<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host Name (or IP address)'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Saved Sessions'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|}<br />
* Click '''Save''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start PuTTY, simply double click on the cluster name under the 'Saved Sessions' list<br />
<br />
=====SSH Secure Shell Client=====<br />
SSH Secure Shell is a commercial application that is installed on many Windows workstations on campus and can be configured as follows:<br />
* Start the program (Click START -> All Programs -> SSH Secure Shell -> Secure Shell Client). The 'default - SSH Secure Shell' window will open<br />
* Click File -> Profiles -> Add Profile to open the 'Add Profile' window<br />
* Type in the name of the cluster (for example: cheaha) in the field and click 'Add to Profiles'<br />
* Click File -> Profiles -> Edit Profiles to open the 'Profiles' window<br />
* Single click on your new profile name<br />
* Use these settings for the clusters<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host name'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''User name'''<br />
|blazerid (insert your blazerid here)<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Encryption algorithm'''<br />
|<Default><br />
|-<br />
|'''MAC algorithm'''<br />
|<Default><br />
|-<br />
|'''Compression'''<br />
|<None><br />
|-<br />
|'''Terminal answerback'''<br />
|vt100<br />
|-<br />
|}<br />
* Leave 'Connect through firewall' and 'Request tunnels only' unchecked<br />
* Click '''OK''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start SSH Secure Shell, click 'Profiles' and click the cluster name<br />
<br />
=== Logging in to Cheaha ===<br />
No matter which client you use to connect to the Cheaha, the first time you connect, the SSH client should display a message asking if you would like to import the hosts public key. Answer '''Yes''' to this question.<br />
<br />
* Connect to Cheaha using one of the methods listed above<br />
* Answer '''Yes''' to import the cluster's public key<br />
** Enter your BlazerID password<br />
<br />
* After successfully logging in for the first time, You may see the following message '''just press ENTER for the next three prompts, don't type any passphrases!'''<br />
<br />
It doesn't appear that you have set up your ssh key.<br />
This process will make the files:<br />
/home/joeuser/.ssh/id_rsa.pub<br />
/home/joeuser/.ssh/id_rsa<br />
/home/joeuser/.ssh/authorized_keys<br />
<br />
Generating public/private rsa key pair.<br />
Enter file in which to save the key (/home/joeuser/.ssh/id_rsa):<br />
** Enter file in which to save the key (/home/joeuser/.ssh/id_rsa):'''Press Enter'''<br />
** Enter passphrase (empty for no passphrase):'''Press Enter'''<br />
** Enter same passphrase again:'''Press Enter'''<br />
Your identification has been saved in /home/joeuser/.ssh/id_rsa.<br />
Your public key has been saved in /home/joeuser/.ssh/id_rsa.pub.<br />
The key fingerprint is:<br />
f6:xx:xx:xx:xx:dd:9a:79:7b:83:xx:f9:d7:a7:d6:27 joeuser@cheaha.rc.uab.edu<br />
<br />
==== Users without a blazerid (collaborators from other universities) ====<br />
** If you were issued a temporary password, enter it (Passwords are CaSE SensitivE!!!) You should see a message similar to this<br />
You are required to change your password immediately (password aged)<br />
<br />
WARNING: Your password has expired.<br />
You must change your password now and login again!<br />
Changing password for user joeuser.<br />
Changing password for joeuser<br />
(current) UNIX password:<br />
*** (current) UNIX password: '''Enter your temporary password at this prompt and press enter'''<br />
*** New UNIX password: '''Enter your new strong password and press enter'''<br />
*** Retype new UNIX password: '''Enter your new strong password again and press enter'''<br />
*** After you enter your new password for the second time and press enter, the shell may exit automatically. If it doesn't, type exit and press enter<br />
*** Log in again, this time use your new password<br />
<br />
Congratulations, you should now have a command prompt and be ready to start [[Cheaha_GettingStarted#Example_Batch_Job_Script | submitting jobs]]!!!<br />
<br />
== Hardware ==<br />
[[Image:Chehah2_2016.png|center|thumb|450px|Logical Diagram of Cheaha Configuration]]<br />
<br />
=== Hardware ===<br />
<br />
The Cheaha Compute Platform includes three generations of commodity compute hardware, totaling 2340 compute cores, 20 TB of RAM, and over 4.7PB of storage.<br />
<br />
The hardware is grouped into generations designated gen3, gen4, gen5 and gen6(oldest to newest). The following descriptions highlight the hardware profile for each generation. <br />
<br />
* Generation 3 (gen3) -- 48 2x6 core (576 cores total) 2.66 GHz Intel compute nodes with quad data rate Infiniband, ScaleMP, and the high-perf storage build-out for capacity and redundancy with 120TB DDN. This is the hardware collection purchased with a combination of the NIH SIG funds and some of the 2010 annual VPIT investment. These nodes were given the code name "sipsey" and tagged as such in the node naming for the queue system. These nodes are tagged as "sipsey-compute-#-#" in the ROCKS naming convention. 16 of the gen3 nodes (sipsey-compute-0-1 thru sipsey-compute-0-16) were upgraded in 2014 from 48GB to 96GB of memory per node.<br />
<br />
* Generation 4 (gen4) -- 3 16 core (48 cores total) compute nodes. This hardware collection purchase by [http://www.soph.uab.edu/ssg/people/tiwari Hemant Tiwari of SSG]. These nodes were given the code name "ssg" and tagged as such in the node naming for the queue system. These nodes are tagged as "ssg-compute-0-#" in the ROCKS naming convention.<br />
<br />
* DDN GPFS storage cluster<br />
** 2 x 12KX40D-56IB controllers<br />
** 10 x SS8460 disk enclosures<br />
** 825 x 4K SAS drives<br />
<br />
* Generation 6 (gen6) -- <br />
** 36 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 38 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 256GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 14 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 384GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 4 Compute Nodes with Nvidia Tesla K80 and two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 4 Compute Nodes with Intel Phi coprocessor SE10/7120 and two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** FDR InfiniBand Switch<br />
** 10Gigabit Ethernet Switch<br />
** Management node and gigabit switch for cluster management<br />
** Bright Advanced Cluster Management software licenses <br />
<br />
Summarized, Cheaha's compute pool includes:<br />
* gen4 is 48 cores of [http://ark.intel.com/products/64583/Intel-Xeon-Processor-E5-2680-20M-Cache-2_70-GHz-8_00-GTs-Intel-QPI 2.70GHz eight-core Intel Xeon E5-2680 processors] with 24G of RAM per core or 384GB total<br />
* gen3.1 is 192 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 8Gb RAM per core or 96GB total<br />
* gen3 is 384 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 4Gb RAM per core or 48GB total <br />
<br />
<br />
{|border="1" cellpadding="2" cellspacing="0"<br />
|+ Physical Nodes<br />
|- bgcolor=grey<br />
!gen!!queue!!#nodes!!cores/node!!RAM/node<br />
|-<br />
|gen6|| ?? || 36 || 24 || 128G<br />
|-<br />
|gen6|| ?? || 38 || 24 || 256G<br />
|-<br />
|gen6|| ?? || 14 || 24 || 384G<br />
|-<br />
|gen5||openstack(?)|| ? || ? || ?G<br />
|-<br />
|gen4||ssg||3||16||385G<br />
|-<br />
|gen3.1||sipsey||16||12||96G<br />
|-<br />
|gen3||sipsey||32||12||48G<br />
|-<br />
|gen2||cheaha||24||8||16G<br />
|}<br />
<br />
=== Performance ===<br />
{{CheahaTflops}}<br />
<br />
== Cluster Software ==<br />
* BrightCM 7.2<br />
* CentOS 7.2 x86_64<br />
* [[Slurm]] 15.08<br />
<br />
== Queuing System ==<br />
All work on Cheaha must be submitted to our queuing system ([[Slurm]]). A common mistake made by new users is to run 'jobs' on the login node. This section gives a basic overview of what a queuing system is and why we use it.<br />
=== What is a queuing system? ===<br />
* Software that gives users fair allocation of the cluster's resources<br />
* Schedules jobs based using resource requests (the following are commonly requested resources, there are many more that are available)<br />
** Number of processors (often referred to as "slots")<br />
** Maximum memory (RAM) required per slot<br />
** Maximum run time<br />
* Common queuing systems:<br />
** '''[[Slurm]]'''<br />
** Sun Grid Engine (Also know as SGE, OGE, GE)<br />
** OpenPBS<br />
** Torque<br />
** LSF (load sharing facility)<br />
<br />
[http://slurm.schedmd.com/ Slurm] is a queue management system and stands for Simple Linux Utility for Resource Management. Slurm was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. '''[[Slurm]]''' is now the primary job manager on Cheaha, it replaces SUN Grid Engine ([[https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted_deprecated SGE]]) the job manager used earlier.<br />
<br />
=== Typical Workflow ===<br />
* Stage data to $USER_SCRATCH (your scratch directory)<br />
* Research how to run your code in "batch" mode. Batch mode typically means the ability to run it from the command line without requiring any interaction from the user.<br />
* Identify the appropriate resources needed to run the job. The following are mandatory resource requests for all jobs on Cheaha<br />
** Maximum memory (RAM) required per slot<br />
** Maximum runtime<br />
* Write a job script specifying queuing system parameters, resource requests and commands to run program<br />
* Submit script to queuing system (sbatch script.job)<br />
* Monitor job (squeue)<br />
* Review the results and resubmit as necessary<br />
* Clean up the scratch directory by moving or deleting the data off of the cluster<br />
<br />
=== Resource Requests ===<br />
Accurate resource requests are extremely important to the health of the over all cluster. In order for Cheaha to operate properly, the queing system must know how much runtime and RAM each job will need.<br />
<br />
==== Mandatory Resource Requests ====<br />
<br />
* -t, --time=<time><br />
Set a limit on the total run time of the job allocation. If the requested time limit exceeds the partition's time limit, the job will be left in a PENDING state (possibly indefinitely).<br />
* For Array jobs, this represents the maximum run time for each task<br />
** For serial or parallel jobs, this represents the maximum run time for the entire job<br />
<br />
* --mem-per-cpu=<MB><br />
Mimimum memory required per allocated CPU in MegaBytes.<br />
<br />
==== Other Common Resource Requests ====<br />
* -N, --nodes=<minnodes[-maxnodes]><br />
Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count.<br />
<br />
* -n, --ntasks=<number><br />
sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node<br />
<br />
* --mem=<MB><br />
Specify the real memory required per node in MegaBytes.<br />
<br />
* -c, --cpus-per-task=<ncpus><br />
Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task.<br />
<br />
* -p, --partition=<partition_names><br />
Request a specific partition for the resource allocation. Available partitions are: express(max 2 hrs), short(max 12 hrs), medium(max 50 hrs), long(max 150 hrs), sinteractive(0-2 hrs)<br />
<br />
=== Submitting Jobs ===<br />
Batch Jobs are submitted on Cheaha by using the "sbatch" command. The full manual for sbtach is available by running the following command<br />
man sbatch<br />
<br />
==== Job Script File Format ====<br />
To submit a job to the queuing systems, you will first define your job in a script (a text file) and then submit that script to the queuing system.<br />
<br />
The script file needs to be '''formatted as a UNIX file''', not a Windows or Mac text file. In geek speak, this means that the end of line (EOL) character should be a line feed (LF) rather than a carriage return line feed (CRLF) for Windows or carriage return (CR) for Mac.<br />
<br />
If you submit a job script formatted as a Windows or Mac text file, your job will likely fail with misleading messages, for example that the path specified does not exist.<br />
<br />
Windows '''Notepad''' does not have the ability to save files using the UNIX file format. Do NOT use Notepad to create files intended for use on the clusters. Instead use one of the alternative text editors listed in the following section.<br />
<br />
===== Converting Files to UNIX Format =====<br />
====== Dos2Unix Method ======<br />
The lines below that begin with $ are commands, the $ represents the command prompt and should not be typed!<br />
<br />
The dos2unix program can be used to convert Windows text files to UNIX files with a simple command. After you have copied the file to your home directory on the cluster, you can identify that the file is a Windows file by executing the following (Windows uses CR LF as the line terminator, where UNIX uses only LF and Mac uses only CR):<br />
<pre><br />
$ file testfile.txt<br />
<br />
testfile.txt: ASCII text, with CRLF line terminators<br />
</pre><br />
<br />
Now, convert the file to UNIX<br />
<pre><br />
$ dos2unix testfile.txt<br />
<br />
dos2unix: converting file testfile.txt to UNIX format ...<br />
</pre><br />
<br />
Verify the conversion using the file command<br />
<pre><br />
$ file testfile.txt<br />
<br />
testfile.txt: ASCII text<br />
</pre><br />
<br />
====== Alternative Windows Text Editors ======<br />
There are many good text editors available for Windows that have the capability to save files using the UNIX file format. Here are a few:<br />
* [[http://www.geany.org/ Geany]] is an excellent free text editor for Windows and Linux that supports Windows, UNIX and Mac file formats, syntax highlighting and many programming features. To convert from Windows to UNIX click '''Document''' click '''Set Line Endings''' and then '''Convert and Set to LF (Unix)'''<br />
* [[http://notepad-plus.sourceforge.net/uk/site.htm Notepad++]] is a great free Windows text editor that supports Windows, UNIX and Mac file formats, syntax highlighting and many programming features. To convert from Windows to UNIX click '''Format''' and then click '''Convert to UNIX Format'''<br />
* [[http://www.textpad.com/ TextPad]] is another excellent Windows text editor. TextPad is not free, however.<br />
<br />
==== Example Batch Job Script ====<br />
A shared cluster environment like Cheaha uses a job scheduler to run tasks on the cluster to provide optimal resource sharing among users. Cheaha uses a job scheduling system call Slurm to schedule and manage jobs. A user needs to tell Slurm about resource requirements (e.g. CPU, memory) so that it can schedule jobs effectively. These resource requirements along with actual application code can be specified in a single file commonly referred as 'Job Script/File'. Following is a simple job script that prints job number and hostname.<br />
<br />
'''Note:'''Jobs '''must request''' the appropriate partition (ex: ''--partition=short'') to satisfy the jobs resource request (maximum runtime, number of compute nodes, etc...)<br />
<pre><br />
#!/bin/bash<br />
#<br />
#SBATCH --job-name=test<br />
#SBATCH --output=res.txt<br />
#SBATCH --ntasks=1<br />
#SBATCH --partition=express<br />
#SBATCH --time=10:00<br />
#SBATCH --mem-per-cpu=100<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
srun hostname<br />
srun sleep 60<br />
</pre><br />
<br />
Lines starting with '#SBATCH' have a special meaning in the Slurm world. Slurm specific configuration options are specified after the '#SBATCH' characters. Above configuration options are useful for most job scripts and for additional configuration options refer to Slurm commands manual. A job script is submitted to the cluster using Slurm specific commands. There are many commands available, but following three commands are the most common:<br />
* sbatch - to submit job<br />
* scancel - to delete job<br />
* squeue - to view job status<br />
<br />
We can submit above job script using sbatch command:<br />
<pre><br />
$ sbatch HelloCheaha.sh<br />
Submitted batch job 52707<br />
</pre><br />
<br />
When the job script is submitted, Slurm queues it up and assigns it a job number (e.g. 52707 in above example). The job number is available inside job script using environment variable $JOB_ID. This variable can be used inside job script to create job related directory structure or file names.<br />
<br />
=== Interactive Resources ===<br />
Login Node (the host that you connected to when you setup the SSH connection to Cheaha) is supposed to be used for submitting jobs and/or lighter prep work required for the job scripts. '''Do not run heavy computations on the login node'''. If you have a heavier workload to prepare for a batch job (eg. compiling code or other manipulations of data) or your compute application requires interactive control, you should request a dedicated interactive node for this work.<br />
<br />
Interactive resources are requested by submitting an "interactive" job to the scheduler. Interactive jobs will provide you a command line on a compute resource that you can use just like you would the command line on the login node. The difference is that the scheduler has dedicated the requested resources to your job and you can run your interactive commands without having to worry about impacting other users on the login node.<br />
<br />
Interactive jobs, that can be run on command line, are requested with the '''srun''' command. <br />
<br />
<pre><br />
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash<br />
</pre><br />
<br />
This command requests for 4 cores (--cpus-per-task) for a single task (--ntasks) with each cpu requesting size 4GB of RAM (--mem-per-cpu) for 8 hrs (--time).<br />
<br />
More advanced interactive scenarios to support graphical applications are available using [https://docs.uabgrid.uab.edu/wiki/Setting_Up_VNC_Session VNC] or X11 tunneling [http://www.uab.edu/it/software X-Win32 2014 for Windows]<br />
<br />
Interactive jobs that requires running a graphical application, are requested with the '''sinteractive''' command, via '''Terminal''' on your VNC window.<br />
<br />
== Storage ==<br />
<br />
=== No Automatic Backups ===<br />
<br />
There is no automatic back up of any data on the cluster (home, scratch, or whatever). All data back up is managed by you. If you aren't managing a data back up process, then you have no backup data.<br />
<br />
=== Home directories ===<br />
<br />
Your home directory on Cheaha is NFS-mounted to the compute nodes as /home/$USER or $HOME. It is acceptable to use your home directory as a location to store job scripts, custom code, and libraries. You are responsible for keeping your home directory under 10GB in size!<br />
<br />
'''The home directory must not be used to store large amounts of data.''' Please use $USER_SCRATCH <br />
for actively used data sets and $USER_DATA for storage of non scratch data.<br />
<br />
=== Scratch ===<br />
Research Computing policy requires that all bulky input and output must be located on the scratch space. The home directory is intended to store your job scripts, log files, libraries and other supporting files.<br />
<br />
'''Important Information:'''<br />
* Scratch space (network and local) '''is not backed up'''.<br />
* Research Computing expects each user to keep their scratch areas clean. The cluster scratch area are not to be used for archiving data.<br />
<br />
Cheaha has two types of scratch space, network mounted and local.<br />
* Network scratch ($USER_SCRATCH) is available on the login node and each compute node. This storage is a Lustre high performance file system providing roughly 240TB of storage. This should be your jobs primary working directory, unless the job would benefit from local scratch (see below).<br />
* Local scratch is physically located on each compute node and is not accessible to the other nodes (including the login node). This space is useful if the job performs a lot of file I/O. Most of the jobs that run on our clusters do not fall into this category. Because the local scratch is inaccessible outside the job, it is important to note that you must move any data between local scratch to your network accessible scratch within your job. For example, step 1 in the job could be to copy the input from $USER_SCRATCH to ${USER_SCRATCH}, step 2 code execution, step 3 move the data back to $USER_SCRATCH.<br />
<br />
==== Network Scratch ====<br />
Network scratch is available using the environment variable $USER_SCRATCH or directly by /data/scratch/$USER<br />
<br />
It is advisable to use the environment variable whenever possible rather than the hard coded path.<br />
<br />
==== Local Scratch ====<br />
Each compute node has a local scratch directory that is accessible via the variable '''$LOCAL_SCRATCH'''. If your job performs a lot of file I/O, the job should use $LOCAL_SCRATCH rather than $USER_SCRATCH to prevent bogging down the network scratch file system. The amount of scratch space available is approximately 800GB.<br />
<br />
The $LOCAL_SCRATCH is a special temporary directory and it's important to note that this directory is deleted when the job completes, so the job script has to move the results to $USER_SCRATCH or other location prior to the job exiting.<br />
<br />
Note that $LOCAL_SCRATCH is only useful for jobs in which all processes run on the same compute node, so MPI jobs are not candidates for this solution.<br />
<br />
The following is an array job example that uses $LOCAL_SCRATCH by transferring the inputs into $LOCAL_SCRATCH at the beginning of the script and the result out of $LOCAL_SCRATCH at the end of the script.<br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --array=1-10<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=R_array_job<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=R_array_job.err<br />
#SBATCH --output=R_array_job.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes and the appropriate partition<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load R/3.2.0-goolf-1.7.20<br />
<br />
echo "TMPDIR: $LOCAL_SCRATCH"<br />
<br />
cd $LOCAL_SCRATCH<br />
# Create a working directory under the special scheduler local scratch directory<br />
# using the array job's taskID<br />
mdkir $SLURM_ARRAY_TASK_ID<br />
cd $SLURM_ARRAY_TASK_ID<br />
<br />
# Next copy the input data to the local scratch<br />
echo "Copying input data from network scratch to $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID - $(date)<br />
# The input data in this case has a numerical file extension that<br />
# matches $SLURM_ARRAY_TASK_ID<br />
cp -a $USER_SCRATCH/GeneData/INP*.$SLURM_ARRAY_TASK_ID ./<br />
echo "copied input data from network scratch to $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID - $(date)<br />
<br />
someapp -S 1 -D 10 -i INP*.$SLURM_ARRAY_TASK_ID -o geneapp.out.$SLURM_ARRAY_TASK_ID<br />
<br />
# Lastly copy the results back to network scratch<br />
echo "Copying results from local $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID to network - $(date)<br />
cp -a geneapp.out.$SLURM_ARRAY_TASK_ID $USER_SCRATCH/GeneData/<br />
echo "Copied results from local $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID to network - $(date)<br />
<br />
</pre><br />
<br />
=== Project Storage ===<br />
Cheaha has a location where shared data can be stored called $SHARE_SCRATCH . As with user scratch, this area '''is not backed up'''!<br />
<br />
This is helpful if a team of researchers must access the same data. Please open a help desk ticket to request a project directory under $SHARE_SCRATCH.<br />
<br />
=== Uploading Data ===<br />
<br />
Data can be moved onto the cluster (pushed) from a remote client (ie. you desktop) via SCP or SFTP. Data can also be downloaded to the cluster (pulled) by issuing transfer commands once you are logged into the cluster. Common transfer methods are `wget <URL>`, FTP, or SCP, and depend on how the data is made available from the data provider.<br />
<br />
Large data sets should be staged directly to your $USER_SCRATCH directory so as not to fill up $HOME. If you are working on a data set shared with multiple users, it's preferable to request space in $SHARE_SCRATCH rather than duplicating the data for each user.<br />
<br />
== Environment Modules ==<br />
[http://modules.sourceforge.net/ Environment Modules] is installed on Cheaha and should be used when constructing your job scripts if an applicable module file exists. Using the module command you can easily configure your environment for specific software packages without having to know the specific environment variables and values to set. Modules allows you to dynamically configure your environment without having to logout / login for the changes to take affect.<br />
<br />
If you find that specific software does not have a module, please submit a [http://etlab.eng.uab.edu/ helpdesk ticket] to request the module.<br />
<br />
* Cheaha supports bash completion for the module command. For example, type 'module' and press the TAB key twice to see a list of options:<br />
<pre><br />
module TAB TAB<br />
<br />
add display initlist keyword refresh switch use <br />
apropos help initprepend list rm unload whatis <br />
avail initadd initrm load show unuse <br />
clear initclear initswitch purge swap update<br />
</pre><br />
<br />
* To see the list of available modulefiles on the cluster, run the '''module avail''' command (note the example list below may not be complete!) or '''module load ''' followed by two tab key presses:<br />
<pre><br />
module avail<br />
<br />
----------------------------------------------------------------------------------------- /cm/shared/modulefiles -----------------------------------------------------------------------------------------<br />
acml/gcc/64/5.3.1 acml/open64-int64/mp/fma4/5.3.1 fftw2/openmpi/gcc/64/float/2.1.5 intel-cluster-runtime/ia32/3.8 netcdf/gcc/64/4.3.3.1<br />
acml/gcc/fma4/5.3.1 blacs/openmpi/gcc/64/1.1patch03 fftw2/openmpi/open64/64/double/2.1.5 intel-cluster-runtime/intel64/3.8 netcdf/open64/64/4.3.3.1<br />
acml/gcc/mp/64/5.3.1 blacs/openmpi/open64/64/1.1patch03 fftw2/openmpi/open64/64/float/2.1.5 intel-cluster-runtime/mic/3.8 netperf/2.7.0<br />
acml/gcc/mp/fma4/5.3.1 blas/gcc/64/3.6.0 fftw3/openmpi/gcc/64/3.3.4 intel-tbb-oss/ia32/44_20160526oss open64/4.5.2.1<br />
acml/gcc-int64/64/5.3.1 blas/open64/64/3.6.0 fftw3/openmpi/open64/64/3.3.4 intel-tbb-oss/intel64/44_20160526oss openblas/dynamic/0.2.15<br />
acml/gcc-int64/fma4/5.3.1 bonnie++/1.97.1 gdb/7.9 iozone/3_434 openmpi/gcc/64/1.10.1<br />
acml/gcc-int64/mp/64/5.3.1 cmgui/7.2 globalarrays/openmpi/gcc/64/5.4 lapack/gcc/64/3.6.0 openmpi/open64/64/1.10.1<br />
acml/gcc-int64/mp/fma4/5.3.1 cuda75/blas/7.5.18 globalarrays/openmpi/open64/64/5.4 lapack/open64/64/3.6.0 pbspro/13.0.2.153173<br />
acml/open64/64/5.3.1 cuda75/fft/7.5.18 hdf5/1.6.10 mpich/ge/gcc/64/3.2 puppet/3.8.4<br />
acml/open64/fma4/5.3.1 cuda75/gdk/352.79 hdf5_18/1.8.16 mpich/ge/open64/64/3.2 rc-base<br />
acml/open64/mp/64/5.3.1 cuda75/nsight/7.5.18 hpl/2.1 mpiexec/0.84_432 scalapack/mvapich2/gcc/64/2.0.2<br />
acml/open64/mp/fma4/5.3.1 cuda75/profiler/7.5.18 hwloc/1.10.1 mvapich/gcc/64/1.2rc1 scalapack/openmpi/gcc/64/2.0.2<br />
acml/open64-int64/64/5.3.1 cuda75/toolkit/7.5.18 intel/compiler/32/15.0/2015.5.223 mvapich/open64/64/1.2rc1 sge/2011.11p1<br />
acml/open64-int64/fma4/5.3.1 default-environment intel/compiler/64/15.0/2015.5.223 mvapich2/gcc/64/2.2b slurm/15.08.6<br />
acml/open64-int64/mp/64/5.3.1 fftw2/openmpi/gcc/64/double/2.1.5 intel-cluster-checker/2.2.2 mvapich2/open64/64/2.2b torque/6.0.0.1<br />
<br />
---------------------------------------------------------------------------------------- /share/apps/modulefiles -----------------------------------------------------------------------------------------<br />
rc/BrainSuite/15b rc/freesurfer/freesurfer-5.3.0 rc/intel/compiler/64/ps_2016/2016.0.047 rc/matlab/R2015a rc/SAS/v9.4<br />
rc/cmg/2012.116.G rc/gromacs-intel/5.1.1 rc/Mathematica/10.3 rc/matlab/R2015b<br />
rc/dsistudio/dsistudio-20151020 rc/gtool/0.7.5 rc/matlab/R2012a rc/MRIConvert/2.0.8<br />
<br />
--------------------------------------------------------------------------------------- /share/apps/rc/modules/all ---------------------------------------------------------------------------------------<br />
AFNI/linux_openmp_64-goolf-1.7.20-20160616 gperf/3.0.4-intel-2016a MVAPICH2/2.2b-GCC-4.9.3-2.25<br />
Amber/14-intel-2016a-AmberTools-15-patchlevel-13-13 grep/2.15-goolf-1.4.10 NASM/2.11.06-goolf-1.7.20<br />
annovar/2016Feb01-foss-2015b-Perl-5.22.1 GROMACS/5.0.5-intel-2015b-hybrid NASM/2.11.08-foss-2015b<br />
ant/1.9.6-Java-1.7.0_80 GSL/1.16-goolf-1.7.20 NASM/2.11.08-intel-2016a<br />
APBS/1.4-linux-static-x86_64 GSL/1.16-intel-2015b NASM/2.12.02-foss-2016a<br />
ASHS/rev103_20140612 GSL/2.1-foss-2015b NASM/2.12.02-intel-2015b<br />
Aspera-Connect/3.6.1 gtool/0.7.5_linux_x86_64 NASM/2.12.02-intel-2016a<br />
ATLAS/3.10.1-gompi-1.5.12-LAPACK-3.4.2 guile/1.8.8-GNU-4.9.3-2.25 ncurses/5.9-foss-2015b<br />
Autoconf/2.69-foss-2016a HAPGEN2/2.2.0 ncurses/5.9-GCC-4.8.4<br />
Autoconf/2.69-GCC-4.8.4 HarfBuzz/1.2.7-intel-2016a ncurses/5.9-GNU-4.9.3-2.25<br />
Autoconf/2.69-GNU-4.9.3-2.25 HDF5/1.8.15-patch1-intel-2015b ncurses/5.9-goolf-1.4.10<br />
. <br />
.<br />
.<br />
.<br />
</pre><br />
<br />
Some software packages have multiple module files, for example:<br />
* GCC/4.7.2 <br />
* GCC/4.8.1 <br />
* GCC/4.8.2 <br />
* GCC/4.8.4 <br />
* GCC/4.9.2 <br />
* GCC/4.9.3 <br />
* GCC/4.9.3-2.25 <br />
<br />
In this case, the GCC module will always load the latest version, so loading this module is equivalent to loading GCC/4.9.3-2.25. If you always want to use the latest version, use this approach. If you want use a specific version, use the module file containing the appropriate version number.<br />
<br />
Some modules, when loaded, will actually load other modules. For example, the ''GROMACS/5.0.5-intel-2015b-hybrid '' module will also load ''intel/2015b'' and other related tools.<br />
<br />
* To load a module, ex: for a GROMACS job, use the following '''module load''' command in your job script:<br />
<pre><br />
module load GROMACS/5.0.5-intel-2015b-hybrid <br />
</pre><br />
<br />
* To see a list of the modules that you currently have loaded use the '''module list''' command<br />
<pre><br />
module list<br />
<br />
Currently Loaded Modulefiles:<br />
1) slurm/15.08.6 9) impi/5.0.3.048-iccifort-2015.3.187-GNU-4.9.3-2.25 17) Tcl/8.6.3-intel-2015b<br />
2) rc-base 10) iimpi/7.3.5-GNU-4.9.3-2.25 18) SQLite/3.8.8.1-intel-2015b<br />
3) GCC/4.9.3-binutils-2.25 11) imkl/11.2.3.187-iimpi-7.3.5-GNU-4.9.3-2.25 19) Tk/8.6.3-intel-2015b-no-X11<br />
4) binutils/2.25-GCC-4.9.3-binutils-2.25 12) intel/2015b 20) Python/2.7.9-intel-2015b<br />
5) GNU/4.9.3-2.25 13) bzip2/1.0.6-intel-2015b 21) Boost/1.58.0-intel-2015b-Python-2.7.9<br />
6) icc/2015.3.187-GNU-4.9.3-2.25 14) zlib/1.2.8-intel-2015b 22) GROMACS/5.0.5-intel-2015b-hybrid<br />
7) ifort/2015.3.187-GNU-4.9.3-2.25 15) ncurses/5.9-intel-2015b<br />
8) iccifort/2015.3.187-GNU-4.9.3-2.25 16) libreadline/6.3-intel-2015b<br />
</pre><br />
<br />
* A module can be removed from your environment by using the '''module unload''' command:<br />
<pre><br />
module unload GROMACS/5.0.5-intel-2015b-hybrid<br />
</pre><br />
<br />
* The definition of a module can also be viewed using the '''module show''' command, revealing what a specific module will do to your environment:<br />
<pre><br />
module show GROMACS/5.0.5-intel-2015b-hybrid <br />
-------------------------------------------------------------------<br />
/share/apps/rc/modules/all/GROMACS/5.0.5-intel-2015b-hybrid:<br />
<br />
module-whatis GROMACS is a versatile package to perform molecular dynamics,<br />
i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. - Homepage: http://www.gromacs.org <br />
conflict GROMACS <br />
prepend-path CPATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/include <br />
prepend-path LD_LIBRARY_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64 <br />
prepend-path LIBRARY_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64 <br />
prepend-path MANPATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/share/man <br />
prepend-path PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/bin <br />
prepend-path PKG_CONFIG_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64/pkgconfig <br />
setenv EBROOTGROMACS /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid <br />
setenv EBVERSIONGROMACS 5.0.5 <br />
setenv EBDEVELGROMACS /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/easybuild/GROMACS-5.0.5-intel-2015b-hybrid-easybuild-devel <br />
-------------------------------------------------------------------<br />
</pre><br />
<br />
=== Error Using Modules from a Job Script ===<br />
<br />
If you are using modules and the command your job executes runs fine from the command line but fails when you run it from the job, you may be having an issue with the script initialization. If you see this error in your job error output file<br />
<pre><br />
-bash: module: line 1: syntax error: unexpected end of file<br />
-bash: error importing function definition for `BASH_FUNC_module'<br />
</pre><br />
Add the command `unset module` before calling your module files. The -V job argument will cause a conflict with the module function used in your script.<br />
<br />
== Sample Job Scripts ==<br />
The following are sample job scripts, please be careful to edit these for your environment (i.e. replace <font color="red">YOUR_EMAIL_ADDRESS</font> with your real email address), set the h_rt to an appropriate runtime limit and modify the job name and any other parameters.<br />
<br />
'''Hello World''' is the classic example used throughout programming. We don't want to buck the system, so we'll use it as well to demonstrate jobs submission with one minor variation: our hello world will send us a greeting using the name of whatever machine it runs on. For example, when run on the Cheaha login node, it would print "Hello from login001".<br />
<br />
=== Hello World (serial) ===<br />
<br />
A serial job is one that can run independently of other commands, ie. it doesn't depend on the data from other jobs running simultaneously. You can run many serial jobs in any order. This is a common solution to processing lots of data when each command works on a single piece of data. For example, running the same conversion on 100s of images.<br />
<br />
Here we show how to create job script for one simple command. Running more than one command just requires submitting more jobs.<br />
<br />
* Create your hello world application. Run this command to create a script, turn it into to a command, and run the command (just copy and past the following on to the command line).<br />
<pre><br />
cat > helloworld.sh << EOF<br />
#!/bin/bash<br />
echo Hello from `hostname`<br />
EOF<br />
chmod +x helloworld.sh<br />
./helloworld.sh<br />
</pre><br />
<br />
* Create the Slurm job script that will request 256 MB RAM and a maximum runtime of 10 minutes.<br />
<pre><br />
$ vi helloworld.job<br />
</pre><br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=helloworld<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=helloworld.err<br />
#SBATCH --output=helloworld.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=$USER@uab.edu<br />
<br />
./helloworld.sh<br />
</pre><br />
* Submit the job to Slurm scheduler and check the status using squeue<br />
<pre><br />
$ sbatch helloworld.job<br />
Submitted batch job 52888<br />
</pre><br />
* When the job completes, you should have output files named helloworld.out and helloworld.err <br />
<pre><br />
$ cat helloworld.out <br />
Hello from c0003<br />
</pre><br />
<br />
=== Hello World (parallel with MPI) ===<br />
<br />
MPI is used to coordinate the activity of many computations occurring in parallel. It is commonly used in simulation software for molecular dynamics, fluid dynamics, and similar domains where there is significant communication (data) exchanged between cooperating process.<br />
<br />
Here is a simple parallel Slurm job script for running commands the rely on MPI. This example also includes the example of compiling the code and submitting the job script to the Slurm scheduler.<br />
<br />
* First, create a directory for the Hello World jobs<br />
<pre><br />
$ mkdir -p ~/jobs/helloworld<br />
$ cd ~/jobs/helloworld<br />
</pre><br />
* Create the Hello World code written in C (this example of MPI enabled Hello World includes a 3 minute sleep to ensure the job runs for several minutes, a normal hello world example would run in a matter of seconds).<br />
<pre><br />
$ vi helloworld-mpi.c<br />
</pre><br />
<pre><br />
#include <stdio.h><br />
#include <mpi.h><br />
<br />
main(int argc, char **argv)<br />
{<br />
int rank, size;<br />
<br />
int i, j;<br />
float f;<br />
<br />
MPI_Init(&argc,&argv);<br />
MPI_Comm_rank(MPI_COMM_WORLD, &rank);<br />
MPI_Comm_size(MPI_COMM_WORLD, &size);<br />
<br />
printf("Hello World from process %d of %d.\n", rank, size);<br />
sleep(180);<br />
for (j=0; j<=100000; j++)<br />
for(i=0; i<=100000; i++)<br />
f=i*2.718281828*i+i+i*3.141592654;<br />
<br />
MPI_Finalize();<br />
}<br />
</pre><br />
* Compile the code, first purging any modules you may have loaded followed by loading the module for OpenMPI GNU. The mpicc command will compile the code and produce a binary named helloworld_gnu_openmpi<br />
<pre><br />
$ module purge<br />
$ module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
<br />
$ mpicc helloworld-mpi.c -o helloworld_gnu_openmpi<br />
</pre><br />
* Create the Slurm job script that will request 8 cpu slots and a maximum runtime of 10 minutes<br />
<pre><br />
$ vi helloworld.job<br />
</pre><br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=helloworld_mpi<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=helloworld_mpi.err<br />
#SBATCH --output=helloworld_mpi.out<br />
#SBATCH --ntasks=8<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:01:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
mpirun -np $SLURM_NTASKS helloworld_gnu_openmpi<br />
</pre><br />
* Submit the job to Slurm scheduler and check the status using squeue -u $USER<br />
<pre><br />
$ sbatch helloworld.job<br />
<br />
Submitted batch job 52893<br />
<br />
$ squeue -u BLAZERID<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
52893 express hellowor BLAZERID R 2:07 2 c[0005-0006]<br />
<br />
</pre><br />
* When the job completes, you should have output files named helloworld_mpi.out and helloworld_mpi.err<br />
<pre><br />
$ cat helloworld_mpi.out<br />
<br />
Hello World from process 1 of 8.<br />
Hello World from process 3 of 8.<br />
Hello World from process 4 of 8.<br />
Hello World from process 7 of 8.<br />
Hello World from process 5 of 8.<br />
Hello World from process 6 of 8.<br />
Hello World from process 0 of 8.<br />
Hello World from process 2 of 8.<br />
</pre><br />
<br />
=== Hello World (serial) -- revisited ===<br />
<br />
The job submit scripts (sbatch scripts) are actually bash shell scripts in their own right. The reason for using the funky #SBATCH prefix in the scripts is so that bash interprets any such line as a comment and won't execute it. Because the # character starts a comment in bash, we can weave the Slurm scheduler directives (the #SBATCH lines) into standard bash scripts. This lets us build scripts that we can execute locally and then easily run the same script to on a cluster node by calling it with sbatch. This can be used to our advantage to create a more fluid experience in moving between development and production job runs. <br />
<br />
The following example is a simple variation on the serial job above. All we will do is convert our Slurm job script into a command called helloworld that calls the helloworld.sh command.<br />
<br />
If the first line of a file is #!/bin/bash and that file is executable, the shell will automatically run the command as if were any other system command, eg. ls. That is, the ".sh" extension on our HelloWorld.sh script is completely optional and is only meaningful to the user.<br />
<br />
Copy the serial helloworld.job script to a new file, add a the special #!/bin/bash as the first line, and make it executable with the following command (note: those are single quotes in the echo command): <br />
<pre><br />
echo '#!/bin/bash' | cat helloworld.job > helloworld ; chmod +x helloworld<br />
</pre><br />
<br />
Our sbatch script has now become a regular command. We can now execute the command with the simple prefix "./helloworld", which means "execute this file in the current directory":<br />
<pre><br />
./helloworld<br />
Hello from login001<br />
</pre><br />
Or if we want to run the command on a compute node, replace the "./" prefix with "sbatch ":<br />
<pre><br />
$ sbatch helloworld<br />
Submitted batch job 53001<br />
</pre><br />
And when the cluster run is complete you can look at the content of the output:<br />
<pre><br />
$ $ cat helloworld.out <br />
Hello from c0003<br />
</pre><br />
<br />
You can use this approach of treating you sbatch files as command wrappers to build a collection of commands that can be executed locally or via sbatch. The other examples can be restructured similarly.<br />
<br />
To avoid having to use the "./" prefix, just add the current directory to your PATH. Also, if you plan to do heavy development using this feature on the cluster, please be sure to run [https://docs.uabgrid.uab.edu/wiki/Slurm#Interactive_Session sinteractive] first so you don't load the login node with your development work.<br />
<br />
=== Gromacs ===<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=test_gromacs<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=test_gromacs.err<br />
#SBATCH --output=test_gromacs.out<br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=4<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:01:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
<br />
module load GROMACS/5.0.5-intel-2015b-hybrid <br />
<br />
# Change directory to the job working directory if not already there<br />
cd ${USER_SCRATCH}/jobs/gromacs<br />
<br />
# Single precision<br />
MDRUN=mdrun_mpi<br />
<br />
# Enter your tpr file over here<br />
export MYFILE=example.tpr<br />
<br />
srun mpirun $MDRUN -v -s $MYFILE -o $MYFILE -c $MYFILE -x $MYFILE -e $MYFILE -g ${MYFILE}.log<br />
<br />
</pre><br />
<br />
=== R ===<br />
<br />
The following is an example job script that will use an array of 10 tasks (--array=1-10), each task has a max runtime of 2 hours and will use no more than 256 MB of RAM per task.<br />
<br />
Create a working directory and the job submission script<br />
<pre><br />
$ mkdir -p ~/jobs/ArrayExample<br />
$ cd ~/jobs/ArrayExample<br />
$ vi R-example-array.job<br />
</pre><br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --array=1-10<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=R_array_job<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=R_array_job.err<br />
#SBATCH --output=R_array_job.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load R/3.2.0-goolf-1.7.20 <br />
cd ~/jobs/ArrayExample/rep$SLURM_ARRAY_TASK_ID<br />
srun R CMD BATCH rscript.R<br />
</pre><br />
<br />
Submit the job to the Slurm scheduler and check the status of the job using the squeue command<br />
<pre><br />
$ sbatch R-example-array.job<br />
$ squeue -u $USER<br />
</pre><br />
<br />
== Installed Software ==<br />
<br />
A partial list of installed software with additional instructions for their use is available on the [[Cheaha Software]] page.</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=SSH_Key_Authentication&diff=5524SSH Key Authentication2017-03-03T14:55:59Z<p>Mhanby@uab.edu: /* Putty */</p>
<hr />
<div>These instructions assist existing users of Cheaha in getting access to new Cheaha.<br />
<br />
<br />
===Mac OS X===<br />
<br />
* On your Mac open '''Terminal''' application. <br />
* Run the following command on your '''terminal: <br />
<pre><br />
ssh-keygen -t rsa<br />
</pre> <br />
* You can put a passphrase for your SSH key (''' Not mandatory but highly recommended''')<br />
* A '''id_rsa.pub''' file would have been created.<br />
* Open the file by running '''less .ssh/id_rsa.pub''' and copy the content.<br />
* Press '''q''' to exit out of the file.<br />
* Now SSH to your '''cheaha.rc.uab.edu''' account , and paste the content in '''~/.ssh/authorized_keys''' using your favorite editor.<br />
* Now '''log out''' from cheaha.rc.uab.edu and login again. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run '''ssh blazerid@cheaha.rc.uab.edu''' from next time.<br />
<br />
===Linux===<br />
<br />
* On your linux machine open '''Terminal''' application. <br />
* Run the following command on your '''terminal: <br />
<pre><br />
ssh-keygen -t rsa<br />
</pre> <br />
* You can put a passphrase for your SSH key (''' Not mandatory but highly recommended''')<br />
* A '''id_rsa.pub''' file would have been created.<br />
* Open the file by running '''less .ssh/id_rsa.pub''' and copy the content.<br />
* Press '''q''' to exit out of the file.<br />
* Now SSH to your '''cheaha.rc.uab.edu''' account , and paste the content in '''~/.ssh/authorized_keys''' using your favorite editor.<br />
* Now '''log out''' from cheaha.rc.uab.edu and login again. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run '''ssh blazerid@cheaha.rc.uab.edu''' from next time.<br />
<br />
===Windows===<br />
<br />
====Putty====<br />
<br />
You would require a tool called '''puttygen''', to generate SSH keys for the pairing purpose. You can downlaod it [http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html here]. Once you have downloaded and installed '''putty''' and '''puttygen''', follow these instructions:<br />
<br />
* Launch PuTTY Key Generator.<br />
<br />
* Launch the program, click the Generate button. The program generates the keys for you.<br />
<br />
* Enter a unique key passphrase in the Key passphrase and Confirm passphrase fields.<br />
<br />
* Save the public and private keys by clicking the Save public key and Save private key buttons.<br />
<br />
* Enter a unique key passphrase in the Key passphrase and Confirm passphrase fields. You will be prompted for that passphrase whenever you log in to a server with this key. ('''Not Mandatory, but highly recommended''')<br />
<br />
* Copy the content from the public file that you just generated by right clicking in the text field labeled '''Public key for pasting into OpenSSH authorized_keys file''' and choose '''Select All''', right click again and select Copy<br />
<br />
* Now open application '''Putty'''.<br />
<br />
* Set up your session for '''cheaha.rc.uab.edu''' in PuTTy. (If you don't know how, follow these [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#PuTTY instructions]).<br />
<br />
* Login to your Cheaha account.<br />
<br />
* Paste the content of the '''Public key''' that you generated using '''Puttygen''' in '''~/.ssh/authorized_keys''' using your favorite editor.<br />
<br />
* Now select your saved session for '''cheaha.rc.uab.edu'''.<br />
<br />
* Click '''Connection > SSH > Auth''' in the left-hand navigation pane and configure the private key to use by clicking Browse under Private key file for authentication.<br />
<br />
* Navigate to the location where you saved your private key earlier, select the file, and click Open.<br />
<br />
* The private key path is now displayed in the Private key file for authentication field.<br />
<br />
* Click Session in the left-hand navigation pane and click '''Save''' in the Load, save or delete a stored session section.<br />
<br />
* Click Open to begin your session with the server. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run your '''cheaha.rc.uab.edu''' profile from next time.<br />
<br />
====SSH Secure Shell Client====<br />
<br />
* In SSH Secure Shell, from the '''Edit''' menu, select '''Settings...''' <br />
* In the window that opens, select '''Global Settings''', then '''User Authentication''', and then '''Keys'''.<br />
* Under "Key pair management", click Generate New.... In the window that appears, click Next.<br />
* In the Key Generation window that appears:<br />
** From the drop-down list next to '''Key Type:''', select from the following:<br />
***If you want to take less time to initially generate the key, select '''DSA'''.<br />
*** If you want to take less time during each connection for the server to verify your key, select '''RSA'''.<br />
** From the the drop-down list next to '''Key Length:''', select at least '''1024'''. You may choose a greater key length, but the time it takes to generate the key, as well as the time it takes to authenticate using it, will go up.<br />
* Click '''Next'''. The key generation process will start. When it's complete, click Next again.<br />
* In the '''File Name:''' field, enter a name for the file where SSH Secure Shell will store your '''private key'''. Your '''public key''' will be stored in a file with the same name, plus a '''.pub extension'''. <br />
** '''Important:''' You can put a passphrase for your SSH key ( Not mandatory but highly recommended)<br />
* To complete the key generation process, click '''Next''', and then '''Finish'''.<br />
* At the '''Settings''' screen, click '''OK'''.<br />
* Copy the content of .pub file generated.<br />
* Now SSH to your '''cheaha.rc.uab.edu''' account, following the instructions [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#SSH_Secure_Shell_Client here] , and paste the content in '''~/.ssh/authorized_keys''' using your favorite editor.<br />
* Now '''exit/logout''' from your account on '''cheaha.uabgrid.uab.edu''' and login again. You shouldn't see a prompt for password and be directly logged in.<br />
<br />
'''Note:''' You need to perform these steps just for the first time access, you should be able to directly run your '''cheaha.rc.uab.edu''' profile from next time.</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Slurm&diff=5515Slurm2017-02-16T18:55:47Z<p>Mhanby@uab.edu: Added OpenMP / SMP job example</p>
<hr />
<div>[http://slurm.schedmd.com/ Slurm] is a queue management system and stands for Simple Linux Utility for Resource Management. Slurm was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. Slurm is now the primary job manager on Cheaha, it replaces SUN Grid Engine (SGE) the job manager used earlier.<br />
<br />
Slurm is similar in many ways to GridEngine or most other queue systems. You write a batch script then submit it to the queue manager (scheduler). The queue manager then schedules your job to run on the queue (or '''partition''' in Slurm parlance) that you designate. Below we will provide an outline of how to submit jobs to Slurm, how Slurm decides when to schedule your job, and how to monitor progress.<br />
<br />
<br />
== General Slurm Documentation ==<br />
The primary source for documentation on Slurm usage and commands can be found at the [http://slurm.schedmd.com/ Slurm] site. If you Google for Slurm questions, you'll often see the Lawrence Livermore pages as the top hits, but these tend to be outdated.<br />
<br />
A great way to get details on the Slurm commands is the man pages available from the Cheaha cluster. For example, if you type the following command:<br />
<br />
<pre><br />
man sbatch<br />
</pre><br />
you'll get the manual page for the sbatch command.<br />
<br />
== Slurm Partitions ==<br />
Cheaha has the following Slurm partitions (can also be thought of in terms of SGE queues) defined (the lower the number the higher the priority).<br />
<br />
'''Note:'''Jobs '''must request''' the appropriate partition (ex: ''--partition=short'') to satisfy the jobs resource request (maximum runtime, number of compute nodes, etc...)<br />
{{Slurm_Partitions}}<br />
<br />
== Logging on and Running Jobs from the command line ==<br />
Once you've gone through the [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#Access_.28Cluster_Account_Request.29 account setup procedure] and obtained a suitable [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#Client_Configuration terminal application], you can login to the Cheaha system via ssh<br />
<br />
ssh '''BLAZERID'''@cheaha.rc.uab.edu<br />
<br />
Alternatively, '''existing users''' could follow these [https://docs.uabgrid.uab.edu/wiki/SSH_Key_Authentication instructions to add SSH keys] and access the new system.<br />
<br />
Cheaha (new hardware) run the CentOS 7 version of the Linux operating system and commands are run under the "bash" shell (the default shell). There are a number of Linux and [http://www.gnu.org/software/bash/manual/bashref.html bash references], [http://cli.learncodethehardway.org/bash_cheat_sheet.pdf cheat sheets] and [http://www.tldp.org/LDP/Bash-Beginners-Guide/html/ tutorials] available on the web.<br />
<br />
== Typical Workflow ==<br />
* Stage data to $USER_SCRATCH (your scratch directory)<br />
* Determine how to run your code in "batch" mode. Batch mode typically means the ability to run it from the command line without requiring any interaction from the user.<br />
* Identify the appropriate resources needed to run the job. The following are mandatory resource requests for all jobs on Cheaha:<br />
** Number of processor cores required by the job<br />
** Maximum memory (RAM) required per core<br />
** Maximum runtime<br />
* Write a job script specifying queuing system parameters, resource requests, and commands to run program<br />
* Submit script to queuing system (sbatch script.job)<br />
* Monitor job (squeue)<br />
* Review the results and resubmit as necessary<br />
* Clean up the scratch directory by moving or deleting the data off of the cluster<br />
<br />
== Slurm Job Types ==<br />
=== Batch Job ===<br />
'''TODO: ''' provide an explanation of what makes a batch job and why use that vs an interactive job<br />
<br />
For additional information on the '''sbatch''' command execute '''man sbatch''' at the command line to view the manual.<br />
<br />
==== Example Batch Job Script ====<br />
A job consists of '''resource requests''' and '''tasks'''. The Slurm job scheduler interprets lines beginning with '''#SBATCH''' as Slurm arguments. In this example, the job is requesting to run 1 task<br />
<br />
'''Note:'''Jobs '''must request''' the appropriate partition (ex: ''--partition=short'') to satisfy the jobs resource request (maximum runtime, number of compute nodes, etc...)<br />
<pre>#!/bin/bash<br />
#<br />
#SBATCH --job-name=test<br />
#SBATCH --output=res.txt<br />
#SBATCH --ntasks=1<br />
#SBATCH --partition=express<br />
#<br />
# Time format = HH:MM:SS, DD-HH:MM:SS<br />
#<br />
#SBATCH --time=10:00<br />
#<br />
# Mimimum memory required per allocated CPU in MegaBytes. <br />
#<br />
#SBATCH --mem-per-cpu=100<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
srun hostname<br />
srun sleep 60<br />
</pre><br />
[https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#Sample_Job_Scripts Click here] for more example SLURM job scripts.<br />
<br />
=== Interactive Job ===<br />
Login Node (the host that you connected to when you setup the SSH connection to Cheaha) is supposed to be used for submitting jobs and/or lighter prep work required for the job scripts. '''Do not run heavy computations on the login node'''. If you have a heavier workload to prepare for a batch job (eg. compiling code or other manipulations of data) or your compute application requires interactive control, you should request a dedicated interactive node for this work.<br />
<br />
Interactive resources are requested by submitting an "interactive" job to the scheduler. Interactive jobs will provide you a command line on a compute resource that you can use just like you would the command line on the login node. The difference is that the scheduler has dedicated the requested resources to your job and you can run your interactive commands without having to worry about impacting other users on the login node.<br />
<br />
Interactive jobs, that can be run on command line, are requested with the '''srun''' command. <br />
<br />
<pre><br />
srun --ntasks=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash<br />
</pre><br />
<br />
This command requests for 4 cores (--ntasks) with each task requesting size 4GB of RAM for 8 hrs (--time).<br />
<br />
More advanced interactive scenarios to support graphical applications are available using [https://docs.uabgrid.uab.edu/wiki/Setting_Up_VNC_Session VNC] or X11 tunneling [http://www.uab.edu/it/software X-Win32 2014 for Windows]<br />
<br />
Interactive jobs that requires running a graphical application, are requested with the '''sinteractive''' command, via '''Terminal''' on your VNC window.<br />
<br />
<pre><br />
sinteractive --ntasks=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME <br />
</pre><br />
<br />
=== MPI Job ===<br />
'''TODO add MPI information and a job example'''<br />
<br />
=== OpenMP / SMP Job ===<br />
[https://en.wikipedia.org/wiki/OpenMP OpenMP / SMP] jobs are those that use multiple CPU cores on a single compute node.<br />
<br />
It is very important to properly structure an SMP job to ensure that the requested CPU cores are assigned to the same compute node. The following example requests 4 CPU cores by setting the number of '''ntasks''' to '''1''' and '''cpus-per-tasks''' to '''4'''<br />
<br />
<pre><br />
srun --partition=short \<br />
--ntasks=1 \<br />
--cpus-per-task=4 \<br />
--mem-per-cpu=1024 \<br />
--time=5:00:00 \<br />
--job-name=rsync \<br />
--pty /bin/bash<br />
</pre><br />
<br />
== Job Status ==<br />
<br />
=== SQUEUE ===<br />
To check your job status, you can use the following command<br />
<pre><br />
squeue -u $USER<br />
</pre><br />
<br />
Following fields are displayed when you run '''squeue'''<br />
<pre><br />
JOBID - ID assigned to your job by Slurm scheduler<br />
PARTITION - Partition your job gets, depends upon time requested (express(max 2 hrs), short(max 12 hrs), medium(max 50 hrs), long(max 150 hrs), sinteractive(0-2 hrs))<br />
NAME - JOB name given by user<br />
USER - User who started the job<br />
ST - State your job is in. The typical states are PENDING (PD), RUNNING(R), SUSPENDED(S), COMPLETING(CG), and COMPLETED(CD)<br />
TIME - Time for which your job has been running<br />
NODES - Number of nodes your job is running on<br />
NODELIST - Node on which the job is running<br />
</pre><br />
<br />
For more details on '''squeue''', go [http://slurm.schedmd.com/squeue.html here].<br />
<br />
=== SSTAT ===<br />
The '''sstat''' command shows status and metric information for a running job.<br />
<br />
'''NOTE: the job parts must be executed using ''srun'' otherwise ''sstat'' will not display useful output'''<br />
<pre><br />
[rcs@login001 ~]$ sstat 256483<br />
JobID MaxVMSize MaxVMSizeNode MaxVMSizeTask AveVMSize MaxRSS MaxRSSNode MaxRSSTask AveRSS MaxPages MaxPagesNode MaxPagesTask AvePages MinCPU MinCPUNode MinCPUTask AveCPU NTasks AveCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov ConsumedEnergy MaxDiskRead MaxDiskReadNode MaxDiskReadTask AveDiskRead MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask AveDiskWrite <br />
------------ ---------- -------------- -------------- ---------- ---------- ---------- ---------- ---------- -------- ------------ -------------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ------------- ------------- ------------- -------------- ------------ --------------- --------------- ------------ ------------ ---------------- ---------------- ------------ <br />
256483.0 1962728K c0043 1 1960633K 91920K c0043 3 91867K 67K c0043 3 50K 00:00.000 c0043 0 00:00.000 8 1.20G Unknown Unknown Unknown 0 1M c0043 5 1M 0.34M c0043 5 0.34M <br />
<br />
</pre><br />
<br />
For more details on '''sstat''', go [http://slurm.schedmd.com/sstat.html here].<br />
<br />
=== SCONTROL ===<br />
<br />
<pre><br />
$ scontrol show jobid -dd 123<br />
<br />
JobId=123 JobName=SLI<br />
UserId=rcuser(1000) GroupId=rcuser(1000)<br />
Priority=4294898073 Nice=0 Account=(null) QOS=normal<br />
JobState=RUNNING Reason=None Dependency=(null)<br />
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0<br />
DerivedExitCode=0:0<br />
RunTime=06:27:02 TimeLimit=08:00:00 TimeMin=N/A<br />
SubmitTime=2016-09-12T14:40:20 EligibleTime=2016-09-12T14:40:20<br />
StartTime=2016-09-12T14:40:20 EndTime=2016-09-12T22:40:21<br />
PreemptTime=None SuspendTime=None SecsPreSuspend=0<br />
Partition=medium AllocNode:Sid=login001:123<br />
ReqNodeList=(null) ExcNodeList=(null)<br />
NodeList=c0003<br />
BatchHost=c0003<br />
NumNodes=1 NumCPUs=24 CPUs/Task=1 ReqB:S:C:T=0:0:*:*<br />
TRES=cpu=24,mem=10000,node=1<br />
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*<br />
Nodes=c0003 CPU_IDs=0-23 Mem=10000<br />
MinCPUsNode=1 MinMemoryNode=10000M MinTmpDiskNode=0<br />
Features=(null) Gres=(null) Reservation=(null)<br />
Shared=OK Contiguous=0 Licenses=(null) Network=(null)<br />
Command=/share/apps/rc/git/rc-sched-scripts/bin/_interactive<br />
WorkDir=/scratch/user/rcuser/work/other/rhea/Gray/MERGED<br />
StdErr=/dev/null<br />
StdIn=/dev/null<br />
StdOut=/dev/null<br />
Power= SICP=0<br />
</pre><br />
<br />
== Job History ==<br />
TODO: Provide some examples of using the '''sacct''' or our wrapper '''rc-sacct''' to view historical information.<br />
<br />
This example uses the rc-sacct wrapper script, for comparison here is the equivalent sacct command:<br />
<pre><br />
$ sacct --starttime 2016-08-30 \<br />
--allusers \<br />
--format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist<br />
</pre><br />
<pre><br />
$ rc-sacct --allusers --starttime 2016-08-30<br />
<br />
User JobID JobName Partition State Timelimit Start End Elapsed MaxRSS MaxVMSize NNodes NCPUS NodeList<br />
--------- ------------ ---------- ---------- ---------- ---------- ------------------- ------------------- ---------- ---------- ---------- -------- ---------- ---------------<br />
kxxxxxxx 34308 Connectom+ interacti+ PENDING 08:00:00 Unknown Unknown 00:00:00 1 4 None assigned<br />
kxxxxxxx 34310 Connectom+ interacti+ PENDING 08:00:00 Unknown Unknown 00:00:00 1 4 None assigned<br />
dxxxxxxx 35927 PK_htseq1 medium COMPLETED 2-00:00:00 2016-08-30T09:21:33 2016-08-30T10:06:25 00:44:52 1 4 c0005<br />
35927.batch batch COMPLETED 2016-08-30T09:21:33 2016-08-30T10:06:25 00:44:52 307704K 718152K 1 4 c0005<br />
bxxxxxxx 35928 SI medium TIMEOUT 12:00:00 2016-08-30T09:36:04 2016-08-30T21:36:42 12:00:38 1 1 c0006<br />
35928.batch batch FAILED 2016-08-30T09:36:04 2016-08-30T21:36:43 12:00:39 31400K 286532K 1 1 c0006<br />
35928.0 hostname COMPLETED 2016-08-30T09:36:16 2016-08-30T09:36:17 00:00:01 1112K 207252K 1 1 c0006<br />
<br />
</pre><br />
<br />
Additional information about the sacct command can be found by running '''man sacct''' or [http://slurm.schedmd.com/sacct.html found here]<br />
<br />
The rc-sacct wrapper script supports the following arguments:<br />
<pre><br />
$ rc-sacct --help<br />
<br />
Copyright (c) 2016 Mike Hanby, University of Alabama at Birmingham IT Research Computing.<br />
<br />
rc-sacct - version 1.0.0<br />
<br />
Run sacct to display history in a nicely formatted output.<br />
<br />
-r, --starttime HH:MM[:SS] [AM|PM]<br />
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]<br />
MM/DD[/YY]-HH:MM[:SS]<br />
YYYY-MM-DD[THH:MM[:SS]]<br />
-a, --allusers Dispay hsitory for all users)<br />
-u, --user user_list Display hsitory for all users in the comma seperated user list<br />
-f, --format a,b,c Comma separated list of columns: i.e. --format jobid,elapsed,ncpus,ntasks,state<br />
--debug Display additional output like internal structures<br />
-?, -h, --help Display this help message<br />
<br />
</pre><br />
<br />
<br />
== Slurm Variables ==<br />
The following is a list of useful Slurm environment variables (click here for the [http://slurm.schedmd.com/srun.html full list]):<br />
{{Slurm_Variables}}<br />
<br />
== SGE - Slurm ==<br />
<br />
This section shows Slurm and SGE equivalent commands<br />
<br />
<pre><br />
SGE Slurm <br />
--------- ------------<br />
qsub sbatch <br />
qlogin sinteractive<br />
qdel scancel<br />
qstat squeue<br />
<br />
</pre><br />
<br />
To get more info about individual commands, run : '''man SLURM_COMMAND''' . For an extensive list of Slurm-SGE equivalent commands, go [https://docs.uabgrid.uab.edu/wiki/SGE-SLURM here] or Slurm's official [http://slurm.schedmd.com/rosetta.pdf documentation]</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Setting_Up_VNC_Session&diff=5487Setting Up VNC Session2016-12-21T16:09:05Z<p>Mhanby@uab.edu: changed userId in examples to blazer</p>
<hr />
<div>[[wikipedia:Virtual_Network_Computing|Virtual Network Computing (VNC)]] is a cross-platform desktop sharing system to interact with a remote system's desktop using a graphical interface. This page covers basic instructions to access a desktop on [[Cheaha]] using VNC. These basic instructions support a variety of use-cases where access to graphical applications on the cluster is helpful or required. If you are interested in knowing more options or detailed technical information, then please take a look at man pages of specified commands.<br />
<br />
== One Time Setup ==<br />
VNC use on Cheaha requires a one-time-setup to configure settings to starting the virtual desktop. These instructions will configure the VNC server to use the Gnome desktop environment, the default desktop environment on the cluster. (Alternatively, you can run the vncserver command without this configure and and start a very basic (but harder to use) desktop environment.) To get started [[Cheaha_GettingStarted#Login | log in to cheaha via ssh.]]<br />
<br />
=== Set VNC Session Password ===<br />
You must maintain a password for your VNC server sessions using the vncpasswd command. The password is validated each time a connection comes in, so it can be changed on the fly using vncpasswd command anytime later. '''Remember this password as you will be prompted for it when you access your cluster desktop'''. By default, the command stores an obfuscated version of the password in the file $HOME/.vnc/passwd.<br />
<br />
<pre><br />
$ vncpasswd <br />
</pre><br />
<br />
=== Configure the Cluster Desktop ===<br />
The vncserver command relies on a configuration script to start your virtual desktop environment. The [[wikipedia:GNOME|GNOME]] desktop provides a familiar desktop experience and can be selected by creating the following vncserver startup script (~/.vnc/xstartup).<br />
<br />
<pre><br />
mkdir $HOME/.vnc<br />
<br />
cat > $HOME/.vnc/xstartup <<\EOF<br />
#!/bin/sh<br />
<br />
# Start up the standard system desktop<br />
unset SESSION_MANAGER<br />
exec /etc/X11/xinit/xinitrc<br />
<br />
EOF<br />
<br />
chmod +x $HOME/.vnc/xstartup<br />
</pre><br />
<br />
By default a VNC server displays graphical environment using a tab-window-manager. If the above xstartup file is absent, then a file with the default tab-window-manager settings will be created by the vncserver command during startup. If you want to switch to the GNOME desktop, simply replace this default file with the settings above. <br />
<br />
This completes the one-time setup on the cluster for creating a VNC server password and selecting the preferred desktop environment.<br />
<br />
=== Select a VNC Client ===<br />
You will also need a VNC client on your personal desktop in order to remotely access your cluster desktop. <br />
<br />
Mac OS comes with a native VNC client so you don't need to use any third-party software. Chicken of the VNC is a popular alternative on Mac OS to the native VNC client, especially for older Mac OS, pre-10.7.<br />
<br />
Most Linux systems have the VNC software installed so you can simply use the vncviewer command to access a VNC server. <br />
<br />
If you use MS Windows then you will need to install a VNC client. Here is a list of VNC client softwares and you can any one of it to access VNC server. <br />
* http://www.tightvnc.com/ (Mac, Linux and Windows)<br />
* http://www.realvnc.com/ (Mac, Linux and Windows)<br />
* http://sourceforge.net/projects/cotvnc/ (Mac)<br />
<br />
== Start your VNC Desktop == <br />
Your VNC desktop must be started before you can connect to it. To start the VNC desktop you need to log into cheaha using an [[Cheaha_GettingStarted#Login|standard SSH connection]]. The VNC server is started by executing the vncserver command after you log in to cheaha. It will run in the background and continue running even after you log out of the SSH session that was used to run the vncserver command.<br />
<br />
To start the VNC desktop run the vncserver command. You will see a short message like the following from the vncserver before it goes into the background. You will need this information to connect to your desktop.<br />
<pre><br />
$ vncserver <br />
New 'login001:24 (blazer)' desktop is login001:24<br />
<br />
Starting applications specified in /home/blazer/.vnc/xstartup<br />
Log file is /home/blazer/.vnc/login001:24.log<br />
</pre><br />
<br />
The above command output indicates that a VNC server is started on VNC X-display number 24, which translates to system port 5924. The vncserver automatically selects this port from a list of available ports.<br />
<br />
The actual system port on which VNC server is listening for connections is obtained by adding a VNC base port (default: port 5900) and a VNC X-display number (24 in above case). Alternatively you can specify a high numbered system port directly (e.g. 5927) using '-rfbport <port-number>' option and the vncserver will try to use it if it's available. See vncserver's man page for details.<br />
<br />
Please note that the vncserver will continue to run in the backgound on the head node until it is explicitly stopped. This allows you to reconnect to the same desktop session without having to first start the vncserver, leaving all your desktop applications active. When you no longer need your desktop, simply log out of your desktop using the desktop's log out menu option or by explicitly ending the vncserver command with the 'vncserver -kill ' command.<br />
<br />
=== Alternate Cluster Desktop Sizes ===<br />
The default size of your cluster desktop is 1024x768 pixels. If you want to start your desktop with an alternate geometry to match your application, personal desktop environment, or other preferences, simply add a "-geometry hieghtxwidth" argument to your vncserver command. For example, if you want a wide screen geometry popular with laptops, you might start the VNC server with:<br />
<pre><br />
vncserver -geometry 1280x800<br />
</pre><br />
<br />
== Establish a Network Connection to your VNC Server ==<br />
<br />
As indicated in the output from the vncserver command, the VNC desktop is listening for connections on a higher numbered port. This port isn't directly accessible from the internet. Hence, we need to use SSH local port forwarding to connect to this server.<br />
<br />
This SSH session provides the connection to your VNC desktop and must remain active while you use the desktop. You can disconnect and reconnect to your desktop by establishing this SSH session whenever you need to access your desktop. In other words, your desktop remains active across your connections to it. This supports a mobile work environment.<br />
<br />
=== Port-forwarding from Linux or Mac Systems ===<br />
Set up SSH port forwarding using the native SSH command. <br />
<pre><br />
# ssh -L <local-port>:<remote-system-host>:<remote-system-port> USERID@<SSH-server-host><br />
$ ssh -L 5924:localhost:5924 USERID@cheaha.rc.uab.edu<br />
</pre><br />
Above command will forward connections on local port 5924 to a remote system's (same as SSH server host Cheaha - hence localhost) port 5924.<br />
<br />
=== Port-forwarding from Windows Systems ===<br />
Windows users need to establish the connection using whatever SSH software they commonly use. The following is an example configuration using Putty client on Windows.<br />
<br />
[[File:Putty-SSH-Tunnel.png]]<br />
<br />
== Access your Cluster Desktop ==<br />
<br />
With the network connection to the VNC server established, you can access your cluster desktop using your preferred VNC client. When you access your cluster desktop you will be prompted for the VNC password you created during the one time setup above.<br />
<br />
The VNC client will actually connect to your local machine, eg. "localhost", because it relies on the SSH port forwarding to connect to the VNC server on the cluster. You do this because you have already created the real connection to Cheaha using the SSH tunnel. The SSH tunnel "listens" on your local host and forwards all of your VNC traffic across the network to your VNC server on the cluster.<br />
<br />
You can access the VNC server using the following connection scenarios based on your personal desktop environment.<br />
<br />
==== From Mac ====<br />
<br />
'''For Mac OSX 10.8 and higher'''<br />
Mac users can use the default VNC client and start it from Finder. Press '''cmd+k''' to bring up the "connect to server" window. Enter the following connection string in Finder: <br />
<pre>vnc://localhost:5924 </pre><br />
The connection string pattern is "vnc://<vnc-server>:<vnc-port>". Adjust your port setting for the specific value of your cluster desktop given when you run vncserver above.<br />
<br />
'''For Mac OSX 10.7 and lower'''<br />
Download and install Chicken of the VNC from [http://sourceforge.net/projects/cotvnc/| sourceforge].<br />
Start COTVNC and enter the following in the host window and provide the VNC password you created during set up when prompted:<br />
<pre>localhost:5924</pre><br />
<br />
<br />
==== From Linux ====<br />
Linux users can use the command<br />
<pre><br />
vncviewer :24 <br />
</pre><br />
<br />
===== Shortcut for Linux Users =====<br />
Linux users can optionally skip the explicit SSH tunnel setup described above by using the -via argument to the vncviewer command. The "-via <gateway>" will set up the SSH tunnel implicitly. For the above example, the following command would be used:<br />
<pre><br />
vncviewer -via cheaha.rc.uab.edu :24<br />
</pre><br />
This option is preferred since it will also establish VNC settings that are more efficient for slow networks. See the man page for vncviewer for details on other encodings.<br />
<br />
==== From Windows ====<br />
Windows users should use whatever connection string is applicable to their VNC client. <br />
<br />
Remember to use "localhost" as the host address in your VNC client. You do this because you have already created the real connection to Cheaha using the SSH tunnel. The SSH tunnel "listens" on your local host and forwards all of your VNC traffic across the network to your VNC server on the cluster.<br />
<br />
== Using your Desktop ==<br />
Once we have a VNC session established with Gnome desktop environment, we can use it to launch any graphical application on Cheaha or use it to open GUI (X11) supported SSH session with a remote system in the cluster. <br />
<br />
VNC can be particularly useful when we are trying to access and X Windows application from MS Windows, as native X11 setup on Windows is typically more involved than the VNC setup above. For example, it's much easier to start X11 based SSH session with the remote system on the cluster from above Gnome desktop environment than doing all X11 setup on Windows.<br />
<pre> <br />
$ ssh -X $USER@172.x.x.x<br />
</pre><br />
<br />
=== Performance Considerations for Slow Networks ===<br />
<br />
If the network you are using to connect to your VNC session is slow (eg. wifi or off campus), you may be able to improve the responsiveness of the VNC session by adjusting simple desktop settings in your VNC desktop. The VNC screen needs to be repainted every time your desktop is modified, eg. opening or moving a window. Any bit of data you don't have to send will improve the drawing speed. Most modern desktops default to a pretty picture. While nice to look at these pictures contain lots data. If you set your desktop background to a solid color (no gradients) the screen refresh will be much quicker (see System->Preferences->Desktop Background). Also, if you change to a basic windowing theme it will also speed screen refreshes (see System->Preferences->Themes->Mist).</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Cheaha_GettingStarted&diff=5449Cheaha GettingStarted2016-10-31T19:57:48Z<p>Mhanby@uab.edu: </p>
<hr />
<div>Cheaha is a cluster computing environment for UAB researchers.Information about the history and future plans for Cheaha is available on the [[cheaha]] page.<br />
<br />
== Access (Cluster Account Request) ==<br />
<br />
To request an account on [[Cheaha]], please [mailto:support@vo.uabgrid.uab.edu submit an authorization request to the IT Research Computing staff]. Please include some background information about the work you plan on doing on the cluster and the group you work with, ie. your lab or affiliation.<br />
<br />
Usage of Cheaha is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources. <br />
<br />
The official DNS name of Cheaha's frontend machine is ''cheaha.rc.uab.edu''. If you want to refer to the machine as ''cheaha'', you'll have to either add the "rc.uab.edu" to you computer's DNS search path. On Unix-derived systems (Linux, Mac) you can edit your computers /etc/resolv.conf as follows (you'll need administrator access to edit this file)<br />
<pre><br />
search rc.uab.edu<br />
</pre><br />
Or you can customize your SSH configuration to use the short name "cheaha" as a connection name. On systems using OpenSSH you can add the following to your ~/.ssh/config file<br />
<br />
<pre><br />
Host cheaha<br />
Hostname cheaha.rc.uab.edu<br />
</pre><br />
<br />
== Login ==<br />
===Overview===<br />
Once your account has been created, you'll receive an email containing your user ID, generally your Blazer ID. Logging into Cheaha requires an SSH client. Most UAB Windows workstations already have an SSH client installed, possibly named '''SSH Secure Shell Client''' or [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY]. Linux and Mac OS X systems should have an SSH client installed by default.<br />
<br />
Usage of Cheaha is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources.<br />
<br />
===Client Configuration===<br />
This section will cover steps to configure Windows, Linux and Mac OS X clients to connect to Cheaha.<br />
====Linux====<br />
Linux systems, regardless of the flavor (RedHat, SuSE, Ubuntu, etc...), should already have an SSH client on the system as part of the default install.<br />
# Start a terminal (on RedHat click Applications -> Accessories -> Terminal, on Ubuntu Ctrl+Alt+T)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Mac OS X====<br />
Mac OS X is a Unix operating system (BSD) and has a built in ssh client.<br />
# Start a terminal (click Finder, type Terminal and double click on Terminal under the Applications category)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Windows====<br />
There are many SSH clients available for Windows, some commercial and some that are free (GPL). This section will cover two clients that are commonly found on UAB Windows systems.<br />
=====MobaXterm=====<br />
[http://mobaxterm.mobatek.net/ MobaXterm] is a free (also available for a price in a Profession version) suite of SSH tools. Of the Windows clients we've used, MobaXterm is the easiest to use and feature complete. [http://mobaxterm.mobatek.net/features.html Features] include (but not limited to):<br />
* SSH client (in a handy web browser like tabbed interface)<br />
* Embedded Cygwin (which allows Windows users to run many Linux commands like grep, rsync, sed)<br />
* Remote file system browser (graphical SFTP)<br />
* X11 forwarding for remotely displaying graphical content from Cheaha<br />
* Installs without requiring Windows Administrator rights<br />
<br />
Start MobaXterm and click the Session toolbar button (top left). Click SSH for the session type, enter the following information and click OK. Once finished, double click cheaha.rc.uab.edu in the list of Saved sessions under PuTTY sessions:<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Remote host'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|}<br />
<br />
=====PuTTY=====<br />
[http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY] is a free suite of SSH and telnet tools written and maintained by [http://www.pobox.com/~anakin/ Simon Tatham]. PuTTY supports SSH, secure FTP (SFTP), and X forwarding (XTERM) among other tools.<br />
<br />
* Start PuTTY (Click START -> All Programs -> PuTTY -> PuTTY). The 'PuTTY Configuration' window will open<br />
* Use these settings for each of the clusters that you would like to configure<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host Name (or IP address)'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Saved Sessions'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|}<br />
* Click '''Save''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start PuTTY, simply double click on the cluster name under the 'Saved Sessions' list<br />
<br />
=====SSH Secure Shell Client=====<br />
SSH Secure Shell is a commercial application that is installed on many Windows workstations on campus and can be configured as follows:<br />
* Start the program (Click START -> All Programs -> SSH Secure Shell -> Secure Shell Client). The 'default - SSH Secure Shell' window will open<br />
* Click File -> Profiles -> Add Profile to open the 'Add Profile' window<br />
* Type in the name of the cluster (for example: cheaha) in the field and click 'Add to Profiles'<br />
* Click File -> Profiles -> Edit Profiles to open the 'Profiles' window<br />
* Single click on your new profile name<br />
* Use these settings for the clusters<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host name'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''User name'''<br />
|blazerid (insert your blazerid here)<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Encryption algorithm'''<br />
|<Default><br />
|-<br />
|'''MAC algorithm'''<br />
|<Default><br />
|-<br />
|'''Compression'''<br />
|<None><br />
|-<br />
|'''Terminal answerback'''<br />
|vt100<br />
|-<br />
|}<br />
* Leave 'Connect through firewall' and 'Request tunnels only' unchecked<br />
* Click '''OK''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start SSH Secure Shell, click 'Profiles' and click the cluster name<br />
<br />
=== Logging in to Cheaha ===<br />
No matter which client you use to connect to the Cheaha, the first time you connect, the SSH client should display a message asking if you would like to import the hosts public key. Answer '''Yes''' to this question.<br />
<br />
* Connect to Cheaha using one of the methods listed above<br />
* Answer '''Yes''' to import the cluster's public key<br />
** Enter your BlazerID password<br />
<br />
* After successfully logging in for the first time, You may see the following message '''just press ENTER for the next three prompts, don't type any passphrases!'''<br />
<br />
It doesn't appear that you have set up your ssh key.<br />
This process will make the files:<br />
/home/joeuser/.ssh/id_rsa.pub<br />
/home/joeuser/.ssh/id_rsa<br />
/home/joeuser/.ssh/authorized_keys<br />
<br />
Generating public/private rsa key pair.<br />
Enter file in which to save the key (/home/joeuser/.ssh/id_rsa):<br />
** Enter file in which to save the key (/home/joeuser/.ssh/id_rsa):'''Press Enter'''<br />
** Enter passphrase (empty for no passphrase):'''Press Enter'''<br />
** Enter same passphrase again:'''Press Enter'''<br />
Your identification has been saved in /home/joeuser/.ssh/id_rsa.<br />
Your public key has been saved in /home/joeuser/.ssh/id_rsa.pub.<br />
The key fingerprint is:<br />
f6:xx:xx:xx:xx:dd:9a:79:7b:83:xx:f9:d7:a7:d6:27 joeuser@cheaha.rc.uab.edu<br />
<br />
==== Users without a blazerid (collaborators from other universities) ====<br />
** If you were issued a temporary password, enter it (Passwords are CaSE SensitivE!!!) You should see a message similar to this<br />
You are required to change your password immediately (password aged)<br />
<br />
WARNING: Your password has expired.<br />
You must change your password now and login again!<br />
Changing password for user joeuser.<br />
Changing password for joeuser<br />
(current) UNIX password:<br />
*** (current) UNIX password: '''Enter your temporary password at this prompt and press enter'''<br />
*** New UNIX password: '''Enter your new strong password and press enter'''<br />
*** Retype new UNIX password: '''Enter your new strong password again and press enter'''<br />
*** After you enter your new password for the second time and press enter, the shell may exit automatically. If it doesn't, type exit and press enter<br />
*** Log in again, this time use your new password<br />
<br />
Congratulations, you should now have a command prompt and be ready to start [[Cheaha_GettingStarted#Example_Batch_Job_Script | submitting jobs]]!!!<br />
<br />
== Hardware ==<br />
[[Image:Chehah2_2016.png|center|thumb|450px|Logical Diagram of Cheaha Configuration]]<br />
<br />
=== Hardware ===<br />
<br />
The Cheaha Compute Platform includes three generations of commodity compute hardware, totaling 2340 compute cores, 20 TB of RAM, and over 4.7PB of storage.<br />
<br />
The hardware is grouped into generations designated gen3, gen4, gen5 and gen6(oldest to newest). The following descriptions highlight the hardware profile for each generation. <br />
<br />
* Generation 3 (gen3) -- 48 2x6 core (576 cores total) 2.66 GHz Intel compute nodes with quad data rate Infiniband, ScaleMP, and the high-perf storage build-out for capacity and redundancy with 120TB DDN. This is the hardware collection purchased with a combination of the NIH SIG funds and some of the 2010 annual VPIT investment. These nodes were given the code name "sipsey" and tagged as such in the node naming for the queue system. These nodes are tagged as "sipsey-compute-#-#" in the ROCKS naming convention. 16 of the gen3 nodes (sipsey-compute-0-1 thru sipsey-compute-0-16) were upgraded in 2014 from 48GB to 96GB of memory per node.<br />
<br />
* Generation 4 (gen4) -- 3 16 core (48 cores total) compute nodes. This hardware collection purchase by [http://www.soph.uab.edu/ssg/people/tiwari Hemant Tiwari of SSG]. These nodes were given the code name "ssg" and tagged as such in the node naming for the queue system. These nodes are tagged as "ssg-compute-0-#" in the ROCKS naming convention.<br />
<br />
* DDN GPFS storage cluster<br />
** 2 x 12KX40D-56IB controllers<br />
** 10 x SS8460 disk enclosures<br />
** 825 x 4K SAS drives<br />
<br />
* Generation 6 (gen6) -- <br />
** 36 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 38 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 256GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 14 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 384GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 4 Compute Nodes with Nvidia Tesla K80 and two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 4 Compute Nodes with Intel Phi coprocessor SE10/7120 and two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** FDR InfiniBand Switch<br />
** 10Gigabit Ethernet Switch<br />
** Management node and gigabit switch for cluster management<br />
** Bright Advanced Cluster Management software licenses <br />
<br />
Summarized, Cheaha's compute pool includes:<br />
* gen4 is 48 cores of [http://ark.intel.com/products/64583/Intel-Xeon-Processor-E5-2680-20M-Cache-2_70-GHz-8_00-GTs-Intel-QPI 2.70GHz eight-core Intel Xeon E5-2680 processors] with 24G of RAM per core or 384GB total<br />
* gen3.1 is 192 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 8Gb RAM per core or 96GB total<br />
* gen3 is 384 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 4Gb RAM per core or 48GB total <br />
<br />
<br />
{|border="1" cellpadding="2" cellspacing="0"<br />
|+ Physical Nodes<br />
|- bgcolor=grey<br />
!gen!!queue!!#nodes!!cores/node!!RAM/node<br />
|-<br />
|gen6|| ?? || 36 || 24 || 128G<br />
|-<br />
|gen6|| ?? || 38 || 24 || 256G<br />
|-<br />
|gen6|| ?? || 14 || 24 || 384G<br />
|-<br />
|gen5||openstack(?)|| ? || ? || ?G<br />
|-<br />
|gen4||ssg||3||16||385G<br />
|-<br />
|gen3.1||sipsey||16||12||96G<br />
|-<br />
|gen3||sipsey||32||12||48G<br />
|-<br />
|gen2||cheaha||24||8||16G<br />
|}<br />
<br />
=== Performance ===<br />
{{CheahaTflops}}<br />
<br />
== Cluster Software ==<br />
* BrightCM 7.2<br />
* CentOS 7.2 x86_64<br />
* [[Slurm]] 15.08<br />
<br />
== Queing System ==<br />
All work on Cheaha must be submitted to our queing system ([[Slurm]]). A common mistake made by new users is to run 'jobs' on the login node. This section gives a basic overview of what a queuing system is and why we use it.<br />
=== What is a queuing system? ===<br />
* Software that gives users fair allocation of the cluster's resources<br />
* Schedules jobs based using resource requests (the following are commonly requested resources, there are many more that are available)<br />
** Number of processors (often referred to as "slots")<br />
** Maximum memory (RAM) required per slot<br />
** Maximum run time<br />
* Common queuing systems:<br />
** '''[[Slurm]]'''<br />
** Sun Grid Engine (Also know as SGE, OGE, GE)<br />
** OpenPBS<br />
** Torque<br />
** LSF (load sharing facility)<br />
<br />
[http://slurm.schedmd.com/ Slurm] is a queue management system and stands for Simple Linux Utility for Resource Management. Slurm was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. '''[[Slurm]]''' is now the primary job manager on Cheaha, it replaces SUN Grid Engine ([[https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted_deprecated SGE]]) the job manager used earlier.<br />
<br />
=== Typical Workflow ===<br />
* Stage data to $USER_SCRATCH (your scratch directory)<br />
* Research how to run your code in "batch" mode. Batch mode typically means the ability to run it from the command line without requiring any interaction from the user.<br />
* Identify the appropriate resources needed to run the job. The following are mandatory resource requests for all jobs on Cheaha<br />
** Maximum memory (RAM) required per slot<br />
** Maximum runtime<br />
* Write a job script specifying queuing system parameters, resource requests and commands to run program<br />
* Submit script to queuing system (sbatch script.job)<br />
* Monitor job (squeue)<br />
* Review the results and resubmit as necessary<br />
* Clean up the scratch directory by moving or deleting the data off of the cluster<br />
<br />
=== Resource Requests ===<br />
Accurate resource requests are extremely important to the health of the over all cluster. In order for Cheaha to operate properly, the queing system must know how much runtime and RAM each job will need.<br />
<br />
==== Mandatory Resource Requests ====<br />
<br />
* -t, --time=<time><br />
Set a limit on the total run time of the job allocation. If the requested time limit exceeds the partition's time limit, the job will be left in a PENDING state (possibly indefinitely).<br />
** For Array jobs, this represents the maximum run time for each task<br />
** For serial or parallel jobs, this represents the maximum run time for the entire job<br />
<br />
<br />
* --mem-per-cpu=<MB><br />
Mimimum memory required per allocated CPU in MegaBytes.<br />
<br />
==== Other Common Resource Requests ====<br />
* -N, --nodes=<minnodes[-maxnodes]><br />
Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count.<br />
<br />
* -n, --ntasks=<number><br />
sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node<br />
<br />
* --mem=<MB><br />
Specify the real memory required per node in MegaBytes.<br />
<br />
* -c, --cpus-per-task=<ncpus><br />
Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task.<br />
<br />
* -p, --partition=<partition_names><br />
Request a specific partition for the resource allocation. Available partitions are: express(max 2 hrs), short(max 12 hrs), medium(max 50 hrs), long(max 150 hrs), sinteractive(0-2 hrs)<br />
<br />
=== Submitting Jobs ===<br />
Batch Jobs are submitted on Cheaha by using the "sbatch" command. The full manual for sbtach is available by running the following command<br />
man sbatch<br />
<br />
==== Job Script File Format ====<br />
To submit a job to the queuing systems, you will first define your job in a script (a text file) and then submit that script to the queuing system.<br />
<br />
The script file needs to be '''formatted as a UNIX file''', not a Windows or Mac text file. In geek speak, this means that the end of line (EOL) character should be a line feed (LF) rather than a carriage return line feed (CRLF) for Windows or carriage return (CR) for Mac.<br />
<br />
If you submit a job script formatted as a Windows or Mac text file, your job will likely fail with misleading messages, for example that the path specified does not exist.<br />
<br />
Windows '''Notepad''' does not have the ability to save files using the UNIX file format. Do NOT use Notepad to create files intended for use on the clusters. Instead use one of the alternative text editors listed in the following section.<br />
<br />
===== Converting Files to UNIX Format =====<br />
====== Dos2Unix Method ======<br />
The lines below that begin with $ are commands, the $ represents the command prompt and should not be typed!<br />
<br />
The dos2unix program can be used to convert Windows text files to UNIX files with a simple command. After you have copied the file to your home directory on the cluster, you can identify that the file is a Windows file by executing the following (Windows uses CR LF as the line terminator, where UNIX uses only LF and Mac uses only CR):<br />
<pre><br />
$ file testfile.txt<br />
<br />
testfile.txt: ASCII text, with CRLF line terminators<br />
</pre><br />
<br />
Now, convert the file to UNIX<br />
<pre><br />
$ dos2unix testfile.txt<br />
<br />
dos2unix: converting file testfile.txt to UNIX format ...<br />
</pre><br />
<br />
Verify the conversion using the file command<br />
<pre><br />
$ file testfile.txt<br />
<br />
testfile.txt: ASCII text<br />
</pre><br />
<br />
====== Alternative Windows Text Editors ======<br />
There are many good text editors available for Windows that have the capability to save files using the UNIX file format. Here are a few:<br />
* [[http://www.geany.org/ Geany]] is an excellent free text editor for Windows and Linux that supports Windows, UNIX and Mac file formats, syntax highlighting and many programming features. To convert from Windows to UNIX click '''Document''' click '''Set Line Endings''' and then '''Convert and Set to LF (Unix)'''<br />
* [[http://notepad-plus.sourceforge.net/uk/site.htm Notepad++]] is a great free Windows text editor that supports Windows, UNIX and Mac file formats, syntax highlighting and many programming features. To convert from Windows to UNIX click '''Format''' and then click '''Convert to UNIX Format'''<br />
* [[http://www.textpad.com/ TextPad]] is another excellent Windows text editor. TextPad is not free, however.<br />
<br />
==== Example Batch Job Script ====<br />
A shared cluster environment like Cheaha uses a job scheduler to run tasks on the cluster to provide optimal resource sharing among users. Cheaha uses a job scheduling system call Slurm to schedule and manage jobs. A user needs to tell Slurm about resource requirements (e.g. CPU, memory) so that it can schedule jobs effectively. These resource requirements along with actual application code can be specified in a single file commonly referred as 'Job Script/File'. Following is a simple job script that prints job number and hostname.<br />
<br />
'''Note:'''Jobs '''must request''' the appropriate partition (ex: ''--partition=short'') to satisfy the jobs resource request (maximum runtime, number of compute nodes, etc...)<br />
<pre><br />
#!/bin/bash<br />
#<br />
#SBATCH --job-name=test<br />
#SBATCH --output=res.txt<br />
#SBATCH --ntasks=1<br />
#SBATCH --partition=express<br />
#SBATCH --time=10:00<br />
#SBATCH --mem-per-cpu=100<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
srun hostname<br />
srun sleep 60<br />
<br />
<br />
</pre><br />
<br />
Lines starting with '#SBATCH' have a special meaning in the Slurm world. Slurm specific configuration options are specified after the '#SBATCH' characters. Above configuration options are useful for most job scripts and for additional configuration options refer to Slurm commands manual. A job script is submitted to the cluster using Slurm specific commands. There are many commands available, but following three commands are the most common:<br />
* sbatch - to submit job<br />
* scancel - to delete job<br />
* squeue - to view job status<br />
<br />
We can submit above job script using sbatch command:<br />
<pre><br />
$ sbatch HelloCheaha.sh<br />
Submitted batch job 52707<br />
</pre><br />
<br />
When the job script is submitted, Slurm queues it up and assigns it a job number (e.g. 52707 in above example). The job number is available inside job script using environment variable $JOB_ID. This variable can be used inside job script to create job related directory structure or file names.<br />
<br />
=== Interactive Resources ===<br />
Login Node (the host that you connected to when you setup the SSH connection to Cheaha) is supposed to be used for submitting jobs and/or lighter prep work required for the job scripts. '''Do not run heavy computations on the login node'''. If you have a heavier workload to prepare for a batch job (eg. compiling code or other manipulations of data) or your compute application requires interactive control, you should request a dedicated interactive node for this work.<br />
<br />
Interactive resources are requested by submitting an "interactive" job to the scheduler. Interactive jobs will provide you a command line on a compute resource that you can use just like you would the command line on the login node. The difference is that the scheduler has dedicated the requested resources to your job and you can run your interactive commands without having to worry about impacting other users on the login node.<br />
<br />
Interactive jobs, that can be run on command line, are requested with the '''srun''' command. <br />
<br />
<pre><br />
srun --ntasks=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash<br />
</pre><br />
<br />
This command requests for 4 cores (--ntasks) with each task requesting size 4GB of RAM for 8 hrs (--time).<br />
<br />
More advanced interactive scenarios to support graphical applications are available using [https://docs.uabgrid.uab.edu/wiki/Setting_Up_VNC_Session VNC] or X11 tunneling [http://www.uab.edu/it/software X-Win32 2014 for Windows]<br />
<br />
Interactive jobs that requires running a graphical application, are requested with the '''sinteractive''' command, via '''Terminal''' on your VNC window.<br />
<br />
== Storage ==<br />
<br />
=== No Automatic Backups ===<br />
<br />
There is no automatic back up of any data on the cluster (home, scratch, or whatever). All data back up is managed by you. If you aren't managing a data back up process, then you have no backup data.<br />
<br />
=== Home directories ===<br />
<br />
Your home directory on Cheaha is NFS-mounted to the compute nodes as /home/$USER or $HOME. It is acceptable to use your home directory a location to store job scripts, custom code, libraries, job scripts.<br />
<br />
'''The home directory must not be used to store large amounts of data.''' Please use $USER_SCRATCH <br />
for actively used data sets or request shared scratch space for shared data sets.<br />
<br />
=== Scratch ===<br />
Research Computing policy requires that all bulky input and output must be located on the scratch space. The home directory is intended to store your job scripts, log files, libraries and other supporting files.<br />
<br />
'''Important Information:'''<br />
* Scratch space (network and local) '''is not backed up'''.<br />
* Research Computing expects each user to keep their scratch areas clean. The cluster scratch area are not to be used for archiving data.<br />
<br />
Cheaha has two types of scratch space, network mounted and local.<br />
* Network scratch ($USER_SCRATCH) is available on the login node and each compute node. This storage is a Lustre high performance file system providing roughly 240TB of storage. This should be your jobs primary working directory, unless the job would benefit from local scratch (see below).<br />
* Local scratch is physically located on each compute node and is not accessible to the other nodes (including the login node). This space is useful if the job performs a lot of file I/O. Most of the jobs that run on our clusters do not fall into this category. Because the local scratch is inaccessible outside the job, it is important to note that you must move any data between local scratch to your network accessible scratch within your job. For example, step 1 in the job could be to copy the input from $USER_SCRATCH to ${USER_SCRATCH}, step 2 code execution, step 3 move the data back to $USER_SCRATCH.<br />
<br />
==== Network Scratch ====<br />
Network scratch is available using the environment variable $USER_SCRATCH or directly by /data/scratch/$USER<br />
<br />
It is advisable to use the environment variable whenever possible rather than the hard coded path.<br />
<br />
==== Local Scratch ====<br />
Each compute node has a local scratch directory that is accessible via the variable '''$LOCAL_SCRATCH'''. If your job performs a lot of file I/O, the job should use $LOCAL_SCRATCH rather than $USER_SCRATCH to prevent bogging down the network scratch file system. The amount of scratch space available is approximately 800GB.<br />
<br />
The $LOCAL_SCRATCH is a special temporary directory and it's important to note that this directory is deleted when the job completes, so the job script has to move the results to $USER_SCRATCH or other location prior to the job exiting.<br />
<br />
Note that $LOCAL_SCRATCH is only useful for jobs in which all processes run on the same compute node, so MPI jobs are not candidates for this solution.<br />
<br />
The following is an array job example that uses $LOCAL_SCRATCH by transferring the inputs into $LOCAL_SCRATCH at the beginning of the script and the result out of $LOCAL_SCRATCH at the end of the script.<br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --array=1-10<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=R_array_job<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=R_array_job.err<br />
#SBATCH --output=R_array_job.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes and the appropriate partition<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load R/3.2.0-goolf-1.7.20<br />
<br />
echo "TMPDIR: $LOCAL_SCRATCH"<br />
<br />
cd $LOCAL_SCRATCH<br />
# Create a working directory under the special scheduler local scratch directory<br />
# using the array job's taskID<br />
mdkir $SLURM_ARRAY_TASK_ID<br />
cd $SLURM_ARRAY_TASK_ID<br />
<br />
# Next copy the input data to the local scratch<br />
echo "Copying input data from network scratch to $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID - $(date)<br />
# The input data in this case has a numerical file extension that<br />
# matches $SLURM_ARRAY_TASK_ID<br />
cp -a $USER_SCRATCH/GeneData/INP*.$SLURM_ARRAY_TASK_ID ./<br />
echo "copied input data from network scratch to $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID - $(date)<br />
<br />
someapp -S 1 -D 10 -i INP*.$SLURM_ARRAY_TASK_ID -o geneapp.out.$SLURM_ARRAY_TASK_ID<br />
<br />
# Lastly copy the results back to network scratch<br />
echo "Copying results from local $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID to network - $(date)<br />
cp -a geneapp.out.$SLURM_ARRAY_TASK_ID $USER_SCRATCH/GeneData/<br />
echo "Copied results from local $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID to network - $(date)<br />
<br />
</pre><br />
<br />
=== Project Storage ===<br />
Cheaha has a location where shared data can be stored called $SHARE_SCRATCH . As with user scratch, this area '''is not backed up'''!<br />
<br />
This is helpful if a team of researchers must access the same data. Please open a help desk ticket to request a project directory under $SHARE_SCRATCH.<br />
<br />
=== Uploading Data ===<br />
<br />
Data can be moved onto the cluster (pushed) from a remote client (ie. you desktop) via SCP or SFTP. Data can also be downloaded to the cluster (pulled) by issuing transfer commands once you are logged into the cluster. Common transfer methods are `wget <URL>`, FTP, or SCP, and depend on how the data is made available from the data provider.<br />
<br />
Large data sets should be staged directly to your $USER_SCRATCH directory so as not to fill up $HOME. If you are working on a data set shared with multiple users, it's preferable to request space in $SHARE_SCRATCH rather than duplicating the data for each user.<br />
<br />
== Environment Modules ==<br />
[http://modules.sourceforge.net/ Environment Modules] is installed on Cheaha and should be used when constructing your job scripts if an applicable module file exists. Using the module command you can easily configure your environment for specific software packages without having to know the specific environment variables and values to set. Modules allows you to dynamically configure your environment without having to logout / login for the changes to take affect.<br />
<br />
If you find that specific software does not have a module, please submit a [http://etlab.eng.uab.edu/ helpdesk ticket] to request the module.<br />
<br />
* Cheaha supports bash completion for the module command. For example, type 'module' and press the TAB key twice to see a list of options:<br />
<pre><br />
module TAB TAB<br />
<br />
add display initlist keyword refresh switch use <br />
apropos help initprepend list rm unload whatis <br />
avail initadd initrm load show unuse <br />
clear initclear initswitch purge swap update<br />
</pre><br />
<br />
* To see the list of available modulefiles on the cluster, run the '''module avail''' command (note the example list below may not be complete!) or '''module load ''' followed by two tab key presses:<br />
<pre><br />
module avail<br />
<br />
----------------------------------------------------------------------------------------- /cm/shared/modulefiles -----------------------------------------------------------------------------------------<br />
acml/gcc/64/5.3.1 acml/open64-int64/mp/fma4/5.3.1 fftw2/openmpi/gcc/64/float/2.1.5 intel-cluster-runtime/ia32/3.8 netcdf/gcc/64/4.3.3.1<br />
acml/gcc/fma4/5.3.1 blacs/openmpi/gcc/64/1.1patch03 fftw2/openmpi/open64/64/double/2.1.5 intel-cluster-runtime/intel64/3.8 netcdf/open64/64/4.3.3.1<br />
acml/gcc/mp/64/5.3.1 blacs/openmpi/open64/64/1.1patch03 fftw2/openmpi/open64/64/float/2.1.5 intel-cluster-runtime/mic/3.8 netperf/2.7.0<br />
acml/gcc/mp/fma4/5.3.1 blas/gcc/64/3.6.0 fftw3/openmpi/gcc/64/3.3.4 intel-tbb-oss/ia32/44_20160526oss open64/4.5.2.1<br />
acml/gcc-int64/64/5.3.1 blas/open64/64/3.6.0 fftw3/openmpi/open64/64/3.3.4 intel-tbb-oss/intel64/44_20160526oss openblas/dynamic/0.2.15<br />
acml/gcc-int64/fma4/5.3.1 bonnie++/1.97.1 gdb/7.9 iozone/3_434 openmpi/gcc/64/1.10.1<br />
acml/gcc-int64/mp/64/5.3.1 cmgui/7.2 globalarrays/openmpi/gcc/64/5.4 lapack/gcc/64/3.6.0 openmpi/open64/64/1.10.1<br />
acml/gcc-int64/mp/fma4/5.3.1 cuda75/blas/7.5.18 globalarrays/openmpi/open64/64/5.4 lapack/open64/64/3.6.0 pbspro/13.0.2.153173<br />
acml/open64/64/5.3.1 cuda75/fft/7.5.18 hdf5/1.6.10 mpich/ge/gcc/64/3.2 puppet/3.8.4<br />
acml/open64/fma4/5.3.1 cuda75/gdk/352.79 hdf5_18/1.8.16 mpich/ge/open64/64/3.2 rc-base<br />
acml/open64/mp/64/5.3.1 cuda75/nsight/7.5.18 hpl/2.1 mpiexec/0.84_432 scalapack/mvapich2/gcc/64/2.0.2<br />
acml/open64/mp/fma4/5.3.1 cuda75/profiler/7.5.18 hwloc/1.10.1 mvapich/gcc/64/1.2rc1 scalapack/openmpi/gcc/64/2.0.2<br />
acml/open64-int64/64/5.3.1 cuda75/toolkit/7.5.18 intel/compiler/32/15.0/2015.5.223 mvapich/open64/64/1.2rc1 sge/2011.11p1<br />
acml/open64-int64/fma4/5.3.1 default-environment intel/compiler/64/15.0/2015.5.223 mvapich2/gcc/64/2.2b slurm/15.08.6<br />
acml/open64-int64/mp/64/5.3.1 fftw2/openmpi/gcc/64/double/2.1.5 intel-cluster-checker/2.2.2 mvapich2/open64/64/2.2b torque/6.0.0.1<br />
<br />
---------------------------------------------------------------------------------------- /share/apps/modulefiles -----------------------------------------------------------------------------------------<br />
rc/BrainSuite/15b rc/freesurfer/freesurfer-5.3.0 rc/intel/compiler/64/ps_2016/2016.0.047 rc/matlab/R2015a rc/SAS/v9.4<br />
rc/cmg/2012.116.G rc/gromacs-intel/5.1.1 rc/Mathematica/10.3 rc/matlab/R2015b<br />
rc/dsistudio/dsistudio-20151020 rc/gtool/0.7.5 rc/matlab/R2012a rc/MRIConvert/2.0.8<br />
<br />
--------------------------------------------------------------------------------------- /share/apps/rc/modules/all ---------------------------------------------------------------------------------------<br />
AFNI/linux_openmp_64-goolf-1.7.20-20160616 gperf/3.0.4-intel-2016a MVAPICH2/2.2b-GCC-4.9.3-2.25<br />
Amber/14-intel-2016a-AmberTools-15-patchlevel-13-13 grep/2.15-goolf-1.4.10 NASM/2.11.06-goolf-1.7.20<br />
annovar/2016Feb01-foss-2015b-Perl-5.22.1 GROMACS/5.0.5-intel-2015b-hybrid NASM/2.11.08-foss-2015b<br />
ant/1.9.6-Java-1.7.0_80 GSL/1.16-goolf-1.7.20 NASM/2.11.08-intel-2016a<br />
APBS/1.4-linux-static-x86_64 GSL/1.16-intel-2015b NASM/2.12.02-foss-2016a<br />
ASHS/rev103_20140612 GSL/2.1-foss-2015b NASM/2.12.02-intel-2015b<br />
Aspera-Connect/3.6.1 gtool/0.7.5_linux_x86_64 NASM/2.12.02-intel-2016a<br />
ATLAS/3.10.1-gompi-1.5.12-LAPACK-3.4.2 guile/1.8.8-GNU-4.9.3-2.25 ncurses/5.9-foss-2015b<br />
Autoconf/2.69-foss-2016a HAPGEN2/2.2.0 ncurses/5.9-GCC-4.8.4<br />
Autoconf/2.69-GCC-4.8.4 HarfBuzz/1.2.7-intel-2016a ncurses/5.9-GNU-4.9.3-2.25<br />
Autoconf/2.69-GNU-4.9.3-2.25 HDF5/1.8.15-patch1-intel-2015b ncurses/5.9-goolf-1.4.10<br />
. <br />
.<br />
.<br />
.<br />
</pre><br />
<br />
Some software packages have multiple module files, for example:<br />
* GCC/4.7.2 <br />
* GCC/4.8.1 <br />
* GCC/4.8.2 <br />
* GCC/4.8.4 <br />
* GCC/4.9.2 <br />
* GCC/4.9.3 <br />
* GCC/4.9.3-2.25 <br />
<br />
In this case, the GCC module will always load the latest version, so loading this module is equivalent to loading GCC/4.9.3-2.25. If you always want to use the latest version, use this approach. If you want use a specific version, use the module file containing the appropriate version number.<br />
<br />
Some modules, when loaded, will actually load other modules. For example, the ''GROMACS/5.0.5-intel-2015b-hybrid '' module will also load ''intel/2015b'' and other related tools.<br />
<br />
* To load a module, ex: for a GROMACS job, use the following '''module load''' command in your job script:<br />
<pre><br />
module load GROMACS/5.0.5-intel-2015b-hybrid <br />
</pre><br />
<br />
* To see a list of the modules that you currently have loaded use the '''module list''' command<br />
<pre><br />
module list<br />
<br />
Currently Loaded Modulefiles:<br />
1) slurm/15.08.6 9) impi/5.0.3.048-iccifort-2015.3.187-GNU-4.9.3-2.25 17) Tcl/8.6.3-intel-2015b<br />
2) rc-base 10) iimpi/7.3.5-GNU-4.9.3-2.25 18) SQLite/3.8.8.1-intel-2015b<br />
3) GCC/4.9.3-binutils-2.25 11) imkl/11.2.3.187-iimpi-7.3.5-GNU-4.9.3-2.25 19) Tk/8.6.3-intel-2015b-no-X11<br />
4) binutils/2.25-GCC-4.9.3-binutils-2.25 12) intel/2015b 20) Python/2.7.9-intel-2015b<br />
5) GNU/4.9.3-2.25 13) bzip2/1.0.6-intel-2015b 21) Boost/1.58.0-intel-2015b-Python-2.7.9<br />
6) icc/2015.3.187-GNU-4.9.3-2.25 14) zlib/1.2.8-intel-2015b 22) GROMACS/5.0.5-intel-2015b-hybrid<br />
7) ifort/2015.3.187-GNU-4.9.3-2.25 15) ncurses/5.9-intel-2015b<br />
8) iccifort/2015.3.187-GNU-4.9.3-2.25 16) libreadline/6.3-intel-2015b<br />
</pre><br />
<br />
* A module can be removed from your environment by using the '''module unload''' command:<br />
<pre><br />
module unload GROMACS/5.0.5-intel-2015b-hybrid<br />
<br />
</pre><br />
<br />
* The definition of a module can also be viewed using the '''module show''' command, revealing what a specific module will do to your environment:<br />
<pre><br />
module show GROMACS/5.0.5-intel-2015b-hybrid <br />
-------------------------------------------------------------------<br />
/share/apps/rc/modules/all/GROMACS/5.0.5-intel-2015b-hybrid:<br />
<br />
module-whatis GROMACS is a versatile package to perform molecular dynamics,<br />
i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. - Homepage: http://www.gromacs.org <br />
conflict GROMACS <br />
prepend-path CPATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/include <br />
prepend-path LD_LIBRARY_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64 <br />
prepend-path LIBRARY_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64 <br />
prepend-path MANPATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/share/man <br />
prepend-path PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/bin <br />
prepend-path PKG_CONFIG_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64/pkgconfig <br />
setenv EBROOTGROMACS /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid <br />
setenv EBVERSIONGROMACS 5.0.5 <br />
setenv EBDEVELGROMACS /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/easybuild/GROMACS-5.0.5-intel-2015b-hybrid-easybuild-devel <br />
-------------------------------------------------------------------<br />
</pre><br />
<br />
=== Error Using Modules from a Job Script ===<br />
<br />
If you are using modules and the command your job executes runs fine from the command line but fails when you run it from the job, you may be having an issue with the script initialization. If you see this error in your job error output file<br />
<pre><br />
-bash: module: line 1: syntax error: unexpected end of file<br />
-bash: error importing function definition for `BASH_FUNC_module'<br />
</pre><br />
Add the command `unset module` before calling your module files. The -V job argument will cause a conflict with the module function used in your script.<br />
<br />
== Sample Job Scripts ==<br />
The following are sample job scripts, please be careful to edit these for your environment (i.e. replace <font color="red">YOUR_EMAIL_ADDRESS</font> with your real email address), set the h_rt to an appropriate runtime limit and modify the job name and any other parameters.<br />
<br />
'''Hello World''' is the classic example used throughout programming. We don't want to buck the system, so we'll use it as well to demonstrate jobs submission with one minor variation: our hello world will send us a greeting using the name of whatever machine it runs on. For example, when run on the Cheaha login node, it would print "Hello from login001".<br />
<br />
=== Hello World (serial) ===<br />
<br />
A serial job is one that can run independently of other commands, ie. it doesn't depend on the data from other jobs running simultaneously. You can run many serial jobs in any order. This is a common solution to processing lots of data when each command works on a single piece of data. For example, running the same conversion on 100's of images.<br />
<br />
Here we show how to create job script for one simple command. Running more than one command just requires submitting more jobs.<br />
<br />
* Create your hello world application. Run this command to create a script, turn it into to a command, and run the command (just copy and past the following on to the command line).<br />
<pre><br />
cat > helloworld.sh << EOF<br />
#!/bin/bash<br />
echo Hello from `hostname`<br />
EOF<br />
chmod +x helloworld.sh<br />
./helloworld.sh<br />
</pre><br />
<br />
* Create the Slurm job script that will request 256 MB RAM and a maximum runtime of 10 minutes<br />
<pre><br />
$ vi helloworld.job<br />
</pre><br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=helloworld<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=helloworld.err<br />
#SBATCH --output=helloworld.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
./helloworld.sh<br />
</pre><br />
* Submit the job to Slurm scheduler and check the status using squeue<br />
<pre><br />
$ sbatch helloworld.job<br />
Submitted batch job 52888<br />
</pre><br />
* When the job completes, you should have output files named helloworld.out and helloworld.err <br />
<pre><br />
$ cat helloworld.out <br />
Hello from c0003<br />
</pre><br />
<br />
=== Hello World (parallel with MPI) ===<br />
<br />
MPI is used to coordinate the activity of many computations occurring in parallel. It is commonly used in simulation software for molecular dynamics, fluid dynamics, and similar domains where there is significant communication (data) exchanged between cooperating process.<br />
<br />
Here is a simple parallel Slurm job script for running commands the rely on MPI. This example also includes the example of compiling the code and submitting the job script to the Slurm scheduler.<br />
<br />
* First, create a directory for the Hello World jobs<br />
<pre><br />
$ mkdir -p ~/jobs/helloworld<br />
$ cd ~/jobs/helloworld<br />
</pre><br />
* Create the Hello World code written in C (this example of MPI enabled Hello World includes a 3 minute sleep to ensure the job runs for several minutes, a normal hello world example would run in a matter of seconds).<br />
<pre><br />
$ vi helloworld-mpi.c<br />
</pre><br />
<pre><br />
#include <stdio.h><br />
#include <mpi.h><br />
<br />
main(int argc, char **argv)<br />
{<br />
int node;<br />
<br />
int i, j;<br />
float f;<br />
<br />
MPI_Init(&argc,&argv);<br />
MPI_Comm_rank(MPI_COMM_WORLD, &node);<br />
<br />
printf("Hello World from thread %d.\n", node);<br />
sleep(180);<br />
for (j=0; j<=100000; j++)<br />
for(i=0; i<=100000; i++)<br />
f=i*2.718281828*i+i+i*3.141592654;<br />
<br />
MPI_Finalize();<br />
}<br />
</pre><br />
* Compile the code, first purging any modules you may have loaded followed by loading the module for OpenMPI GNU. The mpicc command will compile the code and produce a binary named helloworld_gnu_openmpi<br />
<pre><br />
$ module purge<br />
$ module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
<br />
$ mpicc helloworld-mpi.c -o helloworld_gnu_openmpi<br />
</pre><br />
* Create the Slurm job script that will request 8 cpu slots and a maximum runtime of 10 minutes<br />
<pre><br />
$ vi helloworld.job<br />
</pre><br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=helloworld_mpi<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=helloworld_mpi.err<br />
#SBATCH --output=helloworld_mpi.out<br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=4<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:01:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
srun mpirun helloworld_gnu_openmpi<br />
</pre><br />
* Submit the job to Slurm scheduler and check the status using squeue -u $USER<br />
<pre><br />
$ sbatch helloworld.job<br />
<br />
Submitted batch job 52893<br />
<br />
$ squeue -u BLAZERID<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
52893 express hellowor BLAZERID R 2:07 2 c[0005-0006]<br />
<br />
</pre><br />
* When the job completes, you should have output files named helloworld_mpi.out and helloworld_mpi.err<br />
<pre><br />
$ cat helloworld_mpi.out<br />
<br />
Hello World from thread 1.<br />
Hello World from thread 3.<br />
Hello World from thread 4.<br />
Hello World from thread 7.<br />
Hello World from thread 5.<br />
Hello World from thread 6.<br />
Hello World from thread 0.<br />
Hello World from thread 2.<br />
</pre><br />
<br />
=== Hello World (serial) -- revisited ===<br />
<br />
The job submit scripts (sbatch scripts) are actually bash shell scripts in their own right. The reason for using the funky #SBATCH prefix in the scripts is so that bash interprets any such line as a comment and won't execute it. Because the # character starts a comment in bash, we can weave the Slurm scheduler directives (the #SBATCH lines) into standard bash scripts. This lets us build scripts that we can execute locally and then easily run the same script to on a cluster node by calling it with sbatch. This can be used to our advantage to create a more fluid experience in moving between development and production job runs. <br />
<br />
The following example is a simple variation on the serial job above. All we will do is convert our Slurm job script into a command called helloworld that calls the helloworld.sh command.<br />
<br />
If the first line of a file is #!/bin/bash and that file is executable, the shell will automatically run the command as if were any other system command, eg. ls. That is, the ".sh" extension on our HelloWorld.sh script is completely optional and is only meaningful to the user.<br />
<br />
Copy the serial helloworld.job script to a new file, add a the special #!/bin/bash as the first line, and make it executable with the following command (note: those are single quotes in the echo command): <br />
<pre><br />
echo '#!/bin/bash' | cat helloworld.job > helloworld ; chmod +x helloworld<br />
</pre><br />
<br />
Our sbatch script has now become a regular command. We can now execute the command with the simple prefix "./helloworld", which means "execute this file in the current directory":<br />
<pre><br />
./helloworld<br />
Hello from login001<br />
</pre><br />
Or if we want to run the command on a compute node, replace the "./" prefix with "sbatch ":<br />
<pre><br />
$ sbatch helloworld<br />
Submitted batch job 53001<br />
</pre><br />
And when the cluster run is complete you can look at the content of the output:<br />
<pre><br />
$ $ cat helloworld.out <br />
Hello from c0003<br />
</pre><br />
<br />
You can use this approach of treating you sbatch files as command wrappers to build a collection of commands that can be executed locally or via sbatch. The other examples can be restructured similarly.<br />
<br />
To avoid having to use the "./" prefix, just add the current directory to your PATH. Also, if you plan to do heavy development using this feature on the cluster, please be sure to run [https://docs.uabgrid.uab.edu/wiki/Slurm#Interactive_Session sinteractive] first so you don't load the login node with your development work.<br />
<br />
=== Gromacs ===<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=test_gromacs<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=test_gromacs.err<br />
#SBATCH --output=test_gromacs.out<br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=4<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:01:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
<br />
module load GROMACS/5.0.5-intel-2015b-hybrid <br />
<br />
# Change directory to the job working directory if not already there<br />
cd ${USER_SCRATCH}/jobs/gromacs<br />
<br />
# Single precision<br />
MDRUN=mdrun_mpi<br />
<br />
# Enter your tpr file over here<br />
export MYFILE=example.tpr<br />
<br />
srun mpirun $MDRUN -v -s $MYFILE -o $MYFILE -c $MYFILE -x $MYFILE -e $MYFILE -g ${MYFILE}.log<br />
<br />
</pre><br />
<br />
=== R ===<br />
<br />
The following is an example job script that will use an array of 10 tasks (--array=1-10), each task has a max runtime of 2 hours and will use no more than 256 MB of RAM per task.<br />
<br />
Create a working directory and the job submission script<br />
<pre><br />
$ mkdir -p ~/jobs/ArrayExample<br />
$ cd ~/jobs/ArrayExample<br />
$ vi R-example-array.job<br />
</pre><br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --array=1-10<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=R_array_job<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=R_array_job.err<br />
#SBATCH --output=R_array_job.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load R/3.2.0-goolf-1.7.20 <br />
cd ~/jobs/ArrayExample/rep$SLURM_ARRAY_TASK_ID<br />
srun R CMD BATCH rscript.R<br />
</pre><br />
<br />
Submit the job to the Slurm scheduler and check the status of the job using the squeue command<br />
<pre><br />
$ sbatch R-example-array.job<br />
$ squeue -u BLAZERID<br />
<br />
</pre><br />
<br />
== Installed Software ==<br />
<br />
A partial list of installed software with additional instructions for their use is available on the [[Cheaha Software]] page.</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=ANSYS&diff=5448ANSYS2016-10-31T19:56:10Z<p>Mhanby@uab.edu: </p>
<hr />
<div>'''Note: ''' ANSYS on Cheaha is only available for School of Engineering users. The license also cannot be used for commercial purposes!<br />
<br />
The license only supports serial and SMP threaded work and doesn't support parallel (MPI) jobs.<br />
<br />
'''Project website:''' http://www.ansys.com/<br />
<br />
__TOC__<br />
<br />
==Load ANSYS module==<br />
To load MACS into your environment, use the following module command:<br />
<pre><br />
module load ansys/ansys-14.0 <br />
</pre><br />
<br />
==Running ANSYS Interactively==<br />
<br />
If you are trying to run ANSYS interactively, you can start reserve interactive compute resources with the command "qlogin -l h_rt=04:00:00,vf=2G" once you are logged into the cluster via SSH. Please do not run computations on the head node. You may need to adjust the h_rt (runtime) and vf (RAM) requirements if the analysis is expected to run more than 4 hours or use more than 2 gigabytes of RAM. After gaining access to the interactive session you can issue the commands "module load ansys/ansys-14.5" and "ansys145" to run ANSYS interactively.<br />
<br />
==Running ANSYS as an SMP Job==<br />
<br />
Running ANSYS tools via batch mode may also be possible. A quick Google turns up these [http://wiki.crc.nd.edu/wiki/index.php/Submitting_a_Fluent_Job_to_SGE sample job scripts] for running fluent via SGE. You may find the non-interactive "Method 2" example could work well on our cluster, since we also offer the module load ansys option. Combining this example with our [[Cheaha_GettingStarted#Sample_Job_Scripts|provided template]] could lead to the following script that may work for you:<br />
<br />
#$ -S /bin/bash<br />
#$ -cwd<br />
#<br />
#$ -N ANSYS<br />
#$ -pe smp 8<br />
#$ -l h_rt=00:10:00,s_rt=0:08:00,vf=1G<br />
#$ -j y<br />
#<br />
#$ -M YOUR_EMAIL_ADDRESS<br />
#$ -m eas<br />
#<br />
# Load the appropriate module files<br />
module load openmpi/openmpi-gnu<br />
module load ansys/ansys-14.5<br />
#$ -V<br />
# turn off shell variable "noclobber" which prevents data files from being<br />
# overwritten. Fluent jobs are typically setup to overwrite data files.<br />
unset noclobber<br />
# Run the ansys job; <br />
ansys < file.jou >& logfile.txt<br />
<br />
Note, you will likely need to adjust the h_rt= (run time), vf= (RAM) and openmpi 24 (core count) to suite your needs. You should start small and scale up when you are confident in the successful outcome. The [http://148.204.81.206/Ansys/150/Running%20ANSYS%20Fluent%20under%20SGE.pdf Running ANSYS on SGE reference] may provide additional useful information for accomplishing your ANSYS goals.<br />
<br />
<br />
[[Category:Software]]</div>Mhanby@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=NAMD_GPU&diff=5447NAMD GPU2016-10-31T19:54:16Z<p>Mhanby@uab.edu: </p>
<hr />
<div>== CUDA GPU Acceleration ==<br />
<br />
NAMD only uses the GPU for nonbonded force evaluation. Energy evaluation<br />
is done on the CPU. To benefit from GPU acceleration you should set<br />
outputEnergies to 100 or higher in the simulation config file. Some<br />
features are unavailable in CUDA builds, including alchemical free<br />
energy perturbation.<br />
<br />
As this is a new feature you are encouraged to test all simulations<br />
before beginning production runs. Forces evaluated on the GPU differ<br />
slightly from a CPU-only calculation, an effect more visible in reported<br />
scalar pressure values than in energies.<br />
<br />
To benefit from GPU acceleration you will need a CUDA build of NAMD<br />
and a recent high-end NVIDIA video card. CUDA builds will not function<br />
without a CUDA-capable GPU. You will also need to be running the<br />
NVIDIA Linux driver version 195.17 or newer (released Linux binaries<br />
are built with CUDA 2.3, but can be built with newer versions as well).<br />
<br />
Finally, the libcudart.so.2 included with the binary (the one copied from<br />
the version of CUDA it was built with) must be in a directory in your<br />
LD_LIBRARY_PATH before any other libcudart.so libraries. For example:<br />
<br />
setenv LD_LIBRARY_PATH ".:$LD_LIBRARY_PATH"<br />
(or LD_LIBRARY_PATH=".:$LD_LIBRARY_PATH"; export LD_LIBRARY_PATH)<br />
./namd2 +idlepoll <configfile><br />
./charmrun ++local +p4 ./namd2 +idlepoll <configfile><br />
<br />
When running CUDA NAMD always add +idlepoll to the command line. This<br />
is needed to poll the GPU for results rather than sleeping while idle.<br />
<br />
Each namd2 process can use only one GPU. Therefore you will need to run<br />
at least one process for each GPU you want to use. Multiple processes<br />
can share a single GPU, usually with an increase in performance. NAMD<br />
will automatically distribute processes equally among the GPUs on a node.<br />
Specific GPU device IDs can be requested via the +devices argument on<br />
the namd2 command line, for example:<br />
<br />
./charmrun ++local +p4 ./namd2 +idlepoll +devices 0,2 <configfile><br />
<br />
Devices are selected cyclically from those available, so in the above<br />
example processes 0 and 2 will share device 0 and processes 1 and 3 will<br />
share device 2. One could also specify +devices 0,0,2,2 to cause device<br />
0 to be shared by processes 0 and 1, etc. GPUs with two or fewer<br />
multiprocessors are ignored unless specifically requested with +devices.<br />
GPUs of compute capability 1.0 are no longer supported and are ignored.<br />
<br />
While charmrun with ++local will preserve LD_LIBRARY_PATH, normal<br />
charmrun does not. You can use charmrun ++runscript to add the namd2<br />
directory to LD_LIBRARY_PATH with the following executable runscript:<br />
<br />
#!/bin/csh<br />
setenv LD_LIBRARY_PATH "${1:h}:$LD_LIBRARY_PATH"<br />
$*<br />
<br />
For example:<br />
<br />
./charmrun ++runscript ./runscript +p8 ./namd2 +idlepoll <configfile><br />
<br />
An InfiniBand network is highly recommended when running CUDA-accelerated<br />
NAMD across multiple nodes. You will need either an ibverbs NAMD binary<br />
(available for download) or an MPI NAMD binary (must build Charm++ and<br />
NAMD as described below) to make use of the InfiniBand network.<br />
<br />
The CUDA (NVIDIA's graphics processor programming platform) code in<br />
NAMD is completely self-contained and does not use any of the CUDA<br />
support features in Charm++. When building NAMD with CUDA support<br />
you should use the same Charm++ you would use for a non-CUDA build.<br />
Do NOT add the cuda option to the Charm++ build command line. The<br />
only changes to the build process needed are to add --with-cuda and<br />
possibly --cuda-prefix ... to the NAMD config command line.<br />
<br />
<br />
== CUDA NAMD on Cheaha ==<br />
<br />
=== Download and build CUDA NAMD ===<br />
Download and build NAMD with NVIDIA CUDA Acceleration from the NAMD Download <br />
page: http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD<br />
<br />
Following example assumes NAMD was built in $USER_SCRATCH/NAMD directory.<br />
<br />
=== Running NAMD on Cheaha CUDA BLADE === <br />
Cheaha CUDA enabled blade is: cheaha-compute-1-9 <br />
ssh to the particular host to work on CUDA<br />
<pre> ssh cheaha-compute-1-9 </pre><br />
<br />
==== Load CUDA module ====<br />
<pre><br />
module load cuda/cuda-4 ( to load cuda )<br />
</pre><br />
<br />
==== CUDA commands ==== <br />
<pre><br />
deviceQuery (to check the status of the device)<br />
<br />
bandwidthTest (to test the bandwidth for data transfer)<br />
</pre><br />
<br />
If the above tests fail please contact Mike Hanby.<br />
<br />
==== Export Path ====<br />
<pre> <br />
export LD_LIBRARY_PATH=$USER_SCRATCH/NAMD/NAMD_2.8_Linux-x86_64-CUDA/bin/:$LD_LIBRARY_PATH<br />
export LD_LIBRARY_PATH=$USER_SCRATCH/NAMD/NAMD_2.8_Linux-x86_64-CUDA/:$LD_LIBRARY_PATH<br />
</pre><br />
<br />
==== NAMD2 ====<br />
run namd2 from the $USER_SCRATCH/NAMD directory (which contains the CUDA build of NAMD)<br />
<pre><br />
./charmrun ++local +p12 ./namd2 +idlepoll /LOCATION/OF/CONFIGURATION/FILE/*.conf>OUTPUT_FILE<br />
</pre><br />
<br />
here,<br />
* ++local makes use of the processors only on the compute node<br />
* +p12 tells NAMD to make use of the 12 processor cores on the compute node</div>Mhanby@uab.edu