https://docs.uabgrid.uab.edu/w/api.php?action=feedcontributions&user=Tanthony%40uab.edu&feedformat=atomCheaha - User contributions [en]2024-03-28T10:03:33ZUser contributionsMediaWiki 1.38.2https://docs.uabgrid.uab.edu/w/index.php?title=Template:Main_Banner&diff=5883Template:Main Banner2019-01-28T20:46:35Z<p>Tanthony@uab.edu: Added Matlab License renewal instructions</p>
<hr />
<div>--[[User:Tanthony@uab.edu|Tanthony@uab.edu]] ([[User talk:Tanthony@uab.edu|talk]]) 14:46, 28 January 2019 (CST)<!-- MAIN PAGE BANNER --><br />
<table id="mp-banner" style="width: 100%; margin:4px 0 0 0; background:none; border-spacing: 0px;"><br />
<tr><td class="MainPageBG" style="text-align:center; padding:0.2em; background-color:#cef2e0; border:2px solid #f2e0ce; color:#000; font-size:100%;"><br />
<br />
<span style="color:#009000"> '''<big></big>''' </span><br />
<br />
[[Image:information.png|left|link=]]<br />
<span><big>'''Winter 2018 Maintenance Complete'''</big> <br />
<br />
The HPC 2018 Winter Maintenance has been Completed. <br />
<br />
[[Winter2018Maintenance| Click here for more information]]<br />
</span><br />
<br><br />
<span><big>'''Matlab Renewal Instructions'''</big> <br />
To update the license click Help > Licensing > Update Current License <br />
</span><br />
</td><br />
</tr><br />
</table></div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Welcome&diff=5871Welcome2019-01-23T14:28:30Z<p>Tanthony@uab.edu: /* Description of Cheaha for Grants (short) */ added EDR in short description</p>
<hr />
<div>{{Main_Banner}}<br />
Welcome to the '''Research Computing System'''<br />
<br />
The Research Computing System (RCS) provides a framework for sharing data, accessing compute power, and collaborating with peers on campus and around the globe. Our goal is to construct a dynamic "network of services" that you can use to organize your data, study it, and share outcomes.<br />
<br />
''''docs'''' (the service you are looking at while reading this text) is one of a set of core services, or libraries, available for you to organize information you gather. Docs is a wiki, an online editor to collaboratively write and share documentation. ([http://en.wikipedia.org/wiki/Wiki Wiki is a Hawaiian term] meaning fast.) You can learn more about '''docs''' on the page [[UnderstandingDocs]]. The docs wiki is filled with pages that document the many different services and applications available on the Research Computing System. If you see information that looks out of date please don't hesitate to [mailto:support@vo.uabgrid.uab.edu ask about it] or fix it.<br />
<br />
The Research Computing System is designed to provide services to researchers in three core areas:<br />
<br />
* '''Data Analysis''' - using the High Performance Computing (HPC) fabric we call [[Cheaha]] for analyzing data and running simulations. Many [[Cheaha_Software|applications are already available]] or you can install your own <br />
* '''Data Sharing''' - supporting the trusted exchange of information using virtual data containers to spark new ideas<br />
* '''Application Development''' - providing virtual machines and web-hosted development tools empowering you to serve others with your research<br />
<br />
== Support and Development ==<br />
<br />
The Research Computing System is developed and supported by UAB IT's Research Computing Group. We are also developing a core set of applications to help you to easily incorporate our services into your research processes and this documentation collection to help you leverage the resources already available. We follow the best practices of the Open Source community and develop the RCS openly. You can follow our progress via the [http://dev.uabgrid.uab.edu our development wiki].<br />
<br />
The Research Computing System is an out growth of the UABgrid pilot, launched in September 2007 which has focused on demonstrating the utility of unlimited analysis, storage, and application for research. RCS is being built on the same technology foundations used by major cloud vendors and decades of distributed systems computing research, technology that powered the last ten years of large scale systems serving prominent national and international initiatives like the [http://opensciencegrid.org/ Open Science Grid], [http://xsede.org XSEDE], [http://www.teragrid.org/ TeraGrid], the [http://lcg.web.cern.ch/LCG/ LHC Computing Grid], and [https://cabig.nci.nih.gov caBIG].<br />
<br />
== Outreach ==<br />
<br />
The UAB IT Research Computing Group has collaborated with a number of prominent research projects at UAB to identify use cases and develop the requirements for the RCS. Our collaborators include the Center for Clinical and Translational Science (CCTS), Heflin Genomics Center, the Comprehensive Cancer Center (CCC), the Department of Computer and Information Sciences (CIS), the Department of Mechanical Engineering (ME), Lister Hill Library, the School of Optometry's Center for the Development of Functional Imaging, and Health System Information Services (HSIS). <br />
<br />
As part of the process of building this research computing platform, the UAB IT Research Computing Group has hosted an annual campus symposium on research computing and cyber-infrastructure (CI) developments and accomplishments. Starting as CyberInfrastructure (CI) Days in 2007, the name was changed to [http://docs.uabgrid.uab.edu/wiki/UAB_Research_Computing_Day '''UAB Research Computing Day'''] in 2011 to reflect the broader mission to support research. IT Research Computing also participates in other campus wide symposiums including UAB Research Core Day.<br />
<br />
== Featured Research Applications ==<br />
<br />
The Research Computing Group also helps support the campus MATLAB license with self-service installation documentation and supports using MATLAB on the HPC platform, providing a pathway to expand your computational power and freeing your laptop from serving as a compute platform.<br />
<br />
{{abox<br />
| UAB MATLAB Information |<br />
In January 2011, UAB acquired a site license from Mathworks for MATLAB, SimuLink and 42 Toolboxes. <br />
* Learn more about [[MATLAB|MATLAB and how you can use it at UAB]]<br />
* Learn more about the [[UAB TAH license|UAB Mathworks Site license]] and review [[Matlab site license FAQ|frequently asked questions about the license]]<br />
}}<br />
<br />
The UAB IT Research Computing group, the CCTS BMI, and [http://www.uab.edu/hcgs/bioinformatics Heflin Center for Genomic Science] have teamed up to help improve genomic research at UAB. Researchers can work with the scientists and research experts to produce a research pipeline from sequencing, to analysis, to publication.<br />
<br />
{{abox<br />
|'''Galaxy'''|<br />
A web front end to run analyses on the cluster fabric. Currently focused on NGS (Next Generation Sequencing; biology) analysis support. <br />
* [[Galaxy|Galaxy Project Home]]<br />
* [http://projects.uabgrid.uab.edu/galaxy Galaxy Development Wiki]<br />
}}<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. Any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. We also request that you send us a list of publications based on your use of Cheaha resources.<br />
<br />
=== Description of Cheaha for Grants (short)===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2900 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an EDR InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
==== Cheaha HPC system ====<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (RC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. Research Computing in open collaboration with the campus research community is leading the design and development of these resources.<br />
<br />
==== Compute Resources ====<br />
<br />
Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a cluster desktop. The local compute pool provides access to two generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The newest generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs per node, and EDR InfiniBand interconnect. The compute nodes combine to provide over 468 TFLOP/s of dedicated computing power.<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (OSG).<br />
<br />
==== Storage Resources ====<br />
<br />
The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware connected via the Infiniband fabric. An expansion of the GPFS fabric will double the capacity and is scheduled to be on-line Fall 2018. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
==== Network Resources ====<br />
<br />
The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverages the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone, ensuring predictable latencies. In addition, the network also consist of a secure Science DMZ with data transfer nodes (DTNs), Perfsonar measurement nodes, and a Bro security node connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using 10 Gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
==== Personnel ====<br />
<br />
UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes an HPC Architect-Manager, four Software developers, two Scientists, a system administrator and a project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
This work was supported in part by the National Science Foundation under Grants Nos. OAC-1541310, the University of Alabama at Birmingham, and the Alabama Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Alabama at Birmingham.</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Welcome&diff=5855Welcome2019-01-12T02:32:16Z<p>Tanthony@uab.edu: /* Compute Resources */ spell correction</p>
<hr />
<div>{{Main_Banner}}<br />
Welcome to the '''Research Computing System'''<br />
<br />
The Research Computing System (RCS) provides a framework for sharing data, accessing compute power, and collaborating with peers on campus and around the globe. Our goal is to construct a dynamic "network of services" that you can use to organize your data, study it, and share outcomes.<br />
<br />
''''docs'''' (the service you are looking at while reading this text) is one of a set of core services, or libraries, available for you to organize information you gather. Docs is a wiki, an online editor to collaboratively write and share documentation. ([http://en.wikipedia.org/wiki/Wiki Wiki is a Hawaiian term] meaning fast.) You can learn more about '''docs''' on the page [[UnderstandingDocs]]. The docs wiki is filled with pages that document the many different services and applications available on the Research Computing System. If you see information that looks out of date please don't hesitate to [mailto:support@vo.uabgrid.uab.edu ask about it] or fix it.<br />
<br />
The Research Computing System is designed to provide services to researchers in three core areas:<br />
<br />
* '''Data Analysis''' - using the High Performance Computing (HPC) fabric we call [[Cheaha]] for analyzing data and running simulations. Many [[Cheaha_Software|applications are already available]] or you can install your own <br />
* '''Data Sharing''' - supporting the trusted exchange of information using virtual data containers to spark new ideas<br />
* '''Application Development''' - providing virtual machines and web-hosted development tools empowering you to serve others with your research<br />
<br />
== Support and Development ==<br />
<br />
The Research Computing System is developed and supported by UAB IT's Research Computing Group. We are also developing a core set of applications to help you to easily incorporate our services into your research processes and this documentation collection to help you leverage the resources already available. We follow the best practices of the Open Source community and develop the RCS openly. You can follow our progress via the [http://dev.uabgrid.uab.edu our development wiki].<br />
<br />
The Research Computing System is an out growth of the UABgrid pilot, launched in September 2007 which has focused on demonstrating the utility of unlimited analysis, storage, and application for research. RCS is being built on the same technology foundations used by major cloud vendors and decades of distributed systems computing research, technology that powered the last ten years of large scale systems serving prominent national and international initiatives like the [http://opensciencegrid.org/ Open Science Grid], [http://xsede.org XSEDE], [http://www.teragrid.org/ TeraGrid], the [http://lcg.web.cern.ch/LCG/ LHC Computing Grid], and [https://cabig.nci.nih.gov caBIG].<br />
<br />
== Outreach ==<br />
<br />
The UAB IT Research Computing Group has collaborated with a number of prominent research projects at UAB to identify use cases and develop the requirements for the RCS. Our collaborators include the Center for Clinical and Translational Science (CCTS), Heflin Genomics Center, the Comprehensive Cancer Center (CCC), the Department of Computer and Information Sciences (CIS), the Department of Mechanical Engineering (ME), Lister Hill Library, the School of Optometry's Center for the Development of Functional Imaging, and Health System Information Services (HSIS). <br />
<br />
As part of the process of building this research computing platform, the UAB IT Research Computing Group has hosted an annual campus symposium on research computing and cyber-infrastructure (CI) developments and accomplishments. Starting as CyberInfrastructure (CI) Days in 2007, the name was changed to [http://docs.uabgrid.uab.edu/wiki/UAB_Research_Computing_Day '''UAB Research Computing Day'''] in 2011 to reflect the broader mission to support research. IT Research Computing also participates in other campus wide symposiums including UAB Research Core Day.<br />
<br />
== Featured Research Applications ==<br />
<br />
The Research Computing Group also helps support the campus MATLAB license with self-service installation documentation and supports using MATLAB on the HPC platform, providing a pathway to expand your computational power and freeing your laptop from serving as a compute platform.<br />
<br />
{{abox<br />
| UAB MATLAB Information |<br />
In January 2011, UAB acquired a site license from Mathworks for MATLAB, SimuLink and 42 Toolboxes. <br />
* Learn more about [[MATLAB|MATLAB and how you can use it at UAB]]<br />
* Learn more about the [[UAB TAH license|UAB Mathworks Site license]] and review [[Matlab site license FAQ|frequently asked questions about the license]]<br />
}}<br />
<br />
The UAB IT Research Computing group, the CCTS BMI, and [http://www.uab.edu/hcgs/bioinformatics Heflin Center for Genomic Science] have teamed up to help improve genomic research at UAB. Researchers can work with the scientists and research experts to produce a research pipeline from sequencing, to analysis, to publication.<br />
<br />
{{abox<br />
|'''Galaxy'''|<br />
A web front end to run analyses on the cluster fabric. Currently focused on NGS (Next Generation Sequencing; biology) analysis support. <br />
* [[Galaxy|Galaxy Project Home]]<br />
* [http://projects.uabgrid.uab.edu/galaxy Galaxy Development Wiki]<br />
}}<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. Any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. We also request that you send us a list of publications based on your use of Cheaha resources.<br />
<br />
=== Description of Cheaha for Grants (short)===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2900 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
==== Cheaha HPC system ====<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (RC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. Research Computing in open collaboration with the campus research community is leading the design and development of these resources.<br />
<br />
==== Compute Resources ====<br />
<br />
Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a cluster desktop. The local compute pool provides access to two generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The newest generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs per node, and EDR InfiniBand interconnect. The compute nodes combine to provide over 468 TFLOP/s of dedicated computing power.<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (OSG).<br />
<br />
==== Storage Resources ====<br />
<br />
The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware connected via the Infiniband fabric. An expansion of the GPFS fabric will double the capacity and is scheduled to be on-line Fall 2018. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
==== Network Resources ====<br />
<br />
The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverages the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone, ensuring predictable latencies. In addition, the network also consist of a secure Science DMZ with data transfer nodes (DTNs), Perfsonar measurement nodes, and a Bro security node connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using 10 Gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
==== Personnel ====<br />
<br />
UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes an HPC Architect-Manager, four Software developers, two Scientists, a system administrator and a project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
This work was supported in part by the National Science Foundation under Grants Nos. OAC-1541310, the University of Alabama at Birmingham, and the Alabama Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Alabama at Birmingham.</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Winter2018Maintenance&diff=5853Winter2018Maintenance2018-12-17T16:34:53Z<p>Tanthony@uab.edu: </p>
<hr />
<div>'''The HPC 2018 Winter Maintenance is scheduled from Sunday December 16 through Saturday December 22, 2018. This maintenance requires user action to preserve files in /data/scratch.''' <br />
<br />
Please review the details below to determine if this affects your data.<br />
During the maintenance, job execution will be suspended and any jobs remaining in the queue at the start of the maintenance will be removed to allow for service and upgrades to the cluster.<br />
<br />
This maintenance involves service to the cluster storage to add capacity and increase performance. The /data/user and /data/project storage locations will be preserved. However, DATA IN /data/scratch WILL NOT BE PRESERVED. Please arrange to move data you wish to preserve to your /data/user, /data/project or other off-cluster storage. <br />
<br />
Users are reminded that /data/scratch is a location for temporary file storage during computation and provides no assurances of long-term availability. Data that must remain available beyond job execution time frames should be moved to /data/user or /data/project<br />
<br />
As always, we will work to maintain access to the login node and file system so that data access operations are minimally impacted. If possible, we will also reduce the period of time that compute nodes are unavailable. Our goal is to complete these updates with minimal disruption. Unfortunately, some steps still require user-visible restarts to systems and services.<br />
<br />
As the maintenance time approaches, only jobs that can complete before the maintenance time will be queued and initiated. This is intended to ensure no pending jobs can remain in the queue during the maintenance window.<br />
<br />
'''When:''' December 16 thru 22<br />
<br />
'''What:'''<br />
* Cluster management stack will be updated from Bright Cluster Manager 8.0 to 8.1<br />
* Operating System upgraded to RHEL 7.6<br />
* Slurm job scheduler will be updated from 17.02.2 to 17.11.8 (or 18.02.x possibly)<br />
* Enabled pam_slurm.so to limit SSH access to compute nodes for users with active job(s) on the node<br />
* Slurm epilog script to report job resource utilization<br />
* BETA - Rollout Open On Demand portal - <LINK TO MORE INFORMATION><br />
* Possibly moving from tmod to lmod if we can confirm seamless transition<br />
* CUDA version updates<br />
* Mellanox OFED version updates<br />
* Migrate /data/user and /data/project to new GPFS storage<br />
* Upgrade firmware on hardware<br />
<br />
'''<br />
Please contact support@listserv.uab.edu with any questions or concerns.'''</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Winter2018Maintenance&diff=5852Winter2018Maintenance2018-12-17T16:33:15Z<p>Tanthony@uab.edu: </p>
<hr />
<div>'''The HPC 2018 Winter Maintenance is scheduled from Sunday December 16 through Saturday December 22, 2018. This maintenance requires user action to preserve files in /data/scratch.''' <br />
<br />
Please review the details below to determine if this affects your data.<br />
During the maintenance, job execution will be suspended and any jobs remaining in the queue at the start of the maintenance will be removed to allow for service and upgrades to the cluster.<br />
<br />
<br />
This maintenance involves service to the cluster storage to add capacity and increase performance. The /data/user and /data/project storage locations will be preserved. However, DATA IN /data/scratch WILL NOT BE PRESERVED. Please arrange to move data you wish to preserve to your /data/user, /data/project or other off-cluster storage. <br />
<br />
<br />
Users are reminded that /data/scratch is a location for temporary file storage during computation and provides no assurances of long-term availability. Data that must remain available beyond job execution time frames should be moved to /data/user or /data/project.<br />
<br />
<br />
As always, we will work to maintain access to the login node and file system so that data access operations are minimally impacted. If possible, we will also reduce the period of time that compute nodes are unavailable. Our goal is to complete these updates with minimal disruption. Unfortunately, some steps still require user-visible restarts to systems and services.<br />
<br />
<br />
<br />
As the maintenance time approaches, only jobs that can complete before the maintenance time will be queued and initiated. This is intended to ensure no pending jobs can remain in the queue during the maintenance window.<br />
<br />
<br />
<br />
When: December 16 thru 22<br />
<br />
<br />
<br />
What:<br />
<br />
* Cluster management stack will be updated from Bright Cluster Manager 8.0 to 8.1<br />
* Operating System upgraded to RHEL 7.6<br />
* Slurm job scheduler will be updated from 17.02.2 to 17.11.8 (or 18.02.x possibly)<br />
* Enabled pam_slurm.so to limit SSH access to compute nodes for users with active job(s) on the node<br />
* Slurm epilog script to report job resource utilization<br />
* BETA - Rollout Open On Demand portal - <LINK TO MORE INFORMATION><br />
* Possibly moving from tmod to lmod if we can confirm seamless transition<br />
* CUDA version updates<br />
* Mellanox OFED version updates<br />
* Migrate /data/user and /data/project to new GPFS storage<br />
* Upgrade firmware on hardware<br />
<br />
<br />
Please contact support@listserv.uab.edu with any questions or concerns.<br />
<br />
<br />
Thank you,<br />
<br />
<br />
Research Computing</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Template:Main_Banner&diff=5851Template:Main Banner2018-12-17T16:32:15Z<p>Tanthony@uab.edu: </p>
<hr />
<div><!-- MAIN PAGE BANNER --><br />
<table id="mp-banner" style="width: 100%; margin:4px 0 0 0; background:none; border-spacing: 0px;"><br />
<tr><td class="MainPageBG" style="text-align:center; padding:0.2em; background-color:#cef2e0; border:2px solid #f2e0ce; color:#000; font-size:100%;"><br />
<br />
<span style="color:#009000"> '''<big></big>''' </span><br />
<br />
[[Image:information.png|left|link=]]<br />
<span><big>'''Winter 2018 Maintenance - Dec 16-22, 2018'''</big> <br />
<br />
The HPC 2018 Winter Maintenance is scheduled from Sunday December 16 through Saturday December 22, 2018. <br />
<br />
[[Winter2018Maintenance| Click here for more information]]<br />
<br />
</span><br />
</td><br />
</tr><br />
</table></div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Winter2018Maintenance&diff=5850Winter2018Maintenance2018-12-17T16:29:44Z<p>Tanthony@uab.edu: Created page with "The HPC 2018 Winter Maintenance is scheduled from Sunday December 16 through Saturday December 22, 2018. This maintenance requires user action to preserve files in /data/sc..."</p>
<hr />
<div>The HPC 2018 Winter Maintenance is scheduled from Sunday December 16 through Saturday December 22, 2018. This maintenance requires user action to preserve files in /data/scratch. Please review the details below to determine if this affects your data.<br />
<br />
<br />
During the maintenance, job execution will be suspended and any jobs remaining in the queue at the start of the maintenance will be removed to allow for service and upgrades to the cluster.<br />
<br />
<br />
This maintenance involves service to the cluster storage to add capacity and increase performance. The /data/user and /data/project storage locations will be preserved. However, DATA IN /data/scratch WILL NOT BE PRESERVED. Please arrange to move data you wish to preserve to your /data/user, /data/project or other off-cluster storage. <br />
<br />
<br />
Users are reminded that /data/scratch is a location for temporary file storage during computation and provides no assurances of long-term availability. Data that must remain available beyond job execution time frames should be moved to /data/user or /data/project.<br />
<br />
<br />
As always, we will work to maintain access to the login node and file system so that data access operations are minimally impacted. If possible, we will also reduce the period of time that compute nodes are unavailable. Our goal is to complete these updates with minimal disruption. Unfortunately, some steps still require user-visible restarts to systems and services.<br />
<br />
<br />
<br />
As the maintenance time approaches, only jobs that can complete before the maintenance time will be queued and initiated. This is intended to ensure no pending jobs can remain in the queue during the maintenance window.<br />
<br />
<br />
<br />
When: December 16 thru 22<br />
<br />
<br />
<br />
What:<br />
<br />
Cluster management stack will be updated from Bright Cluster Manager 8.0 to 8.1<br />
Operating System upgraded to RHEL 7.6<br />
Slurm job scheduler will be updated from 17.02.2 to 17.11.8 (or 18.02.x possibly)<br />
Enabled pam_slurm.so to limit SSH access to compute nodes for users with active job(s) on the node<br />
Slurm epilog script to report job resource utilization<br />
BETA - Rollout Open On Demand portal - <LINK TO MORE INFORMATION><br />
Possibly moving from tmod to lmod if we can confirm seamless transition<br />
CUDA version updates<br />
Mellanox OFED version updates<br />
Migrate /data/user and /data/project to new GPFS storage<br />
Upgrade firmware on hardware<br />
<br />
<br />
Please contact support@listserv.uab.edu with any questions or concerns.<br />
<br />
<br />
Thank you,<br />
<br />
<br />
Research Computing</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=RCDay2018&diff=5842RCDay20182018-11-05T14:13:07Z<p>Tanthony@uab.edu: /* Agenda */ changed topic of Glenn's talk</p>
<hr />
<div>=== Fall 2018 Research Computing Day -- Use Cases and Strategic Engagement ===<br />
<br />
Date: November, 7 2018<br />
<br />
Venue: Hill Student Center, Alumni Theater<br />
<br />
Open to all UAB faculty, staff, and students. Registration is free but we ask for early registration to get a head count for lunch, so please register '''[https://www.eventbrite.com/e/research-computing-day-2018-tickets-51526010685 here]''' to attend.<br />
<br />
=== Agenda ===<br />
<br />
{| class="wikitable" border="1"<br />
<br />
|9:00 am – 9:15 am ||'''Intro and Welcome''' <br/><br />
Curtis A. Carver Jr., PhD<br/><br />
Vice President and Chief Information Officer<br/><br />
University of Alabama at Birmingham<br />
<br />
|-<br />
|9:15 am – 9:45 am ||'''Walkabout Summary''' <br/><br />
Ralph Zottola, PhD<br/><br />
Assistant Vice President Research Computing<br/><br />
University of Alabama at Birmingham<br />
<br />
|- <br />
|9:45 am – 10:30 am ||'''Research Computing: From Laptops to Leadership''' <br/><br />
Glenn Brook, PhD<br/><br />
NICS<br/><br />
Oak Ridge National Lab<br />
<br />
|- <br />
|10:30 am – 10:45 am ||'''Break''' <br/><br />
<br />
|- <br />
|10:45 am – 11:45 am ||'''Research Computing Use Cases''' <br/><br />
Data Science: UBRITE - Jelai Wang<br /><br />
Lowering Barriers: Open on Demand - John-Paul Robinson<br/><br />
<br />
|-<br />
|11:45 am – 1:00 pm ||'''Lunch'''<br />
<br />
|-<br />
|1:00 pm – 2:00 pm ||'''Research Data Landscape''' <br/><br />
Research Data Management - Ralph Zottola, PhD <br/><br />
Security/Regulations - Steven Osborne <br/><br />
Storage - John-Paul Robinson<br />
<br />
<br />
|-<br />
|2:00 pm – 2:45 pm ||'''Research Engagement “Beyond HPC” [Panel]'''<br/><br />
Moderator: Ralph Zottola<br/><br />
Software Development<br/><br />
Data Science<br/><br />
Research Engagement<br/><br />
Cloud<br />
<br />
|-<br />
|2:45 pm – 3:00 pm ||'''Wrap-Up'''<br />
<br />
|}</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=UAB_Research_Computing_Day&diff=5825UAB Research Computing Day2018-09-05T19:06:43Z<p>Tanthony@uab.edu: added 2017</p>
<hr />
<div>Research Computing Day is a dialog within the UAB research community about leveraging the power of computers to grow the depth of our investigation into the nature of the world that surrounds us. The annual event welcomes discussions on science, engineering, the arts and humanities focused on the drive to open new research frontiers with advances in technology.<br />
<br />
Whether computers are used to increase the accuracy of a model, interpret the ever-growing stream of data from new image collections and instruments, or engage with peers around the globe, UAB’s status as a leading research community depends on the ability to incorporate these capabilities into the research process. By participating in the dialog of Research Computing Day at UAB, researchers can share how they are using these methods to enhance their research, gain new insights from peers, and contribute their voices to the growth of research at UAB.<br />
<br />
== Research Computing Day 2017 ==<br />
<br />
[[RCDay2017|Research Computing Day 2017]] was held October 13, 2017 from 10:30am to 4:00pm at the Hill Student Center, Alumni Theater.<br />
<br />
== Background ==<br />
<br />
Since 2007, The [http://www.uab.edu/it Office of the Vice President for Information Technology] has sponsored an annual dialog on the role of technology in research. These events joined UAB with [https://www.nsf.gov/awardsearch/showAward?AWD_ID=0956272 national dialogs on the role of Cyberinfrastructure in research] held at campuses across the country.<br />
<br />
== Previous UAB Research Computing Days ==<br />
<br />
* 2007 -- Co-hosted along with the [http://asc.edu ASA] site visit, providing an overview of new services and upcoming launch of the UABgrid pilot. (No web record)<br />
* 2008 -- Focus on grid computing and collaboration technologies, in particular the caBIG program with guest speakers from Booz Allen Hamilton who managed the NCI caBIG program and SURA (agenda currently offline) <br />
* 2010 -- Featured introduction to Galaxy platform for genetic sequencing by Dell staff scientist (agenda currently offline)<br />
* [[2011]] -- Understanding growth of research computing support at peer institutions UNC and Emory <br />
* [[2012]] -- Growing data sciences at UAB<br />
* [[2013]] -- OpenStack at UAB<br />
* [[RCDay2016|2016]] -- HPC Expansion<br />
* [[RCDay2017|2017]] -- GPU expansion<br />
<br />
[[Category:RCDay]]</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Welcome&diff=5800Welcome2018-07-13T15:07:03Z<p>Tanthony@uab.edu: /* Description of Cheaha for Grants (Detailed) */ changed format</p>
<hr />
<div>{{Main_Banner}}<br />
Welcome to the '''Research Computing System'''<br />
<br />
The Research Computing System (RCS) provides a framework for sharing data, accessing compute power, and collaborating with peers on campus and around the globe. Our goal is to construct a dynamic "network of services" that you can use to organize your data, study it, and share outcomes.<br />
<br />
''''docs'''' (the service you are looking at while reading this text) is one of a set of core services, or libraries, available for you to organize information you gather. Docs is a wiki, an online editor to collaboratively write and share documentation. ([http://en.wikipedia.org/wiki/Wiki Wiki is a Hawaiian term] meaning fast.) You can learn more about '''docs''' on the page [[UnderstandingDocs]]. The docs wiki is filled with pages that document the many different services and applications available on the Research Computing System. If you see information that looks out of date please don't hesitate to [mailto:support@vo.uabgrid.uab.edu ask about it] or fix it.<br />
<br />
The Research Computing System is designed to provide services to researchers in three core areas:<br />
<br />
* '''Data Analysis''' - using the High Performance Computing (HPC) fabric we call [[Cheaha]] for analyzing data and running simulations. Many [[Cheaha_Software|applications are already available]] or you can install your own <br />
* '''Data Sharing''' - supporting the trusted exchange of information using virtual data containers to spark new ideas<br />
* '''Application Development''' - providing virtual machines and web-hosted development tools empowering you to serve others with your research<br />
<br />
== Support and Development ==<br />
<br />
The Research Computing System is developed and supported by UAB IT's Research Computing Group. We are also developing a core set of applications to help you to easily incorporate our services into your research processes and this documentation collection to help you leverage the resources already available. We follow the best practices of the Open Source community and develop the RCS openly. You can follow our progress via the [http://dev.uabgrid.uab.edu our development wiki].<br />
<br />
The Research Computing System is an out growth of the UABgrid pilot, launched in September 2007 which has focused on demonstrating the utility of unlimited analysis, storage, and application for research. RCS is being built on the same technology foundations used by major cloud vendors and decades of distributed systems computing research, technology that powered the last ten years of large scale systems serving prominent national and international initiatives like the [http://opensciencegrid.org/ Open Science Grid], [http://xsede.org XSEDE], [http://www.teragrid.org/ TeraGrid], the [http://lcg.web.cern.ch/LCG/ LHC Computing Grid], and [https://cabig.nci.nih.gov caBIG].<br />
<br />
== Outreach ==<br />
<br />
The UAB IT Research Computing Group has collaborated with a number of prominent research projects at UAB to identify use cases and develop the requirements for the RCS. Our collaborators include the Center for Clinical and Translational Science (CCTS), Heflin Genomics Center, the Comprehensive Cancer Center (CCC), the Department of Computer and Information Sciences (CIS), the Department of Mechanical Engineering (ME), Lister Hill Library, the School of Optometry's Center for the Development of Functional Imaging, and Health System Information Services (HSIS). <br />
<br />
As part of the process of building this research computing platform, the UAB IT Research Computing Group has hosted an annual campus symposium on research computing and cyber-infrastructure (CI) developments and accomplishments. Starting as CyberInfrastructure (CI) Days in 2007, the name was changed to [http://docs.uabgrid.uab.edu/wiki/UAB_Research_Computing_Day '''UAB Research Computing Day'''] in 2011 to reflect the broader mission to support research. IT Research Computing also participates in other campus wide symposiums including UAB Research Core Day.<br />
<br />
== Featured Research Applications ==<br />
<br />
The Research Computing Group also helps support the campus MATLAB license with self-service installation documentation and supports using MATLAB on the HPC platform, providing a pathway to expand your computational power and freeing your laptop from serving as a compute platform.<br />
<br />
{{abox<br />
| UAB MATLAB Information |<br />
In January 2011, UAB acquired a site license from Mathworks for MATLAB, SimuLink and 42 Toolboxes. <br />
* Learn more about [[MATLAB|MATLAB and how you can use it at UAB]]<br />
* Learn more about the [[UAB TAH license|UAB Mathworks Site license]] and review [[Matlab site license FAQ|frequently asked questions about the license]]<br />
}}<br />
<br />
The UAB IT Research Computing group, the CCTS BMI, and [http://www.uab.edu/hcgs/bioinformatics Heflin Center for Genomic Science] have teamed up to help improve genomic research at UAB. Researchers can work with the scientists and research experts to produce a research pipeline from sequencing, to analysis, to publication.<br />
<br />
{{abox<br />
|'''Galaxy'''|<br />
A web front end to run analyses on the cluster fabric. Currently focused on NGS (Next Generation Sequencing; biology) analysis support. <br />
* [[Galaxy|Galaxy Project Home]]<br />
* [http://projects.uabgrid.uab.edu/galaxy Galaxy Development Wiki]<br />
}}<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. Any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. We also request that you send us a list of publications based on your use of Cheaha resources.<br />
<br />
=== Description of Cheaha for Grants (short)===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2900 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
==== Cheaha HPC system ====<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (RC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. Research Computing in open collaboration with the campus research community is leading the design and development of these resources.<br />
<br />
==== Compute Resources ====<br />
<br />
Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a cluster desktop. The local compute pool provides access to two generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The newest generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs per node, and EDR InfiniBand interconnect. The compute nodes combine to provide over 468 TLOP/s of dedicated computing power.<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (OSG).<br />
<br />
==== Storage Resources ====<br />
<br />
The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware connected via the Infiniband fabric. An expansion of the GPFS fabric will double the capacity and is scheduled to be on-line Fall 2018. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
==== Network Resources ====<br />
<br />
The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverages the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone, ensuring predictable latencies. In addition, the network also consist of a secure Science DMZ with data transfer nodes (DTNs), Perfsonar measurement nodes, and a Bro security node connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using 10 Gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
==== Personnel ====<br />
<br />
UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes an HPC Architect-Manager, four Software developers, two Scientists, a system administrator and a project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
This work was supported in part by the National Science Foundation under Grants Nos. OAC-1541310, the University of Alabama at Birmingham, and the Alabama Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Alabama at Birmingham.</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Cheaha&diff=5797Cheaha2018-07-13T14:46:27Z<p>Tanthony@uab.edu: /* Description of Cheaha for Grants (short) */</p>
<hr />
<div>{{Main_Banner}}<br />
'''Cheaha''' is a campus resource dedicated to enhancing research computing productivity at UAB. [http://cheaha.uabgrid.uab.edu Cheaha] is managed by [http://www.uab.edu/it UAB Information Technology's Research Computing group (UAB ITRC)] and is available to members of the UAB community in need of increased computational capacity. Cheaha supports [http://en.wikipedia.org/wiki/High-performance_computing high-performance computing (HPC)] and [http://en.wikipedia.org/wiki/High-throughput_computing high throughput computing (HTC)] paradigms.<br />
<br />
Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a [[Setting_Up_VNC_Session|cluster desktop]]. The local compute pool provides access to compute hardware based on the [http://en.wikipedia.org/wiki/X86_64 x86-64 64-bit architecture]. The compute resources are organized into a unified Research Computing System. The compute fabric for this system is anchored by the Cheaha cluster, [[ Resources |a commodity cluster with approximately 2400 cores]] connected by low-latency Fourteen Data Rate (FDR) InfiniBand networks. The compute nodes are backed by 6.6PB raw GPFS storage on DDN SFA12KX hardware, an additional 20TB available for home directories on a traditional Hitachi SAN, and other ancillary services. The compute nodes combine to provide over 110TFlops of dedicated computing power. <br />
<br />
Cheaha is composed of resources that span data centers located in the UAB Shared Computing facility UAB 936 Building and the RUST Computer Center. Resource design and development is lead by UAB IT Research Computing in open collaboration with community members. Operational [mailto:support@vo.uabgrid.uab.edu support] is provided by UAB IT's Research Computing group.<br />
<br />
Cheaha is named in honor of [http://en.wikipedia.org/wiki/Cheaha_Mountain Cheaha Mountain], the highest peak in the state of Alabama. Cheaha is a popular destination whose summit offers clear vistas of the surrounding landscape. (Cheaha Mountain photo-streams on [http://www.flickr.com/search/?q=cheaha Flikr] and [http://picasaweb.google.com/lh/view?q=cheaha&psc=G&filter=1# Picasa]).<br />
<br />
== Using ==<br />
<br />
=== Getting Started ===<br />
<br />
For information on getting an account, logging in, and running a job, please see [[Cheaha2_GettingStarted|Getting Started]].<br />
<br />
== History ==<br />
<br />
[[Image:Research-computing-platform.png|right|thumb|450px|Logical Diagram of Cheaha Configuration]]<br />
<br />
=== 2005 ===<br />
<br />
In 2002 UAB was awarded an infrastructure development grant through the NSF EPsCoR program. This led to the 2005 acquisition of a 64 node compute cluster with two AMD Opteron 242 1.6Ghz CPUs per node (128 total cores). This cluster was named Cheaha. Cheaha expanded the compute capacity available at UAB and was the first general-access resource for the community. It lead to expanded roles for UAB IT in research computing support through the development of the UAB Shared HPC Facility in BEC and provided further engagement in Globus-based grid computing resource development on campus via UABgrid and regionally via [http://www.suragrid.org SURAgrid].<br />
<br />
=== 2008 ===<br />
<br />
In 2008, money was allocated by UAB IT for hardware upgrades which lead to the acquisition of an additional 192 cores based on a Dell clustering solution with Intel Quad-Core E5450 3.0Ghz CPU in August of 2008. This uprade migrated Cheaha's core infrastructure to the Dell blade clustering solution. It provided a 3 fold increase in processor density over the original hardware and enables more computing power to be located in the same physical space with room for expansion, an important consideration in light of the continued growth in processing demand. This hardware represented a major technology upgrade that included space for additional expansion to address over-all capacity demand and enable resource reservation. <br />
<br />
The 2008 upgrade began a continuous resource improvement plan that includes a phased development approach for Cheaha with on-going increases in capacity and feature enhancements being brought into production via an [http://projects.uabgrid.uab.edu/cheaha open community process].<br />
<br />
Software improvements rolled into the 2008 upgrade included grid computing services to access distributed compute resources and orchestrate jobs using the [http://www.gridway.org GridWay] meta-scheduler. An initial 10Gigabit Ethernet link establishing the UABgrid Research Network was designed to supports high speed data transfers between clusters connected to this network.<br />
<br />
=== 2009 ===<br />
<br />
In 2009, annual investment funds were directed toward establishing a fully connected dual data rate Infiniband network between the compute nodes added in 2008 and laying the foundation for a research storage system with a 60TB DDN storage system accessed via the Lustre distributed file system. The Infiniband and storage fabrics were designed to support significant increases in research data sets and their associate analytical demand.<br />
<br />
=== 2010 ===<br />
<br />
In 2010, UAB was awarded an NIH Small Instrumentation Grant (SIG) to further increase analytical and storage capacity. The grant funds were combined with the annual investment funds adding 576 cores (48 nodes) based on the Intel Westmere 2.66 GHz CPU, a quad data rate Infiniband fabric with 32 uplinks, an additional 120 TB of storage for the DDN fabric, and additional hardware to improve reliability. Additional improvements to the research compute platform involved extending the UAB Research Network to link the BEC and RUST data centers, adding 20TB of user and ancillary services storage<br />
<br />
=== 2012 ===<br />
<br />
In 2012, UAB IT Research Computing invested in the foundation hardware to expand long term storage and virtual machine capabilities with aqcuisition of 12 Dell 720xd system, each containing 16 cores, 96GB RAM, and 36TB of storage, creating a 192 core and 432TB virtual compute and storage fabric.<br />
<br />
Additionaly hardware investment by the School of Public Health's Section on Statistical Genetics added three 384GB large memory nodes and an additional 48 cores to the QDR Infiniband fabric.<br />
<br />
=== 2013 ===<br />
<br />
In 2013, UAB IT Research Computing acquired an [http://blogs.uabgrid.uab.edu/jpr/2013/03/were-going-with-openstack/ OpenStack cloud and Ceph storage software fabric] through a partnership between Dell and Inktank in order to [http://dev.uabgrid.uab.edu extend cloud computing solutions] to the researchers at UAB and enhance the interfacing capabilities for HPC.<br />
<br />
=== 2015 === <br />
<br />
UAB IT received $500,000 from the university’s Mission Support Fund for a compute cluster seed expansion of 48 teraflops. This added 936 cores across 40 nodes with 2x12 core 2.5 GHz Intel Xeon E5-2680 v3 compute nodes and FDR InfiniBand interconnect.<br />
<br />
UAB received a $500,000 grant from the Alabama Innovation Fund for a three petabyte research storage array. This funding with additional matching from UAB provided a multi-petabyte [https://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] parallel file system to the cluster which went live in 2016.<br />
<br />
=== 2016 ===<br />
<br />
In 2016 UAB IT Research computing received additional funding from Deans of CAS, Engineering, and Public Heath to grow the compute capacity provided by the prior year's seed funding. This added an additional compute nodes providing researchers at UAB with a 96 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. More information can be found at [[Resources]]. <br />
<br />
In addition to the compute, the GPFS six petabyte file system came online. This file system, provided each user five terabyte of personal space, additional space for shared projects and a greatly expanded scratch storage all in a single file system.<br />
<br />
The 2015 and 2016 investments combined to provide a completely new core for the Cheaha cluster, allowing the retirement of earlier compute generations.<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. If you are using Cheaha for grant funded research please send information about your grant (funding source and grant number), a statement of intent for the research project and a list of the applications you are using to UAB IT Research Computing. If you are using Cheaha for exploratory research, please send a similar note on your research interest. Finally, any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. Please note, your acknowledgment may also need to include an addition statement acknowledging grant-funded hardware. We also ask that you send any references to publications based on your use of Cheaha compute resources.<br />
<br />
=== Description of Cheaha for Grants (short) ===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2900 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
'''Cheaha HPC system'''<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (UAB ITRC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. UAB ITRC in open collaboration with community members is leading the design and development of these resources. UAB IT’s Infrastructure Services group provides operational support and maintenance of these resources.<br />
<br />
'''Compute Resources:''' Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a cluster desktop. The local compute pool provides access to two generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The newest generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs, and EDR InfiniBand interconnect. The compute nodes combine to provide over 468 TLOP/s of dedicated computing power.<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (OSG).<br />
<br />
'''Storage Resources:''' The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware connected via the Infiniband fabric. An expansion of the GPFS fabric is currently ongoing to double the capacity and should be online by Fall 2018. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
'''Network Resources''': The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverage the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone ensures predictable latencies. In addition, the network also consist of a secure science DMZ with DTN's and perfsonar nodes connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
'''Personnel:''' UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes a HPC Architect-Manager, 4 Software developers, 2 Scientists, a System Administrator and a Project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
This work was supported in part by the National Science Foundation under Grants Nos. OAC-1541310, the University of Alabama at Birmingham, and the Alabama Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Alabama at Birmingham.<br />
<br />
== System Profile ==<br />
<br />
=== Performance ===<br />
{{CheahaTflops}}<br />
<br />
=== Hardware ===<br />
<br />
The Cheaha Compute Platform includes three generations of commodity compute hardware, totaling 868 compute cores, 2.8TB of RAM, and over 200TB of storage.<br />
<br />
The hardware is grouped into generations designated gen1, gen2, and gen3 (oldest to newest). The following descriptions highlight the hardware profile for each generation. <br />
<br />
* Generation 1 (gen1) -- 64 2-CPU AMD 1.6 GHz compute nodes with Gigabit interconnect. This is the original hardware collection purchased with NSF EPSCoR funds in 2005, approx $150K. These nodes are sometimes called the "Verari" nodes. These nodes are tagged as "verari-compute-#-#" in the ROCKS naming convention.<br />
* Generation 2 (gen2) -- 24 2x4 core (196 cores total) Intel 3.0 GHz Intel compute nodes with dual data rate Infiniband interconnect and the initial high-perf storage implementation using 60TB DDN. This is the hardware collection purchased exclusively with the annual VPIT funds allocation, approx $150K/yr for the 2008 and 2009 fiscal years. These nodes are sometimes confusingly called "cheaha2" or "cheaha" nodes. These nodes are tagged as "cheaha-compute-#-#" in the ROCKS naming convention. <br />
* Generation 3 (gen3) -- 48 2x6 core (576 cores total) 2.66 GHz Intel compute nodes with quad data rate Infiniband, ScaleMP, and the high-perf storage build-out for capacity and redundancy with 120TB DDN. This is the hardware collection purchased with a combination of the NIH SIG funds and some of the 2010 annual VPIT investment. These nodes were given the code name "sipsey" and tagged as such in the node naming for the queue system. These nodes are tagged as "sipsey-compute-#-#" in the ROCKS naming convention. 16 of the gen3 nodes (sipsey-compute-0-1 thru sipsey-compute-0-16) were upgraded in 2014 from 48GB to 96GB of memory per node. <br />
* Generation 4 (gen4) -- 3 16 core (48 cores total) compute nodes. This hardware collection purchase by [http://www.soph.uab.edu/ssg/people/tiwari Dr. Hemant Tiwari of SSG]. These nodes were given the code name "ssg" and tagged as such in the node naming for the queue system. These nodes are tagged as "ssg-compute-0-#" in the ROCKS naming convention. <br />
* Generation 6 (gen6) -- <br />
** 44 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards (4 nodes with NVIDIA K80 GPUs and 4 nodes with Intel Xeon Phi 7120P accelerators)<br />
** 38 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 256GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 14 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 384GB DDR4 RAM, FDR InfiniBand and 10GigE network card<br />
** FDR InfiniBand Switch<br />
** 10Gigabit Ethernet Switch<br />
** Management node and gigabit switch for cluster management<br />
** Bright Advanced Cluster Management software licenses <br />
<br />
Summarized, Cheaha's compute pool includes:<br />
* gen4 is 48 cores of [http://ark.intel.com/products/64583/Intel-Xeon-Processor-E5-2680-20M-Cache-2_70-GHz-8_00-GTs-Intel-QPI 2.70GHz eight-core Intel Xeon E5-2680 processors] with 24G of RAM per core or 384GB total<br />
* gen3 is 192 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 8Gb RAM per core or 96GB total<br />
* gen3 is 384 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 4Gb RAM per core or 48GB total<br />
* gen2 is 192 cores of [http://ark.intel.com/products/33083/Intel-Xeon-Processor-E5450-12M-Cache-3_00-GHz-1333-MHz-FSB 3.0GHz quad-core Intel Xeon E5450 processors] with 2Gb RAM per core<br />
* gen1 is 100 cores of 1.6GhZ AMD Opteron 242 processors with 1Gb RAM per core <br />
<br />
{|border="1" cellpadding="2" cellspacing="0"<br />
|+ Physical Nodes<br />
|- bgcolor=grey<br />
!gen!!queue!!#nodes!!cores/node!!RAM/node<br />
|-<br />
|gen6|| default || 44 || 24 || 128G<br />
|-<br />
|gen6|| default || 38 || 24 || 256G<br />
|-<br />
|gen6|| default || 14 || 24 || 384G<br />
|-<br />
|gen5||Ceph/OpenStack|| 12 || 20 || 96G<br />
|-<br />
|gen4||ssg||3||16||385G<br />
|-<br />
|gen3||sipsey||16||12||96G<br />
|-<br />
|gen3||sipsey||32||12||48G<br />
|-<br />
|gen2||cheaha||24||8||16G<br />
|}<br />
<br />
=== Software ===<br />
<br />
Details of the software available on Cheaha can be found on the [https://docs.uabgrid.uab.edu/wiki/Cheaha_Software Installed software page], an overview follows.<br />
<br />
Cheaha uses [http://modules.sourceforge.net/ Environment Modules] to support account configuration. Please follow these [http://me.eng.uab.edu/wiki/index.php?title=Cheaha#Environment_Modules specific steps for using environment modules].<br />
<br />
Cheaha's software stack is built with the [http://www.brightcomputing.com Bright Cluster Manager]. Cheaha's operating system is CentOS with the following major cluster components:<br />
* BrightCM 7.2<br />
* CentOS 7.2 x86_64<br />
* [[Slurm]] 15.08<br />
<br />
A brief summary of the some of the available computational software and tools available includes:<br />
* Amber<br />
* FFTW<br />
* Gromacs<br />
* GSL<br />
* NAMD<br />
* VMD<br />
* Intel Compilers<br />
* GNU Compilers<br />
* Java<br />
* R<br />
* OpenMPI<br />
* MATLAB<br />
<br />
=== Network ===<br />
<br />
Cheaha is connected to the UAB Research Network which provides a dedicated 10Gbs networking backplane between clusters located in the 936 data center and the campus network core. Data transfers rates of almost 8Gbps between these hosts have been demonstrated using Grid FTP, a multi-channel file transfer service that is used to move data between clusters as part of the job management operations. This performance promises very efficient job management and the seamless integration of other clusters as connectivity to the research network is expanded.<br />
<br />
=== Benchmarks ===<br />
<br />
The continuous resource improvement process involves collecting benchmarks of the system. One of the measures of greatest interest to users of the system are benchmarks of specific application codes. The following benchmarks have been performed on the system and will be further expanded as additional benchmarks are performed.<br />
<br />
* [[Cheaha-BGL_Comparison|Cheaha-BGL Comparison]]<br />
<br />
* [[Gromacs_Benchmark|Gromacs]]<br />
<br />
* [[NAMD_Benchmarks|NAMD]]<br />
<br />
=== Cluster Usage Statistics ===<br />
<br />
Cheaha uses Bright Cluster Manager to report cluster performance data. This information provides a helpful overview of the current and historical operating stats for Cheaha. You can access the status monitoring page [https://cheaha-master01.rc.uab.edu/userportal/ here] (accessible only on the UAB network or through VPN).<br />
<br />
== Availability ==<br />
<br />
Cheaha is a general-purpose computer resource made available to the UAB community by UAB IT. As such, it is available for legitimate research and educational needs and is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources. <br />
<br />
Many software packages commonly used across UAB are available via Cheaha.<br />
<br />
To request access to Cheaha, please send a request to [mailto:support@vo.uabgrid.uab.edu send a request] to the cluster support group.<br />
<br />
Cheaha's intended use implies broad access to the community, however, no guarantees are made that specific computational resources will be available to all users. Availability guarantees can only be made for reserved resources.<br />
<br />
=== Secure Shell Access ===<br />
<br />
Please configure you client secure shell software to use the official host name to access Cheaha:<br />
<br />
<pre><br />
cheaha.rc.uab.edu<br />
</pre><br />
<br />
== Scheduling Framework ==<br />
<br />
[http://slurm.schedmd.com/ Slurm] is a queue management system and stands for Simple Linux Utility for Resource Management. Slurm was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. '''[[Slurm]]''' is now the primary job manager on Cheaha, it replaces SUN Grid Engine (SGE) the job manager used earlier.<br />
<br />
Slurm is similar in many ways to GridEngine or most other queue systems. You write a batch script then submit it to the queue manager (scheduler). The queue manager then schedules your job to run on the queue (or '''partition''' in Slurm parlance) that you designate. Below we will provide an outline of how to submit jobs to Slurm, how Slurm decides when to schedule your job, and how to monitor progress.<br />
<br />
== Support ==<br />
<br />
Operational support for Cheaha is provided by the Research Computing group in UAB IT. For questions regarding the operational status of Cheaha please send your request to [mailto:support@vo.uabgrid.uab.edu support@vo.uabgrid.uab.edu]. As a user of Cheaha you will automatically by subscribed to the hpc-announce email list. This subscription is mandatory for all users of Cheaha. It is our way of communicating important information regarding Cheaha to you. The traffic on this list is restricted to official communication and has a very low volume.<br />
<br />
We have limited capacity, however, to support non-operational issue like "How do I write a job script" or "How do I compile a program". For such requests, you may find it more fruitful to send your questions to the hpc-users email list and request help from our peers in the HPC community at UAB. As with all mailing lists, please observe [http://lifehacker.com/5473859/basic-etiquette-for-email-lists-and-forums common mailing etiquette].<br />
<br />
Finally, please remember that as you learned about HPC from others it becomes part of your responsibilty to help others on their quest. You should update this documentation or respond to mailing list requests of others. <br />
<br />
You can subscribe to hpc-users by sending an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=subscribe%20hpc-users sympa@vo.uabgrid.uab.edu with the subject ''subscribe hpc-users''].<br />
<br />
You can unsubribe from hpc-users by sending an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=unsubscribe%20hpc-users sympa@vo.uabgrid.uab.edu with the subject ''unsubscribe hpc-users''].<br />
<br />
You can review archives of the list in the [http://vo.uabgrid.uab.edu/sympa/arc/hpc-users web hpc-archives].<br />
<br />
If you need help using the list service please send an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=help sympa@vo.uabgrid.uab.edu with the subject ''help'']<br />
<br />
If you have questions about the operation of the list itself, please send an email to the owners of the list:<br />
<br />
[mailto:hpc-users-request@vo.uabgrid.uab.edu sympa@vo.uabgrid.uab.edu with a subject relavent to your issue with the list]<br />
<br />
If you are interested in contributing to the enhancement of HPC features at UAB or would like to talk to other cluster administrators, [mailto:sympa@vo.uabgrid.uab.edu?subject=subscribe%20hpc-dev please join the hpc developers community at UAB].</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Cheaha&diff=5796Cheaha2018-07-13T14:46:13Z<p>Tanthony@uab.edu: /* Description of Cheaha for Grants (Detailed) */</p>
<hr />
<div>{{Main_Banner}}<br />
'''Cheaha''' is a campus resource dedicated to enhancing research computing productivity at UAB. [http://cheaha.uabgrid.uab.edu Cheaha] is managed by [http://www.uab.edu/it UAB Information Technology's Research Computing group (UAB ITRC)] and is available to members of the UAB community in need of increased computational capacity. Cheaha supports [http://en.wikipedia.org/wiki/High-performance_computing high-performance computing (HPC)] and [http://en.wikipedia.org/wiki/High-throughput_computing high throughput computing (HTC)] paradigms.<br />
<br />
Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a [[Setting_Up_VNC_Session|cluster desktop]]. The local compute pool provides access to compute hardware based on the [http://en.wikipedia.org/wiki/X86_64 x86-64 64-bit architecture]. The compute resources are organized into a unified Research Computing System. The compute fabric for this system is anchored by the Cheaha cluster, [[ Resources |a commodity cluster with approximately 2400 cores]] connected by low-latency Fourteen Data Rate (FDR) InfiniBand networks. The compute nodes are backed by 6.6PB raw GPFS storage on DDN SFA12KX hardware, an additional 20TB available for home directories on a traditional Hitachi SAN, and other ancillary services. The compute nodes combine to provide over 110TFlops of dedicated computing power. <br />
<br />
Cheaha is composed of resources that span data centers located in the UAB Shared Computing facility UAB 936 Building and the RUST Computer Center. Resource design and development is lead by UAB IT Research Computing in open collaboration with community members. Operational [mailto:support@vo.uabgrid.uab.edu support] is provided by UAB IT's Research Computing group.<br />
<br />
Cheaha is named in honor of [http://en.wikipedia.org/wiki/Cheaha_Mountain Cheaha Mountain], the highest peak in the state of Alabama. Cheaha is a popular destination whose summit offers clear vistas of the surrounding landscape. (Cheaha Mountain photo-streams on [http://www.flickr.com/search/?q=cheaha Flikr] and [http://picasaweb.google.com/lh/view?q=cheaha&psc=G&filter=1# Picasa]).<br />
<br />
== Using ==<br />
<br />
=== Getting Started ===<br />
<br />
For information on getting an account, logging in, and running a job, please see [[Cheaha2_GettingStarted|Getting Started]].<br />
<br />
== History ==<br />
<br />
[[Image:Research-computing-platform.png|right|thumb|450px|Logical Diagram of Cheaha Configuration]]<br />
<br />
=== 2005 ===<br />
<br />
In 2002 UAB was awarded an infrastructure development grant through the NSF EPsCoR program. This led to the 2005 acquisition of a 64 node compute cluster with two AMD Opteron 242 1.6Ghz CPUs per node (128 total cores). This cluster was named Cheaha. Cheaha expanded the compute capacity available at UAB and was the first general-access resource for the community. It lead to expanded roles for UAB IT in research computing support through the development of the UAB Shared HPC Facility in BEC and provided further engagement in Globus-based grid computing resource development on campus via UABgrid and regionally via [http://www.suragrid.org SURAgrid].<br />
<br />
=== 2008 ===<br />
<br />
In 2008, money was allocated by UAB IT for hardware upgrades which lead to the acquisition of an additional 192 cores based on a Dell clustering solution with Intel Quad-Core E5450 3.0Ghz CPU in August of 2008. This uprade migrated Cheaha's core infrastructure to the Dell blade clustering solution. It provided a 3 fold increase in processor density over the original hardware and enables more computing power to be located in the same physical space with room for expansion, an important consideration in light of the continued growth in processing demand. This hardware represented a major technology upgrade that included space for additional expansion to address over-all capacity demand and enable resource reservation. <br />
<br />
The 2008 upgrade began a continuous resource improvement plan that includes a phased development approach for Cheaha with on-going increases in capacity and feature enhancements being brought into production via an [http://projects.uabgrid.uab.edu/cheaha open community process].<br />
<br />
Software improvements rolled into the 2008 upgrade included grid computing services to access distributed compute resources and orchestrate jobs using the [http://www.gridway.org GridWay] meta-scheduler. An initial 10Gigabit Ethernet link establishing the UABgrid Research Network was designed to supports high speed data transfers between clusters connected to this network.<br />
<br />
=== 2009 ===<br />
<br />
In 2009, annual investment funds were directed toward establishing a fully connected dual data rate Infiniband network between the compute nodes added in 2008 and laying the foundation for a research storage system with a 60TB DDN storage system accessed via the Lustre distributed file system. The Infiniband and storage fabrics were designed to support significant increases in research data sets and their associate analytical demand.<br />
<br />
=== 2010 ===<br />
<br />
In 2010, UAB was awarded an NIH Small Instrumentation Grant (SIG) to further increase analytical and storage capacity. The grant funds were combined with the annual investment funds adding 576 cores (48 nodes) based on the Intel Westmere 2.66 GHz CPU, a quad data rate Infiniband fabric with 32 uplinks, an additional 120 TB of storage for the DDN fabric, and additional hardware to improve reliability. Additional improvements to the research compute platform involved extending the UAB Research Network to link the BEC and RUST data centers, adding 20TB of user and ancillary services storage<br />
<br />
=== 2012 ===<br />
<br />
In 2012, UAB IT Research Computing invested in the foundation hardware to expand long term storage and virtual machine capabilities with aqcuisition of 12 Dell 720xd system, each containing 16 cores, 96GB RAM, and 36TB of storage, creating a 192 core and 432TB virtual compute and storage fabric.<br />
<br />
Additionaly hardware investment by the School of Public Health's Section on Statistical Genetics added three 384GB large memory nodes and an additional 48 cores to the QDR Infiniband fabric.<br />
<br />
=== 2013 ===<br />
<br />
In 2013, UAB IT Research Computing acquired an [http://blogs.uabgrid.uab.edu/jpr/2013/03/were-going-with-openstack/ OpenStack cloud and Ceph storage software fabric] through a partnership between Dell and Inktank in order to [http://dev.uabgrid.uab.edu extend cloud computing solutions] to the researchers at UAB and enhance the interfacing capabilities for HPC.<br />
<br />
=== 2015 === <br />
<br />
UAB IT received $500,000 from the university’s Mission Support Fund for a compute cluster seed expansion of 48 teraflops. This added 936 cores across 40 nodes with 2x12 core 2.5 GHz Intel Xeon E5-2680 v3 compute nodes and FDR InfiniBand interconnect.<br />
<br />
UAB received a $500,000 grant from the Alabama Innovation Fund for a three petabyte research storage array. This funding with additional matching from UAB provided a multi-petabyte [https://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] parallel file system to the cluster which went live in 2016.<br />
<br />
=== 2016 ===<br />
<br />
In 2016 UAB IT Research computing received additional funding from Deans of CAS, Engineering, and Public Heath to grow the compute capacity provided by the prior year's seed funding. This added an additional compute nodes providing researchers at UAB with a 96 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. More information can be found at [[Resources]]. <br />
<br />
In addition to the compute, the GPFS six petabyte file system came online. This file system, provided each user five terabyte of personal space, additional space for shared projects and a greatly expanded scratch storage all in a single file system.<br />
<br />
The 2015 and 2016 investments combined to provide a completely new core for the Cheaha cluster, allowing the retirement of earlier compute generations.<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. If you are using Cheaha for grant funded research please send information about your grant (funding source and grant number), a statement of intent for the research project and a list of the applications you are using to UAB IT Research Computing. If you are using Cheaha for exploratory research, please send a similar note on your research interest. Finally, any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. Please note, your acknowledgment may also need to include an addition statement acknowledging grant-funded hardware. We also ask that you send any references to publications based on your use of Cheaha compute resources.<br />
<br />
=== Description of Cheaha for Grants (short) ===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2800 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
'''Cheaha HPC system'''<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (UAB ITRC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. UAB ITRC in open collaboration with community members is leading the design and development of these resources. UAB IT’s Infrastructure Services group provides operational support and maintenance of these resources.<br />
<br />
'''Compute Resources:''' Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a cluster desktop. The local compute pool provides access to two generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The newest generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs, and EDR InfiniBand interconnect. The compute nodes combine to provide over 468 TLOP/s of dedicated computing power.<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (OSG).<br />
<br />
'''Storage Resources:''' The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware connected via the Infiniband fabric. An expansion of the GPFS fabric is currently ongoing to double the capacity and should be online by Fall 2018. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
'''Network Resources''': The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverage the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone ensures predictable latencies. In addition, the network also consist of a secure science DMZ with DTN's and perfsonar nodes connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
'''Personnel:''' UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes a HPC Architect-Manager, 4 Software developers, 2 Scientists, a System Administrator and a Project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
This work was supported in part by the National Science Foundation under Grants Nos. OAC-1541310, the University of Alabama at Birmingham, and the Alabama Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Alabama at Birmingham.<br />
<br />
== System Profile ==<br />
<br />
=== Performance ===<br />
{{CheahaTflops}}<br />
<br />
=== Hardware ===<br />
<br />
The Cheaha Compute Platform includes three generations of commodity compute hardware, totaling 868 compute cores, 2.8TB of RAM, and over 200TB of storage.<br />
<br />
The hardware is grouped into generations designated gen1, gen2, and gen3 (oldest to newest). The following descriptions highlight the hardware profile for each generation. <br />
<br />
* Generation 1 (gen1) -- 64 2-CPU AMD 1.6 GHz compute nodes with Gigabit interconnect. This is the original hardware collection purchased with NSF EPSCoR funds in 2005, approx $150K. These nodes are sometimes called the "Verari" nodes. These nodes are tagged as "verari-compute-#-#" in the ROCKS naming convention.<br />
* Generation 2 (gen2) -- 24 2x4 core (196 cores total) Intel 3.0 GHz Intel compute nodes with dual data rate Infiniband interconnect and the initial high-perf storage implementation using 60TB DDN. This is the hardware collection purchased exclusively with the annual VPIT funds allocation, approx $150K/yr for the 2008 and 2009 fiscal years. These nodes are sometimes confusingly called "cheaha2" or "cheaha" nodes. These nodes are tagged as "cheaha-compute-#-#" in the ROCKS naming convention. <br />
* Generation 3 (gen3) -- 48 2x6 core (576 cores total) 2.66 GHz Intel compute nodes with quad data rate Infiniband, ScaleMP, and the high-perf storage build-out for capacity and redundancy with 120TB DDN. This is the hardware collection purchased with a combination of the NIH SIG funds and some of the 2010 annual VPIT investment. These nodes were given the code name "sipsey" and tagged as such in the node naming for the queue system. These nodes are tagged as "sipsey-compute-#-#" in the ROCKS naming convention. 16 of the gen3 nodes (sipsey-compute-0-1 thru sipsey-compute-0-16) were upgraded in 2014 from 48GB to 96GB of memory per node. <br />
* Generation 4 (gen4) -- 3 16 core (48 cores total) compute nodes. This hardware collection purchase by [http://www.soph.uab.edu/ssg/people/tiwari Dr. Hemant Tiwari of SSG]. These nodes were given the code name "ssg" and tagged as such in the node naming for the queue system. These nodes are tagged as "ssg-compute-0-#" in the ROCKS naming convention. <br />
* Generation 6 (gen6) -- <br />
** 44 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards (4 nodes with NVIDIA K80 GPUs and 4 nodes with Intel Xeon Phi 7120P accelerators)<br />
** 38 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 256GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 14 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 384GB DDR4 RAM, FDR InfiniBand and 10GigE network card<br />
** FDR InfiniBand Switch<br />
** 10Gigabit Ethernet Switch<br />
** Management node and gigabit switch for cluster management<br />
** Bright Advanced Cluster Management software licenses <br />
<br />
Summarized, Cheaha's compute pool includes:<br />
* gen4 is 48 cores of [http://ark.intel.com/products/64583/Intel-Xeon-Processor-E5-2680-20M-Cache-2_70-GHz-8_00-GTs-Intel-QPI 2.70GHz eight-core Intel Xeon E5-2680 processors] with 24G of RAM per core or 384GB total<br />
* gen3 is 192 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 8Gb RAM per core or 96GB total<br />
* gen3 is 384 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 4Gb RAM per core or 48GB total<br />
* gen2 is 192 cores of [http://ark.intel.com/products/33083/Intel-Xeon-Processor-E5450-12M-Cache-3_00-GHz-1333-MHz-FSB 3.0GHz quad-core Intel Xeon E5450 processors] with 2Gb RAM per core<br />
* gen1 is 100 cores of 1.6GhZ AMD Opteron 242 processors with 1Gb RAM per core <br />
<br />
{|border="1" cellpadding="2" cellspacing="0"<br />
|+ Physical Nodes<br />
|- bgcolor=grey<br />
!gen!!queue!!#nodes!!cores/node!!RAM/node<br />
|-<br />
|gen6|| default || 44 || 24 || 128G<br />
|-<br />
|gen6|| default || 38 || 24 || 256G<br />
|-<br />
|gen6|| default || 14 || 24 || 384G<br />
|-<br />
|gen5||Ceph/OpenStack|| 12 || 20 || 96G<br />
|-<br />
|gen4||ssg||3||16||385G<br />
|-<br />
|gen3||sipsey||16||12||96G<br />
|-<br />
|gen3||sipsey||32||12||48G<br />
|-<br />
|gen2||cheaha||24||8||16G<br />
|}<br />
<br />
=== Software ===<br />
<br />
Details of the software available on Cheaha can be found on the [https://docs.uabgrid.uab.edu/wiki/Cheaha_Software Installed software page], an overview follows.<br />
<br />
Cheaha uses [http://modules.sourceforge.net/ Environment Modules] to support account configuration. Please follow these [http://me.eng.uab.edu/wiki/index.php?title=Cheaha#Environment_Modules specific steps for using environment modules].<br />
<br />
Cheaha's software stack is built with the [http://www.brightcomputing.com Bright Cluster Manager]. Cheaha's operating system is CentOS with the following major cluster components:<br />
* BrightCM 7.2<br />
* CentOS 7.2 x86_64<br />
* [[Slurm]] 15.08<br />
<br />
A brief summary of the some of the available computational software and tools available includes:<br />
* Amber<br />
* FFTW<br />
* Gromacs<br />
* GSL<br />
* NAMD<br />
* VMD<br />
* Intel Compilers<br />
* GNU Compilers<br />
* Java<br />
* R<br />
* OpenMPI<br />
* MATLAB<br />
<br />
=== Network ===<br />
<br />
Cheaha is connected to the UAB Research Network which provides a dedicated 10Gbs networking backplane between clusters located in the 936 data center and the campus network core. Data transfers rates of almost 8Gbps between these hosts have been demonstrated using Grid FTP, a multi-channel file transfer service that is used to move data between clusters as part of the job management operations. This performance promises very efficient job management and the seamless integration of other clusters as connectivity to the research network is expanded.<br />
<br />
=== Benchmarks ===<br />
<br />
The continuous resource improvement process involves collecting benchmarks of the system. One of the measures of greatest interest to users of the system are benchmarks of specific application codes. The following benchmarks have been performed on the system and will be further expanded as additional benchmarks are performed.<br />
<br />
* [[Cheaha-BGL_Comparison|Cheaha-BGL Comparison]]<br />
<br />
* [[Gromacs_Benchmark|Gromacs]]<br />
<br />
* [[NAMD_Benchmarks|NAMD]]<br />
<br />
=== Cluster Usage Statistics ===<br />
<br />
Cheaha uses Bright Cluster Manager to report cluster performance data. This information provides a helpful overview of the current and historical operating stats for Cheaha. You can access the status monitoring page [https://cheaha-master01.rc.uab.edu/userportal/ here] (accessible only on the UAB network or through VPN).<br />
<br />
== Availability ==<br />
<br />
Cheaha is a general-purpose computer resource made available to the UAB community by UAB IT. As such, it is available for legitimate research and educational needs and is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources. <br />
<br />
Many software packages commonly used across UAB are available via Cheaha.<br />
<br />
To request access to Cheaha, please send a request to [mailto:support@vo.uabgrid.uab.edu send a request] to the cluster support group.<br />
<br />
Cheaha's intended use implies broad access to the community, however, no guarantees are made that specific computational resources will be available to all users. Availability guarantees can only be made for reserved resources.<br />
<br />
=== Secure Shell Access ===<br />
<br />
Please configure you client secure shell software to use the official host name to access Cheaha:<br />
<br />
<pre><br />
cheaha.rc.uab.edu<br />
</pre><br />
<br />
== Scheduling Framework ==<br />
<br />
[http://slurm.schedmd.com/ Slurm] is a queue management system and stands for Simple Linux Utility for Resource Management. Slurm was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. '''[[Slurm]]''' is now the primary job manager on Cheaha, it replaces SUN Grid Engine (SGE) the job manager used earlier.<br />
<br />
Slurm is similar in many ways to GridEngine or most other queue systems. You write a batch script then submit it to the queue manager (scheduler). The queue manager then schedules your job to run on the queue (or '''partition''' in Slurm parlance) that you designate. Below we will provide an outline of how to submit jobs to Slurm, how Slurm decides when to schedule your job, and how to monitor progress.<br />
<br />
== Support ==<br />
<br />
Operational support for Cheaha is provided by the Research Computing group in UAB IT. For questions regarding the operational status of Cheaha please send your request to [mailto:support@vo.uabgrid.uab.edu support@vo.uabgrid.uab.edu]. As a user of Cheaha you will automatically by subscribed to the hpc-announce email list. This subscription is mandatory for all users of Cheaha. It is our way of communicating important information regarding Cheaha to you. The traffic on this list is restricted to official communication and has a very low volume.<br />
<br />
We have limited capacity, however, to support non-operational issue like "How do I write a job script" or "How do I compile a program". For such requests, you may find it more fruitful to send your questions to the hpc-users email list and request help from our peers in the HPC community at UAB. As with all mailing lists, please observe [http://lifehacker.com/5473859/basic-etiquette-for-email-lists-and-forums common mailing etiquette].<br />
<br />
Finally, please remember that as you learned about HPC from others it becomes part of your responsibilty to help others on their quest. You should update this documentation or respond to mailing list requests of others. <br />
<br />
You can subscribe to hpc-users by sending an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=subscribe%20hpc-users sympa@vo.uabgrid.uab.edu with the subject ''subscribe hpc-users''].<br />
<br />
You can unsubribe from hpc-users by sending an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=unsubscribe%20hpc-users sympa@vo.uabgrid.uab.edu with the subject ''unsubscribe hpc-users''].<br />
<br />
You can review archives of the list in the [http://vo.uabgrid.uab.edu/sympa/arc/hpc-users web hpc-archives].<br />
<br />
If you need help using the list service please send an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=help sympa@vo.uabgrid.uab.edu with the subject ''help'']<br />
<br />
If you have questions about the operation of the list itself, please send an email to the owners of the list:<br />
<br />
[mailto:hpc-users-request@vo.uabgrid.uab.edu sympa@vo.uabgrid.uab.edu with a subject relavent to your issue with the list]<br />
<br />
If you are interested in contributing to the enhancement of HPC features at UAB or would like to talk to other cluster administrators, [mailto:sympa@vo.uabgrid.uab.edu?subject=subscribe%20hpc-dev please join the hpc developers community at UAB].</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Welcome&diff=5795Welcome2018-07-13T14:45:35Z<p>Tanthony@uab.edu: /* Description of Cheaha for Grants (Detailed) */</p>
<hr />
<div>{{Main_Banner}}<br />
Welcome to the '''Research Computing System'''<br />
<br />
The Research Computing System (RCS) provides a framework for sharing data, accessing compute power, and collaborating with peers on campus and around the globe. Our goal is to construct a dynamic "network of services" that you can use to organize your data, study it, and share outcomes.<br />
<br />
''''docs'''' (the service you are looking at while reading this text) is one of a set of core services, or libraries, available for you to organize information you gather. Docs is a wiki, an online editor to collaboratively write and share documentation. ([http://en.wikipedia.org/wiki/Wiki Wiki is a Hawaiian term] meaning fast.) You can learn more about '''docs''' on the page [[UnderstandingDocs]]. The docs wiki is filled with pages that document the many different services and applications available on the Research Computing System. If you see information that looks out of date please don't hesitate to [mailto:support@vo.uabgrid.uab.edu ask about it] or fix it.<br />
<br />
The Research Computing System is designed to provide services to researchers in three core areas:<br />
<br />
* '''Data Analysis''' - using the High Performance Computing (HPC) fabric we call [[Cheaha]] for analyzing data and running simulations. Many [[Cheaha_Software|applications are already available]] or you can install your own <br />
* '''Data Sharing''' - supporting the trusted exchange of information using virtual data containers to spark new ideas<br />
* '''Application Development''' - providing virtual machines and web-hosted development tools empowering you to serve others with your research<br />
<br />
== Support and Development ==<br />
<br />
The Research Computing System is developed and supported by UAB IT's Research Computing Group. We are also developing a core set of applications to help you to easily incorporate our services into your research processes and this documentation collection to help you leverage the resources already available. We follow the best practices of the Open Source community and develop the RCS openly. You can follow our progress via the [http://dev.uabgrid.uab.edu our development wiki].<br />
<br />
The Research Computing System is an out growth of the UABgrid pilot, launched in September 2007 which has focused on demonstrating the utility of unlimited analysis, storage, and application for research. RCS is being built on the same technology foundations used by major cloud vendors and decades of distributed systems computing research, technology that powered the last ten years of large scale systems serving prominent national and international initiatives like the [http://opensciencegrid.org/ Open Science Grid], [http://xsede.org XSEDE], [http://www.teragrid.org/ TeraGrid], the [http://lcg.web.cern.ch/LCG/ LHC Computing Grid], and [https://cabig.nci.nih.gov caBIG].<br />
<br />
== Outreach ==<br />
<br />
The UAB IT Research Computing Group has collaborated with a number of prominent research projects at UAB to identify use cases and develop the requirements for the RCS. Our collaborators include the Center for Clinical and Translational Science (CCTS), Heflin Genomics Center, the Comprehensive Cancer Center (CCC), the Department of Computer and Information Sciences (CIS), the Department of Mechanical Engineering (ME), Lister Hill Library, the School of Optometry's Center for the Development of Functional Imaging, and Health System Information Services (HSIS). <br />
<br />
As part of the process of building this research computing platform, the UAB IT Research Computing Group has hosted an annual campus symposium on research computing and cyber-infrastructure (CI) developments and accomplishments. Starting as CyberInfrastructure (CI) Days in 2007, the name was changed to [http://docs.uabgrid.uab.edu/wiki/UAB_Research_Computing_Day '''UAB Research Computing Day'''] in 2011 to reflect the broader mission to support research. IT Research Computing also participates in other campus wide symposiums including UAB Research Core Day.<br />
<br />
== Featured Research Applications ==<br />
<br />
The Research Computing Group also helps support the campus MATLAB license with self-service installation documentation and supports using MATLAB on the HPC platform, providing a pathway to expand your computational power and freeing your laptop from serving as a compute platform.<br />
<br />
{{abox<br />
| UAB MATLAB Information |<br />
In January 2011, UAB acquired a site license from Mathworks for MATLAB, SimuLink and 42 Toolboxes. <br />
* Learn more about [[MATLAB|MATLAB and how you can use it at UAB]]<br />
* Learn more about the [[UAB TAH license|UAB Mathworks Site license]] and review [[Matlab site license FAQ|frequently asked questions about the license]]<br />
}}<br />
<br />
The UAB IT Research Computing group, the CCTS BMI, and [http://www.uab.edu/hcgs/bioinformatics Heflin Center for Genomic Science] have teamed up to help improve genomic research at UAB. Researchers can work with the scientists and research experts to produce a research pipeline from sequencing, to analysis, to publication.<br />
<br />
{{abox<br />
|'''Galaxy'''|<br />
A web front end to run analyses on the cluster fabric. Currently focused on NGS (Next Generation Sequencing; biology) analysis support. <br />
* [[Galaxy|Galaxy Project Home]]<br />
* [http://projects.uabgrid.uab.edu/galaxy Galaxy Development Wiki]<br />
}}<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. Any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. We also request that you send us a list of publications based on your use of Cheaha resources.<br />
<br />
=== Description of Cheaha for Grants (short)===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2900 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
'''Cheaha HPC system'''<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (UAB ITRC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. UAB ITRC in open collaboration with community members is leading the design and development of these resources. UAB IT’s Infrastructure Services group provides operational support and maintenance of these resources.<br />
<br />
'''Compute Resources:''' Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a cluster desktop. The local compute pool provides access to two generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The newest generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs, and EDR InfiniBand interconnect. The compute nodes combine to provide over 468 TLOP/s of dedicated computing power.<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (OSG).<br />
<br />
'''Storage Resources:''' The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware connected via the Infiniband fabric. An expansion of the GPFS fabric is currently ongoing to double the capacity and should be online by Fall 2018. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
'''Network Resources''': The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverage the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone ensures predictable latencies. In addition, the network also consist of a secure science DMZ with DTN's and perfsonar nodes connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
'''Personnel:''' UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes a HPC Architect-Manager, 4 Software developers, 2 Scientists, a System Administrator and a Project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
This work was supported in part by the National Science Foundation under Grants Nos. OAC-1541310, the University of Alabama at Birmingham, and the Alabama Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Alabama at Birmingham.</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Welcome&diff=5794Welcome2018-07-13T14:44:57Z<p>Tanthony@uab.edu: /* Description of Cheaha for Grants (short) */</p>
<hr />
<div>{{Main_Banner}}<br />
Welcome to the '''Research Computing System'''<br />
<br />
The Research Computing System (RCS) provides a framework for sharing data, accessing compute power, and collaborating with peers on campus and around the globe. Our goal is to construct a dynamic "network of services" that you can use to organize your data, study it, and share outcomes.<br />
<br />
''''docs'''' (the service you are looking at while reading this text) is one of a set of core services, or libraries, available for you to organize information you gather. Docs is a wiki, an online editor to collaboratively write and share documentation. ([http://en.wikipedia.org/wiki/Wiki Wiki is a Hawaiian term] meaning fast.) You can learn more about '''docs''' on the page [[UnderstandingDocs]]. The docs wiki is filled with pages that document the many different services and applications available on the Research Computing System. If you see information that looks out of date please don't hesitate to [mailto:support@vo.uabgrid.uab.edu ask about it] or fix it.<br />
<br />
The Research Computing System is designed to provide services to researchers in three core areas:<br />
<br />
* '''Data Analysis''' - using the High Performance Computing (HPC) fabric we call [[Cheaha]] for analyzing data and running simulations. Many [[Cheaha_Software|applications are already available]] or you can install your own <br />
* '''Data Sharing''' - supporting the trusted exchange of information using virtual data containers to spark new ideas<br />
* '''Application Development''' - providing virtual machines and web-hosted development tools empowering you to serve others with your research<br />
<br />
== Support and Development ==<br />
<br />
The Research Computing System is developed and supported by UAB IT's Research Computing Group. We are also developing a core set of applications to help you to easily incorporate our services into your research processes and this documentation collection to help you leverage the resources already available. We follow the best practices of the Open Source community and develop the RCS openly. You can follow our progress via the [http://dev.uabgrid.uab.edu our development wiki].<br />
<br />
The Research Computing System is an out growth of the UABgrid pilot, launched in September 2007 which has focused on demonstrating the utility of unlimited analysis, storage, and application for research. RCS is being built on the same technology foundations used by major cloud vendors and decades of distributed systems computing research, technology that powered the last ten years of large scale systems serving prominent national and international initiatives like the [http://opensciencegrid.org/ Open Science Grid], [http://xsede.org XSEDE], [http://www.teragrid.org/ TeraGrid], the [http://lcg.web.cern.ch/LCG/ LHC Computing Grid], and [https://cabig.nci.nih.gov caBIG].<br />
<br />
== Outreach ==<br />
<br />
The UAB IT Research Computing Group has collaborated with a number of prominent research projects at UAB to identify use cases and develop the requirements for the RCS. Our collaborators include the Center for Clinical and Translational Science (CCTS), Heflin Genomics Center, the Comprehensive Cancer Center (CCC), the Department of Computer and Information Sciences (CIS), the Department of Mechanical Engineering (ME), Lister Hill Library, the School of Optometry's Center for the Development of Functional Imaging, and Health System Information Services (HSIS). <br />
<br />
As part of the process of building this research computing platform, the UAB IT Research Computing Group has hosted an annual campus symposium on research computing and cyber-infrastructure (CI) developments and accomplishments. Starting as CyberInfrastructure (CI) Days in 2007, the name was changed to [http://docs.uabgrid.uab.edu/wiki/UAB_Research_Computing_Day '''UAB Research Computing Day'''] in 2011 to reflect the broader mission to support research. IT Research Computing also participates in other campus wide symposiums including UAB Research Core Day.<br />
<br />
== Featured Research Applications ==<br />
<br />
The Research Computing Group also helps support the campus MATLAB license with self-service installation documentation and supports using MATLAB on the HPC platform, providing a pathway to expand your computational power and freeing your laptop from serving as a compute platform.<br />
<br />
{{abox<br />
| UAB MATLAB Information |<br />
In January 2011, UAB acquired a site license from Mathworks for MATLAB, SimuLink and 42 Toolboxes. <br />
* Learn more about [[MATLAB|MATLAB and how you can use it at UAB]]<br />
* Learn more about the [[UAB TAH license|UAB Mathworks Site license]] and review [[Matlab site license FAQ|frequently asked questions about the license]]<br />
}}<br />
<br />
The UAB IT Research Computing group, the CCTS BMI, and [http://www.uab.edu/hcgs/bioinformatics Heflin Center for Genomic Science] have teamed up to help improve genomic research at UAB. Researchers can work with the scientists and research experts to produce a research pipeline from sequencing, to analysis, to publication.<br />
<br />
{{abox<br />
|'''Galaxy'''|<br />
A web front end to run analyses on the cluster fabric. Currently focused on NGS (Next Generation Sequencing; biology) analysis support. <br />
* [[Galaxy|Galaxy Project Home]]<br />
* [http://projects.uabgrid.uab.edu/galaxy Galaxy Development Wiki]<br />
}}<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. Any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. We also request that you send us a list of publications based on your use of Cheaha resources.<br />
<br />
=== Description of Cheaha for Grants (short)===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2900 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
'''Cheaha HPC system'''<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (UAB ITRC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. UAB ITRC in open collaboration with community members is leading the design and development of these resources. UAB IT’s Infrastructure Services group provides operational support and maintenance of these resources.<br />
<br />
'''Compute Resources:''' Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a cluster desktop. The local compute pool provides access to two generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The newest generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs, and EDR InfiniBand interconnect. The compute nodes combine to provide over 468 TLOP/s of dedicated computing power.<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (SG).<br />
<br />
'''Storage Resources:''' The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware connected via the Infiniband fabric. An expansion of the GPFS fabric is currently ongoing to double the capacity and should be online by Fall 2018. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
'''Network Resources''': The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverage the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone ensures predictable latencies. In addition, the network also consist of a secure science DMZ with DTN's and perfsonar nodes connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
'''Personnel:''' UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes a HPC Architect-Manager, 4 Software developers, 2 Scientists, a System Administrator and a Project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
This work was supported in part by the National Science Foundation under Grants Nos. OAC-1541310, the University of Alabama at Birmingham, and the Alabama Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Alabama at Birmingham.</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Welcome&diff=5793Welcome2018-07-13T14:41:48Z<p>Tanthony@uab.edu: /* Description of Cheaha for Grants */</p>
<hr />
<div>{{Main_Banner}}<br />
Welcome to the '''Research Computing System'''<br />
<br />
The Research Computing System (RCS) provides a framework for sharing data, accessing compute power, and collaborating with peers on campus and around the globe. Our goal is to construct a dynamic "network of services" that you can use to organize your data, study it, and share outcomes.<br />
<br />
''''docs'''' (the service you are looking at while reading this text) is one of a set of core services, or libraries, available for you to organize information you gather. Docs is a wiki, an online editor to collaboratively write and share documentation. ([http://en.wikipedia.org/wiki/Wiki Wiki is a Hawaiian term] meaning fast.) You can learn more about '''docs''' on the page [[UnderstandingDocs]]. The docs wiki is filled with pages that document the many different services and applications available on the Research Computing System. If you see information that looks out of date please don't hesitate to [mailto:support@vo.uabgrid.uab.edu ask about it] or fix it.<br />
<br />
The Research Computing System is designed to provide services to researchers in three core areas:<br />
<br />
* '''Data Analysis''' - using the High Performance Computing (HPC) fabric we call [[Cheaha]] for analyzing data and running simulations. Many [[Cheaha_Software|applications are already available]] or you can install your own <br />
* '''Data Sharing''' - supporting the trusted exchange of information using virtual data containers to spark new ideas<br />
* '''Application Development''' - providing virtual machines and web-hosted development tools empowering you to serve others with your research<br />
<br />
== Support and Development ==<br />
<br />
The Research Computing System is developed and supported by UAB IT's Research Computing Group. We are also developing a core set of applications to help you to easily incorporate our services into your research processes and this documentation collection to help you leverage the resources already available. We follow the best practices of the Open Source community and develop the RCS openly. You can follow our progress via the [http://dev.uabgrid.uab.edu our development wiki].<br />
<br />
The Research Computing System is an out growth of the UABgrid pilot, launched in September 2007 which has focused on demonstrating the utility of unlimited analysis, storage, and application for research. RCS is being built on the same technology foundations used by major cloud vendors and decades of distributed systems computing research, technology that powered the last ten years of large scale systems serving prominent national and international initiatives like the [http://opensciencegrid.org/ Open Science Grid], [http://xsede.org XSEDE], [http://www.teragrid.org/ TeraGrid], the [http://lcg.web.cern.ch/LCG/ LHC Computing Grid], and [https://cabig.nci.nih.gov caBIG].<br />
<br />
== Outreach ==<br />
<br />
The UAB IT Research Computing Group has collaborated with a number of prominent research projects at UAB to identify use cases and develop the requirements for the RCS. Our collaborators include the Center for Clinical and Translational Science (CCTS), Heflin Genomics Center, the Comprehensive Cancer Center (CCC), the Department of Computer and Information Sciences (CIS), the Department of Mechanical Engineering (ME), Lister Hill Library, the School of Optometry's Center for the Development of Functional Imaging, and Health System Information Services (HSIS). <br />
<br />
As part of the process of building this research computing platform, the UAB IT Research Computing Group has hosted an annual campus symposium on research computing and cyber-infrastructure (CI) developments and accomplishments. Starting as CyberInfrastructure (CI) Days in 2007, the name was changed to [http://docs.uabgrid.uab.edu/wiki/UAB_Research_Computing_Day '''UAB Research Computing Day'''] in 2011 to reflect the broader mission to support research. IT Research Computing also participates in other campus wide symposiums including UAB Research Core Day.<br />
<br />
== Featured Research Applications ==<br />
<br />
The Research Computing Group also helps support the campus MATLAB license with self-service installation documentation and supports using MATLAB on the HPC platform, providing a pathway to expand your computational power and freeing your laptop from serving as a compute platform.<br />
<br />
{{abox<br />
| UAB MATLAB Information |<br />
In January 2011, UAB acquired a site license from Mathworks for MATLAB, SimuLink and 42 Toolboxes. <br />
* Learn more about [[MATLAB|MATLAB and how you can use it at UAB]]<br />
* Learn more about the [[UAB TAH license|UAB Mathworks Site license]] and review [[Matlab site license FAQ|frequently asked questions about the license]]<br />
}}<br />
<br />
The UAB IT Research Computing group, the CCTS BMI, and [http://www.uab.edu/hcgs/bioinformatics Heflin Center for Genomic Science] have teamed up to help improve genomic research at UAB. Researchers can work with the scientists and research experts to produce a research pipeline from sequencing, to analysis, to publication.<br />
<br />
{{abox<br />
|'''Galaxy'''|<br />
A web front end to run analyses on the cluster fabric. Currently focused on NGS (Next Generation Sequencing; biology) analysis support. <br />
* [[Galaxy|Galaxy Project Home]]<br />
* [http://projects.uabgrid.uab.edu/galaxy Galaxy Development Wiki]<br />
}}<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. Any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. We also request that you send us a list of publications based on your use of Cheaha resources.<br />
<br />
=== Description of Cheaha for Grants (short)===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2800 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
'''Cheaha HPC system'''<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (UAB ITRC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. UAB ITRC in open collaboration with community members is leading the design and development of these resources. UAB IT’s Infrastructure Services group provides operational support and maintenance of these resources.<br />
<br />
'''Compute Resources:''' Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a cluster desktop. The local compute pool provides access to two generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The newest generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs, and EDR InfiniBand interconnect. The compute nodes combine to provide over 468 TLOP/s of dedicated computing power.<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (SG).<br />
<br />
'''Storage Resources:''' The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware connected via the Infiniband fabric. An expansion of the GPFS fabric is currently ongoing to double the capacity and should be online by Fall 2018. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
'''Network Resources''': The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverage the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone ensures predictable latencies. In addition, the network also consist of a secure science DMZ with DTN's and perfsonar nodes connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
'''Personnel:''' UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes a HPC Architect-Manager, 4 Software developers, 2 Scientists, a System Administrator and a Project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
This work was supported in part by the National Science Foundation under Grants Nos. OAC-1541310, the University of Alabama at Birmingham, and the Alabama Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Alabama at Birmingham.</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Welcome&diff=5792Welcome2018-07-13T14:41:21Z<p>Tanthony@uab.edu: /* Description of Cheaha for Grants (Detailed) */</p>
<hr />
<div>{{Main_Banner}}<br />
Welcome to the '''Research Computing System'''<br />
<br />
The Research Computing System (RCS) provides a framework for sharing data, accessing compute power, and collaborating with peers on campus and around the globe. Our goal is to construct a dynamic "network of services" that you can use to organize your data, study it, and share outcomes.<br />
<br />
''''docs'''' (the service you are looking at while reading this text) is one of a set of core services, or libraries, available for you to organize information you gather. Docs is a wiki, an online editor to collaboratively write and share documentation. ([http://en.wikipedia.org/wiki/Wiki Wiki is a Hawaiian term] meaning fast.) You can learn more about '''docs''' on the page [[UnderstandingDocs]]. The docs wiki is filled with pages that document the many different services and applications available on the Research Computing System. If you see information that looks out of date please don't hesitate to [mailto:support@vo.uabgrid.uab.edu ask about it] or fix it.<br />
<br />
The Research Computing System is designed to provide services to researchers in three core areas:<br />
<br />
* '''Data Analysis''' - using the High Performance Computing (HPC) fabric we call [[Cheaha]] for analyzing data and running simulations. Many [[Cheaha_Software|applications are already available]] or you can install your own <br />
* '''Data Sharing''' - supporting the trusted exchange of information using virtual data containers to spark new ideas<br />
* '''Application Development''' - providing virtual machines and web-hosted development tools empowering you to serve others with your research<br />
<br />
== Support and Development ==<br />
<br />
The Research Computing System is developed and supported by UAB IT's Research Computing Group. We are also developing a core set of applications to help you to easily incorporate our services into your research processes and this documentation collection to help you leverage the resources already available. We follow the best practices of the Open Source community and develop the RCS openly. You can follow our progress via the [http://dev.uabgrid.uab.edu our development wiki].<br />
<br />
The Research Computing System is an out growth of the UABgrid pilot, launched in September 2007 which has focused on demonstrating the utility of unlimited analysis, storage, and application for research. RCS is being built on the same technology foundations used by major cloud vendors and decades of distributed systems computing research, technology that powered the last ten years of large scale systems serving prominent national and international initiatives like the [http://opensciencegrid.org/ Open Science Grid], [http://xsede.org XSEDE], [http://www.teragrid.org/ TeraGrid], the [http://lcg.web.cern.ch/LCG/ LHC Computing Grid], and [https://cabig.nci.nih.gov caBIG].<br />
<br />
== Outreach ==<br />
<br />
The UAB IT Research Computing Group has collaborated with a number of prominent research projects at UAB to identify use cases and develop the requirements for the RCS. Our collaborators include the Center for Clinical and Translational Science (CCTS), Heflin Genomics Center, the Comprehensive Cancer Center (CCC), the Department of Computer and Information Sciences (CIS), the Department of Mechanical Engineering (ME), Lister Hill Library, the School of Optometry's Center for the Development of Functional Imaging, and Health System Information Services (HSIS). <br />
<br />
As part of the process of building this research computing platform, the UAB IT Research Computing Group has hosted an annual campus symposium on research computing and cyber-infrastructure (CI) developments and accomplishments. Starting as CyberInfrastructure (CI) Days in 2007, the name was changed to [http://docs.uabgrid.uab.edu/wiki/UAB_Research_Computing_Day '''UAB Research Computing Day'''] in 2011 to reflect the broader mission to support research. IT Research Computing also participates in other campus wide symposiums including UAB Research Core Day.<br />
<br />
== Featured Research Applications ==<br />
<br />
The Research Computing Group also helps support the campus MATLAB license with self-service installation documentation and supports using MATLAB on the HPC platform, providing a pathway to expand your computational power and freeing your laptop from serving as a compute platform.<br />
<br />
{{abox<br />
| UAB MATLAB Information |<br />
In January 2011, UAB acquired a site license from Mathworks for MATLAB, SimuLink and 42 Toolboxes. <br />
* Learn more about [[MATLAB|MATLAB and how you can use it at UAB]]<br />
* Learn more about the [[UAB TAH license|UAB Mathworks Site license]] and review [[Matlab site license FAQ|frequently asked questions about the license]]<br />
}}<br />
<br />
The UAB IT Research Computing group, the CCTS BMI, and [http://www.uab.edu/hcgs/bioinformatics Heflin Center for Genomic Science] have teamed up to help improve genomic research at UAB. Researchers can work with the scientists and research experts to produce a research pipeline from sequencing, to analysis, to publication.<br />
<br />
{{abox<br />
|'''Galaxy'''|<br />
A web front end to run analyses on the cluster fabric. Currently focused on NGS (Next Generation Sequencing; biology) analysis support. <br />
* [[Galaxy|Galaxy Project Home]]<br />
* [http://projects.uabgrid.uab.edu/galaxy Galaxy Development Wiki]<br />
}}<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. Any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. We also request that you send us a list of publications based on your use of Cheaha resources.<br />
<br />
=== Description of Cheaha for Grants ===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2800 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
'''Cheaha HPC system'''<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (UAB ITRC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. UAB ITRC in open collaboration with community members is leading the design and development of these resources. UAB IT’s Infrastructure Services group provides operational support and maintenance of these resources.<br />
<br />
'''Compute Resources:''' Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a cluster desktop. The local compute pool provides access to two generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The newest generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs, and EDR InfiniBand interconnect. The compute nodes combine to provide over 468 TLOP/s of dedicated computing power.<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (SG).<br />
<br />
'''Storage Resources:''' The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware connected via the Infiniband fabric. An expansion of the GPFS fabric is currently ongoing to double the capacity and should be online by Fall 2018. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
'''Network Resources''': The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverage the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone ensures predictable latencies. In addition, the network also consist of a secure science DMZ with DTN's and perfsonar nodes connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
'''Personnel:''' UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes a HPC Architect-Manager, 4 Software developers, 2 Scientists, a System Administrator and a Project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
This work was supported in part by the National Science Foundation under Grants Nos. OAC-1541310, the University of Alabama at Birmingham, and the Alabama Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Alabama at Birmingham.</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Welcome&diff=5791Welcome2018-07-13T14:25:54Z<p>Tanthony@uab.edu: /* Description of Cheaha for Grants */ added detailed description</p>
<hr />
<div>{{Main_Banner}}<br />
Welcome to the '''Research Computing System'''<br />
<br />
The Research Computing System (RCS) provides a framework for sharing data, accessing compute power, and collaborating with peers on campus and around the globe. Our goal is to construct a dynamic "network of services" that you can use to organize your data, study it, and share outcomes.<br />
<br />
''''docs'''' (the service you are looking at while reading this text) is one of a set of core services, or libraries, available for you to organize information you gather. Docs is a wiki, an online editor to collaboratively write and share documentation. ([http://en.wikipedia.org/wiki/Wiki Wiki is a Hawaiian term] meaning fast.) You can learn more about '''docs''' on the page [[UnderstandingDocs]]. The docs wiki is filled with pages that document the many different services and applications available on the Research Computing System. If you see information that looks out of date please don't hesitate to [mailto:support@vo.uabgrid.uab.edu ask about it] or fix it.<br />
<br />
The Research Computing System is designed to provide services to researchers in three core areas:<br />
<br />
* '''Data Analysis''' - using the High Performance Computing (HPC) fabric we call [[Cheaha]] for analyzing data and running simulations. Many [[Cheaha_Software|applications are already available]] or you can install your own <br />
* '''Data Sharing''' - supporting the trusted exchange of information using virtual data containers to spark new ideas<br />
* '''Application Development''' - providing virtual machines and web-hosted development tools empowering you to serve others with your research<br />
<br />
== Support and Development ==<br />
<br />
The Research Computing System is developed and supported by UAB IT's Research Computing Group. We are also developing a core set of applications to help you to easily incorporate our services into your research processes and this documentation collection to help you leverage the resources already available. We follow the best practices of the Open Source community and develop the RCS openly. You can follow our progress via the [http://dev.uabgrid.uab.edu our development wiki].<br />
<br />
The Research Computing System is an out growth of the UABgrid pilot, launched in September 2007 which has focused on demonstrating the utility of unlimited analysis, storage, and application for research. RCS is being built on the same technology foundations used by major cloud vendors and decades of distributed systems computing research, technology that powered the last ten years of large scale systems serving prominent national and international initiatives like the [http://opensciencegrid.org/ Open Science Grid], [http://xsede.org XSEDE], [http://www.teragrid.org/ TeraGrid], the [http://lcg.web.cern.ch/LCG/ LHC Computing Grid], and [https://cabig.nci.nih.gov caBIG].<br />
<br />
== Outreach ==<br />
<br />
The UAB IT Research Computing Group has collaborated with a number of prominent research projects at UAB to identify use cases and develop the requirements for the RCS. Our collaborators include the Center for Clinical and Translational Science (CCTS), Heflin Genomics Center, the Comprehensive Cancer Center (CCC), the Department of Computer and Information Sciences (CIS), the Department of Mechanical Engineering (ME), Lister Hill Library, the School of Optometry's Center for the Development of Functional Imaging, and Health System Information Services (HSIS). <br />
<br />
As part of the process of building this research computing platform, the UAB IT Research Computing Group has hosted an annual campus symposium on research computing and cyber-infrastructure (CI) developments and accomplishments. Starting as CyberInfrastructure (CI) Days in 2007, the name was changed to [http://docs.uabgrid.uab.edu/wiki/UAB_Research_Computing_Day '''UAB Research Computing Day'''] in 2011 to reflect the broader mission to support research. IT Research Computing also participates in other campus wide symposiums including UAB Research Core Day.<br />
<br />
== Featured Research Applications ==<br />
<br />
The Research Computing Group also helps support the campus MATLAB license with self-service installation documentation and supports using MATLAB on the HPC platform, providing a pathway to expand your computational power and freeing your laptop from serving as a compute platform.<br />
<br />
{{abox<br />
| UAB MATLAB Information |<br />
In January 2011, UAB acquired a site license from Mathworks for MATLAB, SimuLink and 42 Toolboxes. <br />
* Learn more about [[MATLAB|MATLAB and how you can use it at UAB]]<br />
* Learn more about the [[UAB TAH license|UAB Mathworks Site license]] and review [[Matlab site license FAQ|frequently asked questions about the license]]<br />
}}<br />
<br />
The UAB IT Research Computing group, the CCTS BMI, and [http://www.uab.edu/hcgs/bioinformatics Heflin Center for Genomic Science] have teamed up to help improve genomic research at UAB. Researchers can work with the scientists and research experts to produce a research pipeline from sequencing, to analysis, to publication.<br />
<br />
{{abox<br />
|'''Galaxy'''|<br />
A web front end to run analyses on the cluster fabric. Currently focused on NGS (Next Generation Sequencing; biology) analysis support. <br />
* [[Galaxy|Galaxy Project Home]]<br />
* [http://projects.uabgrid.uab.edu/galaxy Galaxy Development Wiki]<br />
}}<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. Any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. We also request that you send us a list of publications based on your use of Cheaha resources.<br />
<br />
=== Description of Cheaha for Grants ===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2800 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
'''Cheaha HPC system'''<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (UAB ITRC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. UAB ITRC in open collaboration with community members is leading the design and development of these resources. UAB IT’s Infrastructure Services group provides operational support and maintenance of these resources.<br />
<br />
'''Compute Resources:''' Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a cluster desktop. The local compute pool provides access to two generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The newest generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs, and EDR InfiniBand interconnect. The compute nodes combine to provide over 468 TLOP/s of dedicated computing power.<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (SG).<br />
<br />
'''Storage Resources:''' The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware connected via the Infiniband fabric. An expansion of the GPFS fabric is currently ongoing to double the capacity and should be online by Fall 2018. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
'''Network Resources''': The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverage the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone ensures predictable latencies. In addition, the network also consist of a secure science DMZ with DTN's and perfsonar nodes connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
'''Personnel:''' UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes an HPC architect, 4 Software developers, 2 Scientists, a System Administrator and a Project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
This work was supported in part by the National Science Foundation under Grants Nos. OAC-1541310, the University of Alabama at Birmingham, and the Alabama Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Alabama at Birmingham.</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Welcome&diff=5790Welcome2018-07-13T14:25:15Z<p>Tanthony@uab.edu: /* Description of Cheaha for Grants */ matching other pages</p>
<hr />
<div>{{Main_Banner}}<br />
Welcome to the '''Research Computing System'''<br />
<br />
The Research Computing System (RCS) provides a framework for sharing data, accessing compute power, and collaborating with peers on campus and around the globe. Our goal is to construct a dynamic "network of services" that you can use to organize your data, study it, and share outcomes.<br />
<br />
''''docs'''' (the service you are looking at while reading this text) is one of a set of core services, or libraries, available for you to organize information you gather. Docs is a wiki, an online editor to collaboratively write and share documentation. ([http://en.wikipedia.org/wiki/Wiki Wiki is a Hawaiian term] meaning fast.) You can learn more about '''docs''' on the page [[UnderstandingDocs]]. The docs wiki is filled with pages that document the many different services and applications available on the Research Computing System. If you see information that looks out of date please don't hesitate to [mailto:support@vo.uabgrid.uab.edu ask about it] or fix it.<br />
<br />
The Research Computing System is designed to provide services to researchers in three core areas:<br />
<br />
* '''Data Analysis''' - using the High Performance Computing (HPC) fabric we call [[Cheaha]] for analyzing data and running simulations. Many [[Cheaha_Software|applications are already available]] or you can install your own <br />
* '''Data Sharing''' - supporting the trusted exchange of information using virtual data containers to spark new ideas<br />
* '''Application Development''' - providing virtual machines and web-hosted development tools empowering you to serve others with your research<br />
<br />
== Support and Development ==<br />
<br />
The Research Computing System is developed and supported by UAB IT's Research Computing Group. We are also developing a core set of applications to help you to easily incorporate our services into your research processes and this documentation collection to help you leverage the resources already available. We follow the best practices of the Open Source community and develop the RCS openly. You can follow our progress via the [http://dev.uabgrid.uab.edu our development wiki].<br />
<br />
The Research Computing System is an out growth of the UABgrid pilot, launched in September 2007 which has focused on demonstrating the utility of unlimited analysis, storage, and application for research. RCS is being built on the same technology foundations used by major cloud vendors and decades of distributed systems computing research, technology that powered the last ten years of large scale systems serving prominent national and international initiatives like the [http://opensciencegrid.org/ Open Science Grid], [http://xsede.org XSEDE], [http://www.teragrid.org/ TeraGrid], the [http://lcg.web.cern.ch/LCG/ LHC Computing Grid], and [https://cabig.nci.nih.gov caBIG].<br />
<br />
== Outreach ==<br />
<br />
The UAB IT Research Computing Group has collaborated with a number of prominent research projects at UAB to identify use cases and develop the requirements for the RCS. Our collaborators include the Center for Clinical and Translational Science (CCTS), Heflin Genomics Center, the Comprehensive Cancer Center (CCC), the Department of Computer and Information Sciences (CIS), the Department of Mechanical Engineering (ME), Lister Hill Library, the School of Optometry's Center for the Development of Functional Imaging, and Health System Information Services (HSIS). <br />
<br />
As part of the process of building this research computing platform, the UAB IT Research Computing Group has hosted an annual campus symposium on research computing and cyber-infrastructure (CI) developments and accomplishments. Starting as CyberInfrastructure (CI) Days in 2007, the name was changed to [http://docs.uabgrid.uab.edu/wiki/UAB_Research_Computing_Day '''UAB Research Computing Day'''] in 2011 to reflect the broader mission to support research. IT Research Computing also participates in other campus wide symposiums including UAB Research Core Day.<br />
<br />
== Featured Research Applications ==<br />
<br />
The Research Computing Group also helps support the campus MATLAB license with self-service installation documentation and supports using MATLAB on the HPC platform, providing a pathway to expand your computational power and freeing your laptop from serving as a compute platform.<br />
<br />
{{abox<br />
| UAB MATLAB Information |<br />
In January 2011, UAB acquired a site license from Mathworks for MATLAB, SimuLink and 42 Toolboxes. <br />
* Learn more about [[MATLAB|MATLAB and how you can use it at UAB]]<br />
* Learn more about the [[UAB TAH license|UAB Mathworks Site license]] and review [[Matlab site license FAQ|frequently asked questions about the license]]<br />
}}<br />
<br />
The UAB IT Research Computing group, the CCTS BMI, and [http://www.uab.edu/hcgs/bioinformatics Heflin Center for Genomic Science] have teamed up to help improve genomic research at UAB. Researchers can work with the scientists and research experts to produce a research pipeline from sequencing, to analysis, to publication.<br />
<br />
{{abox<br />
|'''Galaxy'''|<br />
A web front end to run analyses on the cluster fabric. Currently focused on NGS (Next Generation Sequencing; biology) analysis support. <br />
* [[Galaxy|Galaxy Project Home]]<br />
* [http://projects.uabgrid.uab.edu/galaxy Galaxy Development Wiki]<br />
}}<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. Any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. We also request that you send us a list of publications based on your use of Cheaha resources.<br />
<br />
=== Description of Cheaha for Grants ===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2800 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
This work was supported in part by the National Science Foundation under Grants Nos. OAC-1541310, the University of Alabama at Birmingham, and the Alabama Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Alabama at Birmingham.</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Cheaha&diff=5789Cheaha2018-07-13T14:15:02Z<p>Tanthony@uab.edu: /* Description of Cheaha for Grants (Detailed) */</p>
<hr />
<div>{{Main_Banner}}<br />
'''Cheaha''' is a campus resource dedicated to enhancing research computing productivity at UAB. [http://cheaha.uabgrid.uab.edu Cheaha] is managed by [http://www.uab.edu/it UAB Information Technology's Research Computing group (UAB ITRC)] and is available to members of the UAB community in need of increased computational capacity. Cheaha supports [http://en.wikipedia.org/wiki/High-performance_computing high-performance computing (HPC)] and [http://en.wikipedia.org/wiki/High-throughput_computing high throughput computing (HTC)] paradigms.<br />
<br />
Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a [[Setting_Up_VNC_Session|cluster desktop]]. The local compute pool provides access to compute hardware based on the [http://en.wikipedia.org/wiki/X86_64 x86-64 64-bit architecture]. The compute resources are organized into a unified Research Computing System. The compute fabric for this system is anchored by the Cheaha cluster, [[ Resources |a commodity cluster with approximately 2400 cores]] connected by low-latency Fourteen Data Rate (FDR) InfiniBand networks. The compute nodes are backed by 6.6PB raw GPFS storage on DDN SFA12KX hardware, an additional 20TB available for home directories on a traditional Hitachi SAN, and other ancillary services. The compute nodes combine to provide over 110TFlops of dedicated computing power. <br />
<br />
Cheaha is composed of resources that span data centers located in the UAB Shared Computing facility UAB 936 Building and the RUST Computer Center. Resource design and development is lead by UAB IT Research Computing in open collaboration with community members. Operational [mailto:support@vo.uabgrid.uab.edu support] is provided by UAB IT's Research Computing group.<br />
<br />
Cheaha is named in honor of [http://en.wikipedia.org/wiki/Cheaha_Mountain Cheaha Mountain], the highest peak in the state of Alabama. Cheaha is a popular destination whose summit offers clear vistas of the surrounding landscape. (Cheaha Mountain photo-streams on [http://www.flickr.com/search/?q=cheaha Flikr] and [http://picasaweb.google.com/lh/view?q=cheaha&psc=G&filter=1# Picasa]).<br />
<br />
== Using ==<br />
<br />
=== Getting Started ===<br />
<br />
For information on getting an account, logging in, and running a job, please see [[Cheaha2_GettingStarted|Getting Started]].<br />
<br />
== History ==<br />
<br />
[[Image:Research-computing-platform.png|right|thumb|450px|Logical Diagram of Cheaha Configuration]]<br />
<br />
=== 2005 ===<br />
<br />
In 2002 UAB was awarded an infrastructure development grant through the NSF EPsCoR program. This led to the 2005 acquisition of a 64 node compute cluster with two AMD Opteron 242 1.6Ghz CPUs per node (128 total cores). This cluster was named Cheaha. Cheaha expanded the compute capacity available at UAB and was the first general-access resource for the community. It lead to expanded roles for UAB IT in research computing support through the development of the UAB Shared HPC Facility in BEC and provided further engagement in Globus-based grid computing resource development on campus via UABgrid and regionally via [http://www.suragrid.org SURAgrid].<br />
<br />
=== 2008 ===<br />
<br />
In 2008, money was allocated by UAB IT for hardware upgrades which lead to the acquisition of an additional 192 cores based on a Dell clustering solution with Intel Quad-Core E5450 3.0Ghz CPU in August of 2008. This uprade migrated Cheaha's core infrastructure to the Dell blade clustering solution. It provided a 3 fold increase in processor density over the original hardware and enables more computing power to be located in the same physical space with room for expansion, an important consideration in light of the continued growth in processing demand. This hardware represented a major technology upgrade that included space for additional expansion to address over-all capacity demand and enable resource reservation. <br />
<br />
The 2008 upgrade began a continuous resource improvement plan that includes a phased development approach for Cheaha with on-going increases in capacity and feature enhancements being brought into production via an [http://projects.uabgrid.uab.edu/cheaha open community process].<br />
<br />
Software improvements rolled into the 2008 upgrade included grid computing services to access distributed compute resources and orchestrate jobs using the [http://www.gridway.org GridWay] meta-scheduler. An initial 10Gigabit Ethernet link establishing the UABgrid Research Network was designed to supports high speed data transfers between clusters connected to this network.<br />
<br />
=== 2009 ===<br />
<br />
In 2009, annual investment funds were directed toward establishing a fully connected dual data rate Infiniband network between the compute nodes added in 2008 and laying the foundation for a research storage system with a 60TB DDN storage system accessed via the Lustre distributed file system. The Infiniband and storage fabrics were designed to support significant increases in research data sets and their associate analytical demand.<br />
<br />
=== 2010 ===<br />
<br />
In 2010, UAB was awarded an NIH Small Instrumentation Grant (SIG) to further increase analytical and storage capacity. The grant funds were combined with the annual investment funds adding 576 cores (48 nodes) based on the Intel Westmere 2.66 GHz CPU, a quad data rate Infiniband fabric with 32 uplinks, an additional 120 TB of storage for the DDN fabric, and additional hardware to improve reliability. Additional improvements to the research compute platform involved extending the UAB Research Network to link the BEC and RUST data centers, adding 20TB of user and ancillary services storage<br />
<br />
=== 2012 ===<br />
<br />
In 2012, UAB IT Research Computing invested in the foundation hardware to expand long term storage and virtual machine capabilities with aqcuisition of 12 Dell 720xd system, each containing 16 cores, 96GB RAM, and 36TB of storage, creating a 192 core and 432TB virtual compute and storage fabric.<br />
<br />
Additionaly hardware investment by the School of Public Health's Section on Statistical Genetics added three 384GB large memory nodes and an additional 48 cores to the QDR Infiniband fabric.<br />
<br />
=== 2013 ===<br />
<br />
In 2013, UAB IT Research Computing acquired an [http://blogs.uabgrid.uab.edu/jpr/2013/03/were-going-with-openstack/ OpenStack cloud and Ceph storage software fabric] through a partnership between Dell and Inktank in order to [http://dev.uabgrid.uab.edu extend cloud computing solutions] to the researchers at UAB and enhance the interfacing capabilities for HPC.<br />
<br />
=== 2015 === <br />
<br />
UAB IT received $500,000 from the university’s Mission Support Fund for a compute cluster seed expansion of 48 teraflops. This added 936 cores across 40 nodes with 2x12 core 2.5 GHz Intel Xeon E5-2680 v3 compute nodes and FDR InfiniBand interconnect.<br />
<br />
UAB received a $500,000 grant from the Alabama Innovation Fund for a three petabyte research storage array. This funding with additional matching from UAB provided a multi-petabyte [https://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] parallel file system to the cluster which went live in 2016.<br />
<br />
=== 2016 ===<br />
<br />
In 2016 UAB IT Research computing received additional funding from Deans of CAS, Engineering, and Public Heath to grow the compute capacity provided by the prior year's seed funding. This added an additional compute nodes providing researchers at UAB with a 96 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. More information can be found at [[Resources]]. <br />
<br />
In addition to the compute, the GPFS six petabyte file system came online. This file system, provided each user five terabyte of personal space, additional space for shared projects and a greatly expanded scratch storage all in a single file system.<br />
<br />
The 2015 and 2016 investments combined to provide a completely new core for the Cheaha cluster, allowing the retirement of earlier compute generations.<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. If you are using Cheaha for grant funded research please send information about your grant (funding source and grant number), a statement of intent for the research project and a list of the applications you are using to UAB IT Research Computing. If you are using Cheaha for exploratory research, please send a similar note on your research interest. Finally, any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. Please note, your acknowledgment may also need to include an addition statement acknowledging grant-funded hardware. We also ask that you send any references to publications based on your use of Cheaha compute resources.<br />
<br />
=== Description of Cheaha for Grants (short) ===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2800 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
'''Cheaha HPC system'''<br />
<br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (UAB ITRC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. UAB ITRC in open collaboration with community members is leading the design and development of these resources. UAB IT’s Infrastructure Services group provides operational support and maintenance of these resources.<br />
<br />
'''Compute Resources:''' Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a cluster desktop. The local compute pool provides access to two generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The newest generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs, and EDR InfiniBand interconnect. The compute nodes combine to provide over 468 TLOP/s of dedicated computing power.<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (SG).<br />
<br />
'''Storage Resources:''' The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware connected via the Infiniband fabric. An expansion of the GPFS fabric is currently ongoing to double the capacity and should be online by Fall 2018. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
'''Network Resources''': The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverage the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone ensures predictable latencies. In addition, the network also consist of a secure science DMZ with DTN's and perfsonar nodes connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
'''Personnel:''' UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes an HPC architect, 4 Software developers, 2 Scientists, a System Administrator and a Project coordinator.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
This work was supported in part by the National Science Foundation under Grants Nos. OAC-1541310, the University of Alabama at Birmingham, and the Alabama Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Alabama at Birmingham.<br />
<br />
== System Profile ==<br />
<br />
=== Performance ===<br />
{{CheahaTflops}}<br />
<br />
=== Hardware ===<br />
<br />
The Cheaha Compute Platform includes three generations of commodity compute hardware, totaling 868 compute cores, 2.8TB of RAM, and over 200TB of storage.<br />
<br />
The hardware is grouped into generations designated gen1, gen2, and gen3 (oldest to newest). The following descriptions highlight the hardware profile for each generation. <br />
<br />
* Generation 1 (gen1) -- 64 2-CPU AMD 1.6 GHz compute nodes with Gigabit interconnect. This is the original hardware collection purchased with NSF EPSCoR funds in 2005, approx $150K. These nodes are sometimes called the "Verari" nodes. These nodes are tagged as "verari-compute-#-#" in the ROCKS naming convention.<br />
* Generation 2 (gen2) -- 24 2x4 core (196 cores total) Intel 3.0 GHz Intel compute nodes with dual data rate Infiniband interconnect and the initial high-perf storage implementation using 60TB DDN. This is the hardware collection purchased exclusively with the annual VPIT funds allocation, approx $150K/yr for the 2008 and 2009 fiscal years. These nodes are sometimes confusingly called "cheaha2" or "cheaha" nodes. These nodes are tagged as "cheaha-compute-#-#" in the ROCKS naming convention. <br />
* Generation 3 (gen3) -- 48 2x6 core (576 cores total) 2.66 GHz Intel compute nodes with quad data rate Infiniband, ScaleMP, and the high-perf storage build-out for capacity and redundancy with 120TB DDN. This is the hardware collection purchased with a combination of the NIH SIG funds and some of the 2010 annual VPIT investment. These nodes were given the code name "sipsey" and tagged as such in the node naming for the queue system. These nodes are tagged as "sipsey-compute-#-#" in the ROCKS naming convention. 16 of the gen3 nodes (sipsey-compute-0-1 thru sipsey-compute-0-16) were upgraded in 2014 from 48GB to 96GB of memory per node. <br />
* Generation 4 (gen4) -- 3 16 core (48 cores total) compute nodes. This hardware collection purchase by [http://www.soph.uab.edu/ssg/people/tiwari Dr. Hemant Tiwari of SSG]. These nodes were given the code name "ssg" and tagged as such in the node naming for the queue system. These nodes are tagged as "ssg-compute-0-#" in the ROCKS naming convention. <br />
* Generation 6 (gen6) -- <br />
** 44 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards (4 nodes with NVIDIA K80 GPUs and 4 nodes with Intel Xeon Phi 7120P accelerators)<br />
** 38 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 256GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 14 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 384GB DDR4 RAM, FDR InfiniBand and 10GigE network card<br />
** FDR InfiniBand Switch<br />
** 10Gigabit Ethernet Switch<br />
** Management node and gigabit switch for cluster management<br />
** Bright Advanced Cluster Management software licenses <br />
<br />
Summarized, Cheaha's compute pool includes:<br />
* gen4 is 48 cores of [http://ark.intel.com/products/64583/Intel-Xeon-Processor-E5-2680-20M-Cache-2_70-GHz-8_00-GTs-Intel-QPI 2.70GHz eight-core Intel Xeon E5-2680 processors] with 24G of RAM per core or 384GB total<br />
* gen3 is 192 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 8Gb RAM per core or 96GB total<br />
* gen3 is 384 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 4Gb RAM per core or 48GB total<br />
* gen2 is 192 cores of [http://ark.intel.com/products/33083/Intel-Xeon-Processor-E5450-12M-Cache-3_00-GHz-1333-MHz-FSB 3.0GHz quad-core Intel Xeon E5450 processors] with 2Gb RAM per core<br />
* gen1 is 100 cores of 1.6GhZ AMD Opteron 242 processors with 1Gb RAM per core <br />
<br />
{|border="1" cellpadding="2" cellspacing="0"<br />
|+ Physical Nodes<br />
|- bgcolor=grey<br />
!gen!!queue!!#nodes!!cores/node!!RAM/node<br />
|-<br />
|gen6|| default || 44 || 24 || 128G<br />
|-<br />
|gen6|| default || 38 || 24 || 256G<br />
|-<br />
|gen6|| default || 14 || 24 || 384G<br />
|-<br />
|gen5||Ceph/OpenStack|| 12 || 20 || 96G<br />
|-<br />
|gen4||ssg||3||16||385G<br />
|-<br />
|gen3||sipsey||16||12||96G<br />
|-<br />
|gen3||sipsey||32||12||48G<br />
|-<br />
|gen2||cheaha||24||8||16G<br />
|}<br />
<br />
=== Software ===<br />
<br />
Details of the software available on Cheaha can be found on the [https://docs.uabgrid.uab.edu/wiki/Cheaha_Software Installed software page], an overview follows.<br />
<br />
Cheaha uses [http://modules.sourceforge.net/ Environment Modules] to support account configuration. Please follow these [http://me.eng.uab.edu/wiki/index.php?title=Cheaha#Environment_Modules specific steps for using environment modules].<br />
<br />
Cheaha's software stack is built with the [http://www.brightcomputing.com Bright Cluster Manager]. Cheaha's operating system is CentOS with the following major cluster components:<br />
* BrightCM 7.2<br />
* CentOS 7.2 x86_64<br />
* [[Slurm]] 15.08<br />
<br />
A brief summary of the some of the available computational software and tools available includes:<br />
* Amber<br />
* FFTW<br />
* Gromacs<br />
* GSL<br />
* NAMD<br />
* VMD<br />
* Intel Compilers<br />
* GNU Compilers<br />
* Java<br />
* R<br />
* OpenMPI<br />
* MATLAB<br />
<br />
=== Network ===<br />
<br />
Cheaha is connected to the UAB Research Network which provides a dedicated 10Gbs networking backplane between clusters located in the 936 data center and the campus network core. Data transfers rates of almost 8Gbps between these hosts have been demonstrated using Grid FTP, a multi-channel file transfer service that is used to move data between clusters as part of the job management operations. This performance promises very efficient job management and the seamless integration of other clusters as connectivity to the research network is expanded.<br />
<br />
=== Benchmarks ===<br />
<br />
The continuous resource improvement process involves collecting benchmarks of the system. One of the measures of greatest interest to users of the system are benchmarks of specific application codes. The following benchmarks have been performed on the system and will be further expanded as additional benchmarks are performed.<br />
<br />
* [[Cheaha-BGL_Comparison|Cheaha-BGL Comparison]]<br />
<br />
* [[Gromacs_Benchmark|Gromacs]]<br />
<br />
* [[NAMD_Benchmarks|NAMD]]<br />
<br />
=== Cluster Usage Statistics ===<br />
<br />
Cheaha uses Bright Cluster Manager to report cluster performance data. This information provides a helpful overview of the current and historical operating stats for Cheaha. You can access the status monitoring page [https://cheaha-master01.rc.uab.edu/userportal/ here] (accessible only on the UAB network or through VPN).<br />
<br />
== Availability ==<br />
<br />
Cheaha is a general-purpose computer resource made available to the UAB community by UAB IT. As such, it is available for legitimate research and educational needs and is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources. <br />
<br />
Many software packages commonly used across UAB are available via Cheaha.<br />
<br />
To request access to Cheaha, please send a request to [mailto:support@vo.uabgrid.uab.edu send a request] to the cluster support group.<br />
<br />
Cheaha's intended use implies broad access to the community, however, no guarantees are made that specific computational resources will be available to all users. Availability guarantees can only be made for reserved resources.<br />
<br />
=== Secure Shell Access ===<br />
<br />
Please configure you client secure shell software to use the official host name to access Cheaha:<br />
<br />
<pre><br />
cheaha.rc.uab.edu<br />
</pre><br />
<br />
== Scheduling Framework ==<br />
<br />
[http://slurm.schedmd.com/ Slurm] is a queue management system and stands for Simple Linux Utility for Resource Management. Slurm was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. '''[[Slurm]]''' is now the primary job manager on Cheaha, it replaces SUN Grid Engine (SGE) the job manager used earlier.<br />
<br />
Slurm is similar in many ways to GridEngine or most other queue systems. You write a batch script then submit it to the queue manager (scheduler). The queue manager then schedules your job to run on the queue (or '''partition''' in Slurm parlance) that you designate. Below we will provide an outline of how to submit jobs to Slurm, how Slurm decides when to schedule your job, and how to monitor progress.<br />
<br />
== Support ==<br />
<br />
Operational support for Cheaha is provided by the Research Computing group in UAB IT. For questions regarding the operational status of Cheaha please send your request to [mailto:support@vo.uabgrid.uab.edu support@vo.uabgrid.uab.edu]. As a user of Cheaha you will automatically by subscribed to the hpc-announce email list. This subscription is mandatory for all users of Cheaha. It is our way of communicating important information regarding Cheaha to you. The traffic on this list is restricted to official communication and has a very low volume.<br />
<br />
We have limited capacity, however, to support non-operational issue like "How do I write a job script" or "How do I compile a program". For such requests, you may find it more fruitful to send your questions to the hpc-users email list and request help from our peers in the HPC community at UAB. As with all mailing lists, please observe [http://lifehacker.com/5473859/basic-etiquette-for-email-lists-and-forums common mailing etiquette].<br />
<br />
Finally, please remember that as you learned about HPC from others it becomes part of your responsibilty to help others on their quest. You should update this documentation or respond to mailing list requests of others. <br />
<br />
You can subscribe to hpc-users by sending an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=subscribe%20hpc-users sympa@vo.uabgrid.uab.edu with the subject ''subscribe hpc-users''].<br />
<br />
You can unsubribe from hpc-users by sending an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=unsubscribe%20hpc-users sympa@vo.uabgrid.uab.edu with the subject ''unsubscribe hpc-users''].<br />
<br />
You can review archives of the list in the [http://vo.uabgrid.uab.edu/sympa/arc/hpc-users web hpc-archives].<br />
<br />
If you need help using the list service please send an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=help sympa@vo.uabgrid.uab.edu with the subject ''help'']<br />
<br />
If you have questions about the operation of the list itself, please send an email to the owners of the list:<br />
<br />
[mailto:hpc-users-request@vo.uabgrid.uab.edu sympa@vo.uabgrid.uab.edu with a subject relavent to your issue with the list]<br />
<br />
If you are interested in contributing to the enhancement of HPC features at UAB or would like to talk to other cluster administrators, [mailto:sympa@vo.uabgrid.uab.edu?subject=subscribe%20hpc-dev please join the hpc developers community at UAB].</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Cheaha&diff=5788Cheaha2018-07-13T14:05:31Z<p>Tanthony@uab.edu: /* Description of Cheaha for Grants */ added the Detailed description</p>
<hr />
<div>{{Main_Banner}}<br />
'''Cheaha''' is a campus resource dedicated to enhancing research computing productivity at UAB. [http://cheaha.uabgrid.uab.edu Cheaha] is managed by [http://www.uab.edu/it UAB Information Technology's Research Computing group (UAB ITRC)] and is available to members of the UAB community in need of increased computational capacity. Cheaha supports [http://en.wikipedia.org/wiki/High-performance_computing high-performance computing (HPC)] and [http://en.wikipedia.org/wiki/High-throughput_computing high throughput computing (HTC)] paradigms.<br />
<br />
Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a [[Setting_Up_VNC_Session|cluster desktop]]. The local compute pool provides access to compute hardware based on the [http://en.wikipedia.org/wiki/X86_64 x86-64 64-bit architecture]. The compute resources are organized into a unified Research Computing System. The compute fabric for this system is anchored by the Cheaha cluster, [[ Resources |a commodity cluster with approximately 2400 cores]] connected by low-latency Fourteen Data Rate (FDR) InfiniBand networks. The compute nodes are backed by 6.6PB raw GPFS storage on DDN SFA12KX hardware, an additional 20TB available for home directories on a traditional Hitachi SAN, and other ancillary services. The compute nodes combine to provide over 110TFlops of dedicated computing power. <br />
<br />
Cheaha is composed of resources that span data centers located in the UAB Shared Computing facility UAB 936 Building and the RUST Computer Center. Resource design and development is lead by UAB IT Research Computing in open collaboration with community members. Operational [mailto:support@vo.uabgrid.uab.edu support] is provided by UAB IT's Research Computing group.<br />
<br />
Cheaha is named in honor of [http://en.wikipedia.org/wiki/Cheaha_Mountain Cheaha Mountain], the highest peak in the state of Alabama. Cheaha is a popular destination whose summit offers clear vistas of the surrounding landscape. (Cheaha Mountain photo-streams on [http://www.flickr.com/search/?q=cheaha Flikr] and [http://picasaweb.google.com/lh/view?q=cheaha&psc=G&filter=1# Picasa]).<br />
<br />
== Using ==<br />
<br />
=== Getting Started ===<br />
<br />
For information on getting an account, logging in, and running a job, please see [[Cheaha2_GettingStarted|Getting Started]].<br />
<br />
== History ==<br />
<br />
[[Image:Research-computing-platform.png|right|thumb|450px|Logical Diagram of Cheaha Configuration]]<br />
<br />
=== 2005 ===<br />
<br />
In 2002 UAB was awarded an infrastructure development grant through the NSF EPsCoR program. This led to the 2005 acquisition of a 64 node compute cluster with two AMD Opteron 242 1.6Ghz CPUs per node (128 total cores). This cluster was named Cheaha. Cheaha expanded the compute capacity available at UAB and was the first general-access resource for the community. It lead to expanded roles for UAB IT in research computing support through the development of the UAB Shared HPC Facility in BEC and provided further engagement in Globus-based grid computing resource development on campus via UABgrid and regionally via [http://www.suragrid.org SURAgrid].<br />
<br />
=== 2008 ===<br />
<br />
In 2008, money was allocated by UAB IT for hardware upgrades which lead to the acquisition of an additional 192 cores based on a Dell clustering solution with Intel Quad-Core E5450 3.0Ghz CPU in August of 2008. This uprade migrated Cheaha's core infrastructure to the Dell blade clustering solution. It provided a 3 fold increase in processor density over the original hardware and enables more computing power to be located in the same physical space with room for expansion, an important consideration in light of the continued growth in processing demand. This hardware represented a major technology upgrade that included space for additional expansion to address over-all capacity demand and enable resource reservation. <br />
<br />
The 2008 upgrade began a continuous resource improvement plan that includes a phased development approach for Cheaha with on-going increases in capacity and feature enhancements being brought into production via an [http://projects.uabgrid.uab.edu/cheaha open community process].<br />
<br />
Software improvements rolled into the 2008 upgrade included grid computing services to access distributed compute resources and orchestrate jobs using the [http://www.gridway.org GridWay] meta-scheduler. An initial 10Gigabit Ethernet link establishing the UABgrid Research Network was designed to supports high speed data transfers between clusters connected to this network.<br />
<br />
=== 2009 ===<br />
<br />
In 2009, annual investment funds were directed toward establishing a fully connected dual data rate Infiniband network between the compute nodes added in 2008 and laying the foundation for a research storage system with a 60TB DDN storage system accessed via the Lustre distributed file system. The Infiniband and storage fabrics were designed to support significant increases in research data sets and their associate analytical demand.<br />
<br />
=== 2010 ===<br />
<br />
In 2010, UAB was awarded an NIH Small Instrumentation Grant (SIG) to further increase analytical and storage capacity. The grant funds were combined with the annual investment funds adding 576 cores (48 nodes) based on the Intel Westmere 2.66 GHz CPU, a quad data rate Infiniband fabric with 32 uplinks, an additional 120 TB of storage for the DDN fabric, and additional hardware to improve reliability. Additional improvements to the research compute platform involved extending the UAB Research Network to link the BEC and RUST data centers, adding 20TB of user and ancillary services storage<br />
<br />
=== 2012 ===<br />
<br />
In 2012, UAB IT Research Computing invested in the foundation hardware to expand long term storage and virtual machine capabilities with aqcuisition of 12 Dell 720xd system, each containing 16 cores, 96GB RAM, and 36TB of storage, creating a 192 core and 432TB virtual compute and storage fabric.<br />
<br />
Additionaly hardware investment by the School of Public Health's Section on Statistical Genetics added three 384GB large memory nodes and an additional 48 cores to the QDR Infiniband fabric.<br />
<br />
=== 2013 ===<br />
<br />
In 2013, UAB IT Research Computing acquired an [http://blogs.uabgrid.uab.edu/jpr/2013/03/were-going-with-openstack/ OpenStack cloud and Ceph storage software fabric] through a partnership between Dell and Inktank in order to [http://dev.uabgrid.uab.edu extend cloud computing solutions] to the researchers at UAB and enhance the interfacing capabilities for HPC.<br />
<br />
=== 2015 === <br />
<br />
UAB IT received $500,000 from the university’s Mission Support Fund for a compute cluster seed expansion of 48 teraflops. This added 936 cores across 40 nodes with 2x12 core 2.5 GHz Intel Xeon E5-2680 v3 compute nodes and FDR InfiniBand interconnect.<br />
<br />
UAB received a $500,000 grant from the Alabama Innovation Fund for a three petabyte research storage array. This funding with additional matching from UAB provided a multi-petabyte [https://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] parallel file system to the cluster which went live in 2016.<br />
<br />
=== 2016 ===<br />
<br />
In 2016 UAB IT Research computing received additional funding from Deans of CAS, Engineering, and Public Heath to grow the compute capacity provided by the prior year's seed funding. This added an additional compute nodes providing researchers at UAB with a 96 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. More information can be found at [[Resources]]. <br />
<br />
In addition to the compute, the GPFS six petabyte file system came online. This file system, provided each user five terabyte of personal space, additional space for shared projects and a greatly expanded scratch storage all in a single file system.<br />
<br />
The 2015 and 2016 investments combined to provide a completely new core for the Cheaha cluster, allowing the retirement of earlier compute generations.<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. If you are using Cheaha for grant funded research please send information about your grant (funding source and grant number), a statement of intent for the research project and a list of the applications you are using to UAB IT Research Computing. If you are using Cheaha for exploratory research, please send a similar note on your research interest. Finally, any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. Please note, your acknowledgment may also need to include an addition statement acknowledging grant-funded hardware. We also ask that you send any references to publications based on your use of Cheaha compute resources.<br />
<br />
=== Description of Cheaha for Grants (short) ===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides over 2800 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
<br />
=== Description of Cheaha for Grants (Detailed) ===<br />
<br />
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. <br />
<br />
'''Cheaha HPC system''' <br />
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (UAB ITRC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. UAB ITRC in open collaboration with community members is leading the design and development of these resources. UAB IT’s Infrastructure Services group provides operational support and maintenance of these resources.<br />
<br />
Compute Resources<br />
Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a cluster desktop. The local compute pool provides access to two generations of compute hardware based on the x86 64-bit architecture. It includes 96 nodes: 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. The newest generation is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs, and EDR InfiniBand interconnect. The compute nodes combine to provide over 468 TLOP/s of dedicated computing power.<br />
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (SG).<br />
<br />
Storage Resources<br />
The compute nodes on Cheaha are backed by high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware connected via the Infiniband fabric. An expansion of the GPFS fabric is currently ongoing to double the capacity and should be online by Fall 2018. An additional 20TB of traditional SAN storage is also available for home directories.<br />
<br />
Network Resources<br />
The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverage the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone ensures predictable latencies. In addition, the network also consist of a secure science DMZ with DTN's and perfsonar nodes connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.<br />
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.<br />
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
This work was supported in part by the National Science Foundation under Grants Nos. OAC-1541310, the University of Alabama at Birmingham, and the Alabama Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Alabama at Birmingham.<br />
<br />
== System Profile ==<br />
<br />
=== Performance ===<br />
{{CheahaTflops}}<br />
<br />
=== Hardware ===<br />
<br />
The Cheaha Compute Platform includes three generations of commodity compute hardware, totaling 868 compute cores, 2.8TB of RAM, and over 200TB of storage.<br />
<br />
The hardware is grouped into generations designated gen1, gen2, and gen3 (oldest to newest). The following descriptions highlight the hardware profile for each generation. <br />
<br />
* Generation 1 (gen1) -- 64 2-CPU AMD 1.6 GHz compute nodes with Gigabit interconnect. This is the original hardware collection purchased with NSF EPSCoR funds in 2005, approx $150K. These nodes are sometimes called the "Verari" nodes. These nodes are tagged as "verari-compute-#-#" in the ROCKS naming convention.<br />
* Generation 2 (gen2) -- 24 2x4 core (196 cores total) Intel 3.0 GHz Intel compute nodes with dual data rate Infiniband interconnect and the initial high-perf storage implementation using 60TB DDN. This is the hardware collection purchased exclusively with the annual VPIT funds allocation, approx $150K/yr for the 2008 and 2009 fiscal years. These nodes are sometimes confusingly called "cheaha2" or "cheaha" nodes. These nodes are tagged as "cheaha-compute-#-#" in the ROCKS naming convention. <br />
* Generation 3 (gen3) -- 48 2x6 core (576 cores total) 2.66 GHz Intel compute nodes with quad data rate Infiniband, ScaleMP, and the high-perf storage build-out for capacity and redundancy with 120TB DDN. This is the hardware collection purchased with a combination of the NIH SIG funds and some of the 2010 annual VPIT investment. These nodes were given the code name "sipsey" and tagged as such in the node naming for the queue system. These nodes are tagged as "sipsey-compute-#-#" in the ROCKS naming convention. 16 of the gen3 nodes (sipsey-compute-0-1 thru sipsey-compute-0-16) were upgraded in 2014 from 48GB to 96GB of memory per node. <br />
* Generation 4 (gen4) -- 3 16 core (48 cores total) compute nodes. This hardware collection purchase by [http://www.soph.uab.edu/ssg/people/tiwari Dr. Hemant Tiwari of SSG]. These nodes were given the code name "ssg" and tagged as such in the node naming for the queue system. These nodes are tagged as "ssg-compute-0-#" in the ROCKS naming convention. <br />
* Generation 6 (gen6) -- <br />
** 44 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards (4 nodes with NVIDIA K80 GPUs and 4 nodes with Intel Xeon Phi 7120P accelerators)<br />
** 38 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 256GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 14 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 384GB DDR4 RAM, FDR InfiniBand and 10GigE network card<br />
** FDR InfiniBand Switch<br />
** 10Gigabit Ethernet Switch<br />
** Management node and gigabit switch for cluster management<br />
** Bright Advanced Cluster Management software licenses <br />
<br />
Summarized, Cheaha's compute pool includes:<br />
* gen4 is 48 cores of [http://ark.intel.com/products/64583/Intel-Xeon-Processor-E5-2680-20M-Cache-2_70-GHz-8_00-GTs-Intel-QPI 2.70GHz eight-core Intel Xeon E5-2680 processors] with 24G of RAM per core or 384GB total<br />
* gen3 is 192 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 8Gb RAM per core or 96GB total<br />
* gen3 is 384 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 4Gb RAM per core or 48GB total<br />
* gen2 is 192 cores of [http://ark.intel.com/products/33083/Intel-Xeon-Processor-E5450-12M-Cache-3_00-GHz-1333-MHz-FSB 3.0GHz quad-core Intel Xeon E5450 processors] with 2Gb RAM per core<br />
* gen1 is 100 cores of 1.6GhZ AMD Opteron 242 processors with 1Gb RAM per core <br />
<br />
{|border="1" cellpadding="2" cellspacing="0"<br />
|+ Physical Nodes<br />
|- bgcolor=grey<br />
!gen!!queue!!#nodes!!cores/node!!RAM/node<br />
|-<br />
|gen6|| default || 44 || 24 || 128G<br />
|-<br />
|gen6|| default || 38 || 24 || 256G<br />
|-<br />
|gen6|| default || 14 || 24 || 384G<br />
|-<br />
|gen5||Ceph/OpenStack|| 12 || 20 || 96G<br />
|-<br />
|gen4||ssg||3||16||385G<br />
|-<br />
|gen3||sipsey||16||12||96G<br />
|-<br />
|gen3||sipsey||32||12||48G<br />
|-<br />
|gen2||cheaha||24||8||16G<br />
|}<br />
<br />
=== Software ===<br />
<br />
Details of the software available on Cheaha can be found on the [https://docs.uabgrid.uab.edu/wiki/Cheaha_Software Installed software page], an overview follows.<br />
<br />
Cheaha uses [http://modules.sourceforge.net/ Environment Modules] to support account configuration. Please follow these [http://me.eng.uab.edu/wiki/index.php?title=Cheaha#Environment_Modules specific steps for using environment modules].<br />
<br />
Cheaha's software stack is built with the [http://www.brightcomputing.com Bright Cluster Manager]. Cheaha's operating system is CentOS with the following major cluster components:<br />
* BrightCM 7.2<br />
* CentOS 7.2 x86_64<br />
* [[Slurm]] 15.08<br />
<br />
A brief summary of the some of the available computational software and tools available includes:<br />
* Amber<br />
* FFTW<br />
* Gromacs<br />
* GSL<br />
* NAMD<br />
* VMD<br />
* Intel Compilers<br />
* GNU Compilers<br />
* Java<br />
* R<br />
* OpenMPI<br />
* MATLAB<br />
<br />
=== Network ===<br />
<br />
Cheaha is connected to the UAB Research Network which provides a dedicated 10Gbs networking backplane between clusters located in the 936 data center and the campus network core. Data transfers rates of almost 8Gbps between these hosts have been demonstrated using Grid FTP, a multi-channel file transfer service that is used to move data between clusters as part of the job management operations. This performance promises very efficient job management and the seamless integration of other clusters as connectivity to the research network is expanded.<br />
<br />
=== Benchmarks ===<br />
<br />
The continuous resource improvement process involves collecting benchmarks of the system. One of the measures of greatest interest to users of the system are benchmarks of specific application codes. The following benchmarks have been performed on the system and will be further expanded as additional benchmarks are performed.<br />
<br />
* [[Cheaha-BGL_Comparison|Cheaha-BGL Comparison]]<br />
<br />
* [[Gromacs_Benchmark|Gromacs]]<br />
<br />
* [[NAMD_Benchmarks|NAMD]]<br />
<br />
=== Cluster Usage Statistics ===<br />
<br />
Cheaha uses Bright Cluster Manager to report cluster performance data. This information provides a helpful overview of the current and historical operating stats for Cheaha. You can access the status monitoring page [https://cheaha-master01.rc.uab.edu/userportal/ here] (accessible only on the UAB network or through VPN).<br />
<br />
== Availability ==<br />
<br />
Cheaha is a general-purpose computer resource made available to the UAB community by UAB IT. As such, it is available for legitimate research and educational needs and is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources. <br />
<br />
Many software packages commonly used across UAB are available via Cheaha.<br />
<br />
To request access to Cheaha, please send a request to [mailto:support@vo.uabgrid.uab.edu send a request] to the cluster support group.<br />
<br />
Cheaha's intended use implies broad access to the community, however, no guarantees are made that specific computational resources will be available to all users. Availability guarantees can only be made for reserved resources.<br />
<br />
=== Secure Shell Access ===<br />
<br />
Please configure you client secure shell software to use the official host name to access Cheaha:<br />
<br />
<pre><br />
cheaha.rc.uab.edu<br />
</pre><br />
<br />
== Scheduling Framework ==<br />
<br />
[http://slurm.schedmd.com/ Slurm] is a queue management system and stands for Simple Linux Utility for Resource Management. Slurm was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. '''[[Slurm]]''' is now the primary job manager on Cheaha, it replaces SUN Grid Engine (SGE) the job manager used earlier.<br />
<br />
Slurm is similar in many ways to GridEngine or most other queue systems. You write a batch script then submit it to the queue manager (scheduler). The queue manager then schedules your job to run on the queue (or '''partition''' in Slurm parlance) that you designate. Below we will provide an outline of how to submit jobs to Slurm, how Slurm decides when to schedule your job, and how to monitor progress.<br />
<br />
== Support ==<br />
<br />
Operational support for Cheaha is provided by the Research Computing group in UAB IT. For questions regarding the operational status of Cheaha please send your request to [mailto:support@vo.uabgrid.uab.edu support@vo.uabgrid.uab.edu]. As a user of Cheaha you will automatically by subscribed to the hpc-announce email list. This subscription is mandatory for all users of Cheaha. It is our way of communicating important information regarding Cheaha to you. The traffic on this list is restricted to official communication and has a very low volume.<br />
<br />
We have limited capacity, however, to support non-operational issue like "How do I write a job script" or "How do I compile a program". For such requests, you may find it more fruitful to send your questions to the hpc-users email list and request help from our peers in the HPC community at UAB. As with all mailing lists, please observe [http://lifehacker.com/5473859/basic-etiquette-for-email-lists-and-forums common mailing etiquette].<br />
<br />
Finally, please remember that as you learned about HPC from others it becomes part of your responsibilty to help others on their quest. You should update this documentation or respond to mailing list requests of others. <br />
<br />
You can subscribe to hpc-users by sending an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=subscribe%20hpc-users sympa@vo.uabgrid.uab.edu with the subject ''subscribe hpc-users''].<br />
<br />
You can unsubribe from hpc-users by sending an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=unsubscribe%20hpc-users sympa@vo.uabgrid.uab.edu with the subject ''unsubscribe hpc-users''].<br />
<br />
You can review archives of the list in the [http://vo.uabgrid.uab.edu/sympa/arc/hpc-users web hpc-archives].<br />
<br />
If you need help using the list service please send an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=help sympa@vo.uabgrid.uab.edu with the subject ''help'']<br />
<br />
If you have questions about the operation of the list itself, please send an email to the owners of the list:<br />
<br />
[mailto:hpc-users-request@vo.uabgrid.uab.edu sympa@vo.uabgrid.uab.edu with a subject relavent to your issue with the list]<br />
<br />
If you are interested in contributing to the enhancement of HPC features at UAB or would like to talk to other cluster administrators, [mailto:sympa@vo.uabgrid.uab.edu?subject=subscribe%20hpc-dev please join the hpc developers community at UAB].</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=UserSession5_22_18&diff=5742UserSession5 22 182018-05-21T16:59:22Z<p>Tanthony@uab.edu: Created page with "HPC User session 5/22/2018 The CAFÉ – The Edge of Chaos 1.00 p.m. – 2.30 p.m. AGENDA 1. Introduction a. Cheaha b. Hardware - Compute & storage 2. Logging on – MA..."</p>
<hr />
<div>HPC User session 5/22/2018<br />
The CAFÉ – The Edge of Chaos 1.00 p.m. – 2.30 p.m.<br />
<br />
<br />
AGENDA<br />
<br />
1. Introduction<br />
a. Cheaha<br />
b. Hardware - Compute & storage <br />
<br />
2. Logging on – MAC(linux) and Windows (putty)<br />
a. LOGIN NODE – Do not use<br />
<br />
3. Storage types and moving data – <br />
a. $HOME - 20GB<br />
b. /data/user/$USER - 5TB<br />
c. /data/project/group_name - 50TB<br />
d. /data/scratch/$USER -500TB (scratch)<br />
<br />
4. Partitions – <br />
a. policies, time limits<br />
b. types of jobs – sbatch, sinteractive, srun, salloc<br />
c. types of nodes – regular/ GPU<br />
<br />
5. Software <br />
a. Modules <br />
b. Requesting installs<br />
c. Self install in $HOME<br />
<br />
6. Demo <br />
a. Sbatch script <br />
b. Sinteractive<br />
<br />
7. If time permits<br />
a. Jypyter notebook <br />
b. R Studio <br />
<br />
8. Requesting support – docs wiki, email<br />
<br />
9. Questions <br />
<br />
<br />
<br />
===Logging ON===<br />
This section will cover steps to configure Windows, Linux and Mac OS X clients to connect to Cheaha.<br />
<br />
====Linux====<br />
Linux systems, regardless of the flavor (RedHat, SuSE, Ubuntu, etc...), should already have an SSH client on the system as part of the default install.<br />
# Start a terminal (on RedHat click Applications -> Accessories -> Terminal, on Ubuntu Ctrl+Alt+T)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Mac OS X====<br />
Mac OS X is a Unix operating system (BSD) and has a built in ssh client.<br />
# Start a terminal (click Finder, type Terminal and double click on Terminal under the Applications category)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Windows====<br />
There are many SSH clients available for Windows, some commercial and some that are free (GPL). This section will cover two clients that are commonly found on UAB Windows systems.<br />
<br />
=====PuTTY=====<br />
[http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY] is a free suite of SSH and telnet tools written and maintained by [http://www.pobox.com/~anakin/ Simon Tatham]. PuTTY supports SSH, secure FTP (SFTP), and X forwarding (XTERM) among other tools.<br />
<br />
* Start PuTTY (Click START -> All Programs -> PuTTY -> PuTTY). The 'PuTTY Configuration' window will open<br />
* Use these settings for each of the clusters that you would like to configure<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host Name (or IP address)'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Saved Sessions'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|}<br />
* Click '''Save''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start PuTTY, simply double click on the cluster name under the 'Saved Sessions' list</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Data_Movement&diff=5740Data Movement2018-05-03T15:52:21Z<p>Tanthony@uab.edu: added section Privacy</p>
<hr />
<div>'''NOTE: This page is under construction.'''<br />
<br />
There are various Linux native commands that you can use to move your data within the HPC cluster, such as [https://linux.die.net/man/1/mv mv], [https://linux.die.net/man/1/cp cp], [https://linux.die.net/man/1/scp scp] etc. One of the most powerful tools for data movement on Linux is [https://linux.die.net/man/1/rsync rsync], which we'll be using in our examples below. <br />
<br />
'''rsync''' and '''scp''' can also be used for moving data from a local storage to Cheaha.<br />
<br />
==General Usage==<br />
To find out more information such as flags, usage etc. about any of the above mentioned tools, you can use '''man TOOL_NAME'''.<br />
<pre><br />
[build@c0051 ~]$ man rsync<br />
<br />
NAME<br />
rsync - a fast, versatile, remote (and local) file-copying tool<br />
<br />
SYNOPSIS<br />
Local: rsync [OPTION...] SRC... [DEST]<br />
<br />
Access via remote shell:<br />
Pull: rsync [OPTION...] [USER@]HOST:SRC... [DEST]<br />
Push: rsync [OPTION...] SRC... [USER@]HOST:DEST<br />
<br />
Access via rsync daemon:<br />
Pull: rsync [OPTION...] [USER@]HOST::SRC... [DEST]<br />
rsync [OPTION...] rsync://[USER@]HOST[:PORT]/SRC... [DEST]<br />
Push: rsync [OPTION...] SRC... [USER@]HOST::DEST<br />
rsync [OPTION...] SRC... rsync://[USER@]HOST[:PORT]/DEST<br />
<br />
Usages with just one SRC arg and no DEST arg will list the source files<br />
instead of copying.<br />
<br />
DESCRIPTION<br />
.<br />
.<br />
.<br />
</pre><br />
<br />
If you are interested in finding out about various methods of moving data and various tools which can be used to achieve that aim, this page provides a very good description/guide : [http://moo.nac.uci.edu/~hjm/HOWTO_move_data.html How to transfer large amounts of data via network.].<br />
<br />
== Privacy ==<br />
{{SensitiveInformation}}<br />
<br />
==Jobs==<br />
<br />
If the data that you are moving is large, then you should always use either an interactive session or a job script for your data movement. This ensures that the process for your data movement isn't using and slowing login nodes for a long time, and instead is performing these operations on a compute node.<br />
<br />
===Interactive session===<br />
<br />
* Start an interactive session using srun<br />
<pre><br />
srun --ntasks=1 --mem-per-cpu=1024 --time=08:00:00 --partition=medium --job-name=DATA_TRANSFER --pty /bin/bash<br />
</pre><br />
'''NOTE:''' Please change the time required and the corresponding [https://docs.uabgrid.uab.edu/wiki/SLURM#Slurm_Partitions partition] according to your need.<br />
<br />
* Start an rsync process to start the transfer, once you have moved from login001 to c00XX node:<br />
<pre><br />
[build@c0051 Salmon]$ rsync -aP SOURCE_PATH DESTINATION_PATH<br />
</pre><br />
<br />
===Job Script===<br />
<pre>#!/bin/bash<br />
#<br />
#SBATCH --job-name=test<br />
#SBATCH --output=res.txt<br />
#SBATCH --ntasks=1<br />
#SBATCH --partition=express<br />
#<br />
# Time format = HH:MM:SS, DD-HH:MM:SS<br />
#<br />
#SBATCH --time=10:00<br />
#<br />
# Mimimum memory required per allocated CPU in MegaBytes. <br />
#<br />
#SBATCH --mem-per-cpu=2048<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
rsync -aP SOURCE_PATH DESTINATION_PATH<br />
</pre><br />
<br />
'''NOTE:''' <br />
* Please change the time required and the corresponding [https://docs.uabgrid.uab.edu/wiki/SLURM#Slurm_Partitions partition] according to your need.<br />
* After modifications to the given job script, submit it using : '''sbatch JOB_SCRIPT'''<br />
<br />
==Moving data from Lustre to GPFS Storage==<br />
<br />
'''SGE and Lustre will be taken offline December 18 2016 and decommissioned. All data remaining on Lustre after this date will be deleted.'''<br />
<br />
Instructions for migrating data to /data/scratch/$USER location:<br />
* Login to the new hardware (hostname:cheaha.rc.uab.edu). Instructions to login can be found [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#Overview here].<br />
* You will notice that your /scratch/user/$USER is also mounted on the new hardware. It’s a read-only mount, and there to help you in moving your data .<br />
* Start a rsync process using : '''rsync -aP /scratch/user/$USER/ /data/scratch/$USER'''. If the data that you would be transferring is large, then either start an [https://docs.uabgrid.uab.edu/wiki/Data_Movement#Interactive_session interactive session] for this task or create a [[https://docs.uabgrid.uab.edu/wiki/Data_Movement#Job_Script job script].<br />
<br />
Data in /home or /rstore isn’t affected and remains the same on both new and old hardware, hence you don’t need to move that data.<br />
<br />
==Examples==<br />
This sections provides various use cases where you would need to move your data.<br />
<br />
===Moving data from local storage to HPC===<br />
\\TODO<br />
<br />
===Moving data from rstore to /data/scratch===<br />
\\TODO</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Cheaha_GettingStarted&diff=5739Cheaha GettingStarted2018-05-03T15:50:38Z<p>Tanthony@uab.edu: /* Storage */ changed to section Privacy</p>
<hr />
<div>Cheaha is a cluster computing environment for UAB researchers. Information about the history and future plans for Cheaha is available on the [[Cheaha]] page.<br />
<br />
== Access (Cluster Account Request) ==<br />
<br />
To request an account on [[Cheaha]], please {{CheahaAccountRequest}}. Please include some background information about the work you plan on doing on the cluster and the group you work with, ie. your lab or affiliation.<br />
<br />
Usage of Cheaha is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources. <br />
<br />
The official DNS name of Cheaha's frontend machine is ''cheaha.rc.uab.edu''. If you want to refer to the machine as ''cheaha'', you'll have to either add the "rc.uab.edu" to you computer's DNS search path. On Unix-derived systems (Linux, Mac) you can edit your computers /etc/resolv.conf as follows (you'll need administrator access to edit this file)<br />
<pre><br />
search rc.uab.edu<br />
</pre><br />
Or you can customize your SSH configuration to use the short name "cheaha" as a connection name. On systems using OpenSSH you can add the following to your ~/.ssh/config file<br />
<br />
<pre><br />
Host cheaha<br />
Hostname cheaha.rc.uab.edu<br />
</pre><br />
<br />
== Login ==<br />
===Overview===<br />
Once your account has been created, you'll receive an email containing your user ID, generally your Blazer ID. Logging into Cheaha requires an SSH client. Most UAB Windows workstations already have an SSH client installed, possibly named '''SSH Secure Shell Client''' or [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY]. Linux and Mac OS X systems should have an SSH client installed by default.<br />
<br />
Usage of Cheaha is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources.<br />
<br />
===Client Configuration===<br />
This section will cover steps to configure Windows, Linux and Mac OS X clients to connect to Cheaha.<br />
====Linux====<br />
Linux systems, regardless of the flavor (RedHat, SuSE, Ubuntu, etc...), should already have an SSH client on the system as part of the default install.<br />
# Start a terminal (on RedHat click Applications -> Accessories -> Terminal, on Ubuntu Ctrl+Alt+T)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Mac OS X====<br />
Mac OS X is a Unix operating system (BSD) and has a built in ssh client.<br />
# Start a terminal (click Finder, type Terminal and double click on Terminal under the Applications category)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Windows====<br />
There are many SSH clients available for Windows, some commercial and some that are free (GPL). This section will cover two clients that are commonly found on UAB Windows systems.<br />
=====MobaXterm=====<br />
[http://mobaxterm.mobatek.net/ MobaXterm] is a free (also available for a price in a Profession version) suite of SSH tools. Of the Windows clients we've used, MobaXterm is the easiest to use and feature complete. [http://mobaxterm.mobatek.net/features.html Features] include (but not limited to):<br />
* SSH client (in a handy web browser like tabbed interface)<br />
* Embedded Cygwin (which allows Windows users to run many Linux commands like grep, rsync, sed)<br />
* Remote file system browser (graphical SFTP)<br />
* X11 forwarding for remotely displaying graphical content from Cheaha<br />
* Installs without requiring Windows Administrator rights<br />
<br />
Start MobaXterm and click the Session toolbar button (top left). Click SSH for the session type, enter the following information and click OK. Once finished, double click cheaha.rc.uab.edu in the list of Saved sessions under PuTTY sessions:<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Remote host'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|}<br />
<br />
=====PuTTY=====<br />
[http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY] is a free suite of SSH and telnet tools written and maintained by [http://www.pobox.com/~anakin/ Simon Tatham]. PuTTY supports SSH, secure FTP (SFTP), and X forwarding (XTERM) among other tools.<br />
<br />
* Start PuTTY (Click START -> All Programs -> PuTTY -> PuTTY). The 'PuTTY Configuration' window will open<br />
* Use these settings for each of the clusters that you would like to configure<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host Name (or IP address)'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Saved Sessions'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|}<br />
* Click '''Save''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start PuTTY, simply double click on the cluster name under the 'Saved Sessions' list<br />
<br />
=====SSH Secure Shell Client=====<br />
SSH Secure Shell is a commercial application that is installed on many Windows workstations on campus and can be configured as follows:<br />
* Start the program (Click START -> All Programs -> SSH Secure Shell -> Secure Shell Client). The 'default - SSH Secure Shell' window will open<br />
* Click File -> Profiles -> Add Profile to open the 'Add Profile' window<br />
* Type in the name of the cluster (for example: cheaha) in the field and click 'Add to Profiles'<br />
* Click File -> Profiles -> Edit Profiles to open the 'Profiles' window<br />
* Single click on your new profile name<br />
* Use these settings for the clusters<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host name'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''User name'''<br />
|blazerid (insert your blazerid here)<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Encryption algorithm'''<br />
|<Default><br />
|-<br />
|'''MAC algorithm'''<br />
|<Default><br />
|-<br />
|'''Compression'''<br />
|<None><br />
|-<br />
|'''Terminal answerback'''<br />
|vt100<br />
|-<br />
|}<br />
* Leave 'Connect through firewall' and 'Request tunnels only' unchecked<br />
* Click '''OK''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start SSH Secure Shell, click 'Profiles' and click the cluster name<br />
<br />
=== Logging in to Cheaha ===<br />
No matter which client you use to connect to the Cheaha, the first time you connect, the SSH client should display a message asking if you would like to import the hosts public key. Answer '''Yes''' to this question.<br />
<br />
* Connect to Cheaha using one of the methods listed above<br />
* Answer '''Yes''' to import the cluster's public key<br />
** Enter your BlazerID password<br />
<br />
* After successfully logging in for the first time, You may see the following message '''just press ENTER for the next three prompts, don't type any passphrases!'''<br />
<br />
It doesn't appear that you have set up your ssh key.<br />
This process will make the files:<br />
/home/joeuser/.ssh/id_rsa.pub<br />
/home/joeuser/.ssh/id_rsa<br />
/home/joeuser/.ssh/authorized_keys<br />
<br />
Generating public/private rsa key pair.<br />
Enter file in which to save the key (/home/joeuser/.ssh/id_rsa):<br />
** Enter file in which to save the key (/home/joeuser/.ssh/id_rsa):'''Press Enter'''<br />
** Enter passphrase (empty for no passphrase):'''Press Enter'''<br />
** Enter same passphrase again:'''Press Enter'''<br />
Your identification has been saved in /home/joeuser/.ssh/id_rsa.<br />
Your public key has been saved in /home/joeuser/.ssh/id_rsa.pub.<br />
The key fingerprint is:<br />
f6:xx:xx:xx:xx:dd:9a:79:7b:83:xx:f9:d7:a7:d6:27 joeuser@cheaha.rc.uab.edu<br />
<br />
==== Users without a blazerid (collaborators from other universities) ====<br />
** If you were issued a temporary password, enter it (Passwords are CaSE SensitivE!!!) You should see a message similar to this<br />
You are required to change your password immediately (password aged)<br />
<br />
WARNING: Your password has expired.<br />
You must change your password now and login again!<br />
Changing password for user joeuser.<br />
Changing password for joeuser<br />
(current) UNIX password:<br />
*** (current) UNIX password: '''Enter your temporary password at this prompt and press enter'''<br />
*** New UNIX password: '''Enter your new strong password and press enter'''<br />
*** Retype new UNIX password: '''Enter your new strong password again and press enter'''<br />
*** After you enter your new password for the second time and press enter, the shell may exit automatically. If it doesn't, type exit and press enter<br />
*** Log in again, this time use your new password<br />
<br />
Congratulations, you should now have a command prompt and be ready to start [[Cheaha_GettingStarted#Example_Batch_Job_Script | submitting jobs]]!!!<br />
<br />
== Hardware ==<br />
[[Image:Chehah2_2016.png|center|thumb|450px|Logical Diagram of Cheaha Configuration]]<br />
<br />
The Cheaha Compute Platform includes commodity compute hardware, totaling 2800 compute cores and over 4.7PB of usable storage (6.6PB raw capacity). The following descriptions highlight the current hardware profile that provides an aggregate theoretical peak performance of 468 teraflops.<br />
<br />
* Compute <br />
** 36 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 38 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 256GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 14 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 384GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 4 Compute Nodes with Nvidia Tesla K80 and two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 4 Compute Nodes with Intel Phi coprocessor SE10/7120 and two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 18 Compute Nodes with two 14 core processors (Intel Xeon E5-2680 v4 2.4GHz)with 256GB DDR4 RAM, four NVIDIA Tesla P100 16GB GPUs, EDR InfiniBand and 10GigE network cards<br />
<br />
* Networking<br />
**FDR and EDR InfiniBand Switch<br />
** 10Gigabit Ethernet Switch<br />
<br />
* Storage -- DDN SFA12KX with GPFS) <br />
** 2 x 12KX40D-56IB controllers<br />
** 10 x SS8460 disk enclosures<br />
** 825 x 4K SAS drives<br />
<br />
* Management <br />
** Management node and gigabit switch for cluster management<br />
** Bright Advanced Cluster Management software licenses<br />
<br />
== Cluster Software ==<br />
* BrightCM 7.2<br />
* CentOS 7.2 x86_64<br />
* [[Slurm]] 15.08<br />
<br />
== Queuing System ==<br />
All work on Cheaha must be submitted to '''our queuing system ([[Slurm]])'''. A common mistake made by new users is to run 'jobs' on the login node. This section gives a basic overview of what a queuing system is and why we use it.<br />
=== What is a queuing system? ===<br />
* Software that gives users fair allocation of the cluster's resources<br />
* Schedules jobs based using resource requests (the following are commonly requested resources, there are many more that are available)<br />
** Number of processors (often referred to as "slots")<br />
** Maximum memory (RAM) required per slot<br />
** Maximum run time<br />
* Common queuing systems:<br />
** '''[[Slurm]]'''<br />
** Sun Grid Engine (Also know as SGE, OGE, GE)<br />
** OpenPBS<br />
** Torque<br />
** LSF (load sharing facility)<br />
<br />
[http://slurm.schedmd.com/ Slurm] is a queue management system and stands for Simple Linux Utility for Resource Management. Slurm was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. '''[[Slurm]]''' is now the primary job manager on Cheaha, it replaces SUN Grid Engine ([[https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted_deprecated SGE]]) the job manager used earlier. Instructions of using SLURM and writing SLURM scripts for jobs submission on Cheaha can be found '''[[Slurm | here]]'''.<br />
<br />
=== Typical Workflow ===<br />
* Stage data to $USER_SCRATCH (your scratch directory)<br />
* Research how to run your code in "batch" mode. Batch mode typically means the ability to run it from the command line without requiring any interaction from the user.<br />
* Identify the appropriate resources needed to run the job. The following are mandatory resource requests for all jobs on Cheaha<br />
** Maximum memory (RAM) required per slot<br />
** Maximum runtime<br />
* Write a job script specifying queuing system parameters, resource requests and commands to run program<br />
* Submit script to queuing system (sbatch script.job)<br />
* Monitor job (squeue)<br />
* Review the results and resubmit as necessary<br />
* Clean up the scratch directory by moving or deleting the data off of the cluster<br />
<br />
=== Resource Requests ===<br />
Accurate resource requests are extremely important to the health of the over all cluster. In order for Cheaha to operate properly, the queing system must know how much runtime and RAM each job will need.<br />
<br />
==== Mandatory Resource Requests ====<br />
<br />
* -t, --time=<time><br />
Set a limit on the total run time of the job allocation. If the requested time limit exceeds the partition's time limit, the job will be left in a PENDING state (possibly indefinitely).<br />
* For Array jobs, this represents the maximum run time for each task<br />
** For serial or parallel jobs, this represents the maximum run time for the entire job<br />
<br />
* --mem-per-cpu=<MB><br />
Mimimum memory required per allocated CPU in MegaBytes.<br />
<br />
==== Other Common Resource Requests ====<br />
* -N, --nodes=<minnodes[-maxnodes]><br />
Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count.<br />
<br />
* -n, --ntasks=<number><br />
sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node<br />
<br />
* --mem=<MB><br />
Specify the real memory required per node in MegaBytes.<br />
<br />
* -c, --cpus-per-task=<ncpus><br />
Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task.<br />
<br />
* -p, --partition=<partition_names><br />
Request a specific partition for the resource allocation. Available partitions are: express(max 2 hrs), short(max 12 hrs), medium(max 50 hrs), long(max 150 hrs), sinteractive(0-2 hrs)<br />
<br />
=== Submitting Jobs ===<br />
Batch Jobs are submitted on Cheaha by using the "sbatch" command. The full manual for sbtach is available by running the following command<br />
man sbatch<br />
<br />
==== Job Script File Format ====<br />
To submit a job to the queuing systems, you will first define your job in a script (a text file) and then submit that script to the queuing system.<br />
<br />
The script file needs to be '''formatted as a UNIX file''', not a Windows or Mac text file. In geek speak, this means that the end of line (EOL) character should be a line feed (LF) rather than a carriage return line feed (CRLF) for Windows or carriage return (CR) for Mac.<br />
<br />
If you submit a job script formatted as a Windows or Mac text file, your job will likely fail with misleading messages, for example that the path specified does not exist.<br />
<br />
Windows '''Notepad''' does not have the ability to save files using the UNIX file format. Do NOT use Notepad to create files intended for use on the clusters. Instead use one of the alternative text editors listed in the following section.<br />
<br />
===== Converting Files to UNIX Format =====<br />
====== Dos2Unix Method ======<br />
The lines below that begin with $ are commands, the $ represents the command prompt and should not be typed!<br />
<br />
The dos2unix program can be used to convert Windows text files to UNIX files with a simple command. After you have copied the file to your home directory on the cluster, you can identify that the file is a Windows file by executing the following (Windows uses CR LF as the line terminator, where UNIX uses only LF and Mac uses only CR):<br />
<pre><br />
$ file testfile.txt<br />
<br />
testfile.txt: ASCII text, with CRLF line terminators<br />
</pre><br />
<br />
Now, convert the file to UNIX<br />
<pre><br />
$ dos2unix testfile.txt<br />
<br />
dos2unix: converting file testfile.txt to UNIX format ...<br />
</pre><br />
<br />
Verify the conversion using the file command<br />
<pre><br />
$ file testfile.txt<br />
<br />
testfile.txt: ASCII text<br />
</pre><br />
<br />
====== Alternative Windows Text Editors ======<br />
There are many good text editors available for Windows that have the capability to save files using the UNIX file format. Here are a few:<br />
* [[http://www.geany.org/ Geany]] is an excellent free text editor for Windows and Linux that supports Windows, UNIX and Mac file formats, syntax highlighting and many programming features. To convert from Windows to UNIX click '''Document''' click '''Set Line Endings''' and then '''Convert and Set to LF (Unix)'''<br />
* [[http://notepad-plus.sourceforge.net/uk/site.htm Notepad++]] is a great free Windows text editor that supports Windows, UNIX and Mac file formats, syntax highlighting and many programming features. To convert from Windows to UNIX click '''Format''' and then click '''Convert to UNIX Format'''<br />
* [[http://www.textpad.com/ TextPad]] is another excellent Windows text editor. TextPad is not free, however.<br />
<br />
==== Example Batch Job Script ====<br />
A shared cluster environment like Cheaha uses a job scheduler to run tasks on the cluster to provide optimal resource sharing among users. Cheaha uses a job scheduling system call Slurm to schedule and manage jobs. A user needs to tell Slurm about resource requirements (e.g. CPU, memory) so that it can schedule jobs effectively. These resource requirements along with actual application code can be specified in a single file commonly referred as 'Job Script/File'. Following is a simple job script that prints job number and hostname.<br />
<br />
'''Note:'''Jobs '''must request''' the appropriate partition (ex: ''--partition=short'') to satisfy the jobs resource request (maximum runtime, number of compute nodes, etc...)<br />
<pre><br />
#!/bin/bash<br />
#<br />
#SBATCH --job-name=test<br />
#SBATCH --output=res.txt<br />
#SBATCH --ntasks=1<br />
#SBATCH --partition=express<br />
#SBATCH --time=10:00<br />
#SBATCH --mem-per-cpu=100<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
srun hostname<br />
srun sleep 60<br />
</pre><br />
<br />
Lines starting with '#SBATCH' have a special meaning in the Slurm world. Slurm specific configuration options are specified after the '#SBATCH' characters. Above configuration options are useful for most job scripts and for additional configuration options refer to Slurm commands manual. A job script is submitted to the cluster using Slurm specific commands. There are many commands available, but following three commands are the most common:<br />
* sbatch - to submit job<br />
* scancel - to delete job<br />
* squeue - to view job status<br />
<br />
We can submit above job script using sbatch command:<br />
<pre><br />
$ sbatch HelloCheaha.sh<br />
Submitted batch job 52707<br />
</pre><br />
<br />
When the job script is submitted, Slurm queues it up and assigns it a job number (e.g. 52707 in above example). The job number is available inside job script using environment variable $JOB_ID. This variable can be used inside job script to create job related directory structure or file names.<br />
<br />
=== Interactive Resources ===<br />
Login Node (the host that you connected to when you setup the SSH connection to Cheaha) is supposed to be used for submitting jobs and/or lighter prep work required for the job scripts. '''Do not run heavy computations on the login node'''. If you have a heavier workload to prepare for a batch job (eg. compiling code or other manipulations of data) or your compute application requires interactive control, you should request a dedicated interactive node for this work.<br />
<br />
Interactive resources are requested by submitting an "interactive" job to the scheduler. Interactive jobs will provide you a command line on a compute resource that you can use just like you would the command line on the login node. The difference is that the scheduler has dedicated the requested resources to your job and you can run your interactive commands without having to worry about impacting other users on the login node.<br />
<br />
Interactive jobs, that can be run on command line, are requested with the '''srun''' command. <br />
<br />
<pre><br />
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash<br />
</pre><br />
<br />
This command requests for 4 cores (--cpus-per-task) for a single task (--ntasks) with each cpu requesting size 4GB of RAM (--mem-per-cpu) for 8 hrs (--time).<br />
<br />
More advanced interactive scenarios to support graphical applications are available using [https://docs.uabgrid.uab.edu/wiki/Setting_Up_VNC_Session VNC] or X11 tunneling [http://www.uab.edu/it/software X-Win32 2014 for Windows]<br />
<br />
Interactive jobs that requires running a graphical application, are requested with the '''sinteractive''' command, via '''Terminal''' on your VNC window.<br />
<pre><br />
sinteractive --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME<br />
</pre><br />
Please note, sinteractive starts your shell in a screen session. Screen is a terminal emulator that is designed to make it possible to detach and reattach a session. This feature can mostly be ignored. If you application uses `ctrl-a` as a special command sequence (e.g. Emacs), however, you may find the application doesn't receive this special character. When using screen, you need to type `ctrl-a a` (ctrl-a followed by a single "a" key press) to send a ctrl-a to your application. Screen uses ctrl-a as it's own command character, so this special sequence issues the command to screen to "send ctrl-a to my app". Learn more about [https://www.gnu.org/software/screen/manual/html_node/Overview.html#Overview screen from it's documentation].<br />
<br />
== Storage ==<br />
=== Privacy ===<br />
{{SensitiveInformation}}<br />
=== No Automatic Backups ===<br />
<br />
There is no automatic back up of any data on the cluster (home, scratch, or whatever). All data back up is managed by you. If you aren't managing a data back up process, then you have no backup data.<br />
<br />
=== Home directories ===<br />
<br />
Your home directory on Cheaha is NFS-mounted to the compute nodes as /home/$USER or $HOME. It is acceptable to use your home directory as a location to store job scripts, custom code, and libraries. You are responsible for keeping your home directory under 10GB in size!<br />
<br />
'''The home directory must not be used to store large amounts of data.''' Please use $USER_SCRATCH <br />
for actively used data sets and $USER_DATA for storage of non scratch data.<br />
<br />
=== Scratch ===<br />
Research Computing policy requires that all bulky input and output must be located on the scratch space. The home directory is intended to store your job scripts, log files, libraries and other supporting files.<br />
<br />
'''Important Information:'''<br />
* Scratch space (network and local) '''is not backed up'''.<br />
* Research Computing expects each user to keep their scratch areas clean. The cluster scratch area are not to be used for archiving data.<br />
<br />
Cheaha has two types of scratch space, network mounted and local.<br />
* Network scratch ($USER_SCRATCH) is available on the login node and each compute node. This storage is a GPFS high performance file system providing roughly 4.7PB of usable storage. This should be your jobs primary working directory, unless the job would benefit from local scratch (see below).<br />
* Local scratch is physically located on each compute node and is not accessible to the other nodes (including the login node). This space is useful if the job performs a lot of file I/O. Most of the jobs that run on our clusters do not fall into this category. Because the local scratch is inaccessible outside the job, it is important to note that you must move any data between local scratch to your network accessible scratch within your job. For example, step 1 in the job could be to copy the input from $USER_SCRATCH to ${USER_SCRATCH}, step 2 code execution, step 3 move the data back to $USER_SCRATCH.<br />
<br />
==== Network Scratch ====<br />
Network scratch is available using the environment variable $USER_SCRATCH or directly by /data/scratch/$USER<br />
<br />
It is advisable to use the environment variable whenever possible rather than the hard coded path.<br />
<br />
==== Local Scratch ====<br />
Each compute node has a local scratch directory that is accessible via the variable '''$LOCAL_SCRATCH'''. If your job performs a lot of file I/O, the job should use $LOCAL_SCRATCH rather than $USER_SCRATCH to prevent bogging down the network scratch file system. The amount of scratch space available is approximately 800GB.<br />
<br />
The $LOCAL_SCRATCH is a special temporary directory and it's important to note that this directory is deleted when the job completes, so the job script has to move the results to $USER_SCRATCH or other location prior to the job exiting.<br />
<br />
Note that $LOCAL_SCRATCH is only useful for jobs in which all processes run on the same compute node, so MPI jobs are not candidates for this solution.<br />
<br />
The following is an array job example that uses $LOCAL_SCRATCH by transferring the inputs into $LOCAL_SCRATCH at the beginning of the script and the result out of $LOCAL_SCRATCH at the end of the script.<br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --array=1-10<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=R_array_job<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=R_array_job.err<br />
#SBATCH --output=R_array_job.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes and the appropriate partition<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load R/3.2.0-goolf-1.7.20<br />
<br />
echo "TMPDIR: $LOCAL_SCRATCH"<br />
<br />
cd $LOCAL_SCRATCH<br />
# Create a working directory under the special scheduler local scratch directory<br />
# using the array job's taskID<br />
mdkir $SLURM_ARRAY_TASK_ID<br />
cd $SLURM_ARRAY_TASK_ID<br />
<br />
# Next copy the input data to the local scratch<br />
echo "Copying input data from network scratch to $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID - $(date)<br />
# The input data in this case has a numerical file extension that<br />
# matches $SLURM_ARRAY_TASK_ID<br />
cp -a $USER_SCRATCH/GeneData/INP*.$SLURM_ARRAY_TASK_ID ./<br />
echo "copied input data from network scratch to $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID - $(date)<br />
<br />
someapp -S 1 -D 10 -i INP*.$SLURM_ARRAY_TASK_ID -o geneapp.out.$SLURM_ARRAY_TASK_ID<br />
<br />
# Lastly copy the results back to network scratch<br />
echo "Copying results from local $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID to network - $(date)<br />
cp -a geneapp.out.$SLURM_ARRAY_TASK_ID $USER_SCRATCH/GeneData/<br />
echo "Copied results from local $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID to network - $(date)<br />
<br />
</pre><br />
<br />
=== Project Storage ===<br />
Cheaha has a location where shared data can be stored called $SHARE_PROJECT. As with user scratch, this area '''is not backed up'''!<br />
<br />
This is helpful if a team of researchers must access the same data. Please open a help desk ticket to request a project directory under $SHARE_PROJECT.<br />
<br />
=== Uploading Data ===<br />
{{SensitiveInformation}}<br />
Data can be moved onto the cluster (pushed) from a remote client (ie. you desktop) via SCP or SFTP. Data can also be downloaded to the cluster (pulled) by issuing transfer commands once you are logged into the cluster. Common transfer methods are `wget <URL>`, FTP, or SCP, and depend on how the data is made available from the data provider.<br />
<br />
Large data sets should be staged directly to your $USER_SCRATCH directory so as not to fill up $HOME. If you are working on a data set shared with multiple users, it's preferable to request space in $SHARE_PROJECT rather than duplicating the data for each user.<br />
<br />
== Environment Modules ==<br />
[http://modules.sourceforge.net/ Environment Modules] is installed on Cheaha and should be used when constructing your job scripts if an applicable module file exists. Using the module command you can easily configure your environment for specific software packages without having to know the specific environment variables and values to set. Modules allows you to dynamically configure your environment without having to logout / login for the changes to take affect.<br />
<br />
If you find that specific software does not have a module, please submit a [http://etlab.eng.uab.edu/ helpdesk ticket] to request the module.<br />
<br />
* Cheaha supports bash completion for the module command. For example, type 'module' and press the TAB key twice to see a list of options:<br />
<pre><br />
module TAB TAB<br />
<br />
add display initlist keyword refresh switch use <br />
apropos help initprepend list rm unload whatis <br />
avail initadd initrm load show unuse <br />
clear initclear initswitch purge swap update<br />
</pre><br />
<br />
* To see the list of available modulefiles on the cluster, run the '''module avail''' command (note the example list below may not be complete!) or '''module load ''' followed by two tab key presses:<br />
<pre><br />
module avail<br />
<br />
----------------------------------------------------------------------------------------- /cm/shared/modulefiles -----------------------------------------------------------------------------------------<br />
acml/gcc/64/5.3.1 acml/open64-int64/mp/fma4/5.3.1 fftw2/openmpi/gcc/64/float/2.1.5 intel-cluster-runtime/ia32/3.8 netcdf/gcc/64/4.3.3.1<br />
acml/gcc/fma4/5.3.1 blacs/openmpi/gcc/64/1.1patch03 fftw2/openmpi/open64/64/double/2.1.5 intel-cluster-runtime/intel64/3.8 netcdf/open64/64/4.3.3.1<br />
acml/gcc/mp/64/5.3.1 blacs/openmpi/open64/64/1.1patch03 fftw2/openmpi/open64/64/float/2.1.5 intel-cluster-runtime/mic/3.8 netperf/2.7.0<br />
acml/gcc/mp/fma4/5.3.1 blas/gcc/64/3.6.0 fftw3/openmpi/gcc/64/3.3.4 intel-tbb-oss/ia32/44_20160526oss open64/4.5.2.1<br />
acml/gcc-int64/64/5.3.1 blas/open64/64/3.6.0 fftw3/openmpi/open64/64/3.3.4 intel-tbb-oss/intel64/44_20160526oss openblas/dynamic/0.2.15<br />
acml/gcc-int64/fma4/5.3.1 bonnie++/1.97.1 gdb/7.9 iozone/3_434 openmpi/gcc/64/1.10.1<br />
acml/gcc-int64/mp/64/5.3.1 cmgui/7.2 globalarrays/openmpi/gcc/64/5.4 lapack/gcc/64/3.6.0 openmpi/open64/64/1.10.1<br />
acml/gcc-int64/mp/fma4/5.3.1 cuda75/blas/7.5.18 globalarrays/openmpi/open64/64/5.4 lapack/open64/64/3.6.0 pbspro/13.0.2.153173<br />
acml/open64/64/5.3.1 cuda75/fft/7.5.18 hdf5/1.6.10 mpich/ge/gcc/64/3.2 puppet/3.8.4<br />
acml/open64/fma4/5.3.1 cuda75/gdk/352.79 hdf5_18/1.8.16 mpich/ge/open64/64/3.2 rc-base<br />
acml/open64/mp/64/5.3.1 cuda75/nsight/7.5.18 hpl/2.1 mpiexec/0.84_432 scalapack/mvapich2/gcc/64/2.0.2<br />
acml/open64/mp/fma4/5.3.1 cuda75/profiler/7.5.18 hwloc/1.10.1 mvapich/gcc/64/1.2rc1 scalapack/openmpi/gcc/64/2.0.2<br />
acml/open64-int64/64/5.3.1 cuda75/toolkit/7.5.18 intel/compiler/32/15.0/2015.5.223 mvapich/open64/64/1.2rc1 sge/2011.11p1<br />
acml/open64-int64/fma4/5.3.1 default-environment intel/compiler/64/15.0/2015.5.223 mvapich2/gcc/64/2.2b slurm/15.08.6<br />
acml/open64-int64/mp/64/5.3.1 fftw2/openmpi/gcc/64/double/2.1.5 intel-cluster-checker/2.2.2 mvapich2/open64/64/2.2b torque/6.0.0.1<br />
<br />
---------------------------------------------------------------------------------------- /share/apps/modulefiles -----------------------------------------------------------------------------------------<br />
rc/BrainSuite/15b rc/freesurfer/freesurfer-5.3.0 rc/intel/compiler/64/ps_2016/2016.0.047 rc/matlab/R2015a rc/SAS/v9.4<br />
rc/cmg/2012.116.G rc/gromacs-intel/5.1.1 rc/Mathematica/10.3 rc/matlab/R2015b<br />
rc/dsistudio/dsistudio-20151020 rc/gtool/0.7.5 rc/matlab/R2012a rc/MRIConvert/2.0.8<br />
<br />
--------------------------------------------------------------------------------------- /share/apps/rc/modules/all ---------------------------------------------------------------------------------------<br />
AFNI/linux_openmp_64-goolf-1.7.20-20160616 gperf/3.0.4-intel-2016a MVAPICH2/2.2b-GCC-4.9.3-2.25<br />
Amber/14-intel-2016a-AmberTools-15-patchlevel-13-13 grep/2.15-goolf-1.4.10 NASM/2.11.06-goolf-1.7.20<br />
annovar/2016Feb01-foss-2015b-Perl-5.22.1 GROMACS/5.0.5-intel-2015b-hybrid NASM/2.11.08-foss-2015b<br />
ant/1.9.6-Java-1.7.0_80 GSL/1.16-goolf-1.7.20 NASM/2.11.08-intel-2016a<br />
APBS/1.4-linux-static-x86_64 GSL/1.16-intel-2015b NASM/2.12.02-foss-2016a<br />
ASHS/rev103_20140612 GSL/2.1-foss-2015b NASM/2.12.02-intel-2015b<br />
Aspera-Connect/3.6.1 gtool/0.7.5_linux_x86_64 NASM/2.12.02-intel-2016a<br />
ATLAS/3.10.1-gompi-1.5.12-LAPACK-3.4.2 guile/1.8.8-GNU-4.9.3-2.25 ncurses/5.9-foss-2015b<br />
Autoconf/2.69-foss-2016a HAPGEN2/2.2.0 ncurses/5.9-GCC-4.8.4<br />
Autoconf/2.69-GCC-4.8.4 HarfBuzz/1.2.7-intel-2016a ncurses/5.9-GNU-4.9.3-2.25<br />
Autoconf/2.69-GNU-4.9.3-2.25 HDF5/1.8.15-patch1-intel-2015b ncurses/5.9-goolf-1.4.10<br />
. <br />
.<br />
.<br />
.<br />
</pre><br />
<br />
Some software packages have multiple module files, for example:<br />
* GCC/4.7.2 <br />
* GCC/4.8.1 <br />
* GCC/4.8.2 <br />
* GCC/4.8.4 <br />
* GCC/4.9.2 <br />
* GCC/4.9.3 <br />
* GCC/4.9.3-2.25 <br />
<br />
In this case, the GCC module will always load the latest version, so loading this module is equivalent to loading GCC/4.9.3-2.25. If you always want to use the latest version, use this approach. If you want use a specific version, use the module file containing the appropriate version number.<br />
<br />
Some modules, when loaded, will actually load other modules. For example, the ''GROMACS/5.0.5-intel-2015b-hybrid '' module will also load ''intel/2015b'' and other related tools.<br />
<br />
* To load a module, ex: for a GROMACS job, use the following '''module load''' command in your job script:<br />
<pre><br />
module load GROMACS/5.0.5-intel-2015b-hybrid <br />
</pre><br />
<br />
* To see a list of the modules that you currently have loaded use the '''module list''' command<br />
<pre><br />
module list<br />
<br />
Currently Loaded Modulefiles:<br />
1) slurm/15.08.6 9) impi/5.0.3.048-iccifort-2015.3.187-GNU-4.9.3-2.25 17) Tcl/8.6.3-intel-2015b<br />
2) rc-base 10) iimpi/7.3.5-GNU-4.9.3-2.25 18) SQLite/3.8.8.1-intel-2015b<br />
3) GCC/4.9.3-binutils-2.25 11) imkl/11.2.3.187-iimpi-7.3.5-GNU-4.9.3-2.25 19) Tk/8.6.3-intel-2015b-no-X11<br />
4) binutils/2.25-GCC-4.9.3-binutils-2.25 12) intel/2015b 20) Python/2.7.9-intel-2015b<br />
5) GNU/4.9.3-2.25 13) bzip2/1.0.6-intel-2015b 21) Boost/1.58.0-intel-2015b-Python-2.7.9<br />
6) icc/2015.3.187-GNU-4.9.3-2.25 14) zlib/1.2.8-intel-2015b 22) GROMACS/5.0.5-intel-2015b-hybrid<br />
7) ifort/2015.3.187-GNU-4.9.3-2.25 15) ncurses/5.9-intel-2015b<br />
8) iccifort/2015.3.187-GNU-4.9.3-2.25 16) libreadline/6.3-intel-2015b<br />
</pre><br />
<br />
* A module can be removed from your environment by using the '''module unload''' command:<br />
<pre><br />
module unload GROMACS/5.0.5-intel-2015b-hybrid<br />
</pre><br />
<br />
* The definition of a module can also be viewed using the '''module show''' command, revealing what a specific module will do to your environment:<br />
<pre><br />
module show GROMACS/5.0.5-intel-2015b-hybrid <br />
-------------------------------------------------------------------<br />
/share/apps/rc/modules/all/GROMACS/5.0.5-intel-2015b-hybrid:<br />
<br />
module-whatis GROMACS is a versatile package to perform molecular dynamics,<br />
i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. - Homepage: http://www.gromacs.org <br />
conflict GROMACS <br />
prepend-path CPATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/include <br />
prepend-path LD_LIBRARY_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64 <br />
prepend-path LIBRARY_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64 <br />
prepend-path MANPATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/share/man <br />
prepend-path PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/bin <br />
prepend-path PKG_CONFIG_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64/pkgconfig <br />
setenv EBROOTGROMACS /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid <br />
setenv EBVERSIONGROMACS 5.0.5 <br />
setenv EBDEVELGROMACS /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/easybuild/GROMACS-5.0.5-intel-2015b-hybrid-easybuild-devel <br />
-------------------------------------------------------------------<br />
</pre><br />
<br />
=== Error Using Modules from a Job Script ===<br />
<br />
If you are using modules and the command your job executes runs fine from the command line but fails when you run it from the job, you may be having an issue with the script initialization. If you see this error in your job error output file<br />
<pre><br />
-bash: module: line 1: syntax error: unexpected end of file<br />
-bash: error importing function definition for `BASH_FUNC_module'<br />
</pre><br />
Add the command `unset module` before calling your module files. The -V job argument will cause a conflict with the module function used in your script.<br />
<br />
== Sample Job Scripts ==<br />
The following are sample job scripts, please be careful to edit these for your environment (i.e. replace <font color="red">YOUR_EMAIL_ADDRESS</font> with your real email address), set the h_rt to an appropriate runtime limit and modify the job name and any other parameters.<br />
<br />
'''Hello World''' is the classic example used throughout programming. We don't want to buck the system, so we'll use it as well to demonstrate jobs submission with one minor variation: our hello world will send us a greeting using the name of whatever machine it runs on. For example, when run on the Cheaha login node, it would print "Hello from login001".<br />
<br />
=== Hello World (serial) ===<br />
<br />
A serial job is one that can run independently of other commands, ie. it doesn't depend on the data from other jobs running simultaneously. You can run many serial jobs in any order. This is a common solution to processing lots of data when each command works on a single piece of data. For example, running the same conversion on 100s of images.<br />
<br />
Here we show how to create job script for one simple command. Running more than one command just requires submitting more jobs.<br />
<br />
* Create your hello world application. Run this command to create a script, turn it into to a command, and run the command (just copy and past the following on to the command line).<br />
<pre><br />
cat > helloworld.sh << EOF<br />
#!/bin/bash<br />
echo Hello from `hostname`<br />
EOF<br />
chmod +x helloworld.sh<br />
./helloworld.sh<br />
</pre><br />
<br />
* Create the Slurm job script that will request 256 MB RAM and a maximum runtime of 10 minutes.<br />
<pre><br />
$ vi helloworld.job<br />
</pre><br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=helloworld<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=helloworld.err<br />
#SBATCH --output=helloworld.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=$USER@uab.edu<br />
<br />
./helloworld.sh<br />
</pre><br />
* Submit the job to Slurm scheduler and check the status using squeue<br />
<pre><br />
$ sbatch helloworld.job<br />
Submitted batch job 52888<br />
</pre><br />
* When the job completes, you should have output files named helloworld.out and helloworld.err <br />
<pre><br />
$ cat helloworld.out <br />
Hello from c0003<br />
</pre><br />
<br />
=== Hello World (parallel with MPI) ===<br />
<br />
MPI is used to coordinate the activity of many computations occurring in parallel. It is commonly used in simulation software for molecular dynamics, fluid dynamics, and similar domains where there is significant communication (data) exchanged between cooperating process.<br />
<br />
Here is a simple parallel Slurm job script for running commands the rely on MPI. This example also includes the example of compiling the code and submitting the job script to the Slurm scheduler.<br />
<br />
* First, create a directory for the Hello World jobs<br />
<pre><br />
$ mkdir -p ~/jobs/helloworld<br />
$ cd ~/jobs/helloworld<br />
</pre><br />
* Create the Hello World code written in C (this example of MPI enabled Hello World includes a 3 minute sleep to ensure the job runs for several minutes, a normal hello world example would run in a matter of seconds).<br />
<pre><br />
$ vi helloworld-mpi.c<br />
</pre><br />
<pre><br />
#include <stdio.h><br />
#include <mpi.h><br />
<br />
main(int argc, char **argv)<br />
{<br />
int rank, size;<br />
<br />
int i, j;<br />
float f;<br />
<br />
MPI_Init(&argc,&argv);<br />
MPI_Comm_rank(MPI_COMM_WORLD, &rank);<br />
MPI_Comm_size(MPI_COMM_WORLD, &size);<br />
<br />
printf("Hello World from process %d of %d.\n", rank, size);<br />
sleep(180);<br />
for (j=0; j<=100000; j++)<br />
for(i=0; i<=100000; i++)<br />
f=i*2.718281828*i+i+i*3.141592654;<br />
<br />
MPI_Finalize();<br />
}<br />
</pre><br />
* Compile the code, first purging any modules you may have loaded followed by loading the module for OpenMPI GNU. The mpicc command will compile the code and produce a binary named helloworld_gnu_openmpi<br />
<pre><br />
$ module purge<br />
$ module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
<br />
$ mpicc helloworld-mpi.c -o helloworld_gnu_openmpi<br />
</pre><br />
* Create the Slurm job script that will request 8 cpu slots and a maximum runtime of 10 minutes<br />
<pre><br />
$ vi helloworld.job<br />
</pre><br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=helloworld_mpi<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=helloworld_mpi.err<br />
#SBATCH --output=helloworld_mpi.out<br />
#SBATCH --ntasks=8<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:01:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
mpirun -np $SLURM_NTASKS helloworld_gnu_openmpi<br />
</pre><br />
* Submit the job to Slurm scheduler and check the status using squeue -u $USER<br />
<pre><br />
$ sbatch helloworld.job<br />
<br />
Submitted batch job 52893<br />
<br />
$ squeue -u BLAZERID<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
52893 express hellowor BLAZERID R 2:07 2 c[0005-0006]<br />
<br />
</pre><br />
* When the job completes, you should have output files named helloworld_mpi.out and helloworld_mpi.err<br />
<pre><br />
$ cat helloworld_mpi.out<br />
<br />
Hello World from process 1 of 8.<br />
Hello World from process 3 of 8.<br />
Hello World from process 4 of 8.<br />
Hello World from process 7 of 8.<br />
Hello World from process 5 of 8.<br />
Hello World from process 6 of 8.<br />
Hello World from process 0 of 8.<br />
Hello World from process 2 of 8.<br />
</pre><br />
<br />
=== Hello World (serial) -- revisited ===<br />
<br />
The job submit scripts (sbatch scripts) are actually bash shell scripts in their own right. The reason for using the funky #SBATCH prefix in the scripts is so that bash interprets any such line as a comment and won't execute it. Because the # character starts a comment in bash, we can weave the Slurm scheduler directives (the #SBATCH lines) into standard bash scripts. This lets us build scripts that we can execute locally and then easily run the same script to on a cluster node by calling it with sbatch. This can be used to our advantage to create a more fluid experience in moving between development and production job runs. <br />
<br />
The following example is a simple variation on the serial job above. All we will do is convert our Slurm job script into a command called helloworld that calls the helloworld.sh command.<br />
<br />
If the first line of a file is #!/bin/bash and that file is executable, the shell will automatically run the command as if were any other system command, eg. ls. That is, the ".sh" extension on our HelloWorld.sh script is completely optional and is only meaningful to the user.<br />
<br />
Copy the serial helloworld.job script to a new file, add a the special #!/bin/bash as the first line, and make it executable with the following command (note: those are single quotes in the echo command): <br />
<pre><br />
echo '#!/bin/bash' | cat helloworld.job > helloworld ; chmod +x helloworld<br />
</pre><br />
<br />
Our sbatch script has now become a regular command. We can now execute the command with the simple prefix "./helloworld", which means "execute this file in the current directory":<br />
<pre><br />
./helloworld<br />
Hello from login001<br />
</pre><br />
Or if we want to run the command on a compute node, replace the "./" prefix with "sbatch ":<br />
<pre><br />
$ sbatch helloworld<br />
Submitted batch job 53001<br />
</pre><br />
And when the cluster run is complete you can look at the content of the output:<br />
<pre><br />
$ $ cat helloworld.out <br />
Hello from c0003<br />
</pre><br />
<br />
You can use this approach of treating you sbatch files as command wrappers to build a collection of commands that can be executed locally or via sbatch. The other examples can be restructured similarly.<br />
<br />
To avoid having to use the "./" prefix, just add the current directory to your PATH. Also, if you plan to do heavy development using this feature on the cluster, please be sure to run [https://docs.uabgrid.uab.edu/wiki/Slurm#Interactive_Session sinteractive] first so you don't load the login node with your development work.<br />
<br />
=== Gromacs ===<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --partition=short<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=test_gromacs<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=test_gromacs.err<br />
#SBATCH --output=test_gromacs.out<br />
#SBATCH --ntasks=8<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=10:00:00<br />
#SBATCH --mem-per-cpu=2048<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
<br />
module load GROMACS/5.0.5-intel-2015b-hybrid <br />
<br />
# Change directory to the job working directory if not already there<br />
cd ${USER_SCRATCH}/jobs/gromacs<br />
<br />
# Single precision<br />
MDRUN=mdrun_mpi<br />
<br />
# Enter your tpr file over here<br />
export MYFILE=example.tpr<br />
<br />
mpirun -np SLURM_NTASKS $MDRUN -v -s $MYFILE -o $MYFILE -c $MYFILE -x $MYFILE -e $MYFILE -g ${MYFILE}.log<br />
<br />
</pre><br />
<br />
=== R ===<br />
<br />
The following is an example job script that will use an array of 10 tasks (--array=1-10), each task has a max runtime of 2 hours and will use no more than 256 MB of RAM per task.<br />
<br />
Create a working directory and the job submission script<br />
<pre><br />
$ mkdir -p ~/jobs/ArrayExample<br />
$ cd ~/jobs/ArrayExample<br />
$ vi R-example-array.job<br />
</pre><br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --array=1-10<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=R_array_job<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=R_array_job.err<br />
#SBATCH --output=R_array_job.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load R/3.2.0-goolf-1.7.20 <br />
cd ~/jobs/ArrayExample/rep$SLURM_ARRAY_TASK_ID<br />
srun R CMD BATCH rscript.R<br />
</pre><br />
<br />
Submit the job to the Slurm scheduler and check the status of the job using the squeue command<br />
<pre><br />
$ sbatch R-example-array.job<br />
$ squeue -u $USER<br />
</pre><br />
<br />
== Installed Software ==<br />
<br />
A partial list of installed software with additional instructions for their use is available on the [[Cheaha Software]] page.</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Galaxy_File_Uploads&diff=5738Galaxy File Uploads2018-05-03T15:49:41Z<p>Tanthony@uab.edu: added section privacy with Sensitive information template</p>
<hr />
<div>[https://galaxy.uabgrid.uab.edu UAB Galaxy] supports data import in three ways:<br />
<br />
{| border="1"<br />
|+ <br />
! Method !! Limitation <br />
|-<br />
| Direct file uploads to using a web browser<br />
| only files < 2G<br />
|-<br />
| Fetching data from external URLs through Galaxy (ftp/http)<br />
| can't access some password protected sites, such as the HudsonAlpha GSL<br />
|-<br />
| Importing files via the Cheaha file system<br />
| requires an [[Cheaha_GettingStarted#Access|account]] on cheaha, but command-line can be avoided<br />
|-<br />
|}<br />
<br />
==Privacy==<br />
{{SensitiveInformation}}<br />
<br />
==Direct file uploads to using a web browser==<br />
Web browser based file upload is a convenient approach, but not recommended for files larger than 2 GB in size because of browser limitations. Also, web browser based upload in Galaxy doesn't provide any feedback on upload progress and it can be an unreliable operation. Hence, it's recommended to stage data on Galaxy accessible file-system and then import it in Galaxy.<br />
<br />
==Importing files via the Cheaha file system==<br />
UAB Galaxy instance is configured to look for files in '/scratch/importfs/galaxy/$USER' and '/scratch/user/$USER' directories on Cheaha. Data files can be copied to Cheaha using [[Wikipedia:Secure_copy|scp]] or they can be downloaded using tools like wget, curl or ftp. A nice windows-friendly drag-and-drop tool is [http://winscp.net/eng/download.php#download2 WinSCP]. Please refer to [[Cheaha_GettingStarted#Access]] page for getting access to Cheaha.<br />
<br />
Following sections provide an overview of UAB Galaxy import methods. <br />
<br />
# importfs or file drop-off mode: UAB Galaxy platform is configured to import files in $GALAXY_IMPORTFS directory on Cheaha (/scratch/importfs/galaxy/$USER). Galaxy application 'moves' files from imports directory to it's internal datasets directory. See [[Galaxy_Importfs]] page for more details on this upload method.<br />
# Data Library: Galaxy has a concept of 'Data Libraries' which is a data container to organize files in an hierarchical manner, similar to directories on a desktop. Data libraries provide other features for data organization and sharing as well. Data libraries support files uploads using a web browser, fetching from external URLs and also by copying existing directories in a file-system. The file-system copy is similar to importfs option described above, however, it copies file to internal datasets directory rather than moving it. UAB Galaxy platform is configured to copy files in $USER_SCRATCH (/scratch/user/$USER) directory. See [[Galaxy_Data_Libraries]] page for more details on data libraries.</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=FAQ&diff=5737FAQ2018-05-03T15:48:16Z<p>Tanthony@uab.edu: /* What high-speed data transfer software can keep up? */</p>
<hr />
<div>A FAQ for things you might like to know<br />
<br />
== Networking Questions ==<br />
<br />
=== General ===<br />
<br />
==== What type of networking is used on campus? ====<br />
<br />
The campus network is an Ethernet packet-based network.<br />
<br />
==== What is Ethernet? ====<br />
<br />
Ethernet is a family of [[wikipedia:network packet|packet]]-based [[wikipedia:computer network]|computer networking]ing technologies for [[wikipedia:local area network|local area]] and [[wikipedia:wide area network|wide area network]]s (LANs and WANs). Most [[wikipedia:laptop|laptop]]s, [[wikipedia:desktop computer|desktop computer]]s, [[wikipedia:server (computing)|server computer]]s, [[wikipedia:cable modem|cable modem]]s and [[wikipedia:DSL modem|DSL modem]]s have a built-in support for Ethernet networks. For more information and history, read the [[wikipedia:Ethernet|Wikipedia entry on Ethernet]].<br />
<br />
(Credits [[Wikipedia:Ethernet]] April 08, 2011)<br />
<br />
==== What is the recommended configuration for a researcher's network connection? ====<br />
<br />
It depends on the work that you do. If your work frequently involves<br />
moving data sets to and from your computer for visualization, analysis,<br />
or collaboration, you should seriously consider a 100Mbs full-duplex<br />
network connection as your baseline.<br />
<br />
==== What the difference between Mbs and MBs? ====<br />
<br />
"Mbs" stands for "megabits per second". "MBs" stands for "megabytes per<br />
second". A lower-case "b" designates bits (1's and 0's) and an<br />
upper-case "B" designates bytes. 1 byte equals 8 bits.<br />
<br />
Bits are used to measure network data transfer rates in seconds and<br />
bytes are used to measure data storage sizes. When stored data is moved<br />
across a network, however, it is convenient to consider transfer times<br />
measured in the number of bytes of stored data moved in one second.<br />
<br />
==== What do 10Mbs, 100Mbs, and 1Gbs mean? ====<br />
<br />
Network speeds are listed by the number of bits (1's and 0's) they can<br />
transfer in one second. Modern networks transfer millions of bits per<br />
second, designated "Mbs" and read "mega-bits per second". Common<br />
network speeds are 10Mbs, 100Mbs, and 1000Mbs. 1000 megabits are equal<br />
to 1 gigabit, and 1000Mbs is typically written "1Gbs" and read "one<br />
gigabit per second" (1 billion bits per second).<br />
<br />
==== How fast are 10Mbs, 100Mbs, and 1Gbs networks? ====<br />
<br />
To get a sense for the performance of different network speeds, it's<br />
easiest to use the following rules of thumb for comparing network speeds<br />
to data set sizes and their transfer time:<br />
<br />
* 10Mbs can transfer 1MBs<br />
* 100Mbs can transfer 10MBs<br />
* 1000Mbs (1Gbs) can transfer 100MBs<br />
<br />
A CDROM can hold 700MB of data. Transferring this much data would take<br />
about 7 seconds on a 1Gbs network, 70 seconds (more than 1 minute) to<br />
transfer on a 100Mbs network, and 700 seconds (more than 10 minutes) to<br />
transfer on a 10Mbs network.<br />
<br />
==== What's the justification for this transfer rate rule of thumb? ====<br />
<br />
The logic for this metric is that a 10Mbs (10 mega-bit per second)<br />
network connection will move 10 million bits per second. Data is<br />
measured in 8-bit bytes and the rule of thumb for Ethernet is that<br />
performance peaks at 80% capacity. This provides the easy conversion<br />
factor of 10Mbs=1MBs. Note that the lower-case "b" means "bits" and<br />
upper-case "B" means bytes, ie. 8 bits. The network speeds scale up<br />
easily by factors of 10. So 100 megabit per second connection is<br />
capable of transferring 10 megabytes per second, and a 1000 megabit per<br />
second is capable of transferring 100 megabytes per second.<br />
<br />
Theoretically, a 100Mbs connection will transfer 100 million bits in one<br />
second, or about 10 megabytes (MB) per second. This means you would be<br />
able to transfer a CD's worth of data (about 700MB) in about 70 seconds,<br />
about 1 minute. (Compare this to a 10x slower connection of 10Mbs and<br />
it would take 700 seconds<br />
<br />
=== Network Structure ===<br />
<br />
==== How much network bandwidth is available is available on campus? ====<br />
<br />
Individual network connections at 10Mbs, 100Mbs, or 1Gbs speeds can be<br />
delivered to any location on the campus network at standard rates.<br />
Additionally, wireless network connectivity is available across campus.<br />
<br />
==== What does the campus network look like? ====<br />
<br />
The campus network can be visualized as a collection of network trees,<br />
roughly one per building, with the root of each tree connecting to an<br />
expandable high bandwidth core network backplane (currently running at<br />
10Gbs).<br />
<br />
The depth of each individual tree is determined by the physical layout<br />
of and number of network ports in each building. Each tree is typically<br />
no more than three layers deep, including the leaf nodes. The leaf nodes<br />
are the end-user connections, i.e. wired wall ports or wifi connections.<br />
The internal nodes of each tree are network switches and the switches<br />
are connected to the next layer via fast connections (currently running<br />
at 1Gbs).<br />
<br />
Each tree (each building) connects to the core network backplane via a<br />
fast connection (currently running at 1Gbs). At this core network<br />
connection, the data packets are routed to their final destination on-<br />
or off-campus.<br />
<br />
==== How is the campus network connected to off-campus networks? ====<br />
<br />
The campus core network backplane is connected to off-campus networks<br />
like the commercial Internet (Google, Facebook, Amazon) and national<br />
high bandwidth research networks (Internet2 and NLR) which provide high<br />
speed connections to research institutions and labs across the country.<br />
The fastest network route to a specific off-campus destination is<br />
chosen automatically as the network packets move off-campus.<br />
<br />
Custom configurations to meet unique research needs or specific<br />
performance targets can be designed. This requires advanced planning<br />
and an understanding of the proposed research workloads and workflow.<br />
Please contact Research Computing. The cost for these customizations<br />
can often be included in research proposals.<br />
<br />
=== Ordering Information ===<br />
<br />
==== How do I order or upgrade a network connection? ====<br />
<br />
Computer data connections are ordered from [http://www.comm.uab.edu/commweb/default.aspx UAB IT Telecommunications Services] via their [https://commservices.comm.uab.edu/ServiceRequest/login.aspx service request form].<br />
<br />
To place an order you will need to provide a general ledger account number for billing and identify the location (building address) of the service request. The wall-jack identification number for the network connection will be needed to complete the service request and can be entered on the form.<br />
<br />
If you have questions please contact UABCOMM@uab.edu or call 4-0503.<br />
<br />
==== Who pays for my network connection? ====<br />
<br />
You do. <br />
<br />
Network connections are accounted for via a federally<br />
regulated service center run by UAB IT. The rates are set based on the<br />
cost to deliver the service. Money to pay for network <br />
connectivity can come from any legitimate source: directly through <br />
grants, indirect grant funds routed to departments, or other <br />
departmental or research support funds.<br />
<br />
==== How much do network connections cost? ====<br />
<br />
Standard service center rates apply to all network connections (10Mbs, 100Mbs, and 1Gbs). Discounted rates for upgrading existing connections to higher data rates are available. Additionally, network switches can be ordered at a fixed lease rate to supply many network connections to an area.<br />
<br />
Please contact UAB IT Telecommunications for rates at UABCOMM@uab.edu or call 4-0503.<br />
<br />
=== Network Performance ===<br />
<br />
==== How do I measure my network connection speed? ====<br />
<br />
Accessing the [http://speedtest.dpo.uab.edu UAB IT SpeedTest server speedtest.dpo.uab.edu] from your web browser will allow you to run a data transfer test from your computer to the SpeedTest server and assess the general performance of your network connection. The reported performance is a good gauge of your maximum achievable network performance across the campus network. If you are off campus, it is also a great way to measure your data transfer rates across the Internet to UAB.<br />
<br />
Please note that the SpeedTest server reports your bandwidth in megabits per second (Mbps). If you are transferring data, you are most likely interested in knowing file transfer speeds in megabytes per second (i.e. how long it takes to transfer a file that is X megabytes large). A reasonably accurate conversion from bits to bytes is to simply divide the reported megabits per second number from the SpeedTest by 10 to get megabytes per second.<br />
<br />
Also, keep in mind that your actual data transfer speeds depend on at least three factors: 1) the speed of your computer's network connection, 2) any network devices (like firewalls) between your computer and the destination, and 3) the network connection speed of the computer you are transferring data to or from. You can help ensure the best possible experience by provisioning a high-speed network connection for your computer.<br />
<br />
==== How do I measure my network bandwidth from my computer to Cheaha? ====<br />
<br />
You can measure the data transfer performance between your computer and Cheaha by using [[wikipedia:iperf|iperf]]. <br />
<br />
To run an iperf test you will need to install iperf on your desktop and have a public IP address for your computer. Iperf is readily available for [https://publishing.ucf.edu/sites/itr/cst/Pages/IPerf.aspx Windows], Mac, and Linux. It is also already installed on Cheaha.<br />
<br />
To run a 30 second data transfer test from Cheaha to your computer using the iperf test follow these steps:<br />
# Start iperf from a command shell on you desktop in "server" mode. This mode causes iperf to listen on TCP port 5001 for incoming data from Cheaha.<br />
iperf -s -i 1<br />
# Log into your [[Cheaha]] account<br />
# Start iperf from the command line on Cheaha in "client" mode. This mode causes iperf to send data to TCP port 5001 at the provided IP address (ie. the public IP address of your computer).<br />
/opt/iperf/bin/iperf -c <ip-of-you-computer> -t 30 -i 1<br />
<br />
The iperf program output on Cheaha will display the current data transfer rate to your computer once per second for 30 seconds.<br />
<br />
Keep in mind, this test requires that your computer have a public IP address that is reachable from Cheaha since test data is sent from Cheaha to your computer. The iperf server on your computer listens on port 5001 by default. Your computer should allow incoming connections at this port from cheaha.uabgrid.uab.edu. You may need to update your firewall rules to allow access. If you are running a Linux system, you can use the following iptables commend to append a rule to open port 5001 for incoming tcp connections from cheaha. Consult your system and local network documentation for details. <br />
# sudo /sbin/iptables -A -p tcp -s 164.111.161.10 --dport 5001 -j ACCEPT <br />
<br />
The data transfer rates reported by iperf reflect your speed for data transferred from Cheaha to your computer. This should provide a reasonable estimate for data transferred to Cheaha as well.<br />
<br />
==== What factors impact the actual speeds I can expect in the real world? ====<br />
<br />
The actual transfer rates you get depend on three factors: software,<br />
hardware, and other users.<br />
<br />
Data transfer software and computer hardware can significantly impact<br />
real world transfer rates. If you are transferring lots of data, you<br />
will see your best performance with software that can keep the network<br />
full, computer hardware that is not slower than the data network, and a<br />
network connection sized for your data sets and patience.<br />
<br />
==== How does my copying software impact my transfer speeds? ====<br />
<br />
The software you use to transfer data is the most import factor in<br />
maximizing data throughput. Most traditional copy methods move data in<br />
a single-file line. Modern computer hardware hides this software<br />
inefficiency and can easily keep a 10Mbs connection full and can do ok<br />
with a 100Mbs connection. If you are moving lots of data or using a<br />
1Gbs network, you need to use software tuned for high-speed data transfer.<br />
<br />
High speed data transfer software uses multiple single-file lines in<br />
parallel to improve network throughput. This software must be used at both<br />
ends of the data transfer in order coordinate the parallel transfer<br />
streams. You won't get very far if you are smart but your peer is not.<br />
<br />
==== What high-speed data transfer software can keep up? ====<br />
{{SensitiveInformation}}<br />
<br /><br />
It is important to use improved data transfer software that can move data efficiently. There are inherent limitations to performance when data is transferred serially (one bit after the other). The most familiar tools like FTP and enhanced SCP peak at around 1Gigabyte/sec ([http://www.es.net/assets/pubs_presos/20130113-tierney-Science-DMZ-DTNs.pdf some performance data (PDF)]). More advanced software for very high speed networks will support parallel data transfers of single files. Some data providers also offer special data transfer tools to maximize your performance. You can learn more about high speed data transfers and maximized network performance from the [http://fasterdata.es.net/science-dmz/DTN/ Science DMZ project of the Energy Sciences Network].<br />
<br />
==== How does my computer hardware impact my transfer speeds? ====<br />
<br />
Computer hardware also impacts transfer speeds. Your slowest piece of<br />
hardware will dictate your maximum data transfer rate. If you have a<br />
slow disk (you should read that as "an external USB hard drive"), you<br />
will be limited by its data transfer speeds.<br />
<br />
Additionally, your computer may be fast but it still has to manage your<br />
workload and coordinate use of all the devices in your computer,<br />
including the network connection. If you are crunching numbers or doing<br />
heavy visualizations at the same time you are trying to transfer data,<br />
your computer may not be able to keep up. Note, that this scenario is<br />
common when you are reading data for your visualization off a file<br />
server. Sometimes you need to move your data before you can use it.<br />
<br />
==== How do I measure my off-campus network connection speed? ====<br />
<br />
The [http://speedtest.net SpeedTest.net] service can be used to measure your connection to key points on the Internet. To run this test, choose the Atlanta, GA connection point. This will run a data transfer test from your computer, off-campus to the SpeedTest.net server hosted by Comcast in Atlanta, GA. The test will rate the performance of a data transfer. <br />
<br />
Atlanta is a good test destination because this is where UAB's Internet-bound traffic actually connects to the commodity Internet. This test will show the network performance to our nearest off-campus neighbor. If you want to share the results of this test with others, please be sure to click the "Share this Test" and then "Copy" buttons. This will provide you a URL to a PNG image capturing the results of this test that anyone can load in their browser.<br />
<br />
==== What factors impact my off-campus network connection speed? ====<br />
<br />
It is important to understand that Internet traffic speeds are highly variable. Transfer speed depends heavily on the network capacity and use along the entire path from your desktop to the location with which you are exchanging data. It also depends on the capabilities of your desktop and the server that is the target of your data transfer. If the networks or remote sites are overloaded or have insufficient bandwidth, then your data transfer speeds will be limited by those conditions.<br />
<br />
As an example, you can try a speed test to a network destination other than Atlanta, GA or a speed test hosted by a network provider other than Comcast. The spead tests from [http://www.ookla.com/speedtest.php Ookla.net] and [http://speakeasy.net/speedtest/ Speakeasy.net] may show different performance for the selected destinations. You may also find the information at [http://speedtest.org SpeedTest.org] informative.<br />
<br />
== Internet Questions ==<br />
<br />
=== Can I make up a host name for my computer for use on the Internet? ===<br />
<br />
No. The Internet relies on a host name look-up service called DNS (Domain Name System). Host names must be registered in the DNS in order to use them on the Internet.<br />
<br />
=== What is DNS? ===<br />
<br />
DNS is the (Domain Name System). It is address look-up service for the Internet. It is the system that allows all computers to know the correct address for a particular name. The DNS has certain rules to follow for registering a public name. The main rule is that you can only name things in your own domain. For example, you can't register a name like mycomputer.google.com, because only Google has the right to use the google.com domain name.<br />
<br />
For a basic introduction to the DNS please see these helpful links:<br />
* http://www.howstuffworks.com/dns.htm/printable<br />
* http://en.wikipedia.org/wiki/Domain_Name_System<br />
<br />
=== Can I use an "_" (underscore) in my host name? ===<br />
<br />
No. The DNS system does not support using the "_" in host names.<br />
<br />
=== But can't I just call my host whatever I want? ===<br />
<br />
Yes, you can. But you need to understand that all host naming on the Internet is defined from the perspective of whatever computer you are on at the moment. If you make up your own host name for some computer and record it locally, you can certainly use that host name from your local computer, however, you will be the only person who knows about the name. <br />
<br />
In order to let anyone know the name and reach the same computer, you need to register your host name in a public database used by all computers on the Internet. That database is the DNS. It is the only common reference point for name-to-IP mappings on the Internet. In order to register this public name you need to follow the rules for assigning names in the DNS.<br />
<br />
== Storage Questions ==<br />
<br />
=== Is there storage space for research data? ===<br />
<br />
The rapidly growing demand for research storage is clearly recognized. Solutions for hosting research data are under active development (and funding discussions) as part of the [[UABgrid FAQ|UABgrid Pilot]]. Currently, research storage is only available through the traditional compute cluster interface of [[Cheaha]].<br />
<br />
=== How can I contribute to the development of research storage? ===<br />
<br />
The best way to contribute to the development of research storage is to share your storage requirements.<br />
<br />
# How much data do you currently store?<br />
# How are you solving your research data problem today?<br />
# How much do you expect your data to grow in the next year?<br />
# Are you building an analysis pipeline that has known storage expectations?<br />
# Do you need to archive your data? How long?<br />
# Do you need to keep all your data on-line?<br />
# Do you ever delete your data?<br />
# How expensive is it for you to recreate derived data products?<br />
<br />
=== How can I use the existing research storage on Cheaha? ===<br />
<br />
The generally available research storage on the cluster is designated to support storage requirements for the construction of data analysis pipelines where data needs to be shared by multiple users on the cluster.<br />
<br />
=== What best-practices exist for storing my research data? ===<br />
<br />
There are many solutions for storing your research data. Simply keeping it on your desktop is one option. As data grows it is often necessary to move it off your system. Most people find some form of USB Drive to be an acceptable solution. One solution that has become popular is the use of DroboFS.<br />
<br />
Note: No endorsements are made of any product of the fitness of any solution.<br />
<br />
== Cheaha Cluster ==<br />
<br />
=== How do I get an account to use cluster computing on Cheaha? ===<br />
<br />
Please {{CheahaAccountRequest}}. Include you UAB BlazerID and some information about which group you are a part of here on campus and what your plans are for using the cluster.<br />
<br />
=== How do I get started using the cluster after I have an account? ===<br />
<br />
A basic [[Cheaha_GettingStarted|getting started guide]] is available and should answer questions about how to log in to Cheaha and submit a batch job.<br />
<br />
=== How do I cut-and-paste into a terminal window, ctrl+c always exits my commands? ===<br />
<br />
Using a terminal window for an SSH session from your desktop, you can cut-n-paste into that terminal window from your desktop, eg. you may want to copy the example job commands in the [[Cheaha_GettingStarted|getting started guide]]. The exact key combination varies depending on the terminal program you use but it is often Shift+Ctrl+C. On Mac's, the normal command+c keystroke often works since it doesn't not generate the ctrl+c character sequence.<br />
<br />
=== How can I view HTML files on the cluster without transferring them to my desktop? ===<br />
<br />
If you need to view files that are formatted using HTML, e.g documentation for some tool you are using or HTML formatted output produced by your job, an easy way to view that content is the elinks command. [http://elinks.or.cz/ ELinks] is a terminal-based web browser that you can use directly from you SSH terminal session. Simply enter the command <tt>elinks filename.html</tt> and it will display a text-only rendering of the HTML content. ELinks is also a convenient choice for accessing regular web sites, for example <tt>elinks http://google.com</tt>.<br />
<br />
More advanced options for viewing HTML files include starting your SSH session with X-forwarding, eg. <tt>ssh -X</tt>, and launching Firefox to display on your desktop. Your desktop needs to support X11 and should be on-campus (due to network traffic load) to use this option. <br />
<br />
Other options not documented here include launching a VNC session to display Firefox, which will work better for off-campus access, or to use a file system client like SSHFS to mount your home directory on your desktop and then use your desktop web browser to load the HTML files.<br />
<br />
=== How can I run graphical applications on the cluster? ===<br />
<br />
There are two options for doing this:<br />
<br />
# If you are on a Mac or Linux machine, you can simply type `ssh -X cheaha` when you connect to Cheaha. This will open a connection for graphical applications on the cluster to your local desktop. When you start a graphical application on cheaha, it will get displayed on your local desktop. You can do this from a Windows machine as well, but you need to first install an X Windows program.<br />
# Alternatively, you can set up a [[Setting_Up_VNC_Session|cluster desktop]] and display your graphical application there. This option uses the VNC display protocol to connect to your cluster desktop. This is often easier to use from Windows since you only need to install one VNC program, rather than a complex software system as is the case with X Windows on Windows.<br />
<br />
In both cases, you should not do any heavy compute processing on the cheaha head node. This means you should [[Cheaha_GettingStarted#Interactive_Resources|log into an interactive compute node]] using qlogin after you connect to cheaha using option 1 or option 2 above and start your graphical application on the assigned interactive compute node.<br />
<br />
<br />
=== Why is the 'top' command showing that there isn't enough RAM? ===<br />
<br />
The following link should give you an idea as to why [http://www.linuxatemyram.com/ Linux ate your RAM]<br />
<br />
== Desktop Questions ==<br />
<br />
=== How do I install Ubuntu Linux along side Windows? ===<br />
<br />
Ubuntu has a good community of documentation writers. Ubuntu's Win7 dual boot instructions provides solid advice:<br />
<br />
https://help.ubuntu.com/community/WindowsDualBoot<br />
<br />
Ubuntu has an option to install side-by-side with windows and allow dual boot selection at boot. When you choose a size for Ubuntu you could split the space evenly between the two so you have ample room for data on either system (Ubuntu only needs about 20GB for the system and apps, the rest would be for your personal data). If this is your main work box you might leave more room for Windows. Looks like it will be very straight forward.<br />
<br />
== Collaboration Tools ==<br />
<br />
=== How do I edit a wiki page on docs? ===<br />
<br />
Users are encouraged to create original content and improve existing content on the docs wiki. Please see the [[Documentation#Editing_Docs|introduction to docs]] for more guidance on editing wiki pages.<br />
<br />
=== How do I link to a file on docs with alternate text? ===<br />
<br />
There are two ways to link to a file uploaded to docs and provide [http://www.mediawiki.org/wiki/Help:Images#Linking_to_an_image_without_displaying_it alternate text]:<br />
# Link to the file summary page from which the file can then be downloaded. Alternate text can be provided by prefixing the File namespace with a colon and using the vertical bar to separate the text:<br />
<pre><br />
[[:File:name-of-file.jpg|link text for file]]<br />
</pre><br />
# Link directly to the file so it is immediately available to the client web browser<br />
<pre><br />
[[Media:name-of-file.jpg|link text for file]]<br />
</pre><br />
<br />
More information on these methods and other file and image link syntax can be found on the [http://www.mediawiki.org/wiki/Help:Images MediaWiki Help page for Images].<br />
<br />
=== Why is the wiki markup syntax different between my project space and the docs wiki? ===<br />
<br />
The "Projects" wikis are implemented using a tool called [http://www.edgewall.com/trac Trac] and follow a formatting convention popularized by earlier wikis mainly [http://moinmo.in/ MoinMoin]. The "Docs" wiki is implemented using a tool called [http://www.mediawiki.org MediaWiki] and follows a formatting convention popularized by Wikipedia. Because these communities have focused on addressing specific use cases, software developers in the case of Trac and document writers in the case of Mediawiki, there formatting conventions have differ significantly in their details.<br />
<br />
Section heading markup (using '=' to designate section headings) and external urls (typing in a bare URL like http://google.com) are typically portable between the two wikis, but details like table layout vary widely.<br />
<br />
An easy option is to leave pages in place and reference them by name from the Projects or Docs wikis.<br />
<br />
=== Should I post XYZ to the list/group/forum? ===<br />
<br />
If you participate in an on-line discussion group and are asking yourself if you should post some sort of content to that group, thank you! Asking this question shows self restraint and consideration for others. These are the core tenants of on-line etiquette, or netiquette. [[wikipedia:Netiquette|Netiquette]] is the term used to describe rules of behavior for on-line discourse. The good news is that netiquette rules are pretty much the same as the basic rules of human interaction you learned as a child, so they should be really familiar to you by now. Respect others, and they will respect you.<br />
<br />
There is one primary additional consideration to keep in mind when participating in on-line discussions. On-line discussions should generally be considered public because you are communicating with more than one person at a time. This means that whatever you say and do on-line is amplified across all the people who will read your comments. This simple fact provides solid guidance for how to act in a forum and what information to post:<br />
# Your post will be seen by many people. Make sure it's relevant to the discussions that are typical of the group to which you are sending it.<br />
# Your email will be received by many people. You should think of "email" as a primitive "copy" command. For example, sending your email to 100 people will make 100 copies of the email and all the documents you have attached. Make sure the information you are including in your email or attaching to it is really worth each person having their own copy. There are completely legitimate reasons to share information and email is powerful copy command, however, you should use that power wisely and follow the conventions of the group with whom you are communicating. A simple heads up: most groups will frown on attaching large files to messages sent to a mailing list.<br />
<br />
If you want to learn more about netiquette or need more guidance here are some links that might be helpful: [http://www.albion.com/netiquette/book/0963702513p65.html Netiquette Book], [http://linux.sgms-centre.com/misc/netiquette.php Mailing List and Newsgroup netiquette], and [http://lowendmac.com/lists/netiquette.shtml more Mailing List and Newsgroup netiquette].<br />
<br />
== Application Questions ==<br />
<br />
=== How do I build software on the cluster? ===<br />
<br />
Please follow the [[AppBuildBestPractices]] when building software on the cluster. <br />
<br />
Each upstream software project will have their own requirements for building software and you will need to work within those requirements. You should also considered your local build practices and work to create a consistent experience on the Research Computing System so you applications adapt well to the culture and best practices. The [[AppBuildBestPractices]] are designed to help you do that.<br />
<br />
=== Why am I getting an SMTP 421 error when my App delivers to UAB mail servers? ===<br />
<br />
If you have an App that sends email messages(e.g. directly via SMTP or you have an App with an embedded host SMTP server) and you repeatedly receive the [http://www.greenend.org.uk/rjk/tech/smtpreplies.html SMTP 421 error code] when connecting to ppagent1 or ppagent2 to deliver email to @uab.edu addresses, you may have run afoul of the Proofpoint SPAM detection software and your machine running the App has been greylisted. One easy test is to run ''telnet ppagent1.ad.uab.edu 25'' from the machine running your App. If you get an immediate 421 response and the connection is closes, you are almost certainly greylisted, especially if you run this command from a different machine you get the standard 220 SMTP response. Proofpoint could greylist for sending just 10 similar messages in under 1 hour. This could easily happen if your App has an "invite user" feature that sends individualized messages a collection of users.<br />
<br />
To fix this error you need to contact askit@uab.edu and request that the host for your App be removed from the greylist of Proofpoint.<br />
<br />
=== How fast will my NAMD job run on Cheaha ===<br />
<br />
Take a look at our [[NAMD_Benchmarks]] documentation. We compared a number of different compute fabrics (gen3 [[Cheaha]] hardware (the sipsey.q), the ASA DMC nodes, and the NIH Biowulf cluster. We based our benchmarks off the approach followed by Biowulf. Our [[NAMD]] page also has good general guidance for setting up jobs on the cluster<br />
<br />
Our fabric scales wells up to 512 processes but the compute efficiency starts to taper above 256 slots. For a "[[NAMD_Benchmarks#Actual_job_benchmarks_.28Segrest_job.29|real world job]]" of 246K atoms we were able to produce about 0.0982 Days/ns. This example model would produce about 10ns/day at 256K atoms.<br />
<br />
You will want to run your model for a short duration at a given slot count to get accurate scaling numbers that you can use to predict the run time for your full model. By measuring in days/ns, as in our benchmarks, you will be able to predict longer run times well. days/ns * ns = days-for-job.<br />
<br />
You'll likely have a hard time getting 256 slots so you are better off maxing out at 128 slots, to ensure your job doesn't wait in the queue forever. You can see the number of currently available slots on the sipsey.q with the `qstat -g c -q sipsey.q` command. <br />
<br />
$ qstat -g c -q sipsey.q<br />
CLUSTER QUEUE CQLOAD USED RES AVAIL TOTAL aoACDS cdsuE <br />
--------------------------------------------------------------------------------<br />
sipsey.q 0.49 291 0 255 576 0 36<br />
<br />
If the cluster is lightly loaded, grabbing 128 nodes for a while shouldn't be too greedy, however, you will share resources more equitably if you run your model for shorter durations over multiple jobs, ie. if your full run would take 20 days, create four 5-day jobs. <br />
<br />
This is about being courteous to your fellow HPC users, so they get a chance to compute as well. <br />
<br />
You can accomplish this with dependencies between jobs so later model steps wait on earlier jobs using the [https://wiki.duke.edu/display/SCSC/SGE+Job+Dependencies -hold-jid parameter to qsub].<br />
<br />
== Security Questions ==<br />
<br />
=== What kind of security environment do you provide? ===<br />
<br />
The Research Computing System (RCS) is built on top of the [[wikipedia:Linux|Linux]] kernel and [[wikipedia:GNU|GNU]] system platform. Linux is a Unix-like environment. This mean that we provide an environment that builds on top of the file-process abstraction that is inherent in all Unix-like environments. The ownership and permissions of any resource (file, group of files, or processes) can be configured to allow only authorized access to the resource. Linux supports a large collection of security features and others can be added if needed. If you can think it; you can build it.<br />
<br />
Each user of the Research Computing System is assigned a unique identity that is used to control access to resources in the system. Your access rights are determined by your affiliations and the interfaces through which you access the system.<br />
<br />
=== What interfaces are available to access the Research Computing System? ===<br />
<br />
The Research Computing System can be accessed via the web, a command line interface (SSH), and desktop file shares (CIFS). Access via the Open Science Grid is under development.<br />
<br />
=== What are the security features of the command line interface? ===<br />
<br />
Access to the command line interface is provided by SSH via [[Cheaha]]. SSH requires you to use your system username and password. It grants you access to processes and files owned by you. SSH provides programmatic control of your files and processes. SSH is most common with users and developers of high-performance computing (HPC). By default, you are assigned a personal directory (i.e. your home directory) and a scratch directory (temporary, high-speed storage for large files on which you are computing). These are the only storage locations to which you have write access. All commands you execute (processes you run) will operate under your user identity and be restricted by file access permissions. Please visit [[Cheaha_GettingStarted]] for information on access and use of this interface.<br />
<br />
=== What is the security configuration for the desktop file sharing interface? ===<br />
<br />
Access to the desktop file sharing interface is provided by CIFS, ie. standard Microsoft Windows file sharing that is available on all computing platforms (Linux, Mac, Windows). Access is restricted to on-campus (or VPN) clients. Access requires using your system username and password. By default, this interface grants you access to your personal directory (i.e. your home directory). You may also have access to shared group storage for groups to which you belong or special read-only storage resources available to any client. All access is limited to manipulating files and restricted by the ownership of the files. Desktop file sharing can be used to create a seamless user experience between your desktop and the command-line interface. It also enables you to build storage solutions for your research needs.<br />
<br />
=== What is the security configuration for the web interface? ===<br />
<br />
Access to the web interface of the Research Computing System depends on the web applications implementing specific features. Some web applications may restrict access only to a specific user or groups of users authorized to use the application or access the content it makes available. When required, access to these features requires authentication with your Research Computing System account. Generally, all access to modify content is restricted to authenticated users. Depending on the application, anonymous web access is possible, typically in read-only mode.<br />
<br />
=== Why can't I manage my own affiliations? ===<br />
<br />
Our goal is to provide a comprehensive, integrated, user-managed affiliation and permissions system. Today you can self-manage your affiliations to the degree supported by the interfaces and tools you use, however, coordination of these settings across tools is not universal.<br />
<br />
=== Can I analyze my research data on the cluster? ===<br />
<br />
Yes, that is what the cluster is designated for. Depending on the nature of your research data you may be required to control who has access to that data. The system access controls are designed to allow explicit control over who can and cannot access your research data.<br />
<br />
=== How do FISMA, HIPAA, FERPA, IRB, etc impact my ability to use the Research Computing System? ===<br />
<br />
Depending on the nature of your research, you may work with regulated information types that have specific audit requirements. We are working to address these audit requirements within the standard Research Computing System. This does not limit your ability to use the system for your research today, however, it does require explicit documentation for your use case. <br />
<br />
To learn more about data security and the documentation we are developing please review our [[Information Security Guide]] and contact data security with specific questions.<br />
<br />
== Misc ==<br />
<br />
== UABgrid ==<br />
<br />
UABgrid is an infrastructure pilot of UAB IT Research Computing. More information can be found in the [[UABgrid FAQ]] though this information may be dated.</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Data_Movement&diff=5735Data Movement2018-05-03T15:28:50Z<p>Tanthony@uab.edu: added Sensitiveinformation Template</p>
<hr />
<div>'''NOTE: This page is under construction.'''<br />
{{SensitiveInformation}}<br />
<br />
There are various Linux native commands that you can use to move your data within the HPC cluster, such as [https://linux.die.net/man/1/mv mv], [https://linux.die.net/man/1/cp cp], [https://linux.die.net/man/1/scp scp] etc. One of the most powerful tools for data movement on Linux is [https://linux.die.net/man/1/rsync rsync], which we'll be using in our examples below. <br />
<br />
'''rsync''' and '''scp''' can also be used for moving data from a local storage to Cheaha.<br />
<br />
==General Usage==<br />
To find out more information such as flags, usage etc. about any of the above mentioned tools, you can use '''man TOOL_NAME'''.<br />
<pre><br />
[build@c0051 ~]$ man rsync<br />
<br />
NAME<br />
rsync - a fast, versatile, remote (and local) file-copying tool<br />
<br />
SYNOPSIS<br />
Local: rsync [OPTION...] SRC... [DEST]<br />
<br />
Access via remote shell:<br />
Pull: rsync [OPTION...] [USER@]HOST:SRC... [DEST]<br />
Push: rsync [OPTION...] SRC... [USER@]HOST:DEST<br />
<br />
Access via rsync daemon:<br />
Pull: rsync [OPTION...] [USER@]HOST::SRC... [DEST]<br />
rsync [OPTION...] rsync://[USER@]HOST[:PORT]/SRC... [DEST]<br />
Push: rsync [OPTION...] SRC... [USER@]HOST::DEST<br />
rsync [OPTION...] SRC... rsync://[USER@]HOST[:PORT]/DEST<br />
<br />
Usages with just one SRC arg and no DEST arg will list the source files<br />
instead of copying.<br />
<br />
DESCRIPTION<br />
.<br />
.<br />
.<br />
</pre><br />
<br />
If you are interested in finding out about various methods of moving data and various tools which can be used to achieve that aim, this page provides a very good description/guide : [http://moo.nac.uci.edu/~hjm/HOWTO_move_data.html How to transfer large amounts of data via network.].<br />
<br />
==Jobs==<br />
<br />
If the data that you are moving is large, then you should always use either an interactive session or a job script for your data movement. This ensures that the process for your data movement isn't using and slowing login nodes for a long time, and instead is performing these operations on a compute node.<br />
<br />
===Interactive session===<br />
<br />
* Start an interactive session using srun<br />
<pre><br />
srun --ntasks=1 --mem-per-cpu=1024 --time=08:00:00 --partition=medium --job-name=DATA_TRANSFER --pty /bin/bash<br />
</pre><br />
'''NOTE:''' Please change the time required and the corresponding [https://docs.uabgrid.uab.edu/wiki/SLURM#Slurm_Partitions partition] according to your need.<br />
<br />
* Start an rsync process to start the transfer, once you have moved from login001 to c00XX node:<br />
<pre><br />
[build@c0051 Salmon]$ rsync -aP SOURCE_PATH DESTINATION_PATH<br />
</pre><br />
<br />
===Job Script===<br />
<pre>#!/bin/bash<br />
#<br />
#SBATCH --job-name=test<br />
#SBATCH --output=res.txt<br />
#SBATCH --ntasks=1<br />
#SBATCH --partition=express<br />
#<br />
# Time format = HH:MM:SS, DD-HH:MM:SS<br />
#<br />
#SBATCH --time=10:00<br />
#<br />
# Mimimum memory required per allocated CPU in MegaBytes. <br />
#<br />
#SBATCH --mem-per-cpu=2048<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
rsync -aP SOURCE_PATH DESTINATION_PATH<br />
</pre><br />
<br />
'''NOTE:''' <br />
* Please change the time required and the corresponding [https://docs.uabgrid.uab.edu/wiki/SLURM#Slurm_Partitions partition] according to your need.<br />
* After modifications to the given job script, submit it using : '''sbatch JOB_SCRIPT'''<br />
<br />
==Moving data from Lustre to GPFS Storage==<br />
<br />
'''SGE and Lustre will be taken offline December 18 2016 and decommissioned. All data remaining on Lustre after this date will be deleted.'''<br />
<br />
Instructions for migrating data to /data/scratch/$USER location:<br />
* Login to the new hardware (hostname:cheaha.rc.uab.edu). Instructions to login can be found [https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#Overview here].<br />
* You will notice that your /scratch/user/$USER is also mounted on the new hardware. It’s a read-only mount, and there to help you in moving your data .<br />
* Start a rsync process using : '''rsync -aP /scratch/user/$USER/ /data/scratch/$USER'''. If the data that you would be transferring is large, then either start an [https://docs.uabgrid.uab.edu/wiki/Data_Movement#Interactive_session interactive session] for this task or create a [[https://docs.uabgrid.uab.edu/wiki/Data_Movement#Job_Script job script].<br />
<br />
Data in /home or /rstore isn’t affected and remains the same on both new and old hardware, hence you don’t need to move that data.<br />
<br />
==Examples==<br />
This sections provides various use cases where you would need to move your data.<br />
<br />
===Moving data from local storage to HPC===<br />
\\TODO<br />
<br />
===Moving data from rstore to /data/scratch===<br />
\\TODO</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=FAQ&diff=5734FAQ2018-05-03T15:27:28Z<p>Tanthony@uab.edu: /* What high-speed data transfer software can keep up? */ added sensitive information template</p>
<hr />
<div>A FAQ for things you might like to know<br />
<br />
== Networking Questions ==<br />
<br />
=== General ===<br />
<br />
==== What type of networking is used on campus? ====<br />
<br />
The campus network is an Ethernet packet-based network.<br />
<br />
==== What is Ethernet? ====<br />
<br />
Ethernet is a family of [[wikipedia:network packet|packet]]-based [[wikipedia:computer network]|computer networking]ing technologies for [[wikipedia:local area network|local area]] and [[wikipedia:wide area network|wide area network]]s (LANs and WANs). Most [[wikipedia:laptop|laptop]]s, [[wikipedia:desktop computer|desktop computer]]s, [[wikipedia:server (computing)|server computer]]s, [[wikipedia:cable modem|cable modem]]s and [[wikipedia:DSL modem|DSL modem]]s have a built-in support for Ethernet networks. For more information and history, read the [[wikipedia:Ethernet|Wikipedia entry on Ethernet]].<br />
<br />
(Credits [[Wikipedia:Ethernet]] April 08, 2011)<br />
<br />
==== What is the recommended configuration for a researcher's network connection? ====<br />
<br />
It depends on the work that you do. If your work frequently involves<br />
moving data sets to and from your computer for visualization, analysis,<br />
or collaboration, you should seriously consider a 100Mbs full-duplex<br />
network connection as your baseline.<br />
<br />
==== What the difference between Mbs and MBs? ====<br />
<br />
"Mbs" stands for "megabits per second". "MBs" stands for "megabytes per<br />
second". A lower-case "b" designates bits (1's and 0's) and an<br />
upper-case "B" designates bytes. 1 byte equals 8 bits.<br />
<br />
Bits are used to measure network data transfer rates in seconds and<br />
bytes are used to measure data storage sizes. When stored data is moved<br />
across a network, however, it is convenient to consider transfer times<br />
measured in the number of bytes of stored data moved in one second.<br />
<br />
==== What do 10Mbs, 100Mbs, and 1Gbs mean? ====<br />
<br />
Network speeds are listed by the number of bits (1's and 0's) they can<br />
transfer in one second. Modern networks transfer millions of bits per<br />
second, designated "Mbs" and read "mega-bits per second". Common<br />
network speeds are 10Mbs, 100Mbs, and 1000Mbs. 1000 megabits are equal<br />
to 1 gigabit, and 1000Mbs is typically written "1Gbs" and read "one<br />
gigabit per second" (1 billion bits per second).<br />
<br />
==== How fast are 10Mbs, 100Mbs, and 1Gbs networks? ====<br />
<br />
To get a sense for the performance of different network speeds, it's<br />
easiest to use the following rules of thumb for comparing network speeds<br />
to data set sizes and their transfer time:<br />
<br />
* 10Mbs can transfer 1MBs<br />
* 100Mbs can transfer 10MBs<br />
* 1000Mbs (1Gbs) can transfer 100MBs<br />
<br />
A CDROM can hold 700MB of data. Transferring this much data would take<br />
about 7 seconds on a 1Gbs network, 70 seconds (more than 1 minute) to<br />
transfer on a 100Mbs network, and 700 seconds (more than 10 minutes) to<br />
transfer on a 10Mbs network.<br />
<br />
==== What's the justification for this transfer rate rule of thumb? ====<br />
<br />
The logic for this metric is that a 10Mbs (10 mega-bit per second)<br />
network connection will move 10 million bits per second. Data is<br />
measured in 8-bit bytes and the rule of thumb for Ethernet is that<br />
performance peaks at 80% capacity. This provides the easy conversion<br />
factor of 10Mbs=1MBs. Note that the lower-case "b" means "bits" and<br />
upper-case "B" means bytes, ie. 8 bits. The network speeds scale up<br />
easily by factors of 10. So 100 megabit per second connection is<br />
capable of transferring 10 megabytes per second, and a 1000 megabit per<br />
second is capable of transferring 100 megabytes per second.<br />
<br />
Theoretically, a 100Mbs connection will transfer 100 million bits in one<br />
second, or about 10 megabytes (MB) per second. This means you would be<br />
able to transfer a CD's worth of data (about 700MB) in about 70 seconds,<br />
about 1 minute. (Compare this to a 10x slower connection of 10Mbs and<br />
it would take 700 seconds<br />
<br />
=== Network Structure ===<br />
<br />
==== How much network bandwidth is available is available on campus? ====<br />
<br />
Individual network connections at 10Mbs, 100Mbs, or 1Gbs speeds can be<br />
delivered to any location on the campus network at standard rates.<br />
Additionally, wireless network connectivity is available across campus.<br />
<br />
==== What does the campus network look like? ====<br />
<br />
The campus network can be visualized as a collection of network trees,<br />
roughly one per building, with the root of each tree connecting to an<br />
expandable high bandwidth core network backplane (currently running at<br />
10Gbs).<br />
<br />
The depth of each individual tree is determined by the physical layout<br />
of and number of network ports in each building. Each tree is typically<br />
no more than three layers deep, including the leaf nodes. The leaf nodes<br />
are the end-user connections, i.e. wired wall ports or wifi connections.<br />
The internal nodes of each tree are network switches and the switches<br />
are connected to the next layer via fast connections (currently running<br />
at 1Gbs).<br />
<br />
Each tree (each building) connects to the core network backplane via a<br />
fast connection (currently running at 1Gbs). At this core network<br />
connection, the data packets are routed to their final destination on-<br />
or off-campus.<br />
<br />
==== How is the campus network connected to off-campus networks? ====<br />
<br />
The campus core network backplane is connected to off-campus networks<br />
like the commercial Internet (Google, Facebook, Amazon) and national<br />
high bandwidth research networks (Internet2 and NLR) which provide high<br />
speed connections to research institutions and labs across the country.<br />
The fastest network route to a specific off-campus destination is<br />
chosen automatically as the network packets move off-campus.<br />
<br />
Custom configurations to meet unique research needs or specific<br />
performance targets can be designed. This requires advanced planning<br />
and an understanding of the proposed research workloads and workflow.<br />
Please contact Research Computing. The cost for these customizations<br />
can often be included in research proposals.<br />
<br />
=== Ordering Information ===<br />
<br />
==== How do I order or upgrade a network connection? ====<br />
<br />
Computer data connections are ordered from [http://www.comm.uab.edu/commweb/default.aspx UAB IT Telecommunications Services] via their [https://commservices.comm.uab.edu/ServiceRequest/login.aspx service request form].<br />
<br />
To place an order you will need to provide a general ledger account number for billing and identify the location (building address) of the service request. The wall-jack identification number for the network connection will be needed to complete the service request and can be entered on the form.<br />
<br />
If you have questions please contact UABCOMM@uab.edu or call 4-0503.<br />
<br />
==== Who pays for my network connection? ====<br />
<br />
You do. <br />
<br />
Network connections are accounted for via a federally<br />
regulated service center run by UAB IT. The rates are set based on the<br />
cost to deliver the service. Money to pay for network <br />
connectivity can come from any legitimate source: directly through <br />
grants, indirect grant funds routed to departments, or other <br />
departmental or research support funds.<br />
<br />
==== How much do network connections cost? ====<br />
<br />
Standard service center rates apply to all network connections (10Mbs, 100Mbs, and 1Gbs). Discounted rates for upgrading existing connections to higher data rates are available. Additionally, network switches can be ordered at a fixed lease rate to supply many network connections to an area.<br />
<br />
Please contact UAB IT Telecommunications for rates at UABCOMM@uab.edu or call 4-0503.<br />
<br />
=== Network Performance ===<br />
<br />
==== How do I measure my network connection speed? ====<br />
<br />
Accessing the [http://speedtest.dpo.uab.edu UAB IT SpeedTest server speedtest.dpo.uab.edu] from your web browser will allow you to run a data transfer test from your computer to the SpeedTest server and assess the general performance of your network connection. The reported performance is a good gauge of your maximum achievable network performance across the campus network. If you are off campus, it is also a great way to measure your data transfer rates across the Internet to UAB.<br />
<br />
Please note that the SpeedTest server reports your bandwidth in megabits per second (Mbps). If you are transferring data, you are most likely interested in knowing file transfer speeds in megabytes per second (i.e. how long it takes to transfer a file that is X megabytes large). A reasonably accurate conversion from bits to bytes is to simply divide the reported megabits per second number from the SpeedTest by 10 to get megabytes per second.<br />
<br />
Also, keep in mind that your actual data transfer speeds depend on at least three factors: 1) the speed of your computer's network connection, 2) any network devices (like firewalls) between your computer and the destination, and 3) the network connection speed of the computer you are transferring data to or from. You can help ensure the best possible experience by provisioning a high-speed network connection for your computer.<br />
<br />
==== How do I measure my network bandwidth from my computer to Cheaha? ====<br />
<br />
You can measure the data transfer performance between your computer and Cheaha by using [[wikipedia:iperf|iperf]]. <br />
<br />
To run an iperf test you will need to install iperf on your desktop and have a public IP address for your computer. Iperf is readily available for [https://publishing.ucf.edu/sites/itr/cst/Pages/IPerf.aspx Windows], Mac, and Linux. It is also already installed on Cheaha.<br />
<br />
To run a 30 second data transfer test from Cheaha to your computer using the iperf test follow these steps:<br />
# Start iperf from a command shell on you desktop in "server" mode. This mode causes iperf to listen on TCP port 5001 for incoming data from Cheaha.<br />
iperf -s -i 1<br />
# Log into your [[Cheaha]] account<br />
# Start iperf from the command line on Cheaha in "client" mode. This mode causes iperf to send data to TCP port 5001 at the provided IP address (ie. the public IP address of your computer).<br />
/opt/iperf/bin/iperf -c <ip-of-you-computer> -t 30 -i 1<br />
<br />
The iperf program output on Cheaha will display the current data transfer rate to your computer once per second for 30 seconds.<br />
<br />
Keep in mind, this test requires that your computer have a public IP address that is reachable from Cheaha since test data is sent from Cheaha to your computer. The iperf server on your computer listens on port 5001 by default. Your computer should allow incoming connections at this port from cheaha.uabgrid.uab.edu. You may need to update your firewall rules to allow access. If you are running a Linux system, you can use the following iptables commend to append a rule to open port 5001 for incoming tcp connections from cheaha. Consult your system and local network documentation for details. <br />
# sudo /sbin/iptables -A -p tcp -s 164.111.161.10 --dport 5001 -j ACCEPT <br />
<br />
The data transfer rates reported by iperf reflect your speed for data transferred from Cheaha to your computer. This should provide a reasonable estimate for data transferred to Cheaha as well.<br />
<br />
==== What factors impact the actual speeds I can expect in the real world? ====<br />
<br />
The actual transfer rates you get depend on three factors: software,<br />
hardware, and other users.<br />
<br />
Data transfer software and computer hardware can significantly impact<br />
real world transfer rates. If you are transferring lots of data, you<br />
will see your best performance with software that can keep the network<br />
full, computer hardware that is not slower than the data network, and a<br />
network connection sized for your data sets and patience.<br />
<br />
==== How does my copying software impact my transfer speeds? ====<br />
<br />
The software you use to transfer data is the most import factor in<br />
maximizing data throughput. Most traditional copy methods move data in<br />
a single-file line. Modern computer hardware hides this software<br />
inefficiency and can easily keep a 10Mbs connection full and can do ok<br />
with a 100Mbs connection. If you are moving lots of data or using a<br />
1Gbs network, you need to use software tuned for high-speed data transfer.<br />
<br />
High speed data transfer software uses multiple single-file lines in<br />
parallel to improve network throughput. This software must be used at both<br />
ends of the data transfer in order coordinate the parallel transfer<br />
streams. You won't get very far if you are smart but your peer is not.<br />
<br />
==== What high-speed data transfer software can keep up? ====<br />
{{SensitiveInformation}}<br />
It is important to use improved data transfer software that can move data efficiently. There are inherent limitations to performance when data is transferred serially (one bit after the other). The most familiar tools like FTP and enhanced SCP peak at around 1Gigabyte/sec ([http://www.es.net/assets/pubs_presos/20130113-tierney-Science-DMZ-DTNs.pdf some performance data (PDF)]). More advanced software for very high speed networks will support parallel data transfers of single files. Some data providers also offer special data transfer tools to maximize your performance. You can learn more about high speed data transfers and maximized network performance from the [http://fasterdata.es.net/science-dmz/DTN/ Science DMZ project of the Energy Sciences Network].<br />
<br />
==== How does my computer hardware impact my transfer speeds? ====<br />
<br />
Computer hardware also impacts transfer speeds. Your slowest piece of<br />
hardware will dictate your maximum data transfer rate. If you have a<br />
slow disk (you should read that as "an external USB hard drive"), you<br />
will be limited by its data transfer speeds.<br />
<br />
Additionally, your computer may be fast but it still has to manage your<br />
workload and coordinate use of all the devices in your computer,<br />
including the network connection. If you are crunching numbers or doing<br />
heavy visualizations at the same time you are trying to transfer data,<br />
your computer may not be able to keep up. Note, that this scenario is<br />
common when you are reading data for your visualization off a file<br />
server. Sometimes you need to move your data before you can use it.<br />
<br />
==== How do I measure my off-campus network connection speed? ====<br />
<br />
The [http://speedtest.net SpeedTest.net] service can be used to measure your connection to key points on the Internet. To run this test, choose the Atlanta, GA connection point. This will run a data transfer test from your computer, off-campus to the SpeedTest.net server hosted by Comcast in Atlanta, GA. The test will rate the performance of a data transfer. <br />
<br />
Atlanta is a good test destination because this is where UAB's Internet-bound traffic actually connects to the commodity Internet. This test will show the network performance to our nearest off-campus neighbor. If you want to share the results of this test with others, please be sure to click the "Share this Test" and then "Copy" buttons. This will provide you a URL to a PNG image capturing the results of this test that anyone can load in their browser.<br />
<br />
==== What factors impact my off-campus network connection speed? ====<br />
<br />
It is important to understand that Internet traffic speeds are highly variable. Transfer speed depends heavily on the network capacity and use along the entire path from your desktop to the location with which you are exchanging data. It also depends on the capabilities of your desktop and the server that is the target of your data transfer. If the networks or remote sites are overloaded or have insufficient bandwidth, then your data transfer speeds will be limited by those conditions.<br />
<br />
As an example, you can try a speed test to a network destination other than Atlanta, GA or a speed test hosted by a network provider other than Comcast. The spead tests from [http://www.ookla.com/speedtest.php Ookla.net] and [http://speakeasy.net/speedtest/ Speakeasy.net] may show different performance for the selected destinations. You may also find the information at [http://speedtest.org SpeedTest.org] informative.<br />
<br />
== Internet Questions ==<br />
<br />
=== Can I make up a host name for my computer for use on the Internet? ===<br />
<br />
No. The Internet relies on a host name look-up service called DNS (Domain Name System). Host names must be registered in the DNS in order to use them on the Internet.<br />
<br />
=== What is DNS? ===<br />
<br />
DNS is the (Domain Name System). It is address look-up service for the Internet. It is the system that allows all computers to know the correct address for a particular name. The DNS has certain rules to follow for registering a public name. The main rule is that you can only name things in your own domain. For example, you can't register a name like mycomputer.google.com, because only Google has the right to use the google.com domain name.<br />
<br />
For a basic introduction to the DNS please see these helpful links:<br />
* http://www.howstuffworks.com/dns.htm/printable<br />
* http://en.wikipedia.org/wiki/Domain_Name_System<br />
<br />
=== Can I use an "_" (underscore) in my host name? ===<br />
<br />
No. The DNS system does not support using the "_" in host names.<br />
<br />
=== But can't I just call my host whatever I want? ===<br />
<br />
Yes, you can. But you need to understand that all host naming on the Internet is defined from the perspective of whatever computer you are on at the moment. If you make up your own host name for some computer and record it locally, you can certainly use that host name from your local computer, however, you will be the only person who knows about the name. <br />
<br />
In order to let anyone know the name and reach the same computer, you need to register your host name in a public database used by all computers on the Internet. That database is the DNS. It is the only common reference point for name-to-IP mappings on the Internet. In order to register this public name you need to follow the rules for assigning names in the DNS.<br />
<br />
== Storage Questions ==<br />
<br />
=== Is there storage space for research data? ===<br />
<br />
The rapidly growing demand for research storage is clearly recognized. Solutions for hosting research data are under active development (and funding discussions) as part of the [[UABgrid FAQ|UABgrid Pilot]]. Currently, research storage is only available through the traditional compute cluster interface of [[Cheaha]].<br />
<br />
=== How can I contribute to the development of research storage? ===<br />
<br />
The best way to contribute to the development of research storage is to share your storage requirements.<br />
<br />
# How much data do you currently store?<br />
# How are you solving your research data problem today?<br />
# How much do you expect your data to grow in the next year?<br />
# Are you building an analysis pipeline that has known storage expectations?<br />
# Do you need to archive your data? How long?<br />
# Do you need to keep all your data on-line?<br />
# Do you ever delete your data?<br />
# How expensive is it for you to recreate derived data products?<br />
<br />
=== How can I use the existing research storage on Cheaha? ===<br />
<br />
The generally available research storage on the cluster is designated to support storage requirements for the construction of data analysis pipelines where data needs to be shared by multiple users on the cluster.<br />
<br />
=== What best-practices exist for storing my research data? ===<br />
<br />
There are many solutions for storing your research data. Simply keeping it on your desktop is one option. As data grows it is often necessary to move it off your system. Most people find some form of USB Drive to be an acceptable solution. One solution that has become popular is the use of DroboFS.<br />
<br />
Note: No endorsements are made of any product of the fitness of any solution.<br />
<br />
== Cheaha Cluster ==<br />
<br />
=== How do I get an account to use cluster computing on Cheaha? ===<br />
<br />
Please {{CheahaAccountRequest}}. Include you UAB BlazerID and some information about which group you are a part of here on campus and what your plans are for using the cluster.<br />
<br />
=== How do I get started using the cluster after I have an account? ===<br />
<br />
A basic [[Cheaha_GettingStarted|getting started guide]] is available and should answer questions about how to log in to Cheaha and submit a batch job.<br />
<br />
=== How do I cut-and-paste into a terminal window, ctrl+c always exits my commands? ===<br />
<br />
Using a terminal window for an SSH session from your desktop, you can cut-n-paste into that terminal window from your desktop, eg. you may want to copy the example job commands in the [[Cheaha_GettingStarted|getting started guide]]. The exact key combination varies depending on the terminal program you use but it is often Shift+Ctrl+C. On Mac's, the normal command+c keystroke often works since it doesn't not generate the ctrl+c character sequence.<br />
<br />
=== How can I view HTML files on the cluster without transferring them to my desktop? ===<br />
<br />
If you need to view files that are formatted using HTML, e.g documentation for some tool you are using or HTML formatted output produced by your job, an easy way to view that content is the elinks command. [http://elinks.or.cz/ ELinks] is a terminal-based web browser that you can use directly from you SSH terminal session. Simply enter the command <tt>elinks filename.html</tt> and it will display a text-only rendering of the HTML content. ELinks is also a convenient choice for accessing regular web sites, for example <tt>elinks http://google.com</tt>.<br />
<br />
More advanced options for viewing HTML files include starting your SSH session with X-forwarding, eg. <tt>ssh -X</tt>, and launching Firefox to display on your desktop. Your desktop needs to support X11 and should be on-campus (due to network traffic load) to use this option. <br />
<br />
Other options not documented here include launching a VNC session to display Firefox, which will work better for off-campus access, or to use a file system client like SSHFS to mount your home directory on your desktop and then use your desktop web browser to load the HTML files.<br />
<br />
=== How can I run graphical applications on the cluster? ===<br />
<br />
There are two options for doing this:<br />
<br />
# If you are on a Mac or Linux machine, you can simply type `ssh -X cheaha` when you connect to Cheaha. This will open a connection for graphical applications on the cluster to your local desktop. When you start a graphical application on cheaha, it will get displayed on your local desktop. You can do this from a Windows machine as well, but you need to first install an X Windows program.<br />
# Alternatively, you can set up a [[Setting_Up_VNC_Session|cluster desktop]] and display your graphical application there. This option uses the VNC display protocol to connect to your cluster desktop. This is often easier to use from Windows since you only need to install one VNC program, rather than a complex software system as is the case with X Windows on Windows.<br />
<br />
In both cases, you should not do any heavy compute processing on the cheaha head node. This means you should [[Cheaha_GettingStarted#Interactive_Resources|log into an interactive compute node]] using qlogin after you connect to cheaha using option 1 or option 2 above and start your graphical application on the assigned interactive compute node.<br />
<br />
<br />
=== Why is the 'top' command showing that there isn't enough RAM? ===<br />
<br />
The following link should give you an idea as to why [http://www.linuxatemyram.com/ Linux ate your RAM]<br />
<br />
== Desktop Questions ==<br />
<br />
=== How do I install Ubuntu Linux along side Windows? ===<br />
<br />
Ubuntu has a good community of documentation writers. Ubuntu's Win7 dual boot instructions provides solid advice:<br />
<br />
https://help.ubuntu.com/community/WindowsDualBoot<br />
<br />
Ubuntu has an option to install side-by-side with windows and allow dual boot selection at boot. When you choose a size for Ubuntu you could split the space evenly between the two so you have ample room for data on either system (Ubuntu only needs about 20GB for the system and apps, the rest would be for your personal data). If this is your main work box you might leave more room for Windows. Looks like it will be very straight forward.<br />
<br />
== Collaboration Tools ==<br />
<br />
=== How do I edit a wiki page on docs? ===<br />
<br />
Users are encouraged to create original content and improve existing content on the docs wiki. Please see the [[Documentation#Editing_Docs|introduction to docs]] for more guidance on editing wiki pages.<br />
<br />
=== How do I link to a file on docs with alternate text? ===<br />
<br />
There are two ways to link to a file uploaded to docs and provide [http://www.mediawiki.org/wiki/Help:Images#Linking_to_an_image_without_displaying_it alternate text]:<br />
# Link to the file summary page from which the file can then be downloaded. Alternate text can be provided by prefixing the File namespace with a colon and using the vertical bar to separate the text:<br />
<pre><br />
[[:File:name-of-file.jpg|link text for file]]<br />
</pre><br />
# Link directly to the file so it is immediately available to the client web browser<br />
<pre><br />
[[Media:name-of-file.jpg|link text for file]]<br />
</pre><br />
<br />
More information on these methods and other file and image link syntax can be found on the [http://www.mediawiki.org/wiki/Help:Images MediaWiki Help page for Images].<br />
<br />
=== Why is the wiki markup syntax different between my project space and the docs wiki? ===<br />
<br />
The "Projects" wikis are implemented using a tool called [http://www.edgewall.com/trac Trac] and follow a formatting convention popularized by earlier wikis mainly [http://moinmo.in/ MoinMoin]. The "Docs" wiki is implemented using a tool called [http://www.mediawiki.org MediaWiki] and follows a formatting convention popularized by Wikipedia. Because these communities have focused on addressing specific use cases, software developers in the case of Trac and document writers in the case of Mediawiki, there formatting conventions have differ significantly in their details.<br />
<br />
Section heading markup (using '=' to designate section headings) and external urls (typing in a bare URL like http://google.com) are typically portable between the two wikis, but details like table layout vary widely.<br />
<br />
An easy option is to leave pages in place and reference them by name from the Projects or Docs wikis.<br />
<br />
=== Should I post XYZ to the list/group/forum? ===<br />
<br />
If you participate in an on-line discussion group and are asking yourself if you should post some sort of content to that group, thank you! Asking this question shows self restraint and consideration for others. These are the core tenants of on-line etiquette, or netiquette. [[wikipedia:Netiquette|Netiquette]] is the term used to describe rules of behavior for on-line discourse. The good news is that netiquette rules are pretty much the same as the basic rules of human interaction you learned as a child, so they should be really familiar to you by now. Respect others, and they will respect you.<br />
<br />
There is one primary additional consideration to keep in mind when participating in on-line discussions. On-line discussions should generally be considered public because you are communicating with more than one person at a time. This means that whatever you say and do on-line is amplified across all the people who will read your comments. This simple fact provides solid guidance for how to act in a forum and what information to post:<br />
# Your post will be seen by many people. Make sure it's relevant to the discussions that are typical of the group to which you are sending it.<br />
# Your email will be received by many people. You should think of "email" as a primitive "copy" command. For example, sending your email to 100 people will make 100 copies of the email and all the documents you have attached. Make sure the information you are including in your email or attaching to it is really worth each person having their own copy. There are completely legitimate reasons to share information and email is powerful copy command, however, you should use that power wisely and follow the conventions of the group with whom you are communicating. A simple heads up: most groups will frown on attaching large files to messages sent to a mailing list.<br />
<br />
If you want to learn more about netiquette or need more guidance here are some links that might be helpful: [http://www.albion.com/netiquette/book/0963702513p65.html Netiquette Book], [http://linux.sgms-centre.com/misc/netiquette.php Mailing List and Newsgroup netiquette], and [http://lowendmac.com/lists/netiquette.shtml more Mailing List and Newsgroup netiquette].<br />
<br />
== Application Questions ==<br />
<br />
=== How do I build software on the cluster? ===<br />
<br />
Please follow the [[AppBuildBestPractices]] when building software on the cluster. <br />
<br />
Each upstream software project will have their own requirements for building software and you will need to work within those requirements. You should also considered your local build practices and work to create a consistent experience on the Research Computing System so you applications adapt well to the culture and best practices. The [[AppBuildBestPractices]] are designed to help you do that.<br />
<br />
=== Why am I getting an SMTP 421 error when my App delivers to UAB mail servers? ===<br />
<br />
If you have an App that sends email messages(e.g. directly via SMTP or you have an App with an embedded host SMTP server) and you repeatedly receive the [http://www.greenend.org.uk/rjk/tech/smtpreplies.html SMTP 421 error code] when connecting to ppagent1 or ppagent2 to deliver email to @uab.edu addresses, you may have run afoul of the Proofpoint SPAM detection software and your machine running the App has been greylisted. One easy test is to run ''telnet ppagent1.ad.uab.edu 25'' from the machine running your App. If you get an immediate 421 response and the connection is closes, you are almost certainly greylisted, especially if you run this command from a different machine you get the standard 220 SMTP response. Proofpoint could greylist for sending just 10 similar messages in under 1 hour. This could easily happen if your App has an "invite user" feature that sends individualized messages a collection of users.<br />
<br />
To fix this error you need to contact askit@uab.edu and request that the host for your App be removed from the greylist of Proofpoint.<br />
<br />
=== How fast will my NAMD job run on Cheaha ===<br />
<br />
Take a look at our [[NAMD_Benchmarks]] documentation. We compared a number of different compute fabrics (gen3 [[Cheaha]] hardware (the sipsey.q), the ASA DMC nodes, and the NIH Biowulf cluster. We based our benchmarks off the approach followed by Biowulf. Our [[NAMD]] page also has good general guidance for setting up jobs on the cluster<br />
<br />
Our fabric scales wells up to 512 processes but the compute efficiency starts to taper above 256 slots. For a "[[NAMD_Benchmarks#Actual_job_benchmarks_.28Segrest_job.29|real world job]]" of 246K atoms we were able to produce about 0.0982 Days/ns. This example model would produce about 10ns/day at 256K atoms.<br />
<br />
You will want to run your model for a short duration at a given slot count to get accurate scaling numbers that you can use to predict the run time for your full model. By measuring in days/ns, as in our benchmarks, you will be able to predict longer run times well. days/ns * ns = days-for-job.<br />
<br />
You'll likely have a hard time getting 256 slots so you are better off maxing out at 128 slots, to ensure your job doesn't wait in the queue forever. You can see the number of currently available slots on the sipsey.q with the `qstat -g c -q sipsey.q` command. <br />
<br />
$ qstat -g c -q sipsey.q<br />
CLUSTER QUEUE CQLOAD USED RES AVAIL TOTAL aoACDS cdsuE <br />
--------------------------------------------------------------------------------<br />
sipsey.q 0.49 291 0 255 576 0 36<br />
<br />
If the cluster is lightly loaded, grabbing 128 nodes for a while shouldn't be too greedy, however, you will share resources more equitably if you run your model for shorter durations over multiple jobs, ie. if your full run would take 20 days, create four 5-day jobs. <br />
<br />
This is about being courteous to your fellow HPC users, so they get a chance to compute as well. <br />
<br />
You can accomplish this with dependencies between jobs so later model steps wait on earlier jobs using the [https://wiki.duke.edu/display/SCSC/SGE+Job+Dependencies -hold-jid parameter to qsub].<br />
<br />
== Security Questions ==<br />
<br />
=== What kind of security environment do you provide? ===<br />
<br />
The Research Computing System (RCS) is built on top of the [[wikipedia:Linux|Linux]] kernel and [[wikipedia:GNU|GNU]] system platform. Linux is a Unix-like environment. This mean that we provide an environment that builds on top of the file-process abstraction that is inherent in all Unix-like environments. The ownership and permissions of any resource (file, group of files, or processes) can be configured to allow only authorized access to the resource. Linux supports a large collection of security features and others can be added if needed. If you can think it; you can build it.<br />
<br />
Each user of the Research Computing System is assigned a unique identity that is used to control access to resources in the system. Your access rights are determined by your affiliations and the interfaces through which you access the system.<br />
<br />
=== What interfaces are available to access the Research Computing System? ===<br />
<br />
The Research Computing System can be accessed via the web, a command line interface (SSH), and desktop file shares (CIFS). Access via the Open Science Grid is under development.<br />
<br />
=== What are the security features of the command line interface? ===<br />
<br />
Access to the command line interface is provided by SSH via [[Cheaha]]. SSH requires you to use your system username and password. It grants you access to processes and files owned by you. SSH provides programmatic control of your files and processes. SSH is most common with users and developers of high-performance computing (HPC). By default, you are assigned a personal directory (i.e. your home directory) and a scratch directory (temporary, high-speed storage for large files on which you are computing). These are the only storage locations to which you have write access. All commands you execute (processes you run) will operate under your user identity and be restricted by file access permissions. Please visit [[Cheaha_GettingStarted]] for information on access and use of this interface.<br />
<br />
=== What is the security configuration for the desktop file sharing interface? ===<br />
<br />
Access to the desktop file sharing interface is provided by CIFS, ie. standard Microsoft Windows file sharing that is available on all computing platforms (Linux, Mac, Windows). Access is restricted to on-campus (or VPN) clients. Access requires using your system username and password. By default, this interface grants you access to your personal directory (i.e. your home directory). You may also have access to shared group storage for groups to which you belong or special read-only storage resources available to any client. All access is limited to manipulating files and restricted by the ownership of the files. Desktop file sharing can be used to create a seamless user experience between your desktop and the command-line interface. It also enables you to build storage solutions for your research needs.<br />
<br />
=== What is the security configuration for the web interface? ===<br />
<br />
Access to the web interface of the Research Computing System depends on the web applications implementing specific features. Some web applications may restrict access only to a specific user or groups of users authorized to use the application or access the content it makes available. When required, access to these features requires authentication with your Research Computing System account. Generally, all access to modify content is restricted to authenticated users. Depending on the application, anonymous web access is possible, typically in read-only mode.<br />
<br />
=== Why can't I manage my own affiliations? ===<br />
<br />
Our goal is to provide a comprehensive, integrated, user-managed affiliation and permissions system. Today you can self-manage your affiliations to the degree supported by the interfaces and tools you use, however, coordination of these settings across tools is not universal.<br />
<br />
=== Can I analyze my research data on the cluster? ===<br />
<br />
Yes, that is what the cluster is designated for. Depending on the nature of your research data you may be required to control who has access to that data. The system access controls are designed to allow explicit control over who can and cannot access your research data.<br />
<br />
=== How do FISMA, HIPAA, FERPA, IRB, etc impact my ability to use the Research Computing System? ===<br />
<br />
Depending on the nature of your research, you may work with regulated information types that have specific audit requirements. We are working to address these audit requirements within the standard Research Computing System. This does not limit your ability to use the system for your research today, however, it does require explicit documentation for your use case. <br />
<br />
To learn more about data security and the documentation we are developing please review our [[Information Security Guide]] and contact data security with specific questions.<br />
<br />
== Misc ==<br />
<br />
== UABgrid ==<br />
<br />
UABgrid is an infrastructure pilot of UAB IT Research Computing. More information can be found in the [[UABgrid FAQ]] though this information may be dated.</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Cheaha_GettingStarted&diff=5733Cheaha GettingStarted2018-05-03T15:26:24Z<p>Tanthony@uab.edu: /* Uploading Data */ added Sensitiveinformation Template</p>
<hr />
<div>Cheaha is a cluster computing environment for UAB researchers. Information about the history and future plans for Cheaha is available on the [[Cheaha]] page.<br />
<br />
== Access (Cluster Account Request) ==<br />
<br />
To request an account on [[Cheaha]], please {{CheahaAccountRequest}}. Please include some background information about the work you plan on doing on the cluster and the group you work with, ie. your lab or affiliation.<br />
<br />
Usage of Cheaha is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources. <br />
<br />
The official DNS name of Cheaha's frontend machine is ''cheaha.rc.uab.edu''. If you want to refer to the machine as ''cheaha'', you'll have to either add the "rc.uab.edu" to you computer's DNS search path. On Unix-derived systems (Linux, Mac) you can edit your computers /etc/resolv.conf as follows (you'll need administrator access to edit this file)<br />
<pre><br />
search rc.uab.edu<br />
</pre><br />
Or you can customize your SSH configuration to use the short name "cheaha" as a connection name. On systems using OpenSSH you can add the following to your ~/.ssh/config file<br />
<br />
<pre><br />
Host cheaha<br />
Hostname cheaha.rc.uab.edu<br />
</pre><br />
<br />
== Login ==<br />
===Overview===<br />
Once your account has been created, you'll receive an email containing your user ID, generally your Blazer ID. Logging into Cheaha requires an SSH client. Most UAB Windows workstations already have an SSH client installed, possibly named '''SSH Secure Shell Client''' or [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY]. Linux and Mac OS X systems should have an SSH client installed by default.<br />
<br />
Usage of Cheaha is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources.<br />
<br />
===Client Configuration===<br />
This section will cover steps to configure Windows, Linux and Mac OS X clients to connect to Cheaha.<br />
====Linux====<br />
Linux systems, regardless of the flavor (RedHat, SuSE, Ubuntu, etc...), should already have an SSH client on the system as part of the default install.<br />
# Start a terminal (on RedHat click Applications -> Accessories -> Terminal, on Ubuntu Ctrl+Alt+T)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Mac OS X====<br />
Mac OS X is a Unix operating system (BSD) and has a built in ssh client.<br />
# Start a terminal (click Finder, type Terminal and double click on Terminal under the Applications category)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Windows====<br />
There are many SSH clients available for Windows, some commercial and some that are free (GPL). This section will cover two clients that are commonly found on UAB Windows systems.<br />
=====MobaXterm=====<br />
[http://mobaxterm.mobatek.net/ MobaXterm] is a free (also available for a price in a Profession version) suite of SSH tools. Of the Windows clients we've used, MobaXterm is the easiest to use and feature complete. [http://mobaxterm.mobatek.net/features.html Features] include (but not limited to):<br />
* SSH client (in a handy web browser like tabbed interface)<br />
* Embedded Cygwin (which allows Windows users to run many Linux commands like grep, rsync, sed)<br />
* Remote file system browser (graphical SFTP)<br />
* X11 forwarding for remotely displaying graphical content from Cheaha<br />
* Installs without requiring Windows Administrator rights<br />
<br />
Start MobaXterm and click the Session toolbar button (top left). Click SSH for the session type, enter the following information and click OK. Once finished, double click cheaha.rc.uab.edu in the list of Saved sessions under PuTTY sessions:<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Remote host'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|}<br />
<br />
=====PuTTY=====<br />
[http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY] is a free suite of SSH and telnet tools written and maintained by [http://www.pobox.com/~anakin/ Simon Tatham]. PuTTY supports SSH, secure FTP (SFTP), and X forwarding (XTERM) among other tools.<br />
<br />
* Start PuTTY (Click START -> All Programs -> PuTTY -> PuTTY). The 'PuTTY Configuration' window will open<br />
* Use these settings for each of the clusters that you would like to configure<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host Name (or IP address)'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Saved Sessions'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|}<br />
* Click '''Save''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start PuTTY, simply double click on the cluster name under the 'Saved Sessions' list<br />
<br />
=====SSH Secure Shell Client=====<br />
SSH Secure Shell is a commercial application that is installed on many Windows workstations on campus and can be configured as follows:<br />
* Start the program (Click START -> All Programs -> SSH Secure Shell -> Secure Shell Client). The 'default - SSH Secure Shell' window will open<br />
* Click File -> Profiles -> Add Profile to open the 'Add Profile' window<br />
* Type in the name of the cluster (for example: cheaha) in the field and click 'Add to Profiles'<br />
* Click File -> Profiles -> Edit Profiles to open the 'Profiles' window<br />
* Single click on your new profile name<br />
* Use these settings for the clusters<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host name'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''User name'''<br />
|blazerid (insert your blazerid here)<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Encryption algorithm'''<br />
|<Default><br />
|-<br />
|'''MAC algorithm'''<br />
|<Default><br />
|-<br />
|'''Compression'''<br />
|<None><br />
|-<br />
|'''Terminal answerback'''<br />
|vt100<br />
|-<br />
|}<br />
* Leave 'Connect through firewall' and 'Request tunnels only' unchecked<br />
* Click '''OK''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start SSH Secure Shell, click 'Profiles' and click the cluster name<br />
<br />
=== Logging in to Cheaha ===<br />
No matter which client you use to connect to the Cheaha, the first time you connect, the SSH client should display a message asking if you would like to import the hosts public key. Answer '''Yes''' to this question.<br />
<br />
* Connect to Cheaha using one of the methods listed above<br />
* Answer '''Yes''' to import the cluster's public key<br />
** Enter your BlazerID password<br />
<br />
* After successfully logging in for the first time, You may see the following message '''just press ENTER for the next three prompts, don't type any passphrases!'''<br />
<br />
It doesn't appear that you have set up your ssh key.<br />
This process will make the files:<br />
/home/joeuser/.ssh/id_rsa.pub<br />
/home/joeuser/.ssh/id_rsa<br />
/home/joeuser/.ssh/authorized_keys<br />
<br />
Generating public/private rsa key pair.<br />
Enter file in which to save the key (/home/joeuser/.ssh/id_rsa):<br />
** Enter file in which to save the key (/home/joeuser/.ssh/id_rsa):'''Press Enter'''<br />
** Enter passphrase (empty for no passphrase):'''Press Enter'''<br />
** Enter same passphrase again:'''Press Enter'''<br />
Your identification has been saved in /home/joeuser/.ssh/id_rsa.<br />
Your public key has been saved in /home/joeuser/.ssh/id_rsa.pub.<br />
The key fingerprint is:<br />
f6:xx:xx:xx:xx:dd:9a:79:7b:83:xx:f9:d7:a7:d6:27 joeuser@cheaha.rc.uab.edu<br />
<br />
==== Users without a blazerid (collaborators from other universities) ====<br />
** If you were issued a temporary password, enter it (Passwords are CaSE SensitivE!!!) You should see a message similar to this<br />
You are required to change your password immediately (password aged)<br />
<br />
WARNING: Your password has expired.<br />
You must change your password now and login again!<br />
Changing password for user joeuser.<br />
Changing password for joeuser<br />
(current) UNIX password:<br />
*** (current) UNIX password: '''Enter your temporary password at this prompt and press enter'''<br />
*** New UNIX password: '''Enter your new strong password and press enter'''<br />
*** Retype new UNIX password: '''Enter your new strong password again and press enter'''<br />
*** After you enter your new password for the second time and press enter, the shell may exit automatically. If it doesn't, type exit and press enter<br />
*** Log in again, this time use your new password<br />
<br />
Congratulations, you should now have a command prompt and be ready to start [[Cheaha_GettingStarted#Example_Batch_Job_Script | submitting jobs]]!!!<br />
<br />
== Hardware ==<br />
[[Image:Chehah2_2016.png|center|thumb|450px|Logical Diagram of Cheaha Configuration]]<br />
<br />
The Cheaha Compute Platform includes commodity compute hardware, totaling 2800 compute cores and over 4.7PB of usable storage (6.6PB raw capacity). The following descriptions highlight the current hardware profile that provides an aggregate theoretical peak performance of 468 teraflops.<br />
<br />
* Compute <br />
** 36 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 38 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 256GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 14 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 384GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 4 Compute Nodes with Nvidia Tesla K80 and two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 4 Compute Nodes with Intel Phi coprocessor SE10/7120 and two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 18 Compute Nodes with two 14 core processors (Intel Xeon E5-2680 v4 2.4GHz)with 256GB DDR4 RAM, four NVIDIA Tesla P100 16GB GPUs, EDR InfiniBand and 10GigE network cards<br />
<br />
* Networking<br />
**FDR and EDR InfiniBand Switch<br />
** 10Gigabit Ethernet Switch<br />
<br />
* Storage -- DDN SFA12KX with GPFS) <br />
** 2 x 12KX40D-56IB controllers<br />
** 10 x SS8460 disk enclosures<br />
** 825 x 4K SAS drives<br />
<br />
* Management <br />
** Management node and gigabit switch for cluster management<br />
** Bright Advanced Cluster Management software licenses<br />
<br />
== Cluster Software ==<br />
* BrightCM 7.2<br />
* CentOS 7.2 x86_64<br />
* [[Slurm]] 15.08<br />
<br />
== Queuing System ==<br />
All work on Cheaha must be submitted to '''our queuing system ([[Slurm]])'''. A common mistake made by new users is to run 'jobs' on the login node. This section gives a basic overview of what a queuing system is and why we use it.<br />
=== What is a queuing system? ===<br />
* Software that gives users fair allocation of the cluster's resources<br />
* Schedules jobs based using resource requests (the following are commonly requested resources, there are many more that are available)<br />
** Number of processors (often referred to as "slots")<br />
** Maximum memory (RAM) required per slot<br />
** Maximum run time<br />
* Common queuing systems:<br />
** '''[[Slurm]]'''<br />
** Sun Grid Engine (Also know as SGE, OGE, GE)<br />
** OpenPBS<br />
** Torque<br />
** LSF (load sharing facility)<br />
<br />
[http://slurm.schedmd.com/ Slurm] is a queue management system and stands for Simple Linux Utility for Resource Management. Slurm was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. '''[[Slurm]]''' is now the primary job manager on Cheaha, it replaces SUN Grid Engine ([[https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted_deprecated SGE]]) the job manager used earlier. Instructions of using SLURM and writing SLURM scripts for jobs submission on Cheaha can be found '''[[Slurm | here]]'''.<br />
<br />
=== Typical Workflow ===<br />
* Stage data to $USER_SCRATCH (your scratch directory)<br />
* Research how to run your code in "batch" mode. Batch mode typically means the ability to run it from the command line without requiring any interaction from the user.<br />
* Identify the appropriate resources needed to run the job. The following are mandatory resource requests for all jobs on Cheaha<br />
** Maximum memory (RAM) required per slot<br />
** Maximum runtime<br />
* Write a job script specifying queuing system parameters, resource requests and commands to run program<br />
* Submit script to queuing system (sbatch script.job)<br />
* Monitor job (squeue)<br />
* Review the results and resubmit as necessary<br />
* Clean up the scratch directory by moving or deleting the data off of the cluster<br />
<br />
=== Resource Requests ===<br />
Accurate resource requests are extremely important to the health of the over all cluster. In order for Cheaha to operate properly, the queing system must know how much runtime and RAM each job will need.<br />
<br />
==== Mandatory Resource Requests ====<br />
<br />
* -t, --time=<time><br />
Set a limit on the total run time of the job allocation. If the requested time limit exceeds the partition's time limit, the job will be left in a PENDING state (possibly indefinitely).<br />
* For Array jobs, this represents the maximum run time for each task<br />
** For serial or parallel jobs, this represents the maximum run time for the entire job<br />
<br />
* --mem-per-cpu=<MB><br />
Mimimum memory required per allocated CPU in MegaBytes.<br />
<br />
==== Other Common Resource Requests ====<br />
* -N, --nodes=<minnodes[-maxnodes]><br />
Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count.<br />
<br />
* -n, --ntasks=<number><br />
sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node<br />
<br />
* --mem=<MB><br />
Specify the real memory required per node in MegaBytes.<br />
<br />
* -c, --cpus-per-task=<ncpus><br />
Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task.<br />
<br />
* -p, --partition=<partition_names><br />
Request a specific partition for the resource allocation. Available partitions are: express(max 2 hrs), short(max 12 hrs), medium(max 50 hrs), long(max 150 hrs), sinteractive(0-2 hrs)<br />
<br />
=== Submitting Jobs ===<br />
Batch Jobs are submitted on Cheaha by using the "sbatch" command. The full manual for sbtach is available by running the following command<br />
man sbatch<br />
<br />
==== Job Script File Format ====<br />
To submit a job to the queuing systems, you will first define your job in a script (a text file) and then submit that script to the queuing system.<br />
<br />
The script file needs to be '''formatted as a UNIX file''', not a Windows or Mac text file. In geek speak, this means that the end of line (EOL) character should be a line feed (LF) rather than a carriage return line feed (CRLF) for Windows or carriage return (CR) for Mac.<br />
<br />
If you submit a job script formatted as a Windows or Mac text file, your job will likely fail with misleading messages, for example that the path specified does not exist.<br />
<br />
Windows '''Notepad''' does not have the ability to save files using the UNIX file format. Do NOT use Notepad to create files intended for use on the clusters. Instead use one of the alternative text editors listed in the following section.<br />
<br />
===== Converting Files to UNIX Format =====<br />
====== Dos2Unix Method ======<br />
The lines below that begin with $ are commands, the $ represents the command prompt and should not be typed!<br />
<br />
The dos2unix program can be used to convert Windows text files to UNIX files with a simple command. After you have copied the file to your home directory on the cluster, you can identify that the file is a Windows file by executing the following (Windows uses CR LF as the line terminator, where UNIX uses only LF and Mac uses only CR):<br />
<pre><br />
$ file testfile.txt<br />
<br />
testfile.txt: ASCII text, with CRLF line terminators<br />
</pre><br />
<br />
Now, convert the file to UNIX<br />
<pre><br />
$ dos2unix testfile.txt<br />
<br />
dos2unix: converting file testfile.txt to UNIX format ...<br />
</pre><br />
<br />
Verify the conversion using the file command<br />
<pre><br />
$ file testfile.txt<br />
<br />
testfile.txt: ASCII text<br />
</pre><br />
<br />
====== Alternative Windows Text Editors ======<br />
There are many good text editors available for Windows that have the capability to save files using the UNIX file format. Here are a few:<br />
* [[http://www.geany.org/ Geany]] is an excellent free text editor for Windows and Linux that supports Windows, UNIX and Mac file formats, syntax highlighting and many programming features. To convert from Windows to UNIX click '''Document''' click '''Set Line Endings''' and then '''Convert and Set to LF (Unix)'''<br />
* [[http://notepad-plus.sourceforge.net/uk/site.htm Notepad++]] is a great free Windows text editor that supports Windows, UNIX and Mac file formats, syntax highlighting and many programming features. To convert from Windows to UNIX click '''Format''' and then click '''Convert to UNIX Format'''<br />
* [[http://www.textpad.com/ TextPad]] is another excellent Windows text editor. TextPad is not free, however.<br />
<br />
==== Example Batch Job Script ====<br />
A shared cluster environment like Cheaha uses a job scheduler to run tasks on the cluster to provide optimal resource sharing among users. Cheaha uses a job scheduling system call Slurm to schedule and manage jobs. A user needs to tell Slurm about resource requirements (e.g. CPU, memory) so that it can schedule jobs effectively. These resource requirements along with actual application code can be specified in a single file commonly referred as 'Job Script/File'. Following is a simple job script that prints job number and hostname.<br />
<br />
'''Note:'''Jobs '''must request''' the appropriate partition (ex: ''--partition=short'') to satisfy the jobs resource request (maximum runtime, number of compute nodes, etc...)<br />
<pre><br />
#!/bin/bash<br />
#<br />
#SBATCH --job-name=test<br />
#SBATCH --output=res.txt<br />
#SBATCH --ntasks=1<br />
#SBATCH --partition=express<br />
#SBATCH --time=10:00<br />
#SBATCH --mem-per-cpu=100<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
srun hostname<br />
srun sleep 60<br />
</pre><br />
<br />
Lines starting with '#SBATCH' have a special meaning in the Slurm world. Slurm specific configuration options are specified after the '#SBATCH' characters. Above configuration options are useful for most job scripts and for additional configuration options refer to Slurm commands manual. A job script is submitted to the cluster using Slurm specific commands. There are many commands available, but following three commands are the most common:<br />
* sbatch - to submit job<br />
* scancel - to delete job<br />
* squeue - to view job status<br />
<br />
We can submit above job script using sbatch command:<br />
<pre><br />
$ sbatch HelloCheaha.sh<br />
Submitted batch job 52707<br />
</pre><br />
<br />
When the job script is submitted, Slurm queues it up and assigns it a job number (e.g. 52707 in above example). The job number is available inside job script using environment variable $JOB_ID. This variable can be used inside job script to create job related directory structure or file names.<br />
<br />
=== Interactive Resources ===<br />
Login Node (the host that you connected to when you setup the SSH connection to Cheaha) is supposed to be used for submitting jobs and/or lighter prep work required for the job scripts. '''Do not run heavy computations on the login node'''. If you have a heavier workload to prepare for a batch job (eg. compiling code or other manipulations of data) or your compute application requires interactive control, you should request a dedicated interactive node for this work.<br />
<br />
Interactive resources are requested by submitting an "interactive" job to the scheduler. Interactive jobs will provide you a command line on a compute resource that you can use just like you would the command line on the login node. The difference is that the scheduler has dedicated the requested resources to your job and you can run your interactive commands without having to worry about impacting other users on the login node.<br />
<br />
Interactive jobs, that can be run on command line, are requested with the '''srun''' command. <br />
<br />
<pre><br />
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash<br />
</pre><br />
<br />
This command requests for 4 cores (--cpus-per-task) for a single task (--ntasks) with each cpu requesting size 4GB of RAM (--mem-per-cpu) for 8 hrs (--time).<br />
<br />
More advanced interactive scenarios to support graphical applications are available using [https://docs.uabgrid.uab.edu/wiki/Setting_Up_VNC_Session VNC] or X11 tunneling [http://www.uab.edu/it/software X-Win32 2014 for Windows]<br />
<br />
Interactive jobs that requires running a graphical application, are requested with the '''sinteractive''' command, via '''Terminal''' on your VNC window.<br />
<pre><br />
sinteractive --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME<br />
</pre><br />
Please note, sinteractive starts your shell in a screen session. Screen is a terminal emulator that is designed to make it possible to detach and reattach a session. This feature can mostly be ignored. If you application uses `ctrl-a` as a special command sequence (e.g. Emacs), however, you may find the application doesn't receive this special character. When using screen, you need to type `ctrl-a a` (ctrl-a followed by a single "a" key press) to send a ctrl-a to your application. Screen uses ctrl-a as it's own command character, so this special sequence issues the command to screen to "send ctrl-a to my app". Learn more about [https://www.gnu.org/software/screen/manual/html_node/Overview.html#Overview screen from it's documentation].<br />
<br />
== Storage ==<br />
{{SensitiveInformation}}<br />
=== No Automatic Backups ===<br />
<br />
There is no automatic back up of any data on the cluster (home, scratch, or whatever). All data back up is managed by you. If you aren't managing a data back up process, then you have no backup data.<br />
<br />
=== Home directories ===<br />
<br />
Your home directory on Cheaha is NFS-mounted to the compute nodes as /home/$USER or $HOME. It is acceptable to use your home directory as a location to store job scripts, custom code, and libraries. You are responsible for keeping your home directory under 10GB in size!<br />
<br />
'''The home directory must not be used to store large amounts of data.''' Please use $USER_SCRATCH <br />
for actively used data sets and $USER_DATA for storage of non scratch data.<br />
<br />
=== Scratch ===<br />
Research Computing policy requires that all bulky input and output must be located on the scratch space. The home directory is intended to store your job scripts, log files, libraries and other supporting files.<br />
<br />
'''Important Information:'''<br />
* Scratch space (network and local) '''is not backed up'''.<br />
* Research Computing expects each user to keep their scratch areas clean. The cluster scratch area are not to be used for archiving data.<br />
<br />
Cheaha has two types of scratch space, network mounted and local.<br />
* Network scratch ($USER_SCRATCH) is available on the login node and each compute node. This storage is a GPFS high performance file system providing roughly 4.7PB of usable storage. This should be your jobs primary working directory, unless the job would benefit from local scratch (see below).<br />
* Local scratch is physically located on each compute node and is not accessible to the other nodes (including the login node). This space is useful if the job performs a lot of file I/O. Most of the jobs that run on our clusters do not fall into this category. Because the local scratch is inaccessible outside the job, it is important to note that you must move any data between local scratch to your network accessible scratch within your job. For example, step 1 in the job could be to copy the input from $USER_SCRATCH to ${USER_SCRATCH}, step 2 code execution, step 3 move the data back to $USER_SCRATCH.<br />
<br />
==== Network Scratch ====<br />
Network scratch is available using the environment variable $USER_SCRATCH or directly by /data/scratch/$USER<br />
<br />
It is advisable to use the environment variable whenever possible rather than the hard coded path.<br />
<br />
==== Local Scratch ====<br />
Each compute node has a local scratch directory that is accessible via the variable '''$LOCAL_SCRATCH'''. If your job performs a lot of file I/O, the job should use $LOCAL_SCRATCH rather than $USER_SCRATCH to prevent bogging down the network scratch file system. The amount of scratch space available is approximately 800GB.<br />
<br />
The $LOCAL_SCRATCH is a special temporary directory and it's important to note that this directory is deleted when the job completes, so the job script has to move the results to $USER_SCRATCH or other location prior to the job exiting.<br />
<br />
Note that $LOCAL_SCRATCH is only useful for jobs in which all processes run on the same compute node, so MPI jobs are not candidates for this solution.<br />
<br />
The following is an array job example that uses $LOCAL_SCRATCH by transferring the inputs into $LOCAL_SCRATCH at the beginning of the script and the result out of $LOCAL_SCRATCH at the end of the script.<br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --array=1-10<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=R_array_job<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=R_array_job.err<br />
#SBATCH --output=R_array_job.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes and the appropriate partition<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load R/3.2.0-goolf-1.7.20<br />
<br />
echo "TMPDIR: $LOCAL_SCRATCH"<br />
<br />
cd $LOCAL_SCRATCH<br />
# Create a working directory under the special scheduler local scratch directory<br />
# using the array job's taskID<br />
mdkir $SLURM_ARRAY_TASK_ID<br />
cd $SLURM_ARRAY_TASK_ID<br />
<br />
# Next copy the input data to the local scratch<br />
echo "Copying input data from network scratch to $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID - $(date)<br />
# The input data in this case has a numerical file extension that<br />
# matches $SLURM_ARRAY_TASK_ID<br />
cp -a $USER_SCRATCH/GeneData/INP*.$SLURM_ARRAY_TASK_ID ./<br />
echo "copied input data from network scratch to $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID - $(date)<br />
<br />
someapp -S 1 -D 10 -i INP*.$SLURM_ARRAY_TASK_ID -o geneapp.out.$SLURM_ARRAY_TASK_ID<br />
<br />
# Lastly copy the results back to network scratch<br />
echo "Copying results from local $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID to network - $(date)<br />
cp -a geneapp.out.$SLURM_ARRAY_TASK_ID $USER_SCRATCH/GeneData/<br />
echo "Copied results from local $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID to network - $(date)<br />
<br />
</pre><br />
<br />
=== Project Storage ===<br />
Cheaha has a location where shared data can be stored called $SHARE_PROJECT. As with user scratch, this area '''is not backed up'''!<br />
<br />
This is helpful if a team of researchers must access the same data. Please open a help desk ticket to request a project directory under $SHARE_PROJECT.<br />
<br />
=== Uploading Data ===<br />
{{SensitiveInformation}}<br />
Data can be moved onto the cluster (pushed) from a remote client (ie. you desktop) via SCP or SFTP. Data can also be downloaded to the cluster (pulled) by issuing transfer commands once you are logged into the cluster. Common transfer methods are `wget <URL>`, FTP, or SCP, and depend on how the data is made available from the data provider.<br />
<br />
Large data sets should be staged directly to your $USER_SCRATCH directory so as not to fill up $HOME. If you are working on a data set shared with multiple users, it's preferable to request space in $SHARE_PROJECT rather than duplicating the data for each user.<br />
<br />
== Environment Modules ==<br />
[http://modules.sourceforge.net/ Environment Modules] is installed on Cheaha and should be used when constructing your job scripts if an applicable module file exists. Using the module command you can easily configure your environment for specific software packages without having to know the specific environment variables and values to set. Modules allows you to dynamically configure your environment without having to logout / login for the changes to take affect.<br />
<br />
If you find that specific software does not have a module, please submit a [http://etlab.eng.uab.edu/ helpdesk ticket] to request the module.<br />
<br />
* Cheaha supports bash completion for the module command. For example, type 'module' and press the TAB key twice to see a list of options:<br />
<pre><br />
module TAB TAB<br />
<br />
add display initlist keyword refresh switch use <br />
apropos help initprepend list rm unload whatis <br />
avail initadd initrm load show unuse <br />
clear initclear initswitch purge swap update<br />
</pre><br />
<br />
* To see the list of available modulefiles on the cluster, run the '''module avail''' command (note the example list below may not be complete!) or '''module load ''' followed by two tab key presses:<br />
<pre><br />
module avail<br />
<br />
----------------------------------------------------------------------------------------- /cm/shared/modulefiles -----------------------------------------------------------------------------------------<br />
acml/gcc/64/5.3.1 acml/open64-int64/mp/fma4/5.3.1 fftw2/openmpi/gcc/64/float/2.1.5 intel-cluster-runtime/ia32/3.8 netcdf/gcc/64/4.3.3.1<br />
acml/gcc/fma4/5.3.1 blacs/openmpi/gcc/64/1.1patch03 fftw2/openmpi/open64/64/double/2.1.5 intel-cluster-runtime/intel64/3.8 netcdf/open64/64/4.3.3.1<br />
acml/gcc/mp/64/5.3.1 blacs/openmpi/open64/64/1.1patch03 fftw2/openmpi/open64/64/float/2.1.5 intel-cluster-runtime/mic/3.8 netperf/2.7.0<br />
acml/gcc/mp/fma4/5.3.1 blas/gcc/64/3.6.0 fftw3/openmpi/gcc/64/3.3.4 intel-tbb-oss/ia32/44_20160526oss open64/4.5.2.1<br />
acml/gcc-int64/64/5.3.1 blas/open64/64/3.6.0 fftw3/openmpi/open64/64/3.3.4 intel-tbb-oss/intel64/44_20160526oss openblas/dynamic/0.2.15<br />
acml/gcc-int64/fma4/5.3.1 bonnie++/1.97.1 gdb/7.9 iozone/3_434 openmpi/gcc/64/1.10.1<br />
acml/gcc-int64/mp/64/5.3.1 cmgui/7.2 globalarrays/openmpi/gcc/64/5.4 lapack/gcc/64/3.6.0 openmpi/open64/64/1.10.1<br />
acml/gcc-int64/mp/fma4/5.3.1 cuda75/blas/7.5.18 globalarrays/openmpi/open64/64/5.4 lapack/open64/64/3.6.0 pbspro/13.0.2.153173<br />
acml/open64/64/5.3.1 cuda75/fft/7.5.18 hdf5/1.6.10 mpich/ge/gcc/64/3.2 puppet/3.8.4<br />
acml/open64/fma4/5.3.1 cuda75/gdk/352.79 hdf5_18/1.8.16 mpich/ge/open64/64/3.2 rc-base<br />
acml/open64/mp/64/5.3.1 cuda75/nsight/7.5.18 hpl/2.1 mpiexec/0.84_432 scalapack/mvapich2/gcc/64/2.0.2<br />
acml/open64/mp/fma4/5.3.1 cuda75/profiler/7.5.18 hwloc/1.10.1 mvapich/gcc/64/1.2rc1 scalapack/openmpi/gcc/64/2.0.2<br />
acml/open64-int64/64/5.3.1 cuda75/toolkit/7.5.18 intel/compiler/32/15.0/2015.5.223 mvapich/open64/64/1.2rc1 sge/2011.11p1<br />
acml/open64-int64/fma4/5.3.1 default-environment intel/compiler/64/15.0/2015.5.223 mvapich2/gcc/64/2.2b slurm/15.08.6<br />
acml/open64-int64/mp/64/5.3.1 fftw2/openmpi/gcc/64/double/2.1.5 intel-cluster-checker/2.2.2 mvapich2/open64/64/2.2b torque/6.0.0.1<br />
<br />
---------------------------------------------------------------------------------------- /share/apps/modulefiles -----------------------------------------------------------------------------------------<br />
rc/BrainSuite/15b rc/freesurfer/freesurfer-5.3.0 rc/intel/compiler/64/ps_2016/2016.0.047 rc/matlab/R2015a rc/SAS/v9.4<br />
rc/cmg/2012.116.G rc/gromacs-intel/5.1.1 rc/Mathematica/10.3 rc/matlab/R2015b<br />
rc/dsistudio/dsistudio-20151020 rc/gtool/0.7.5 rc/matlab/R2012a rc/MRIConvert/2.0.8<br />
<br />
--------------------------------------------------------------------------------------- /share/apps/rc/modules/all ---------------------------------------------------------------------------------------<br />
AFNI/linux_openmp_64-goolf-1.7.20-20160616 gperf/3.0.4-intel-2016a MVAPICH2/2.2b-GCC-4.9.3-2.25<br />
Amber/14-intel-2016a-AmberTools-15-patchlevel-13-13 grep/2.15-goolf-1.4.10 NASM/2.11.06-goolf-1.7.20<br />
annovar/2016Feb01-foss-2015b-Perl-5.22.1 GROMACS/5.0.5-intel-2015b-hybrid NASM/2.11.08-foss-2015b<br />
ant/1.9.6-Java-1.7.0_80 GSL/1.16-goolf-1.7.20 NASM/2.11.08-intel-2016a<br />
APBS/1.4-linux-static-x86_64 GSL/1.16-intel-2015b NASM/2.12.02-foss-2016a<br />
ASHS/rev103_20140612 GSL/2.1-foss-2015b NASM/2.12.02-intel-2015b<br />
Aspera-Connect/3.6.1 gtool/0.7.5_linux_x86_64 NASM/2.12.02-intel-2016a<br />
ATLAS/3.10.1-gompi-1.5.12-LAPACK-3.4.2 guile/1.8.8-GNU-4.9.3-2.25 ncurses/5.9-foss-2015b<br />
Autoconf/2.69-foss-2016a HAPGEN2/2.2.0 ncurses/5.9-GCC-4.8.4<br />
Autoconf/2.69-GCC-4.8.4 HarfBuzz/1.2.7-intel-2016a ncurses/5.9-GNU-4.9.3-2.25<br />
Autoconf/2.69-GNU-4.9.3-2.25 HDF5/1.8.15-patch1-intel-2015b ncurses/5.9-goolf-1.4.10<br />
. <br />
.<br />
.<br />
.<br />
</pre><br />
<br />
Some software packages have multiple module files, for example:<br />
* GCC/4.7.2 <br />
* GCC/4.8.1 <br />
* GCC/4.8.2 <br />
* GCC/4.8.4 <br />
* GCC/4.9.2 <br />
* GCC/4.9.3 <br />
* GCC/4.9.3-2.25 <br />
<br />
In this case, the GCC module will always load the latest version, so loading this module is equivalent to loading GCC/4.9.3-2.25. If you always want to use the latest version, use this approach. If you want use a specific version, use the module file containing the appropriate version number.<br />
<br />
Some modules, when loaded, will actually load other modules. For example, the ''GROMACS/5.0.5-intel-2015b-hybrid '' module will also load ''intel/2015b'' and other related tools.<br />
<br />
* To load a module, ex: for a GROMACS job, use the following '''module load''' command in your job script:<br />
<pre><br />
module load GROMACS/5.0.5-intel-2015b-hybrid <br />
</pre><br />
<br />
* To see a list of the modules that you currently have loaded use the '''module list''' command<br />
<pre><br />
module list<br />
<br />
Currently Loaded Modulefiles:<br />
1) slurm/15.08.6 9) impi/5.0.3.048-iccifort-2015.3.187-GNU-4.9.3-2.25 17) Tcl/8.6.3-intel-2015b<br />
2) rc-base 10) iimpi/7.3.5-GNU-4.9.3-2.25 18) SQLite/3.8.8.1-intel-2015b<br />
3) GCC/4.9.3-binutils-2.25 11) imkl/11.2.3.187-iimpi-7.3.5-GNU-4.9.3-2.25 19) Tk/8.6.3-intel-2015b-no-X11<br />
4) binutils/2.25-GCC-4.9.3-binutils-2.25 12) intel/2015b 20) Python/2.7.9-intel-2015b<br />
5) GNU/4.9.3-2.25 13) bzip2/1.0.6-intel-2015b 21) Boost/1.58.0-intel-2015b-Python-2.7.9<br />
6) icc/2015.3.187-GNU-4.9.3-2.25 14) zlib/1.2.8-intel-2015b 22) GROMACS/5.0.5-intel-2015b-hybrid<br />
7) ifort/2015.3.187-GNU-4.9.3-2.25 15) ncurses/5.9-intel-2015b<br />
8) iccifort/2015.3.187-GNU-4.9.3-2.25 16) libreadline/6.3-intel-2015b<br />
</pre><br />
<br />
* A module can be removed from your environment by using the '''module unload''' command:<br />
<pre><br />
module unload GROMACS/5.0.5-intel-2015b-hybrid<br />
</pre><br />
<br />
* The definition of a module can also be viewed using the '''module show''' command, revealing what a specific module will do to your environment:<br />
<pre><br />
module show GROMACS/5.0.5-intel-2015b-hybrid <br />
-------------------------------------------------------------------<br />
/share/apps/rc/modules/all/GROMACS/5.0.5-intel-2015b-hybrid:<br />
<br />
module-whatis GROMACS is a versatile package to perform molecular dynamics,<br />
i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. - Homepage: http://www.gromacs.org <br />
conflict GROMACS <br />
prepend-path CPATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/include <br />
prepend-path LD_LIBRARY_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64 <br />
prepend-path LIBRARY_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64 <br />
prepend-path MANPATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/share/man <br />
prepend-path PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/bin <br />
prepend-path PKG_CONFIG_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64/pkgconfig <br />
setenv EBROOTGROMACS /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid <br />
setenv EBVERSIONGROMACS 5.0.5 <br />
setenv EBDEVELGROMACS /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/easybuild/GROMACS-5.0.5-intel-2015b-hybrid-easybuild-devel <br />
-------------------------------------------------------------------<br />
</pre><br />
<br />
=== Error Using Modules from a Job Script ===<br />
<br />
If you are using modules and the command your job executes runs fine from the command line but fails when you run it from the job, you may be having an issue with the script initialization. If you see this error in your job error output file<br />
<pre><br />
-bash: module: line 1: syntax error: unexpected end of file<br />
-bash: error importing function definition for `BASH_FUNC_module'<br />
</pre><br />
Add the command `unset module` before calling your module files. The -V job argument will cause a conflict with the module function used in your script.<br />
<br />
== Sample Job Scripts ==<br />
The following are sample job scripts, please be careful to edit these for your environment (i.e. replace <font color="red">YOUR_EMAIL_ADDRESS</font> with your real email address), set the h_rt to an appropriate runtime limit and modify the job name and any other parameters.<br />
<br />
'''Hello World''' is the classic example used throughout programming. We don't want to buck the system, so we'll use it as well to demonstrate jobs submission with one minor variation: our hello world will send us a greeting using the name of whatever machine it runs on. For example, when run on the Cheaha login node, it would print "Hello from login001".<br />
<br />
=== Hello World (serial) ===<br />
<br />
A serial job is one that can run independently of other commands, ie. it doesn't depend on the data from other jobs running simultaneously. You can run many serial jobs in any order. This is a common solution to processing lots of data when each command works on a single piece of data. For example, running the same conversion on 100s of images.<br />
<br />
Here we show how to create job script for one simple command. Running more than one command just requires submitting more jobs.<br />
<br />
* Create your hello world application. Run this command to create a script, turn it into to a command, and run the command (just copy and past the following on to the command line).<br />
<pre><br />
cat > helloworld.sh << EOF<br />
#!/bin/bash<br />
echo Hello from `hostname`<br />
EOF<br />
chmod +x helloworld.sh<br />
./helloworld.sh<br />
</pre><br />
<br />
* Create the Slurm job script that will request 256 MB RAM and a maximum runtime of 10 minutes.<br />
<pre><br />
$ vi helloworld.job<br />
</pre><br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=helloworld<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=helloworld.err<br />
#SBATCH --output=helloworld.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=$USER@uab.edu<br />
<br />
./helloworld.sh<br />
</pre><br />
* Submit the job to Slurm scheduler and check the status using squeue<br />
<pre><br />
$ sbatch helloworld.job<br />
Submitted batch job 52888<br />
</pre><br />
* When the job completes, you should have output files named helloworld.out and helloworld.err <br />
<pre><br />
$ cat helloworld.out <br />
Hello from c0003<br />
</pre><br />
<br />
=== Hello World (parallel with MPI) ===<br />
<br />
MPI is used to coordinate the activity of many computations occurring in parallel. It is commonly used in simulation software for molecular dynamics, fluid dynamics, and similar domains where there is significant communication (data) exchanged between cooperating process.<br />
<br />
Here is a simple parallel Slurm job script for running commands the rely on MPI. This example also includes the example of compiling the code and submitting the job script to the Slurm scheduler.<br />
<br />
* First, create a directory for the Hello World jobs<br />
<pre><br />
$ mkdir -p ~/jobs/helloworld<br />
$ cd ~/jobs/helloworld<br />
</pre><br />
* Create the Hello World code written in C (this example of MPI enabled Hello World includes a 3 minute sleep to ensure the job runs for several minutes, a normal hello world example would run in a matter of seconds).<br />
<pre><br />
$ vi helloworld-mpi.c<br />
</pre><br />
<pre><br />
#include <stdio.h><br />
#include <mpi.h><br />
<br />
main(int argc, char **argv)<br />
{<br />
int rank, size;<br />
<br />
int i, j;<br />
float f;<br />
<br />
MPI_Init(&argc,&argv);<br />
MPI_Comm_rank(MPI_COMM_WORLD, &rank);<br />
MPI_Comm_size(MPI_COMM_WORLD, &size);<br />
<br />
printf("Hello World from process %d of %d.\n", rank, size);<br />
sleep(180);<br />
for (j=0; j<=100000; j++)<br />
for(i=0; i<=100000; i++)<br />
f=i*2.718281828*i+i+i*3.141592654;<br />
<br />
MPI_Finalize();<br />
}<br />
</pre><br />
* Compile the code, first purging any modules you may have loaded followed by loading the module for OpenMPI GNU. The mpicc command will compile the code and produce a binary named helloworld_gnu_openmpi<br />
<pre><br />
$ module purge<br />
$ module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
<br />
$ mpicc helloworld-mpi.c -o helloworld_gnu_openmpi<br />
</pre><br />
* Create the Slurm job script that will request 8 cpu slots and a maximum runtime of 10 minutes<br />
<pre><br />
$ vi helloworld.job<br />
</pre><br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=helloworld_mpi<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=helloworld_mpi.err<br />
#SBATCH --output=helloworld_mpi.out<br />
#SBATCH --ntasks=8<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:01:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
mpirun -np $SLURM_NTASKS helloworld_gnu_openmpi<br />
</pre><br />
* Submit the job to Slurm scheduler and check the status using squeue -u $USER<br />
<pre><br />
$ sbatch helloworld.job<br />
<br />
Submitted batch job 52893<br />
<br />
$ squeue -u BLAZERID<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
52893 express hellowor BLAZERID R 2:07 2 c[0005-0006]<br />
<br />
</pre><br />
* When the job completes, you should have output files named helloworld_mpi.out and helloworld_mpi.err<br />
<pre><br />
$ cat helloworld_mpi.out<br />
<br />
Hello World from process 1 of 8.<br />
Hello World from process 3 of 8.<br />
Hello World from process 4 of 8.<br />
Hello World from process 7 of 8.<br />
Hello World from process 5 of 8.<br />
Hello World from process 6 of 8.<br />
Hello World from process 0 of 8.<br />
Hello World from process 2 of 8.<br />
</pre><br />
<br />
=== Hello World (serial) -- revisited ===<br />
<br />
The job submit scripts (sbatch scripts) are actually bash shell scripts in their own right. The reason for using the funky #SBATCH prefix in the scripts is so that bash interprets any such line as a comment and won't execute it. Because the # character starts a comment in bash, we can weave the Slurm scheduler directives (the #SBATCH lines) into standard bash scripts. This lets us build scripts that we can execute locally and then easily run the same script to on a cluster node by calling it with sbatch. This can be used to our advantage to create a more fluid experience in moving between development and production job runs. <br />
<br />
The following example is a simple variation on the serial job above. All we will do is convert our Slurm job script into a command called helloworld that calls the helloworld.sh command.<br />
<br />
If the first line of a file is #!/bin/bash and that file is executable, the shell will automatically run the command as if were any other system command, eg. ls. That is, the ".sh" extension on our HelloWorld.sh script is completely optional and is only meaningful to the user.<br />
<br />
Copy the serial helloworld.job script to a new file, add a the special #!/bin/bash as the first line, and make it executable with the following command (note: those are single quotes in the echo command): <br />
<pre><br />
echo '#!/bin/bash' | cat helloworld.job > helloworld ; chmod +x helloworld<br />
</pre><br />
<br />
Our sbatch script has now become a regular command. We can now execute the command with the simple prefix "./helloworld", which means "execute this file in the current directory":<br />
<pre><br />
./helloworld<br />
Hello from login001<br />
</pre><br />
Or if we want to run the command on a compute node, replace the "./" prefix with "sbatch ":<br />
<pre><br />
$ sbatch helloworld<br />
Submitted batch job 53001<br />
</pre><br />
And when the cluster run is complete you can look at the content of the output:<br />
<pre><br />
$ $ cat helloworld.out <br />
Hello from c0003<br />
</pre><br />
<br />
You can use this approach of treating you sbatch files as command wrappers to build a collection of commands that can be executed locally or via sbatch. The other examples can be restructured similarly.<br />
<br />
To avoid having to use the "./" prefix, just add the current directory to your PATH. Also, if you plan to do heavy development using this feature on the cluster, please be sure to run [https://docs.uabgrid.uab.edu/wiki/Slurm#Interactive_Session sinteractive] first so you don't load the login node with your development work.<br />
<br />
=== Gromacs ===<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --partition=short<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=test_gromacs<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=test_gromacs.err<br />
#SBATCH --output=test_gromacs.out<br />
#SBATCH --ntasks=8<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=10:00:00<br />
#SBATCH --mem-per-cpu=2048<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
<br />
module load GROMACS/5.0.5-intel-2015b-hybrid <br />
<br />
# Change directory to the job working directory if not already there<br />
cd ${USER_SCRATCH}/jobs/gromacs<br />
<br />
# Single precision<br />
MDRUN=mdrun_mpi<br />
<br />
# Enter your tpr file over here<br />
export MYFILE=example.tpr<br />
<br />
mpirun -np SLURM_NTASKS $MDRUN -v -s $MYFILE -o $MYFILE -c $MYFILE -x $MYFILE -e $MYFILE -g ${MYFILE}.log<br />
<br />
</pre><br />
<br />
=== R ===<br />
<br />
The following is an example job script that will use an array of 10 tasks (--array=1-10), each task has a max runtime of 2 hours and will use no more than 256 MB of RAM per task.<br />
<br />
Create a working directory and the job submission script<br />
<pre><br />
$ mkdir -p ~/jobs/ArrayExample<br />
$ cd ~/jobs/ArrayExample<br />
$ vi R-example-array.job<br />
</pre><br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --array=1-10<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=R_array_job<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=R_array_job.err<br />
#SBATCH --output=R_array_job.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load R/3.2.0-goolf-1.7.20 <br />
cd ~/jobs/ArrayExample/rep$SLURM_ARRAY_TASK_ID<br />
srun R CMD BATCH rscript.R<br />
</pre><br />
<br />
Submit the job to the Slurm scheduler and check the status of the job using the squeue command<br />
<pre><br />
$ sbatch R-example-array.job<br />
$ squeue -u $USER<br />
</pre><br />
<br />
== Installed Software ==<br />
<br />
A partial list of installed software with additional instructions for their use is available on the [[Cheaha Software]] page.</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Galaxy_File_Uploads&diff=5732Galaxy File Uploads2018-05-03T15:25:26Z<p>Tanthony@uab.edu: added Sensitiveinformation Template</p>
<hr />
<div>{{SensitiveInformation}}<br />
[https://galaxy.uabgrid.uab.edu UAB Galaxy] supports data import in three ways:<br />
<br />
{| border="1"<br />
|+ <br />
! Method !! Limitation <br />
|-<br />
| Direct file uploads to using a web browser<br />
| only files < 2G<br />
|-<br />
| Fetching data from external URLs through Galaxy (ftp/http)<br />
| can't access some password protected sites, such as the HudsonAlpha GSL<br />
|-<br />
| Importing files via the Cheaha file system<br />
| requires an [[Cheaha_GettingStarted#Access|account]] on cheaha, but command-line can be avoided<br />
|-<br />
|}<br />
<br />
==Direct file uploads to using a web browser==<br />
Web browser based file upload is a convenient approach, but not recommended for files larger than 2 GB in size because of browser limitations. Also, web browser based upload in Galaxy doesn't provide any feedback on upload progress and it can be an unreliable operation. Hence, it's recommended to stage data on Galaxy accessible file-system and then import it in Galaxy.<br />
<br />
==Importing files via the Cheaha file system==<br />
UAB Galaxy instance is configured to look for files in '/scratch/importfs/galaxy/$USER' and '/scratch/user/$USER' directories on Cheaha. Data files can be copied to Cheaha using [[Wikipedia:Secure_copy|scp]] or they can be downloaded using tools like wget, curl or ftp. A nice windows-friendly drag-and-drop tool is [http://winscp.net/eng/download.php#download2 WinSCP]. Please refer to [[Cheaha_GettingStarted#Access]] page for getting access to Cheaha.<br />
<br />
Following sections provide an overview of UAB Galaxy import methods. <br />
<br />
# importfs or file drop-off mode: UAB Galaxy platform is configured to import files in $GALAXY_IMPORTFS directory on Cheaha (/scratch/importfs/galaxy/$USER). Galaxy application 'moves' files from imports directory to it's internal datasets directory. See [[Galaxy_Importfs]] page for more details on this upload method.<br />
# Data Library: Galaxy has a concept of 'Data Libraries' which is a data container to organize files in an hierarchical manner, similar to directories on a desktop. Data libraries provide other features for data organization and sharing as well. Data libraries support files uploads using a web browser, fetching from external URLs and also by copying existing directories in a file-system. The file-system copy is similar to importfs option described above, however, it copies file to internal datasets directory rather than moving it. UAB Galaxy platform is configured to copy files in $USER_SCRATCH (/scratch/user/$USER) directory. See [[Galaxy_Data_Libraries]] page for more details on data libraries.</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Cheaha_GettingStarted&diff=5731Cheaha GettingStarted2018-05-03T15:22:58Z<p>Tanthony@uab.edu: /* Storage */ added Sensitiveinformation Template</p>
<hr />
<div>Cheaha is a cluster computing environment for UAB researchers. Information about the history and future plans for Cheaha is available on the [[Cheaha]] page.<br />
<br />
== Access (Cluster Account Request) ==<br />
<br />
To request an account on [[Cheaha]], please {{CheahaAccountRequest}}. Please include some background information about the work you plan on doing on the cluster and the group you work with, ie. your lab or affiliation.<br />
<br />
Usage of Cheaha is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources. <br />
<br />
The official DNS name of Cheaha's frontend machine is ''cheaha.rc.uab.edu''. If you want to refer to the machine as ''cheaha'', you'll have to either add the "rc.uab.edu" to you computer's DNS search path. On Unix-derived systems (Linux, Mac) you can edit your computers /etc/resolv.conf as follows (you'll need administrator access to edit this file)<br />
<pre><br />
search rc.uab.edu<br />
</pre><br />
Or you can customize your SSH configuration to use the short name "cheaha" as a connection name. On systems using OpenSSH you can add the following to your ~/.ssh/config file<br />
<br />
<pre><br />
Host cheaha<br />
Hostname cheaha.rc.uab.edu<br />
</pre><br />
<br />
== Login ==<br />
===Overview===<br />
Once your account has been created, you'll receive an email containing your user ID, generally your Blazer ID. Logging into Cheaha requires an SSH client. Most UAB Windows workstations already have an SSH client installed, possibly named '''SSH Secure Shell Client''' or [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY]. Linux and Mac OS X systems should have an SSH client installed by default.<br />
<br />
Usage of Cheaha is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources.<br />
<br />
===Client Configuration===<br />
This section will cover steps to configure Windows, Linux and Mac OS X clients to connect to Cheaha.<br />
====Linux====<br />
Linux systems, regardless of the flavor (RedHat, SuSE, Ubuntu, etc...), should already have an SSH client on the system as part of the default install.<br />
# Start a terminal (on RedHat click Applications -> Accessories -> Terminal, on Ubuntu Ctrl+Alt+T)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Mac OS X====<br />
Mac OS X is a Unix operating system (BSD) and has a built in ssh client.<br />
# Start a terminal (click Finder, type Terminal and double click on Terminal under the Applications category)<br />
# At the prompt, enter the following command to connect to Cheaha ('''Replace blazerid with your Cheaha userid''')<br />
ssh '''blazerid'''@cheaha.rc.uab.edu<br />
<br />
====Windows====<br />
There are many SSH clients available for Windows, some commercial and some that are free (GPL). This section will cover two clients that are commonly found on UAB Windows systems.<br />
=====MobaXterm=====<br />
[http://mobaxterm.mobatek.net/ MobaXterm] is a free (also available for a price in a Profession version) suite of SSH tools. Of the Windows clients we've used, MobaXterm is the easiest to use and feature complete. [http://mobaxterm.mobatek.net/features.html Features] include (but not limited to):<br />
* SSH client (in a handy web browser like tabbed interface)<br />
* Embedded Cygwin (which allows Windows users to run many Linux commands like grep, rsync, sed)<br />
* Remote file system browser (graphical SFTP)<br />
* X11 forwarding for remotely displaying graphical content from Cheaha<br />
* Installs without requiring Windows Administrator rights<br />
<br />
Start MobaXterm and click the Session toolbar button (top left). Click SSH for the session type, enter the following information and click OK. Once finished, double click cheaha.rc.uab.edu in the list of Saved sessions under PuTTY sessions:<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Remote host'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|}<br />
<br />
=====PuTTY=====<br />
[http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY] is a free suite of SSH and telnet tools written and maintained by [http://www.pobox.com/~anakin/ Simon Tatham]. PuTTY supports SSH, secure FTP (SFTP), and X forwarding (XTERM) among other tools.<br />
<br />
* Start PuTTY (Click START -> All Programs -> PuTTY -> PuTTY). The 'PuTTY Configuration' window will open<br />
* Use these settings for each of the clusters that you would like to configure<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host Name (or IP address)'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Saved Sessions'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|}<br />
* Click '''Save''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start PuTTY, simply double click on the cluster name under the 'Saved Sessions' list<br />
<br />
=====SSH Secure Shell Client=====<br />
SSH Secure Shell is a commercial application that is installed on many Windows workstations on campus and can be configured as follows:<br />
* Start the program (Click START -> All Programs -> SSH Secure Shell -> Secure Shell Client). The 'default - SSH Secure Shell' window will open<br />
* Click File -> Profiles -> Add Profile to open the 'Add Profile' window<br />
* Type in the name of the cluster (for example: cheaha) in the field and click 'Add to Profiles'<br />
* Click File -> Profiles -> Edit Profiles to open the 'Profiles' window<br />
* Single click on your new profile name<br />
* Use these settings for the clusters<br />
{| border="1" cellpadding="5"<br />
!Field<br />
!Cheaha Settings<br />
|-<br />
|'''Host name'''<br />
|cheaha.rc.uab.edu<br />
|-<br />
|'''User name'''<br />
|blazerid (insert your blazerid here)<br />
|-<br />
|'''Port'''<br />
|22<br />
|-<br />
|'''Protocol'''<br />
|SSH<br />
|-<br />
|'''Encryption algorithm'''<br />
|<Default><br />
|-<br />
|'''MAC algorithm'''<br />
|<Default><br />
|-<br />
|'''Compression'''<br />
|<None><br />
|-<br />
|'''Terminal answerback'''<br />
|vt100<br />
|-<br />
|}<br />
* Leave 'Connect through firewall' and 'Request tunnels only' unchecked<br />
* Click '''OK''' to save the configuration, repeat the previous steps for the other clusters<br />
* The next time you start SSH Secure Shell, click 'Profiles' and click the cluster name<br />
<br />
=== Logging in to Cheaha ===<br />
No matter which client you use to connect to the Cheaha, the first time you connect, the SSH client should display a message asking if you would like to import the hosts public key. Answer '''Yes''' to this question.<br />
<br />
* Connect to Cheaha using one of the methods listed above<br />
* Answer '''Yes''' to import the cluster's public key<br />
** Enter your BlazerID password<br />
<br />
* After successfully logging in for the first time, You may see the following message '''just press ENTER for the next three prompts, don't type any passphrases!'''<br />
<br />
It doesn't appear that you have set up your ssh key.<br />
This process will make the files:<br />
/home/joeuser/.ssh/id_rsa.pub<br />
/home/joeuser/.ssh/id_rsa<br />
/home/joeuser/.ssh/authorized_keys<br />
<br />
Generating public/private rsa key pair.<br />
Enter file in which to save the key (/home/joeuser/.ssh/id_rsa):<br />
** Enter file in which to save the key (/home/joeuser/.ssh/id_rsa):'''Press Enter'''<br />
** Enter passphrase (empty for no passphrase):'''Press Enter'''<br />
** Enter same passphrase again:'''Press Enter'''<br />
Your identification has been saved in /home/joeuser/.ssh/id_rsa.<br />
Your public key has been saved in /home/joeuser/.ssh/id_rsa.pub.<br />
The key fingerprint is:<br />
f6:xx:xx:xx:xx:dd:9a:79:7b:83:xx:f9:d7:a7:d6:27 joeuser@cheaha.rc.uab.edu<br />
<br />
==== Users without a blazerid (collaborators from other universities) ====<br />
** If you were issued a temporary password, enter it (Passwords are CaSE SensitivE!!!) You should see a message similar to this<br />
You are required to change your password immediately (password aged)<br />
<br />
WARNING: Your password has expired.<br />
You must change your password now and login again!<br />
Changing password for user joeuser.<br />
Changing password for joeuser<br />
(current) UNIX password:<br />
*** (current) UNIX password: '''Enter your temporary password at this prompt and press enter'''<br />
*** New UNIX password: '''Enter your new strong password and press enter'''<br />
*** Retype new UNIX password: '''Enter your new strong password again and press enter'''<br />
*** After you enter your new password for the second time and press enter, the shell may exit automatically. If it doesn't, type exit and press enter<br />
*** Log in again, this time use your new password<br />
<br />
Congratulations, you should now have a command prompt and be ready to start [[Cheaha_GettingStarted#Example_Batch_Job_Script | submitting jobs]]!!!<br />
<br />
== Hardware ==<br />
[[Image:Chehah2_2016.png|center|thumb|450px|Logical Diagram of Cheaha Configuration]]<br />
<br />
The Cheaha Compute Platform includes commodity compute hardware, totaling 2800 compute cores and over 4.7PB of usable storage (6.6PB raw capacity). The following descriptions highlight the current hardware profile that provides an aggregate theoretical peak performance of 468 teraflops.<br />
<br />
* Compute <br />
** 36 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 38 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 256GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 14 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 384GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 4 Compute Nodes with Nvidia Tesla K80 and two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 4 Compute Nodes with Intel Phi coprocessor SE10/7120 and two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 18 Compute Nodes with two 14 core processors (Intel Xeon E5-2680 v4 2.4GHz)with 256GB DDR4 RAM, four NVIDIA Tesla P100 16GB GPUs, EDR InfiniBand and 10GigE network cards<br />
<br />
* Networking<br />
**FDR and EDR InfiniBand Switch<br />
** 10Gigabit Ethernet Switch<br />
<br />
* Storage -- DDN SFA12KX with GPFS) <br />
** 2 x 12KX40D-56IB controllers<br />
** 10 x SS8460 disk enclosures<br />
** 825 x 4K SAS drives<br />
<br />
* Management <br />
** Management node and gigabit switch for cluster management<br />
** Bright Advanced Cluster Management software licenses<br />
<br />
== Cluster Software ==<br />
* BrightCM 7.2<br />
* CentOS 7.2 x86_64<br />
* [[Slurm]] 15.08<br />
<br />
== Queuing System ==<br />
All work on Cheaha must be submitted to '''our queuing system ([[Slurm]])'''. A common mistake made by new users is to run 'jobs' on the login node. This section gives a basic overview of what a queuing system is and why we use it.<br />
=== What is a queuing system? ===<br />
* Software that gives users fair allocation of the cluster's resources<br />
* Schedules jobs based using resource requests (the following are commonly requested resources, there are many more that are available)<br />
** Number of processors (often referred to as "slots")<br />
** Maximum memory (RAM) required per slot<br />
** Maximum run time<br />
* Common queuing systems:<br />
** '''[[Slurm]]'''<br />
** Sun Grid Engine (Also know as SGE, OGE, GE)<br />
** OpenPBS<br />
** Torque<br />
** LSF (load sharing facility)<br />
<br />
[http://slurm.schedmd.com/ Slurm] is a queue management system and stands for Simple Linux Utility for Resource Management. Slurm was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. '''[[Slurm]]''' is now the primary job manager on Cheaha, it replaces SUN Grid Engine ([[https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted_deprecated SGE]]) the job manager used earlier. Instructions of using SLURM and writing SLURM scripts for jobs submission on Cheaha can be found '''[[Slurm | here]]'''.<br />
<br />
=== Typical Workflow ===<br />
* Stage data to $USER_SCRATCH (your scratch directory)<br />
* Research how to run your code in "batch" mode. Batch mode typically means the ability to run it from the command line without requiring any interaction from the user.<br />
* Identify the appropriate resources needed to run the job. The following are mandatory resource requests for all jobs on Cheaha<br />
** Maximum memory (RAM) required per slot<br />
** Maximum runtime<br />
* Write a job script specifying queuing system parameters, resource requests and commands to run program<br />
* Submit script to queuing system (sbatch script.job)<br />
* Monitor job (squeue)<br />
* Review the results and resubmit as necessary<br />
* Clean up the scratch directory by moving or deleting the data off of the cluster<br />
<br />
=== Resource Requests ===<br />
Accurate resource requests are extremely important to the health of the over all cluster. In order for Cheaha to operate properly, the queing system must know how much runtime and RAM each job will need.<br />
<br />
==== Mandatory Resource Requests ====<br />
<br />
* -t, --time=<time><br />
Set a limit on the total run time of the job allocation. If the requested time limit exceeds the partition's time limit, the job will be left in a PENDING state (possibly indefinitely).<br />
* For Array jobs, this represents the maximum run time for each task<br />
** For serial or parallel jobs, this represents the maximum run time for the entire job<br />
<br />
* --mem-per-cpu=<MB><br />
Mimimum memory required per allocated CPU in MegaBytes.<br />
<br />
==== Other Common Resource Requests ====<br />
* -N, --nodes=<minnodes[-maxnodes]><br />
Request that a minimum of minnodes nodes be allocated to this job. A maximum node count may also be specified with maxnodes. If only one number is specified, this is used as both the minimum and maximum node count.<br />
<br />
* -n, --ntasks=<number><br />
sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node<br />
<br />
* --mem=<MB><br />
Specify the real memory required per node in MegaBytes.<br />
<br />
* -c, --cpus-per-task=<ncpus><br />
Advise the Slurm controller that ensuing job steps will require ncpus number of processors per task. Without this option, the controller will just try to allocate one processor per task.<br />
<br />
* -p, --partition=<partition_names><br />
Request a specific partition for the resource allocation. Available partitions are: express(max 2 hrs), short(max 12 hrs), medium(max 50 hrs), long(max 150 hrs), sinteractive(0-2 hrs)<br />
<br />
=== Submitting Jobs ===<br />
Batch Jobs are submitted on Cheaha by using the "sbatch" command. The full manual for sbtach is available by running the following command<br />
man sbatch<br />
<br />
==== Job Script File Format ====<br />
To submit a job to the queuing systems, you will first define your job in a script (a text file) and then submit that script to the queuing system.<br />
<br />
The script file needs to be '''formatted as a UNIX file''', not a Windows or Mac text file. In geek speak, this means that the end of line (EOL) character should be a line feed (LF) rather than a carriage return line feed (CRLF) for Windows or carriage return (CR) for Mac.<br />
<br />
If you submit a job script formatted as a Windows or Mac text file, your job will likely fail with misleading messages, for example that the path specified does not exist.<br />
<br />
Windows '''Notepad''' does not have the ability to save files using the UNIX file format. Do NOT use Notepad to create files intended for use on the clusters. Instead use one of the alternative text editors listed in the following section.<br />
<br />
===== Converting Files to UNIX Format =====<br />
====== Dos2Unix Method ======<br />
The lines below that begin with $ are commands, the $ represents the command prompt and should not be typed!<br />
<br />
The dos2unix program can be used to convert Windows text files to UNIX files with a simple command. After you have copied the file to your home directory on the cluster, you can identify that the file is a Windows file by executing the following (Windows uses CR LF as the line terminator, where UNIX uses only LF and Mac uses only CR):<br />
<pre><br />
$ file testfile.txt<br />
<br />
testfile.txt: ASCII text, with CRLF line terminators<br />
</pre><br />
<br />
Now, convert the file to UNIX<br />
<pre><br />
$ dos2unix testfile.txt<br />
<br />
dos2unix: converting file testfile.txt to UNIX format ...<br />
</pre><br />
<br />
Verify the conversion using the file command<br />
<pre><br />
$ file testfile.txt<br />
<br />
testfile.txt: ASCII text<br />
</pre><br />
<br />
====== Alternative Windows Text Editors ======<br />
There are many good text editors available for Windows that have the capability to save files using the UNIX file format. Here are a few:<br />
* [[http://www.geany.org/ Geany]] is an excellent free text editor for Windows and Linux that supports Windows, UNIX and Mac file formats, syntax highlighting and many programming features. To convert from Windows to UNIX click '''Document''' click '''Set Line Endings''' and then '''Convert and Set to LF (Unix)'''<br />
* [[http://notepad-plus.sourceforge.net/uk/site.htm Notepad++]] is a great free Windows text editor that supports Windows, UNIX and Mac file formats, syntax highlighting and many programming features. To convert from Windows to UNIX click '''Format''' and then click '''Convert to UNIX Format'''<br />
* [[http://www.textpad.com/ TextPad]] is another excellent Windows text editor. TextPad is not free, however.<br />
<br />
==== Example Batch Job Script ====<br />
A shared cluster environment like Cheaha uses a job scheduler to run tasks on the cluster to provide optimal resource sharing among users. Cheaha uses a job scheduling system call Slurm to schedule and manage jobs. A user needs to tell Slurm about resource requirements (e.g. CPU, memory) so that it can schedule jobs effectively. These resource requirements along with actual application code can be specified in a single file commonly referred as 'Job Script/File'. Following is a simple job script that prints job number and hostname.<br />
<br />
'''Note:'''Jobs '''must request''' the appropriate partition (ex: ''--partition=short'') to satisfy the jobs resource request (maximum runtime, number of compute nodes, etc...)<br />
<pre><br />
#!/bin/bash<br />
#<br />
#SBATCH --job-name=test<br />
#SBATCH --output=res.txt<br />
#SBATCH --ntasks=1<br />
#SBATCH --partition=express<br />
#SBATCH --time=10:00<br />
#SBATCH --mem-per-cpu=100<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
srun hostname<br />
srun sleep 60<br />
</pre><br />
<br />
Lines starting with '#SBATCH' have a special meaning in the Slurm world. Slurm specific configuration options are specified after the '#SBATCH' characters. Above configuration options are useful for most job scripts and for additional configuration options refer to Slurm commands manual. A job script is submitted to the cluster using Slurm specific commands. There are many commands available, but following three commands are the most common:<br />
* sbatch - to submit job<br />
* scancel - to delete job<br />
* squeue - to view job status<br />
<br />
We can submit above job script using sbatch command:<br />
<pre><br />
$ sbatch HelloCheaha.sh<br />
Submitted batch job 52707<br />
</pre><br />
<br />
When the job script is submitted, Slurm queues it up and assigns it a job number (e.g. 52707 in above example). The job number is available inside job script using environment variable $JOB_ID. This variable can be used inside job script to create job related directory structure or file names.<br />
<br />
=== Interactive Resources ===<br />
Login Node (the host that you connected to when you setup the SSH connection to Cheaha) is supposed to be used for submitting jobs and/or lighter prep work required for the job scripts. '''Do not run heavy computations on the login node'''. If you have a heavier workload to prepare for a batch job (eg. compiling code or other manipulations of data) or your compute application requires interactive control, you should request a dedicated interactive node for this work.<br />
<br />
Interactive resources are requested by submitting an "interactive" job to the scheduler. Interactive jobs will provide you a command line on a compute resource that you can use just like you would the command line on the login node. The difference is that the scheduler has dedicated the requested resources to your job and you can run your interactive commands without having to worry about impacting other users on the login node.<br />
<br />
Interactive jobs, that can be run on command line, are requested with the '''srun''' command. <br />
<br />
<pre><br />
srun --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME --pty /bin/bash<br />
</pre><br />
<br />
This command requests for 4 cores (--cpus-per-task) for a single task (--ntasks) with each cpu requesting size 4GB of RAM (--mem-per-cpu) for 8 hrs (--time).<br />
<br />
More advanced interactive scenarios to support graphical applications are available using [https://docs.uabgrid.uab.edu/wiki/Setting_Up_VNC_Session VNC] or X11 tunneling [http://www.uab.edu/it/software X-Win32 2014 for Windows]<br />
<br />
Interactive jobs that requires running a graphical application, are requested with the '''sinteractive''' command, via '''Terminal''' on your VNC window.<br />
<pre><br />
sinteractive --ntasks=1 --cpus-per-task=4 --mem-per-cpu=4096 --time=08:00:00 --partition=medium --job-name=JOB_NAME<br />
</pre><br />
Please note, sinteractive starts your shell in a screen session. Screen is a terminal emulator that is designed to make it possible to detach and reattach a session. This feature can mostly be ignored. If you application uses `ctrl-a` as a special command sequence (e.g. Emacs), however, you may find the application doesn't receive this special character. When using screen, you need to type `ctrl-a a` (ctrl-a followed by a single "a" key press) to send a ctrl-a to your application. Screen uses ctrl-a as it's own command character, so this special sequence issues the command to screen to "send ctrl-a to my app". Learn more about [https://www.gnu.org/software/screen/manual/html_node/Overview.html#Overview screen from it's documentation].<br />
<br />
== Storage ==<br />
{{SensitiveInformation}}<br />
=== No Automatic Backups ===<br />
<br />
There is no automatic back up of any data on the cluster (home, scratch, or whatever). All data back up is managed by you. If you aren't managing a data back up process, then you have no backup data.<br />
<br />
=== Home directories ===<br />
<br />
Your home directory on Cheaha is NFS-mounted to the compute nodes as /home/$USER or $HOME. It is acceptable to use your home directory as a location to store job scripts, custom code, and libraries. You are responsible for keeping your home directory under 10GB in size!<br />
<br />
'''The home directory must not be used to store large amounts of data.''' Please use $USER_SCRATCH <br />
for actively used data sets and $USER_DATA for storage of non scratch data.<br />
<br />
=== Scratch ===<br />
Research Computing policy requires that all bulky input and output must be located on the scratch space. The home directory is intended to store your job scripts, log files, libraries and other supporting files.<br />
<br />
'''Important Information:'''<br />
* Scratch space (network and local) '''is not backed up'''.<br />
* Research Computing expects each user to keep their scratch areas clean. The cluster scratch area are not to be used for archiving data.<br />
<br />
Cheaha has two types of scratch space, network mounted and local.<br />
* Network scratch ($USER_SCRATCH) is available on the login node and each compute node. This storage is a GPFS high performance file system providing roughly 4.7PB of usable storage. This should be your jobs primary working directory, unless the job would benefit from local scratch (see below).<br />
* Local scratch is physically located on each compute node and is not accessible to the other nodes (including the login node). This space is useful if the job performs a lot of file I/O. Most of the jobs that run on our clusters do not fall into this category. Because the local scratch is inaccessible outside the job, it is important to note that you must move any data between local scratch to your network accessible scratch within your job. For example, step 1 in the job could be to copy the input from $USER_SCRATCH to ${USER_SCRATCH}, step 2 code execution, step 3 move the data back to $USER_SCRATCH.<br />
<br />
==== Network Scratch ====<br />
Network scratch is available using the environment variable $USER_SCRATCH or directly by /data/scratch/$USER<br />
<br />
It is advisable to use the environment variable whenever possible rather than the hard coded path.<br />
<br />
==== Local Scratch ====<br />
Each compute node has a local scratch directory that is accessible via the variable '''$LOCAL_SCRATCH'''. If your job performs a lot of file I/O, the job should use $LOCAL_SCRATCH rather than $USER_SCRATCH to prevent bogging down the network scratch file system. The amount of scratch space available is approximately 800GB.<br />
<br />
The $LOCAL_SCRATCH is a special temporary directory and it's important to note that this directory is deleted when the job completes, so the job script has to move the results to $USER_SCRATCH or other location prior to the job exiting.<br />
<br />
Note that $LOCAL_SCRATCH is only useful for jobs in which all processes run on the same compute node, so MPI jobs are not candidates for this solution.<br />
<br />
The following is an array job example that uses $LOCAL_SCRATCH by transferring the inputs into $LOCAL_SCRATCH at the beginning of the script and the result out of $LOCAL_SCRATCH at the end of the script.<br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --array=1-10<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=R_array_job<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=R_array_job.err<br />
#SBATCH --output=R_array_job.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes and the appropriate partition<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load R/3.2.0-goolf-1.7.20<br />
<br />
echo "TMPDIR: $LOCAL_SCRATCH"<br />
<br />
cd $LOCAL_SCRATCH<br />
# Create a working directory under the special scheduler local scratch directory<br />
# using the array job's taskID<br />
mdkir $SLURM_ARRAY_TASK_ID<br />
cd $SLURM_ARRAY_TASK_ID<br />
<br />
# Next copy the input data to the local scratch<br />
echo "Copying input data from network scratch to $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID - $(date)<br />
# The input data in this case has a numerical file extension that<br />
# matches $SLURM_ARRAY_TASK_ID<br />
cp -a $USER_SCRATCH/GeneData/INP*.$SLURM_ARRAY_TASK_ID ./<br />
echo "copied input data from network scratch to $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID - $(date)<br />
<br />
someapp -S 1 -D 10 -i INP*.$SLURM_ARRAY_TASK_ID -o geneapp.out.$SLURM_ARRAY_TASK_ID<br />
<br />
# Lastly copy the results back to network scratch<br />
echo "Copying results from local $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID to network - $(date)<br />
cp -a geneapp.out.$SLURM_ARRAY_TASK_ID $USER_SCRATCH/GeneData/<br />
echo "Copied results from local $LOCAL_SCRATCH/$SLURM_ARRAY_TASK_ID to network - $(date)<br />
<br />
</pre><br />
<br />
=== Project Storage ===<br />
Cheaha has a location where shared data can be stored called $SHARE_PROJECT. As with user scratch, this area '''is not backed up'''!<br />
<br />
This is helpful if a team of researchers must access the same data. Please open a help desk ticket to request a project directory under $SHARE_PROJECT.<br />
<br />
=== Uploading Data ===<br />
<br />
Data can be moved onto the cluster (pushed) from a remote client (ie. you desktop) via SCP or SFTP. Data can also be downloaded to the cluster (pulled) by issuing transfer commands once you are logged into the cluster. Common transfer methods are `wget <URL>`, FTP, or SCP, and depend on how the data is made available from the data provider.<br />
<br />
Large data sets should be staged directly to your $USER_SCRATCH directory so as not to fill up $HOME. If you are working on a data set shared with multiple users, it's preferable to request space in $SHARE_PROJECT rather than duplicating the data for each user.<br />
<br />
== Environment Modules ==<br />
[http://modules.sourceforge.net/ Environment Modules] is installed on Cheaha and should be used when constructing your job scripts if an applicable module file exists. Using the module command you can easily configure your environment for specific software packages without having to know the specific environment variables and values to set. Modules allows you to dynamically configure your environment without having to logout / login for the changes to take affect.<br />
<br />
If you find that specific software does not have a module, please submit a [http://etlab.eng.uab.edu/ helpdesk ticket] to request the module.<br />
<br />
* Cheaha supports bash completion for the module command. For example, type 'module' and press the TAB key twice to see a list of options:<br />
<pre><br />
module TAB TAB<br />
<br />
add display initlist keyword refresh switch use <br />
apropos help initprepend list rm unload whatis <br />
avail initadd initrm load show unuse <br />
clear initclear initswitch purge swap update<br />
</pre><br />
<br />
* To see the list of available modulefiles on the cluster, run the '''module avail''' command (note the example list below may not be complete!) or '''module load ''' followed by two tab key presses:<br />
<pre><br />
module avail<br />
<br />
----------------------------------------------------------------------------------------- /cm/shared/modulefiles -----------------------------------------------------------------------------------------<br />
acml/gcc/64/5.3.1 acml/open64-int64/mp/fma4/5.3.1 fftw2/openmpi/gcc/64/float/2.1.5 intel-cluster-runtime/ia32/3.8 netcdf/gcc/64/4.3.3.1<br />
acml/gcc/fma4/5.3.1 blacs/openmpi/gcc/64/1.1patch03 fftw2/openmpi/open64/64/double/2.1.5 intel-cluster-runtime/intel64/3.8 netcdf/open64/64/4.3.3.1<br />
acml/gcc/mp/64/5.3.1 blacs/openmpi/open64/64/1.1patch03 fftw2/openmpi/open64/64/float/2.1.5 intel-cluster-runtime/mic/3.8 netperf/2.7.0<br />
acml/gcc/mp/fma4/5.3.1 blas/gcc/64/3.6.0 fftw3/openmpi/gcc/64/3.3.4 intel-tbb-oss/ia32/44_20160526oss open64/4.5.2.1<br />
acml/gcc-int64/64/5.3.1 blas/open64/64/3.6.0 fftw3/openmpi/open64/64/3.3.4 intel-tbb-oss/intel64/44_20160526oss openblas/dynamic/0.2.15<br />
acml/gcc-int64/fma4/5.3.1 bonnie++/1.97.1 gdb/7.9 iozone/3_434 openmpi/gcc/64/1.10.1<br />
acml/gcc-int64/mp/64/5.3.1 cmgui/7.2 globalarrays/openmpi/gcc/64/5.4 lapack/gcc/64/3.6.0 openmpi/open64/64/1.10.1<br />
acml/gcc-int64/mp/fma4/5.3.1 cuda75/blas/7.5.18 globalarrays/openmpi/open64/64/5.4 lapack/open64/64/3.6.0 pbspro/13.0.2.153173<br />
acml/open64/64/5.3.1 cuda75/fft/7.5.18 hdf5/1.6.10 mpich/ge/gcc/64/3.2 puppet/3.8.4<br />
acml/open64/fma4/5.3.1 cuda75/gdk/352.79 hdf5_18/1.8.16 mpich/ge/open64/64/3.2 rc-base<br />
acml/open64/mp/64/5.3.1 cuda75/nsight/7.5.18 hpl/2.1 mpiexec/0.84_432 scalapack/mvapich2/gcc/64/2.0.2<br />
acml/open64/mp/fma4/5.3.1 cuda75/profiler/7.5.18 hwloc/1.10.1 mvapich/gcc/64/1.2rc1 scalapack/openmpi/gcc/64/2.0.2<br />
acml/open64-int64/64/5.3.1 cuda75/toolkit/7.5.18 intel/compiler/32/15.0/2015.5.223 mvapich/open64/64/1.2rc1 sge/2011.11p1<br />
acml/open64-int64/fma4/5.3.1 default-environment intel/compiler/64/15.0/2015.5.223 mvapich2/gcc/64/2.2b slurm/15.08.6<br />
acml/open64-int64/mp/64/5.3.1 fftw2/openmpi/gcc/64/double/2.1.5 intel-cluster-checker/2.2.2 mvapich2/open64/64/2.2b torque/6.0.0.1<br />
<br />
---------------------------------------------------------------------------------------- /share/apps/modulefiles -----------------------------------------------------------------------------------------<br />
rc/BrainSuite/15b rc/freesurfer/freesurfer-5.3.0 rc/intel/compiler/64/ps_2016/2016.0.047 rc/matlab/R2015a rc/SAS/v9.4<br />
rc/cmg/2012.116.G rc/gromacs-intel/5.1.1 rc/Mathematica/10.3 rc/matlab/R2015b<br />
rc/dsistudio/dsistudio-20151020 rc/gtool/0.7.5 rc/matlab/R2012a rc/MRIConvert/2.0.8<br />
<br />
--------------------------------------------------------------------------------------- /share/apps/rc/modules/all ---------------------------------------------------------------------------------------<br />
AFNI/linux_openmp_64-goolf-1.7.20-20160616 gperf/3.0.4-intel-2016a MVAPICH2/2.2b-GCC-4.9.3-2.25<br />
Amber/14-intel-2016a-AmberTools-15-patchlevel-13-13 grep/2.15-goolf-1.4.10 NASM/2.11.06-goolf-1.7.20<br />
annovar/2016Feb01-foss-2015b-Perl-5.22.1 GROMACS/5.0.5-intel-2015b-hybrid NASM/2.11.08-foss-2015b<br />
ant/1.9.6-Java-1.7.0_80 GSL/1.16-goolf-1.7.20 NASM/2.11.08-intel-2016a<br />
APBS/1.4-linux-static-x86_64 GSL/1.16-intel-2015b NASM/2.12.02-foss-2016a<br />
ASHS/rev103_20140612 GSL/2.1-foss-2015b NASM/2.12.02-intel-2015b<br />
Aspera-Connect/3.6.1 gtool/0.7.5_linux_x86_64 NASM/2.12.02-intel-2016a<br />
ATLAS/3.10.1-gompi-1.5.12-LAPACK-3.4.2 guile/1.8.8-GNU-4.9.3-2.25 ncurses/5.9-foss-2015b<br />
Autoconf/2.69-foss-2016a HAPGEN2/2.2.0 ncurses/5.9-GCC-4.8.4<br />
Autoconf/2.69-GCC-4.8.4 HarfBuzz/1.2.7-intel-2016a ncurses/5.9-GNU-4.9.3-2.25<br />
Autoconf/2.69-GNU-4.9.3-2.25 HDF5/1.8.15-patch1-intel-2015b ncurses/5.9-goolf-1.4.10<br />
. <br />
.<br />
.<br />
.<br />
</pre><br />
<br />
Some software packages have multiple module files, for example:<br />
* GCC/4.7.2 <br />
* GCC/4.8.1 <br />
* GCC/4.8.2 <br />
* GCC/4.8.4 <br />
* GCC/4.9.2 <br />
* GCC/4.9.3 <br />
* GCC/4.9.3-2.25 <br />
<br />
In this case, the GCC module will always load the latest version, so loading this module is equivalent to loading GCC/4.9.3-2.25. If you always want to use the latest version, use this approach. If you want use a specific version, use the module file containing the appropriate version number.<br />
<br />
Some modules, when loaded, will actually load other modules. For example, the ''GROMACS/5.0.5-intel-2015b-hybrid '' module will also load ''intel/2015b'' and other related tools.<br />
<br />
* To load a module, ex: for a GROMACS job, use the following '''module load''' command in your job script:<br />
<pre><br />
module load GROMACS/5.0.5-intel-2015b-hybrid <br />
</pre><br />
<br />
* To see a list of the modules that you currently have loaded use the '''module list''' command<br />
<pre><br />
module list<br />
<br />
Currently Loaded Modulefiles:<br />
1) slurm/15.08.6 9) impi/5.0.3.048-iccifort-2015.3.187-GNU-4.9.3-2.25 17) Tcl/8.6.3-intel-2015b<br />
2) rc-base 10) iimpi/7.3.5-GNU-4.9.3-2.25 18) SQLite/3.8.8.1-intel-2015b<br />
3) GCC/4.9.3-binutils-2.25 11) imkl/11.2.3.187-iimpi-7.3.5-GNU-4.9.3-2.25 19) Tk/8.6.3-intel-2015b-no-X11<br />
4) binutils/2.25-GCC-4.9.3-binutils-2.25 12) intel/2015b 20) Python/2.7.9-intel-2015b<br />
5) GNU/4.9.3-2.25 13) bzip2/1.0.6-intel-2015b 21) Boost/1.58.0-intel-2015b-Python-2.7.9<br />
6) icc/2015.3.187-GNU-4.9.3-2.25 14) zlib/1.2.8-intel-2015b 22) GROMACS/5.0.5-intel-2015b-hybrid<br />
7) ifort/2015.3.187-GNU-4.9.3-2.25 15) ncurses/5.9-intel-2015b<br />
8) iccifort/2015.3.187-GNU-4.9.3-2.25 16) libreadline/6.3-intel-2015b<br />
</pre><br />
<br />
* A module can be removed from your environment by using the '''module unload''' command:<br />
<pre><br />
module unload GROMACS/5.0.5-intel-2015b-hybrid<br />
</pre><br />
<br />
* The definition of a module can also be viewed using the '''module show''' command, revealing what a specific module will do to your environment:<br />
<pre><br />
module show GROMACS/5.0.5-intel-2015b-hybrid <br />
-------------------------------------------------------------------<br />
/share/apps/rc/modules/all/GROMACS/5.0.5-intel-2015b-hybrid:<br />
<br />
module-whatis GROMACS is a versatile package to perform molecular dynamics,<br />
i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. - Homepage: http://www.gromacs.org <br />
conflict GROMACS <br />
prepend-path CPATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/include <br />
prepend-path LD_LIBRARY_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64 <br />
prepend-path LIBRARY_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64 <br />
prepend-path MANPATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/share/man <br />
prepend-path PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/bin <br />
prepend-path PKG_CONFIG_PATH /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/lib64/pkgconfig <br />
setenv EBROOTGROMACS /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid <br />
setenv EBVERSIONGROMACS 5.0.5 <br />
setenv EBDEVELGROMACS /share/apps/rc/software/GROMACS/5.0.5-intel-2015b-hybrid/easybuild/GROMACS-5.0.5-intel-2015b-hybrid-easybuild-devel <br />
-------------------------------------------------------------------<br />
</pre><br />
<br />
=== Error Using Modules from a Job Script ===<br />
<br />
If you are using modules and the command your job executes runs fine from the command line but fails when you run it from the job, you may be having an issue with the script initialization. If you see this error in your job error output file<br />
<pre><br />
-bash: module: line 1: syntax error: unexpected end of file<br />
-bash: error importing function definition for `BASH_FUNC_module'<br />
</pre><br />
Add the command `unset module` before calling your module files. The -V job argument will cause a conflict with the module function used in your script.<br />
<br />
== Sample Job Scripts ==<br />
The following are sample job scripts, please be careful to edit these for your environment (i.e. replace <font color="red">YOUR_EMAIL_ADDRESS</font> with your real email address), set the h_rt to an appropriate runtime limit and modify the job name and any other parameters.<br />
<br />
'''Hello World''' is the classic example used throughout programming. We don't want to buck the system, so we'll use it as well to demonstrate jobs submission with one minor variation: our hello world will send us a greeting using the name of whatever machine it runs on. For example, when run on the Cheaha login node, it would print "Hello from login001".<br />
<br />
=== Hello World (serial) ===<br />
<br />
A serial job is one that can run independently of other commands, ie. it doesn't depend on the data from other jobs running simultaneously. You can run many serial jobs in any order. This is a common solution to processing lots of data when each command works on a single piece of data. For example, running the same conversion on 100s of images.<br />
<br />
Here we show how to create job script for one simple command. Running more than one command just requires submitting more jobs.<br />
<br />
* Create your hello world application. Run this command to create a script, turn it into to a command, and run the command (just copy and past the following on to the command line).<br />
<pre><br />
cat > helloworld.sh << EOF<br />
#!/bin/bash<br />
echo Hello from `hostname`<br />
EOF<br />
chmod +x helloworld.sh<br />
./helloworld.sh<br />
</pre><br />
<br />
* Create the Slurm job script that will request 256 MB RAM and a maximum runtime of 10 minutes.<br />
<pre><br />
$ vi helloworld.job<br />
</pre><br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=helloworld<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=helloworld.err<br />
#SBATCH --output=helloworld.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=$USER@uab.edu<br />
<br />
./helloworld.sh<br />
</pre><br />
* Submit the job to Slurm scheduler and check the status using squeue<br />
<pre><br />
$ sbatch helloworld.job<br />
Submitted batch job 52888<br />
</pre><br />
* When the job completes, you should have output files named helloworld.out and helloworld.err <br />
<pre><br />
$ cat helloworld.out <br />
Hello from c0003<br />
</pre><br />
<br />
=== Hello World (parallel with MPI) ===<br />
<br />
MPI is used to coordinate the activity of many computations occurring in parallel. It is commonly used in simulation software for molecular dynamics, fluid dynamics, and similar domains where there is significant communication (data) exchanged between cooperating process.<br />
<br />
Here is a simple parallel Slurm job script for running commands the rely on MPI. This example also includes the example of compiling the code and submitting the job script to the Slurm scheduler.<br />
<br />
* First, create a directory for the Hello World jobs<br />
<pre><br />
$ mkdir -p ~/jobs/helloworld<br />
$ cd ~/jobs/helloworld<br />
</pre><br />
* Create the Hello World code written in C (this example of MPI enabled Hello World includes a 3 minute sleep to ensure the job runs for several minutes, a normal hello world example would run in a matter of seconds).<br />
<pre><br />
$ vi helloworld-mpi.c<br />
</pre><br />
<pre><br />
#include <stdio.h><br />
#include <mpi.h><br />
<br />
main(int argc, char **argv)<br />
{<br />
int rank, size;<br />
<br />
int i, j;<br />
float f;<br />
<br />
MPI_Init(&argc,&argv);<br />
MPI_Comm_rank(MPI_COMM_WORLD, &rank);<br />
MPI_Comm_size(MPI_COMM_WORLD, &size);<br />
<br />
printf("Hello World from process %d of %d.\n", rank, size);<br />
sleep(180);<br />
for (j=0; j<=100000; j++)<br />
for(i=0; i<=100000; i++)<br />
f=i*2.718281828*i+i+i*3.141592654;<br />
<br />
MPI_Finalize();<br />
}<br />
</pre><br />
* Compile the code, first purging any modules you may have loaded followed by loading the module for OpenMPI GNU. The mpicc command will compile the code and produce a binary named helloworld_gnu_openmpi<br />
<pre><br />
$ module purge<br />
$ module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
<br />
$ mpicc helloworld-mpi.c -o helloworld_gnu_openmpi<br />
</pre><br />
* Create the Slurm job script that will request 8 cpu slots and a maximum runtime of 10 minutes<br />
<pre><br />
$ vi helloworld.job<br />
</pre><br />
<pre><br />
#!/bin/bash<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=helloworld_mpi<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=helloworld_mpi.err<br />
#SBATCH --output=helloworld_mpi.out<br />
#SBATCH --ntasks=8<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:01:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
mpirun -np $SLURM_NTASKS helloworld_gnu_openmpi<br />
</pre><br />
* Submit the job to Slurm scheduler and check the status using squeue -u $USER<br />
<pre><br />
$ sbatch helloworld.job<br />
<br />
Submitted batch job 52893<br />
<br />
$ squeue -u BLAZERID<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
52893 express hellowor BLAZERID R 2:07 2 c[0005-0006]<br />
<br />
</pre><br />
* When the job completes, you should have output files named helloworld_mpi.out and helloworld_mpi.err<br />
<pre><br />
$ cat helloworld_mpi.out<br />
<br />
Hello World from process 1 of 8.<br />
Hello World from process 3 of 8.<br />
Hello World from process 4 of 8.<br />
Hello World from process 7 of 8.<br />
Hello World from process 5 of 8.<br />
Hello World from process 6 of 8.<br />
Hello World from process 0 of 8.<br />
Hello World from process 2 of 8.<br />
</pre><br />
<br />
=== Hello World (serial) -- revisited ===<br />
<br />
The job submit scripts (sbatch scripts) are actually bash shell scripts in their own right. The reason for using the funky #SBATCH prefix in the scripts is so that bash interprets any such line as a comment and won't execute it. Because the # character starts a comment in bash, we can weave the Slurm scheduler directives (the #SBATCH lines) into standard bash scripts. This lets us build scripts that we can execute locally and then easily run the same script to on a cluster node by calling it with sbatch. This can be used to our advantage to create a more fluid experience in moving between development and production job runs. <br />
<br />
The following example is a simple variation on the serial job above. All we will do is convert our Slurm job script into a command called helloworld that calls the helloworld.sh command.<br />
<br />
If the first line of a file is #!/bin/bash and that file is executable, the shell will automatically run the command as if were any other system command, eg. ls. That is, the ".sh" extension on our HelloWorld.sh script is completely optional and is only meaningful to the user.<br />
<br />
Copy the serial helloworld.job script to a new file, add a the special #!/bin/bash as the first line, and make it executable with the following command (note: those are single quotes in the echo command): <br />
<pre><br />
echo '#!/bin/bash' | cat helloworld.job > helloworld ; chmod +x helloworld<br />
</pre><br />
<br />
Our sbatch script has now become a regular command. We can now execute the command with the simple prefix "./helloworld", which means "execute this file in the current directory":<br />
<pre><br />
./helloworld<br />
Hello from login001<br />
</pre><br />
Or if we want to run the command on a compute node, replace the "./" prefix with "sbatch ":<br />
<pre><br />
$ sbatch helloworld<br />
Submitted batch job 53001<br />
</pre><br />
And when the cluster run is complete you can look at the content of the output:<br />
<pre><br />
$ $ cat helloworld.out <br />
Hello from c0003<br />
</pre><br />
<br />
You can use this approach of treating you sbatch files as command wrappers to build a collection of commands that can be executed locally or via sbatch. The other examples can be restructured similarly.<br />
<br />
To avoid having to use the "./" prefix, just add the current directory to your PATH. Also, if you plan to do heavy development using this feature on the cluster, please be sure to run [https://docs.uabgrid.uab.edu/wiki/Slurm#Interactive_Session sinteractive] first so you don't load the login node with your development work.<br />
<br />
=== Gromacs ===<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --partition=short<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=test_gromacs<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=test_gromacs.err<br />
#SBATCH --output=test_gromacs.out<br />
#SBATCH --ntasks=8<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=10:00:00<br />
#SBATCH --mem-per-cpu=2048<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load OpenMPI/1.8.8-GNU-4.9.3-2.25<br />
<br />
module load GROMACS/5.0.5-intel-2015b-hybrid <br />
<br />
# Change directory to the job working directory if not already there<br />
cd ${USER_SCRATCH}/jobs/gromacs<br />
<br />
# Single precision<br />
MDRUN=mdrun_mpi<br />
<br />
# Enter your tpr file over here<br />
export MYFILE=example.tpr<br />
<br />
mpirun -np SLURM_NTASKS $MDRUN -v -s $MYFILE -o $MYFILE -c $MYFILE -x $MYFILE -e $MYFILE -g ${MYFILE}.log<br />
<br />
</pre><br />
<br />
=== R ===<br />
<br />
The following is an example job script that will use an array of 10 tasks (--array=1-10), each task has a max runtime of 2 hours and will use no more than 256 MB of RAM per task.<br />
<br />
Create a working directory and the job submission script<br />
<pre><br />
$ mkdir -p ~/jobs/ArrayExample<br />
$ cd ~/jobs/ArrayExample<br />
$ vi R-example-array.job<br />
</pre><br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --array=1-10<br />
#SBATCH --share<br />
#SBATCH --partition=express<br />
#<br />
# Name your job to make it easier for you to track<br />
#<br />
#SBATCH --job-name=R_array_job<br />
#<br />
# Set your error and output files<br />
#<br />
#SBATCH --error=R_array_job.err<br />
#SBATCH --output=R_array_job.out<br />
#SBATCH --ntasks=1<br />
#<br />
# Tell the scheduler only need 10 minutes<br />
#<br />
#SBATCH --time=00:10:00<br />
#SBATCH --mem-per-cpu=256<br />
#<br />
# Set your email address and request notification when you job is complete or if it fails<br />
#<br />
#SBATCH --mail-type=FAIL<br />
#SBATCH --mail-user=YOUR_EMAIL_ADDRESS<br />
<br />
module load R/3.2.0-goolf-1.7.20 <br />
cd ~/jobs/ArrayExample/rep$SLURM_ARRAY_TASK_ID<br />
srun R CMD BATCH rscript.R<br />
</pre><br />
<br />
Submit the job to the Slurm scheduler and check the status of the job using the squeue command<br />
<pre><br />
$ sbatch R-example-array.job<br />
$ squeue -u $USER<br />
</pre><br />
<br />
== Installed Software ==<br />
<br />
A partial list of installed software with additional instructions for their use is available on the [[Cheaha Software]] page.</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Template:SensitiveInformation&diff=5730Template:SensitiveInformation2018-05-03T15:21:35Z<p>Tanthony@uab.edu: Created Template for sensitive information</p>
<hr />
<div><br />
'''<big>Do not store sensitive information on this filesystem.</big>''' <br />
'''It is not encrypted.'''<br />
Note that your data will be stored on the cluster filesystem, and while not<br />
accessible to ordinary users it could be accessible to the cluster administrator(s).</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=MATLAB&diff=5713MATLAB2018-01-17T18:00:45Z<p>Tanthony@uab.edu: </p>
<hr />
<div>{{Main_Banner}}<br />
<br />
<br />
<br />
[[Matlab_OSX_JAVA | '''Mac OSX USERS click here to view any outstanding issues with installing MATLAB on OSX''']]<br />
<br />
<br />
'''[[wikipedia:MATLAB|MATLAB]]''' ('''mat'''rix '''lab'''oratory) is a [[wikipedia:Numerical analysis|numerical computing]] environment and [[wikipedia:fourth-generation programming language|fourth-generation programming language]]. Developed by [[wikipedia:MathWorks|Mathworks]], MATLAB allows [[wikipedia:matrix (mathematics)|matrix]] manipulations, plotting of [[wikipedia:function (mathematics)|functions]] and data, implementation of [[wikipedia:algorithm|algorithm]]s, creation of [[wikipedia:user interface|user interface]]s, and interfacing with programs written in other languages, including [[wikipedia:C (programming language)|C]], [[wikipedia:C++|C++]], and [[wikipedia:Fortran|Fortran]]. An additional package, [[wikipedia:Simulink|Simulink]], adds graphical multi-domain simulation and [[wikipedia:Model based design|Model-Based Design]] for [[wikipedia:dynamical system|dynamic]] and [[wikipedia:embedded systems|embedded systems]].<br />
<br />
MATLAB can be used on personal computers and powerful server systems, including the [[Cheaha]] compute cluster. With the addition of the Parallel Computing Toolbox, the language can be extended with parallel implementations for common computational functions, including for-loop unrolling. Additionally this toolbox supports offloading computationally intensive workloads to [[Cheaha]] the campus compute cluster.<br />
<br />
In January 2011, UAB acquired a site license for MATLAB that allows faculty, staff, post-docs, and graduate students to use MATLAB, Simulink, and 42 toolboxes (including the parallel toolbox) for research activities on campus and personal systems. Additionally, from January 2012 MATLAB is available to students on campus and personal computer systems.<br />
<br />
== MATLAB Versions ==<br />
<br />
Mathworks has two annual releases of Matlab: the "a" release in the spring and the "b" release in the fall. Each release gets tagged with the current year and "a" or "b". For example, "Matlab 2013a" is the spring release for 2013. <br />
<br />
If you are using Matlab in an isolated environment like on your laptop or desktop, you can generally install the most recent release available from Mathworks. <br />
<br />
If you plan to uses specific features of Matlab, however, like running computations on the [[Cheaha]] cluster or using a network install. You should install our recommended release of Matlab that we know works with our services.<br />
<br />
'''The current recommended release is Matlab 2013a.'''<br />
<br />
In UAB IT Research Computing, we update our services to work with the latest Matlab release a month or so after the general release of that product. This gives us time to try out the latest release, get feedback from other early adopters, and update services like the Distributed Computing Toolbox, license server and our documentation.<br />
<br />
Note: you can always install whichever Matlab release you need and that is still available from Mathworks. Different versions of Matlab are always installed side-by side. Depending on your science domain, you may need to select certain releases in order to access specific features. However, not all of these releases may be supported by our compute cluster or network license manager.<br />
<br />
== MATLAB on the Desktop ==<br />
Using Mathworks software available under the UAB campus license on your computer involves download and install steps common to all software packages and an authorization step that grants you the rights to use the software under the campus agreement. <br />
<br />
===Download and Installation Steps===<br />
'''NOTE:'''These steps are common to all installation and activation scenarios and are detailed in [[Downloading and Installing MATLAB]].<br />
<br />
# [http://www.mathworks.com/accesslogin/createProfile.do Create an account at the Mathworks site] using your campus @uab.edu email address. Please do not share your mathworks account ''username'' or ''password'' with anyone as this account will be associated with the UAB TAH license. An end user must choose one of these values when creating a MathWorks Account. '''Students should choose Student use. Faculty, researchers, and grad students should choose Teaching or research in school.''' <br />
#Request an [[Get a UAB Mathworks key |activation key]] from the [http://www.uab.edu/it/software/index.php UAB software library page] for faculty/staff and from [http://uab.onthehub.com/ UAB on the Hub] for students. Please make sure to request the appropriate key (Faculty/staff or student) as the software are on different licenses.<br />
#[[Downloading_and_Installing_MATLAB#Associate_with_the_UAB_TAH_license|Associate your Mathworks account]] with the campus-wide MATLAB license using your activation key.<br />
#[[Downloading_and_Installing_MATLAB#Download_Matlab|Download the software]] from the [http://www.mathworks.com/downloads/web_downloads/agent_check?s_cid=mwa-cmlndedl&mode=gwylf&refer=mwa mathworks download site] and [[Downloading_and_Installing_MATLAB#Install_Matlab|install MATLAB]]<br />
#[[Downloading_and_Installing_MATLAB#Activating_Matlab|Activate the software]] using the activation scenario that best suits your particular needs.<br />
<br />
===Updating MATLAB on Desktop===<br />
<br />
If you have been running MATLAB on your desktop during 2011, one can click 'Help', then 'Licensing', and finally 'Update Current Licenses'. This will remedy the license expiration message without having to update to a new copy of MATLAB. <br />
<br />
===Installation Help===<br />
<br />
MATLAB is a self-supported application at UAB. A UAB MATLAB users peer support forum is available. Subscription options are described below in [[MATLAB#MATLAB Support|MATLAB Support]].<br />
<br />
== MATLAB on Cheaha (compute cluster) ==<br />
<br />
MATLAB is pre-installed on the [[Cheaha]] research computing system. This allows users to run MATLAB directly on the cluster without any need to install software. MATLAB jobs can also be submitted to Cheaha directly from your desktop, however, this requires additional configuration described in [[MatLab DCS]].<br />
<br />
=== Integration with Desktop MATLAB ===<br />
<br />
Accessing the additional compute power of Cheaha from your desktop MATLAB install is recommended for most users because it combines the familiar MATLAB user experience with cluster computing power. However, additional steps are required to configure a desktop MATLAB installation to access worker nodes on the Cheaha cluster via the Distributed Computing Server (DCS) platform. Please see [[MatLab DCS]] for configuration information.<br />
<br />
=== Using Batch Submit from the Desktop Instead of ''matlabpool'' Jobs ===<br />
<br />
It is not possible to use matlabpool jobs on the cluster from your desktop due to firewall restrictions. Instead, desktop MATLAB users should use the batch submit options described in the [[MatLab DCS]] configuration to submit their jobs to the cluster. Matlabpool jobs are possible when running MATLAB directly on the cluster as described in [[MatLab_CLI#Matlabpool.2FParFor_Parallel_Example| matlabpool from the head node]] .<br />
<br />
=== Direct Use on the Cluster ===<br />
<br />
Using MATLAB directly on the cluster is recommended only for people comfortable accessing systems via a command line environment (eg. secure shell SSH). SSH access to Cheaha supports X Windows and VNC sessions for displaying a full graphical MATLAB development environment on client desktops with an X Windows servers or VNC client applications installed. For more information please see [[MatLab CLI]]. Matlabpool jobs are possible when running MATLAB directly in this environment as described in [[MatLab_CLI#Matlabpool.2FParFor_Parallel_Example| matlabpool from the head node]] .<br />
<br />
== Advanced Install Scenarios ==<br />
<br />
This information is helpful for people interested in the many ways in which MATLAB can be installed. A normal end-user installing MATLAB for themselves on a desktop of laptop computer should follow the [[#MATLAB on the Desktop]] instructions above. The following information is of most interest to IT or computer lab administrators who maintain MATLAB installs for many people on many computers.<br />
<br />
=== User Installation and Activation Scenarios ===<br />
<br />
#'''[[Matlab Designated Computer Install | Installation and activation with Designated Computer License ]]''' - This option is recommended for mobile computing systems which may or may not be on the UAB network when MATLAB is being used. This install type authorizes an individual computer to run MATLAB, allowing MATLAB to run regardless of where the computer is located. (This is the only option available if you want to use your MATLAB on your computer when you are not physically present at UAB)<br />
#'''[[Simplified MATLAB Install | Installation and Activation with Network License]]''' - This is the recommended install when MATLAB will be used on computers that remain connected to the campus network. This installation requires MatLab software to be installed on your computer and provides a simple 2-line file to activate the software. This option is highly recommend for UAB desktops.<br />
<br />
'''NOTE''': Most on-campus users are encouraged to use the '''[[Simplified MATLAB Install | Installation and Activation with Network License]]''' option for activation unless there are special circumstances that require the alternative activation scenarios.<br />
<br />
=== Network Concurrent/Lab admin Installation and Activation Scenarios ===<br />
# '''[[Matlab Network Concurrent User Install]]''' - This installation is only recommended for system administrators who already manage a lab or departmental installation of MATLAB and who would like to continue to provide this service for their user community. This install type may also be practical if there are special additional license needs that will apply to multiple computers running MATLAB. Note, all MATLAB toolboxes actively used at UAB are currently covered under the [[UAB TAH license|UAB campus license]].<br />
<br />
== On-line Tutorials and Learning Resources ==<br />
<br />
* [http://www.mathworks.com/help/techdoc/learn_matlab/bqr_2pl.html Getting Started] <br />
* Recorded [http://www.mathworks.com/company/events/webinars/?s_cid=HP_E_RW Webinars], select a topic and complete the request form.<br />
* Interactive Tutorials for Students and Faculty<br />
** [http://www.mathworks.com/academia/matlabtutorial MATLAB]<br />
** [http://www.mathworks.com/academia/simulinktutorial Simulink]<br />
* Example Code, News, Blogs, Teaching Materials <br />
** [http://www.mathworks.com/matlabcentral Matlab Central]<br />
** [http://www.mathworks.com/academia/classroom-resources Classroom resources]<br />
<br />
{{MATLAB Support}}<br />
<br />
== UAB Mathworks Site License ==<br />
<br />
UAB has acquired a university wide site license for MATLAB and Simulink software. This license includes all Mathworks Inc. products in use at UAB, with the exception of the Distributed Computing Server (DCS) which must be licensed separately. This new site license also makes available several new toolboxes and blocksets not previously licensed by UAB.<br />
<br />
This site license is known as the Mathworks Inc. Total Academic Headcount (TAH) license or Mathworks TAH. As of January 1, 2012, UAB has two TAH licenses. First, the TAH campus with license number 678600 is the same TAH license which was in operations during 2011 and is for use by all UAB full-time faculty and staff. Second, the TAH student with license number 731720 is for use by UAB students, where graduate and professional students at UAB with funding or working on UAB research projects should use the TAH campus license. <br />
<br />
Mathworks TAH license -- either campus or student -- will make it easier for everyone in the UAB community to use MATLAB, MATLAB Toolboxes (extensions) and Simulink software. Specifically, it authorizes use of MATLAB on university owned machines for all faculty, staff and students. Faculty and staff are also entitled to install the software (TAH campus - 678600) on personally owned computers. Students are authorized to install TAH student (TAH student - 731730) are authorized to install this software on their personal computer. It is important each authorized user of either TAH license to use the the authentication key corresponding to their authorized TAH license. That is, authorized users of TAH campus (678600) should use the authentication key obtained from http://www.uab.edu/it/software/, after selecting Mathworks/Mathlab, corresponding to the Faculty/staff group. Similarly, authorized users for TAH student (731730) should use the authentication key for the Students group. For questions on which authentication key to use or with help on installing MATHLAB software on your computer, please contact askit@uab.edu or post a question the list serve matlab-user@vo.uabgrid.uab.edu.<br />
<br />
The TAH allows unlimited use MATLAB, Simlink and the 48 MATLAB Toolboxes, block sets and complier in both research and teaching activities. Faculty, staff and students can install the software on computers located off-campus, however, students may only use Mathworks software on UAB owned computers computers located on-campus.<br />
<br />
UAB was the first university in Alabama to implement a Mathworks TAH license.<br />
<br />
== Parallel Computing Extensions ==<br />
<br />
MATLAB language extensions to support parallel processing are available via the Parallel Computing Toolbox. This is one of the [[2011 UAB MATLAB site license software|42 toolboxes]] available under the [[UAB TAH license]]. The Parallel Computing Toolbox enables MATLAB to make use of the multi-core processors found on many computers in order to speed the execution of code sections that can execute in parallel. This toolbox supports the use of up to 8 cores on a single computer systems through the use of worker threads the spread the execution of code across multiple cores.<br />
<br />
Additional parallelism can be supported by adding more worker threads via a secondary software platform known as the [http://www.mathworks.com/products/distriben/ Distributed Computing Server (DCS)]. The DCS runs on a compute cluster and can provide many more worker threads to increase parallelism. UAB IT Research Computing has licensed a 128 worker DCS installation for the Cheaha compute cluster. The Parallel Computing Toolbox can be configured to access this license from desktop MATLAB installations. Please see [[MatLab DCS]] for configuration details.<br />
<br />
[[Category:MATLAB]][[Category:MATLAB installation]][[Category:Software]][[Category:Math]]</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Template:Main_Banner&diff=5683Template:Main Banner2018-01-02T14:12:23Z<p>Tanthony@uab.edu: Changed to maintenance complete and MATLAB renewal instructions</p>
<hr />
<div><!-- MAIN PAGE BANNER --><br />
<table id="mp-banner" style="width: 100%; margin:4px 0 0 0; background:none; border-spacing: 0px;"><br />
<tr><td class="MainPageBG" style="text-align:center; padding:0.2em; background-color:#cef2e0; border:2px solid #f2e0ce; color:#000; font-size:100%;"><br />
<br />
<span style="color:#009000"> '''<big></big>''' </span><br />
<br />
[[Image:information.png|left|link=]]<br />
<span><big>'''Cheaha 2017 Winter Maintenance Complete'''</big> <br />
<br />
<br />
''' <big> Matlab License Renewed 2017-2018 </big> <br />
<br><br />
To update the license click Help > Licensing > Update Current License <br />
<br />
<br />
</span><br />
</td><br />
</tr><br />
</table></div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=RCDay2017&diff=5669RCDay20172017-11-07T19:05:13Z<p>Tanthony@uab.edu: changed Andreas fielname</p>
<hr />
<div><big>Fall 2017 Research Computing Day -- GPU Computing <br />
<br />
Date: October 13, 2017<br />
<br />
Venue: Hill Student Center, Alumni Theater</big><br />
<br />
Open to all UAB faculty, staff, and students. Registration is free but seating is limited, so please register '''[https://www.eventbrite.com/e/research-computing-day-2017-gpu-computing-tickets-38557527603 here]''' to attend. <br />
<br />
<big>Agenda</big><br />
<br />
{| class="wikitable" border="1"<br />
|10:30 am – 11:15 am ||[https://www.youtube.com/watch?v=QVtoLddu188 '''Research Computing Vision''']<br/><br />
Dr. Curtis A. Carver Jr.<br/><br />
Vice President and Chief Information Officer<br/><br />
University of Alabama at Birmingham<br />
|- <br />
|11:15 am – 12:15 pm ||[[:File:UAB_Dell_NVIDIA_AI.pdftest| '''GPU usage for Personalized Medicine and Medical Research''']]<br/><br />
Andrea De Souza<br/><br />
Global Business Development Lead Healthcare and Pharma<br/><br />
Nvidia<br />
|-<br />
|12:15 pm – 1:15 pm ||'''Lunch'''<br />
|-<br />
|1:15 pm – 1:45 pm ||[https://www.dellemc.com/en-us/solutions/high-performance-computing/index.htm '''GPU-based HPC Applications''']<br/><br />
Richard Adkins<br/><br />
Senior Enterprise Sales Engineer<br/><br />
Dell EMC<br />
|-<br />
|1:45 pm – 2:45 pm ||[[:File:Introduction_to_GPU_Computing.pdf |'''GPU Programming''']] <br/><br />
Dr. Jeff Layton<br/><br />
Senior Solutions Architect<br/><br />
Nvidia<br />
|-<br />
|2:45 pm – 3:00 pm ||'''Break'''<br />
|-<br />
|3:00 pm – 3:30 pm ||[[:File:GPUComputingMATLAB13Oct2017.pdf |'''GPU Programming with Matlab''']]<br/><br />
Thomas Anthony<br/><br />
Scientist, Research Computing<br/><br />
University of Alabama at Birmingham<br />
|-<br />
|3:30 pm – 4:00 pm ||[[:File:HCPrc-anatomy-of-engagement.pdf|'''Analyzing the Human Connectome Project (HCP) Datasets using GPUs''']] <br/><br />
John-Paul Robinson<br/><br />
System Architect, Research Computing<br/><br />
University of Alabama at Birmingham<br />
|}</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=RCDay2017&diff=5668RCDay20172017-11-07T18:50:15Z<p>Tanthony@uab.edu: Added John-Pauls talk</p>
<hr />
<div><big>Fall 2017 Research Computing Day -- GPU Computing <br />
<br />
Date: October 13, 2017<br />
<br />
Venue: Hill Student Center, Alumni Theater</big><br />
<br />
Open to all UAB faculty, staff, and students. Registration is free but seating is limited, so please register '''[https://www.eventbrite.com/e/research-computing-day-2017-gpu-computing-tickets-38557527603 here]''' to attend. <br />
<br />
<big>Agenda</big><br />
<br />
{| class="wikitable" border="1"<br />
|10:30 am – 11:15 am ||[https://www.youtube.com/watch?v=QVtoLddu188 '''Research Computing Vision''']<br/><br />
Dr. Curtis A. Carver Jr.<br/><br />
Vice President and Chief Information Officer<br/><br />
University of Alabama at Birmingham<br />
|- <br />
|11:15 am – 12:15 pm ||[[:File:UAB_Dell_NVIDIA_AI.pdf| '''GPU usage for Personalized Medicine and Medical Research''']]<br/><br />
Andrea De Souza<br/><br />
Global Business Development Lead Healthcare and Pharma<br/><br />
Nvidia<br />
|-<br />
|12:15 pm – 1:15 pm ||'''Lunch'''<br />
|-<br />
|1:15 pm – 1:45 pm ||[https://www.dellemc.com/en-us/solutions/high-performance-computing/index.htm '''GPU-based HPC Applications''']<br/><br />
Richard Adkins<br/><br />
Senior Enterprise Sales Engineer<br/><br />
Dell EMC<br />
|-<br />
|1:45 pm – 2:45 pm ||[[:File:Introduction_to_GPU_Computing.pdf |'''GPU Programming''']] <br/><br />
Dr. Jeff Layton<br/><br />
Senior Solutions Architect<br/><br />
Nvidia<br />
|-<br />
|2:45 pm – 3:00 pm ||'''Break'''<br />
|-<br />
|3:00 pm – 3:30 pm ||[[:File:GPUComputingMATLAB13Oct2017.pdf |'''GPU Programming with Matlab''']]<br/><br />
Thomas Anthony<br/><br />
Scientist, Research Computing<br/><br />
University of Alabama at Birmingham<br />
|-<br />
|3:30 pm – 4:00 pm ||[[:File:HCPrc-anatomy-of-engagement.pdf|'''Analyzing the Human Connectome Project (HCP) Datasets using GPUs''']] <br/><br />
John-Paul Robinson<br/><br />
System Architect, Research Computing<br/><br />
University of Alabama at Birmingham<br />
|}</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=File:HCPrc-anatomy-of-engagement.pdf&diff=5667File:HCPrc-anatomy-of-engagement.pdf2017-11-07T18:49:43Z<p>Tanthony@uab.edu: John-Pauls Talk
RC Day 2017</p>
<hr />
<div>John-Pauls Talk<br />
<br />
RC Day 2017</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=RCDay2017&diff=5666RCDay20172017-11-07T18:47:48Z<p>Tanthony@uab.edu: Added Jeff Leytons talk</p>
<hr />
<div><big>Fall 2017 Research Computing Day -- GPU Computing <br />
<br />
Date: October 13, 2017<br />
<br />
Venue: Hill Student Center, Alumni Theater</big><br />
<br />
Open to all UAB faculty, staff, and students. Registration is free but seating is limited, so please register '''[https://www.eventbrite.com/e/research-computing-day-2017-gpu-computing-tickets-38557527603 here]''' to attend. <br />
<br />
<big>Agenda</big><br />
<br />
{| class="wikitable" border="1"<br />
|10:30 am – 11:15 am ||[https://www.youtube.com/watch?v=QVtoLddu188 '''Research Computing Vision''']<br/><br />
Dr. Curtis A. Carver Jr.<br/><br />
Vice President and Chief Information Officer<br/><br />
University of Alabama at Birmingham<br />
|- <br />
|11:15 am – 12:15 pm ||[[:File:UAB_Dell_NVIDIA_AI.pdf| '''GPU usage for Personalized Medicine and Medical Research''']]<br/><br />
Andrea De Souza<br/><br />
Global Business Development Lead Healthcare and Pharma<br/><br />
Nvidia<br />
|-<br />
|12:15 pm – 1:15 pm ||'''Lunch'''<br />
|-<br />
|1:15 pm – 1:45 pm ||[https://www.dellemc.com/en-us/solutions/high-performance-computing/index.htm '''GPU-based HPC Applications''']<br/><br />
Richard Adkins<br/><br />
Senior Enterprise Sales Engineer<br/><br />
Dell EMC<br />
|-<br />
|1:45 pm – 2:45 pm ||[[:File:Introduction_to_GPU_Computing.pdf |'''GPU Programming''']] <br/><br />
Dr. Jeff Layton<br/><br />
Senior Solutions Architect<br/><br />
Nvidia<br />
|-<br />
|2:45 pm – 3:00 pm ||'''Break'''<br />
|-<br />
|3:00 pm – 3:30 pm ||[[:File:GPUComputingMATLAB13Oct2017.pdf |'''GPU Programming with Matlab''']]<br/><br />
Thomas Anthony<br/><br />
Scientist, Research Computing<br/><br />
University of Alabama at Birmingham<br />
|-<br />
|3:30 pm – 4:00 pm ||'''Analyzing the Human Connectome Project (HCP) Datasets using GPUs''' <br/><br />
John-Paul Robinson<br/><br />
System Architect, Research Computing<br/><br />
University of Alabama at Birmingham<br />
|}</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=File:Introduction_to_GPU_Computing.pdf&diff=5665File:Introduction to GPU Computing.pdf2017-11-07T18:47:08Z<p>Tanthony@uab.edu: Jeff Leyton Nvidia
RC day 2017</p>
<hr />
<div>Jeff Leyton Nvidia <br />
RC day 2017</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=RCDay2017&diff=5664RCDay20172017-11-07T18:46:25Z<p>Tanthony@uab.edu: added link to Andreas talk</p>
<hr />
<div><big>Fall 2017 Research Computing Day -- GPU Computing <br />
<br />
Date: October 13, 2017<br />
<br />
Venue: Hill Student Center, Alumni Theater</big><br />
<br />
Open to all UAB faculty, staff, and students. Registration is free but seating is limited, so please register '''[https://www.eventbrite.com/e/research-computing-day-2017-gpu-computing-tickets-38557527603 here]''' to attend. <br />
<br />
<big>Agenda</big><br />
<br />
{| class="wikitable" border="1"<br />
|10:30 am – 11:15 am ||[https://www.youtube.com/watch?v=QVtoLddu188 '''Research Computing Vision''']<br/><br />
Dr. Curtis A. Carver Jr.<br/><br />
Vice President and Chief Information Officer<br/><br />
University of Alabama at Birmingham<br />
|- <br />
|11:15 am – 12:15 pm ||[[:File:UAB_Dell_NVIDIA_AI.pdf| '''GPU usage for Personalized Medicine and Medical Research''']]<br/><br />
Andrea De Souza<br/><br />
Global Business Development Lead Healthcare and Pharma<br/><br />
Nvidia<br />
|-<br />
|12:15 pm – 1:15 pm ||'''Lunch'''<br />
|-<br />
|1:15 pm – 1:45 pm ||[https://www.dellemc.com/en-us/solutions/high-performance-computing/index.htm '''GPU-based HPC Applications''']<br/><br />
Richard Adkins<br/><br />
Senior Enterprise Sales Engineer<br/><br />
Dell EMC<br />
|-<br />
|1:45 pm – 2:45 pm ||'''GPU Programming''' <br/><br />
Dr. Jeff Layton<br/><br />
Senior Solutions Architect<br/><br />
Nvidia<br />
|-<br />
|2:45 pm – 3:00 pm ||'''Break'''<br />
|-<br />
|3:00 pm – 3:30 pm ||[[:File:GPUComputingMATLAB13Oct2017.pdf |'''GPU Programming with Matlab''']]<br/><br />
Thomas Anthony<br/><br />
Scientist, Research Computing<br/><br />
University of Alabama at Birmingham<br />
|-<br />
|3:30 pm – 4:00 pm ||'''Analyzing the Human Connectome Project (HCP) Datasets using GPUs''' <br/><br />
John-Paul Robinson<br/><br />
System Architect, Research Computing<br/><br />
University of Alabama at Birmingham<br />
|}</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=File:UAB_Dell_NVIDIA_AI.pdf&diff=5663File:UAB Dell NVIDIA AI.pdf2017-11-07T18:45:24Z<p>Tanthony@uab.edu: Andrea Dsouza Talk AI
RC day 2017</p>
<hr />
<div>Andrea Dsouza Talk AI<br />
RC day 2017</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=RCDay2017&diff=5662RCDay20172017-11-07T18:43:34Z<p>Tanthony@uab.edu: </p>
<hr />
<div><big>Fall 2017 Research Computing Day -- GPU Computing <br />
<br />
Date: October 13, 2017<br />
<br />
Venue: Hill Student Center, Alumni Theater</big><br />
<br />
Open to all UAB faculty, staff, and students. Registration is free but seating is limited, so please register '''[https://www.eventbrite.com/e/research-computing-day-2017-gpu-computing-tickets-38557527603 here]''' to attend. <br />
<br />
<big>Agenda</big><br />
<br />
{| class="wikitable" border="1"<br />
|10:30 am – 11:15 am ||[https://www.youtube.com/watch?v=QVtoLddu188 '''Research Computing Vision''']<br/><br />
Dr. Curtis A. Carver Jr.<br/><br />
Vice President and Chief Information Officer<br/><br />
University of Alabama at Birmingham<br />
|- <br />
|11:15 am – 12:15 pm ||['''GPU usage for Personalized Medicine and Medical Research''']<br/><br />
Andrea De Souza<br/><br />
Global Business Development Lead Healthcare and Pharma<br/><br />
Nvidia<br />
|-<br />
|12:15 pm – 1:15 pm ||'''Lunch'''<br />
|-<br />
|1:15 pm – 1:45 pm ||[https://www.dellemc.com/en-us/solutions/high-performance-computing/index.htm '''GPU-based HPC Applications''']<br/><br />
Richard Adkins<br/><br />
Senior Enterprise Sales Engineer<br/><br />
Dell EMC<br />
|-<br />
|1:45 pm – 2:45 pm ||'''GPU Programming''' <br/><br />
Dr. Jeff Layton<br/><br />
Senior Solutions Architect<br/><br />
Nvidia<br />
|-<br />
|2:45 pm – 3:00 pm ||'''Break'''<br />
|-<br />
|3:00 pm – 3:30 pm ||[[:File:GPUComputingMATLAB13Oct2017.pdf |'''GPU Programming with Matlab''']]<br/><br />
Thomas Anthony<br/><br />
Scientist, Research Computing<br/><br />
University of Alabama at Birmingham<br />
|-<br />
|3:30 pm – 4:00 pm ||'''Analyzing the Human Connectome Project (HCP) Datasets using GPUs''' <br/><br />
John-Paul Robinson<br/><br />
System Architect, Research Computing<br/><br />
University of Alabama at Birmingham<br />
|}</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=RCDay2017&diff=5661RCDay20172017-11-07T18:43:01Z<p>Tanthony@uab.edu: </p>
<hr />
<div><big>Fall 2017 Research Computing Day -- GPU Computing <br />
<br />
Date: October 13, 2017<br />
<br />
Venue: Hill Student Center, Alumni Theater</big><br />
<br />
Open to all UAB faculty, staff, and students. Registration is free but seating is limited, so please register '''[https://www.eventbrite.com/e/research-computing-day-2017-gpu-computing-tickets-38557527603 here]''' to attend. <br />
<br />
<big>Agenda</big><br />
<br />
{| class="wikitable" border="1"<br />
|10:30 am – 11:15 am ||[https://www.youtube.com/watch?v=QVtoLddu188 '''Research Computing Vision''']<br/><br />
Dr. Curtis A. Carver Jr.<br/><br />
Vice President and Chief Information Officer<br/><br />
University of Alabama at Birmingham<br />
|- <br />
|11:15 am – 12:15 pm ||['''GPU usage for Personalized Medicine and Medical Research''']<br/><br />
Andrea De Souza<br/><br />
Global Business Development Lead Healthcare and Pharma<br/><br />
Nvidia<br />
|-<br />
|12:15 pm – 1:15 pm ||'''Lunch'''<br />
|-<br />
|1:15 pm – 1:45 pm ||[https://www.dellemc.com/en-us/solutions/high-performance-computing/index.htm '''GPU-based HPC Applications''']<br/><br />
Richard Adkins<br/><br />
Senior Enterprise Sales Engineer<br/><br />
Dell EMC<br />
|-<br />
|1:45 pm – 2:45 pm ||'''GPU Programming''' <br/><br />
Dr. Jeff Layton<br/><br />
Senior Solutions Architect<br/><br />
Nvidia<br />
|-<br />
|2:45 pm – 3:00 pm ||'''Break'''<br />
|-<br />
|3:00 pm – 3:30 pm ||'''GPU Programming with Matlab'''[[:File:GPUComputingMATLAB13Oct2017.pdf |pdf]]<br/><br />
Thomas Anthony<br/><br />
Scientist, Research Computing<br/><br />
University of Alabama at Birmingham<br />
|-<br />
|3:30 pm – 4:00 pm ||'''Analyzing the Human Connectome Project (HCP) Datasets using GPUs''' <br/><br />
John-Paul Robinson<br/><br />
System Architect, Research Computing<br/><br />
University of Alabama at Birmingham<br />
|}</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=RCDay2017&diff=5660RCDay20172017-11-07T18:42:12Z<p>Tanthony@uab.edu: Attached Thomas File</p>
<hr />
<div><big>Fall 2017 Research Computing Day -- GPU Computing <br />
<br />
Date: October 13, 2017<br />
<br />
Venue: Hill Student Center, Alumni Theater</big><br />
<br />
Open to all UAB faculty, staff, and students. Registration is free but seating is limited, so please register '''[https://www.eventbrite.com/e/research-computing-day-2017-gpu-computing-tickets-38557527603 here]''' to attend. <br />
<br />
<big>Agenda</big><br />
<br />
{| class="wikitable" border="1"<br />
|10:30 am – 11:15 am ||[https://www.youtube.com/watch?v=QVtoLddu188 '''Research Computing Vision''']<br/><br />
Dr. Curtis A. Carver Jr.<br/><br />
Vice President and Chief Information Officer<br/><br />
University of Alabama at Birmingham<br />
|- <br />
|11:15 am – 12:15 pm ||['''GPU usage for Personalized Medicine and Medical Research''']<br/><br />
Andrea De Souza<br/><br />
Global Business Development Lead Healthcare and Pharma<br/><br />
Nvidia<br />
|-<br />
|12:15 pm – 1:15 pm ||'''Lunch'''<br />
|-<br />
|1:15 pm – 1:45 pm ||[https://www.dellemc.com/en-us/solutions/high-performance-computing/index.htm '''GPU-based HPC Applications''']<br/><br />
Richard Adkins<br/><br />
Senior Enterprise Sales Engineer<br/><br />
Dell EMC<br />
|-<br />
|1:45 pm – 2:45 pm ||'''GPU Programming''' <br/><br />
Dr. Jeff Layton<br/><br />
Senior Solutions Architect<br/><br />
Nvidia<br />
|-<br />
|2:45 pm – 3:00 pm ||'''Break'''<br />
|-<br />
|3:00 pm – 3:30 pm ||'''GPU Programming with Matlab'''[[:File:GPUComputingMATLAB13Oct2017.pdf | pdf]]<br/><br />
Thomas Anthony<br/><br />
Scientist, Research Computing<br/><br />
University of Alabama at Birmingham<br />
|-<br />
|3:30 pm – 4:00 pm ||'''Analyzing the Human Connectome Project (HCP) Datasets using GPUs''' <br/><br />
John-Paul Robinson<br/><br />
System Architect, Research Computing<br/><br />
University of Alabama at Birmingham<br />
|}</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=File:GPUComputingMATLAB13Oct2017.pdf&diff=5658File:GPUComputingMATLAB13Oct2017.pdf2017-11-07T18:39:25Z<p>Tanthony@uab.edu: Thomas Anthony talk
RC Day 2017</p>
<hr />
<div>Thomas Anthony talk <br />
RC Day 2017</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Cheaha&diff=5657Cheaha2017-10-23T13:21:58Z<p>Tanthony@uab.edu: /* Software */ remove broken link to me page</p>
<hr />
<div>{{Main_Banner}}<br />
'''Cheaha''' is a campus resource dedicated to enhancing research computing productivity at UAB. [http://cheaha.uabgrid.uab.edu Cheaha] is managed by [http://www.uab.edu/it UAB Information Technology's Research Computing group (UAB ITRC)] and is available to members of the UAB community in need of increased computational capacity. Cheaha supports [http://en.wikipedia.org/wiki/High-performance_computing high-performance computing (HPC)] and [http://en.wikipedia.org/wiki/High-throughput_computing high throughput computing (HTC)] paradigms.<br />
<br />
Cheaha provides users with a traditional command-line interactive environment with access to many scientific tools that can leverage its dedicated pool of local compute resources. Alternately, users of graphical applications can start a [[Setting_Up_VNC_Session|cluster desktop]]. The local compute pool provides access to compute hardware based on the [http://en.wikipedia.org/wiki/X86_64 x86-64 64-bit architecture]. The compute resources are organized into a unified Research Computing System. The compute fabric for this system is anchored by the Cheaha cluster, [[ Resources |a commodity cluster with approximately 2400 cores]] connected by low-latency Fourteen Data Rate (FDR) InfiniBand networks. The compute nodes are backed by 6.6PB raw GPFS storage on DDN SFA12KX hardware, an additional 20TB available for home directories on a traditional Hitachi SAN, and other ancillary services. The compute nodes combine to provide over 110TFlops of dedicated computing power. <br />
<br />
Cheaha is composed of resources that span data centers located in the UAB Shared Computing facility UAB 936 Building and the RUST Computer Center. Resource design and development is lead by UAB IT Research Computing in open collaboration with community members. Operational [mailto:support@vo.uabgrid.uab.edu support] is provided by UAB IT's Research Computing group.<br />
<br />
Cheaha is named in honor of [http://en.wikipedia.org/wiki/Cheaha_Mountain Cheaha Mountain], the highest peak in the state of Alabama. Cheaha is a popular destination whose summit offers clear vistas of the surrounding landscape. (Cheaha Mountain photo-streams on [http://www.flickr.com/search/?q=cheaha Flikr] and [http://picasaweb.google.com/lh/view?q=cheaha&psc=G&filter=1# Picasa]).<br />
<br />
== Using ==<br />
<br />
=== Getting Started ===<br />
<br />
For information on getting an account, logging in, and running a job, please see [[Cheaha2_GettingStarted|Getting Started]].<br />
<br />
== History ==<br />
<br />
[[Image:Research-computing-platform.png|right|thumb|450px|Logical Diagram of Cheaha Configuration]]<br />
<br />
=== 2005 ===<br />
<br />
In 2002 UAB was awarded an infrastructure development grant through the NSF EPsCoR program. This led to the 2005 acquisition of a 64 node compute cluster with two AMD Opteron 242 1.6Ghz CPUs per node (128 total cores). This cluster was named Cheaha. Cheaha expanded the compute capacity available at UAB and was the first general-access resource for the community. It lead to expanded roles for UAB IT in research computing support through the development of the UAB Shared HPC Facility in BEC and provided further engagement in Globus-based grid computing resource development on campus via UABgrid and regionally via [http://www.suragrid.org SURAgrid].<br />
<br />
=== 2008 ===<br />
<br />
In 2008, money was allocated by UAB IT for hardware upgrades which lead to the acquisition of an additional 192 cores based on a Dell clustering solution with Intel Quad-Core E5450 3.0Ghz CPU in August of 2008. This uprade migrated Cheaha's core infrastructure to the Dell blade clustering solution. It provided a 3 fold increase in processor density over the original hardware and enables more computing power to be located in the same physical space with room for expansion, an important consideration in light of the continued growth in processing demand. This hardware represented a major technology upgrade that included space for additional expansion to address over-all capacity demand and enable resource reservation. <br />
<br />
The 2008 upgrade began a continuous resource improvement plan that includes a phased development approach for Cheaha with on-going increases in capacity and feature enhancements being brought into production via an [http://projects.uabgrid.uab.edu/cheaha open community process].<br />
<br />
Software improvements rolled into the 2008 upgrade included grid computing services to access distributed compute resources and orchestrate jobs using the [http://www.gridway.org GridWay] meta-scheduler. An initial 10Gigabit Ethernet link establishing the UABgrid Research Network was designed to supports high speed data transfers between clusters connected to this network.<br />
<br />
=== 2009 ===<br />
<br />
In 2009, annual investment funds were directed toward establishing a fully connected dual data rate Infiniband network between the compute nodes added in 2008 and laying the foundation for a research storage system with a 60TB DDN storage system accessed via the Lustre distributed file system. The Infiniband and storage fabrics were designed to support significant increases in research data sets and their associate analytical demand.<br />
<br />
=== 2010 ===<br />
<br />
In 2010, UAB was awarded an NIH Small Instrumentation Grant (SIG) to further increase analytical and storage capacity. The grant funds were combined with the annual investment funds adding 576 cores (48 nodes) based on the Intel Westmere 2.66 GHz CPU, a quad data rate Infiniband fabric with 32 uplinks, an additional 120 TB of storage for the DDN fabric, and additional hardware to improve reliability. Additional improvements to the research compute platform involved extending the UAB Research Network to link the BEC and RUST data centers, adding 20TB of user and ancillary services storage<br />
<br />
=== 2012 ===<br />
<br />
In 2012, UAB IT Research Computing invested in the foundation hardware to expand long term storage and virtual machine capabilities with aqcuisition of 12 Dell 720xd system, each containing 16 cores, 96GB RAM, and 36TB of storage, creating a 192 core and 432TB virtual compute and storage fabric.<br />
<br />
Additionaly hardware investment by the School of Public Health's Section on Statistical Genetics added three 384GB large memory nodes and an additional 48 cores to the QDR Infiniband fabric.<br />
<br />
=== 2013 ===<br />
<br />
In 2013, UAB IT Research Computing acquired an [http://blogs.uabgrid.uab.edu/jpr/2013/03/were-going-with-openstack/ OpenStack cloud and Ceph storage software fabric] through a partnership between Dell and Inktank in order to [http://dev.uabgrid.uab.edu extend cloud computing solutions] to the researchers at UAB and enhance the interfacing capabilities for HPC.<br />
<br />
=== 2015 === <br />
<br />
UAB IT received $500,000 from the university’s Mission Support Fund for a compute cluster seed expansion of 48 teraflops. This added 936 cores across 40 nodes with 2x12 core 2.5 GHz Intel Xeon E5-2680 v3 compute nodes and FDR InfiniBand interconnect.<br />
<br />
UAB received a $500,000 grant from the Alabama Innovation Fund for a three petabyte research storage array. This funding with additional matching from UAB provided a multi-petabyte [https://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS] parallel file system to the cluster which went live in 2016.<br />
<br />
=== 2016 ===<br />
<br />
In 2016 UAB IT Research computing received additional funding from Deans of CAS, Engineering, and Public Heath to grow the compute capacity provided by the prior year's seed funding. This added an additional compute nodes providing researchers at UAB with a 96 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. More information can be found at [[Resources]]. <br />
<br />
In addition to the compute, the GPFS six petabyte file system came online. This file system, provided each user five terabyte of personal space, additional space for shared projects and a greatly expanded scratch storage all in a single file system.<br />
<br />
The 2015 and 2016 investments combined to provide a completely new core for the Cheaha cluster, allowing the retirement of earlier compute generations.<br />
<br />
== Grant and Publication Resources ==<br />
<br />
The following description may prove useful in summarizing the services available via Cheaha. If you are using Cheaha for grant funded research please send information about your grant (funding source and grant number), a statement of intent for the research project and a list of the applications you are using to UAB IT Research Computing. If you are using Cheaha for exploratory research, please send a similar note on your research interest. Finally, any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below. Please note, your acknowledgment may also need to include an addition statement acknowledging grant-funded hardware. We also ask that you send any references to publications based on your use of Cheaha compute resources.<br />
<br />
=== Description of Cheaha for Grants ===<br />
<br />
UAB IT Research Computing maintains high performance compute and storage resources for investigators. The Cheaha compute cluster provides 2800 conventional CPU cores and 80 accelerators interconnected via InfiniBand network and provides 468 TFLOP/s of aggregate theoretical peak performance. A high-performance, 6.6PB raw GPFS storage on DDN SFA12KX hardware is also connected to these compute nodes via the Infiniband fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.<br />
<br />
=== Acknowledgment in Publications ===<br />
<br />
This work was supported in part by the National Science Foundation under Grants Nos. OAC-1541310, the University of Alabama at Birmingham, and the Alabama Innovation Fund. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Alabama at Birmingham.<br />
<br />
== System Profile ==<br />
<br />
=== Performance ===<br />
{{CheahaTflops}}<br />
<br />
=== Hardware ===<br />
<br />
The Cheaha Compute Platform includes three generations of commodity compute hardware, totaling 868 compute cores, 2.8TB of RAM, and over 200TB of storage.<br />
<br />
The hardware is grouped into generations designated gen1, gen2, and gen3 (oldest to newest). The following descriptions highlight the hardware profile for each generation. <br />
<br />
* Generation 1 (gen1) -- 64 2-CPU AMD 1.6 GHz compute nodes with Gigabit interconnect. This is the original hardware collection purchased with NSF EPSCoR funds in 2005, approx $150K. These nodes are sometimes called the "Verari" nodes. These nodes are tagged as "verari-compute-#-#" in the ROCKS naming convention.<br />
* Generation 2 (gen2) -- 24 2x4 core (196 cores total) Intel 3.0 GHz Intel compute nodes with dual data rate Infiniband interconnect and the initial high-perf storage implementation using 60TB DDN. This is the hardware collection purchased exclusively with the annual VPIT funds allocation, approx $150K/yr for the 2008 and 2009 fiscal years. These nodes are sometimes confusingly called "cheaha2" or "cheaha" nodes. These nodes are tagged as "cheaha-compute-#-#" in the ROCKS naming convention. <br />
* Generation 3 (gen3) -- 48 2x6 core (576 cores total) 2.66 GHz Intel compute nodes with quad data rate Infiniband, ScaleMP, and the high-perf storage build-out for capacity and redundancy with 120TB DDN. This is the hardware collection purchased with a combination of the NIH SIG funds and some of the 2010 annual VPIT investment. These nodes were given the code name "sipsey" and tagged as such in the node naming for the queue system. These nodes are tagged as "sipsey-compute-#-#" in the ROCKS naming convention. 16 of the gen3 nodes (sipsey-compute-0-1 thru sipsey-compute-0-16) were upgraded in 2014 from 48GB to 96GB of memory per node. <br />
* Generation 4 (gen4) -- 3 16 core (48 cores total) compute nodes. This hardware collection purchase by [http://www.soph.uab.edu/ssg/people/tiwari Dr. Hemant Tiwari of SSG]. These nodes were given the code name "ssg" and tagged as such in the node naming for the queue system. These nodes are tagged as "ssg-compute-0-#" in the ROCKS naming convention. <br />
* Generation 6 (gen6) -- <br />
** 44 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 128GB DDR4 RAM, FDR InfiniBand and 10GigE network cards (4 nodes with NVIDIA K80 GPUs and 4 nodes with Intel Xeon Phi 7120P accelerators)<br />
** 38 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 256GB DDR4 RAM, FDR InfiniBand and 10GigE network cards<br />
** 14 Compute Nodes with two 12 core processors (Intel Xeon E5-2680 v3 2.5GHz) with 384GB DDR4 RAM, FDR InfiniBand and 10GigE network card<br />
** FDR InfiniBand Switch<br />
** 10Gigabit Ethernet Switch<br />
** Management node and gigabit switch for cluster management<br />
** Bright Advanced Cluster Management software licenses <br />
<br />
Summarized, Cheaha's compute pool includes:<br />
* gen4 is 48 cores of [http://ark.intel.com/products/64583/Intel-Xeon-Processor-E5-2680-20M-Cache-2_70-GHz-8_00-GTs-Intel-QPI 2.70GHz eight-core Intel Xeon E5-2680 processors] with 24G of RAM per core or 384GB total<br />
* gen3 is 192 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 8Gb RAM per core or 96GB total<br />
* gen3 is 384 cores of [http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI?q=x5650 2.67GHz six-core Intel Xeon X5650 processors] with 4Gb RAM per core or 48GB total<br />
* gen2 is 192 cores of [http://ark.intel.com/products/33083/Intel-Xeon-Processor-E5450-12M-Cache-3_00-GHz-1333-MHz-FSB 3.0GHz quad-core Intel Xeon E5450 processors] with 2Gb RAM per core<br />
* gen1 is 100 cores of 1.6GhZ AMD Opteron 242 processors with 1Gb RAM per core <br />
<br />
{|border="1" cellpadding="2" cellspacing="0"<br />
|+ Physical Nodes<br />
|- bgcolor=grey<br />
!gen!!queue!!#nodes!!cores/node!!RAM/node<br />
|-<br />
|gen6|| default || 44 || 24 || 128G<br />
|-<br />
|gen6|| default || 38 || 24 || 256G<br />
|-<br />
|gen6|| default || 14 || 24 || 384G<br />
|-<br />
|gen5||Ceph/OpenStack|| 12 || 20 || 96G<br />
|-<br />
|gen4||ssg||3||16||385G<br />
|-<br />
|gen3||sipsey||16||12||96G<br />
|-<br />
|gen3||sipsey||32||12||48G<br />
|-<br />
|gen2||cheaha||24||8||16G<br />
|}<br />
<br />
=== Software ===<br />
<br />
Details of the software available on Cheaha can be found on the [https://docs.uabgrid.uab.edu/wiki/Cheaha_Software Installed software page], an overview follows.<br />
<br />
Cheaha uses [http://modules.sourceforge.net/ Environment Modules] to support account configuration. Please follow these [http://me.eng.uab.edu/wiki/index.php?title=Cheaha#Environment_Modules specific steps for using environment modules].<br />
<br />
Cheaha's software stack is built with the [http://www.brightcomputing.com Bright Cluster Manager]. Cheaha's operating system is CentOS with the following major cluster components:<br />
* BrightCM 7.2<br />
* CentOS 7.2 x86_64<br />
* [[Slurm]] 15.08<br />
<br />
A brief summary of the some of the available computational software and tools available includes:<br />
* Amber<br />
* FFTW<br />
* Gromacs<br />
* GSL<br />
* NAMD<br />
* VMD<br />
* Intel Compilers<br />
* GNU Compilers<br />
* Java<br />
* R<br />
* OpenMPI<br />
* MATLAB<br />
<br />
=== Network ===<br />
<br />
Cheaha is connected to the UAB Research Network which provides a dedicated 10Gbs networking backplane between clusters located in the 936 data center and the campus network core. Data transfers rates of almost 8Gbps between these hosts have been demonstrated using Grid FTP, a multi-channel file transfer service that is used to move data between clusters as part of the job management operations. This performance promises very efficient job management and the seamless integration of other clusters as connectivity to the research network is expanded.<br />
<br />
=== Benchmarks ===<br />
<br />
The continuous resource improvement process involves collecting benchmarks of the system. One of the measures of greatest interest to users of the system are benchmarks of specific application codes. The following benchmarks have been performed on the system and will be further expanded as additional benchmarks are performed.<br />
<br />
* [[Cheaha-BGL_Comparison|Cheaha-BGL Comparison]]<br />
<br />
* [[Gromacs_Benchmark|Gromacs]]<br />
<br />
* [[NAMD_Benchmarks|NAMD]]<br />
<br />
=== Cluster Usage Statistics ===<br />
<br />
Cheaha uses Bright Cluster Manager to report cluster performance data. This information provides a helpful overview of the current and historical operating stats for Cheaha. You can access the status monitoring page [https://cheaha-master01.rc.uab.edu/userportal/ here] (accessible only on the UAB network or through VPN).<br />
<br />
== Availability ==<br />
<br />
Cheaha is a general-purpose computer resource made available to the UAB community by UAB IT. As such, it is available for legitimate research and educational needs and is governed by [http://www.uabgrid.uab.edu/aup UAB's Acceptable Use Policy (AUP)] for computer resources. <br />
<br />
Many software packages commonly used across UAB are available via Cheaha.<br />
<br />
To request access to Cheaha, please send a request to [mailto:support@vo.uabgrid.uab.edu send a request] to the cluster support group.<br />
<br />
Cheaha's intended use implies broad access to the community, however, no guarantees are made that specific computational resources will be available to all users. Availability guarantees can only be made for reserved resources.<br />
<br />
=== Secure Shell Access ===<br />
<br />
Please configure you client secure shell software to use the official host name to access Cheaha:<br />
<br />
<pre><br />
cheaha.rc.uab.edu<br />
</pre><br />
<br />
== Scheduling Framework ==<br />
<br />
[http://slurm.schedmd.com/ Slurm] is a queue management system and stands for Simple Linux Utility for Resource Management. Slurm was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. '''[[Slurm]]''' is now the primary job manager on Cheaha, it replaces SUN Grid Engine (SGE) the job manager used earlier.<br />
<br />
Slurm is similar in many ways to GridEngine or most other queue systems. You write a batch script then submit it to the queue manager (scheduler). The queue manager then schedules your job to run on the queue (or '''partition''' in Slurm parlance) that you designate. Below we will provide an outline of how to submit jobs to Slurm, how Slurm decides when to schedule your job, and how to monitor progress.<br />
<br />
== Support ==<br />
<br />
Operational support for Cheaha is provided by the Research Computing group in UAB IT. For questions regarding the operational status of Cheaha please send your request to [mailto:support@vo.uabgrid.uab.edu support@vo.uabgrid.uab.edu]. As a user of Cheaha you will automatically by subscribed to the hpc-announce email list. This subscription is mandatory for all users of Cheaha. It is our way of communicating important information regarding Cheaha to you. The traffic on this list is restricted to official communication and has a very low volume.<br />
<br />
We have limited capacity, however, to support non-operational issue like "How do I write a job script" or "How do I compile a program". For such requests, you may find it more fruitful to send your questions to the hpc-users email list and request help from our peers in the HPC community at UAB. As with all mailing lists, please observe [http://lifehacker.com/5473859/basic-etiquette-for-email-lists-and-forums common mailing etiquette].<br />
<br />
Finally, please remember that as you learned about HPC from others it becomes part of your responsibilty to help others on their quest. You should update this documentation or respond to mailing list requests of others. <br />
<br />
You can subscribe to hpc-users by sending an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=subscribe%20hpc-users sympa@vo.uabgrid.uab.edu with the subject ''subscribe hpc-users''].<br />
<br />
You can unsubribe from hpc-users by sending an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=unsubscribe%20hpc-users sympa@vo.uabgrid.uab.edu with the subject ''unsubscribe hpc-users''].<br />
<br />
You can review archives of the list in the [http://vo.uabgrid.uab.edu/sympa/arc/hpc-users web hpc-archives].<br />
<br />
If you need help using the list service please send an email to:<br />
<br />
[mailto:sympa@vo.uabgrid.uab.edu?subject=help sympa@vo.uabgrid.uab.edu with the subject ''help'']<br />
<br />
If you have questions about the operation of the list itself, please send an email to the owners of the list:<br />
<br />
[mailto:hpc-users-request@vo.uabgrid.uab.edu sympa@vo.uabgrid.uab.edu with a subject relavent to your issue with the list]<br />
<br />
If you are interested in contributing to the enhancement of HPC features at UAB or would like to talk to other cluster administrators, [mailto:sympa@vo.uabgrid.uab.edu?subject=subscribe%20hpc-dev please join the hpc developers community at UAB].</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=UAB_Research_Computing_Day&diff=5638UAB Research Computing Day2017-10-10T16:57:57Z<p>Tanthony@uab.edu: Undo revision 5636 by Tanthony@uab.edu (talk)</p>
<hr />
<div>Research Computing Day is a dialog within the UAB research community about leveraging the power of computers to grow the depth of our investigation into the nature of the world that surrounds us. The annual event welcomes discussions on science, engineering, the arts and humanities focused on the drive to open new research frontiers with advances in technology.<br />
<br />
Whether computers are used to increase the accuracy of a model, interpret the ever-growing stream of data from new image collections and instruments, or engage with peers around the globe, UAB’s status as a leading research community depends on the ability to incorporate these capabilities into the research process. By participating in the dialog of Research Computing Day at UAB, researchers can share how they are using these methods to enhance their research, gain new insights from peers, and contribute their voices to the growth of research at UAB.<br />
<br />
== Research Computing Day 2017 ==<br />
<br />
[[RCDay2017|Research Computing Day 2017]] will be held October 13, 2017 from 10:30am to 4:00pm at the Hill Student Center, Alumni Theater.<br />
<br />
== Background ==<br />
<br />
Since 2007, The [http://www.uab.edu/it Office of the Vice President for Information Technology] has sponsored an annual dialog on the role of technology in research. These events joined UAB with [https://www.nsf.gov/awardsearch/showAward?AWD_ID=0956272 national dialogs on the role of Cyberinfrastructure in research] held at campuses across the country.<br />
<br />
== Previous UAB Research Computing Days ==<br />
<br />
* 2007 -- Co-hosted along with the [http://asc.edu ASA] site visit, providing an overview of new services and upcoming launce of the UABgrid pilot. (No web record)<br />
* 2008 -- Focus on grid computing and collaboration technologies, in particular the caBIG program with guest speakers from Booz Allen Hamilton who managed the NCI caBIG program and SURA (agenda currently offline) <br />
* 2010 -- Featured introduction to Galaxy platform for genetic sequencing by Dell staff scientist (agenda currently offline)<br />
* [[2011]] -- Understanding growth of research computing support at peer institutions UNC and Emory <br />
* [[2012]] -- Growing data sciences at UAB<br />
* [[2013]] -- OpenStack at UAB<br />
* [[RCDay2016|2016]] -- HPC Expansion<br />
<br />
[[Category:RCDay]]</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=UAB_Research_Computing_Day&diff=5636UAB Research Computing Day2017-10-10T16:51:33Z<p>Tanthony@uab.edu: Added RC Day 2017</p>
<hr />
<div>Research Computing Day is a dialog within the UAB research community about leveraging the power of computers to grow the depth of our investigation into the nature of the world that surrounds us. The annual event welcomes discussions on science, engineering, the arts and humanities focused on the drive to open new research frontiers with advances in technology.<br />
<br />
Whether computers are used to increase the accuracy of a model, interpret the ever-growing stream of data from new image collections and instruments, or engage with peers around the globe, UAB’s status as a leading research community depends on the ability to incorporate these capabilities into the research process. By participating in the dialog of Research Computing Day at UAB, researchers can share how they are using these methods to enhance their research, gain new insights from peers, and contribute their voices to the growth of research at UAB.<br />
<br />
== Research Computing Day 2017 ==<br />
<br />
[[RCDay2017|Research Computing Day 2017]] will be held October 13, 2017 from 10:30am to 4:00pm at the Hill Student Center, Alumni Theater.<br />
<br />
== Background ==<br />
<br />
Since 2007, The [http://www.uab.edu/it Office of the Vice President for Information Technology] has sponsored an annual dialog on the role of technology in research. These events joined UAB with [https://www.nsf.gov/awardsearch/showAward?AWD_ID=0956272 national dialogs on the role of Cyberinfrastructure in research] held at campuses across the country.<br />
<br />
== Previous UAB Research Computing Days ==<br />
<br />
* 2007 -- Co-hosted along with the [http://asc.edu ASA] site visit, providing an overview of new services and upcoming launce of the UABgrid pilot. (No web record)<br />
* 2008 -- Focus on grid computing and collaboration technologies, in particular the caBIG program with guest speakers from Booz Allen Hamilton who managed the NCI caBIG program and SURA (agenda currently offline) <br />
* 2010 -- Featured introduction to Galaxy platform for genetic sequencing by Dell staff scientist (agenda currently offline)<br />
* [[2011]] -- Understanding growth of research computing support at peer institutions UNC and Emory <br />
* [[2012]] -- Growing data sciences at UAB<br />
* [[2013]] -- OpenStack at UAB<br />
* [[RCDay2016|2016]] -- HPC Expansion<br />
* [[RCDay2017|2017]] -- GPU Computing<br />
<br />
[[Category:RCDay]]</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Setting_Up_VNC_Session&diff=5581Setting Up VNC Session2017-07-31T21:01:50Z<p>Tanthony@uab.edu: /* Configure the Cluster Desktop */</p>
<hr />
<div>[[wikipedia:Virtual_Network_Computing|Virtual Network Computing (VNC)]] is a cross-platform desktop sharing system to interact with a remote system's desktop using a graphical interface. This page covers basic instructions to access a desktop on [[Cheaha]] using VNC. These basic instructions support a variety of use-cases where access to graphical applications on the cluster is helpful or required. If you are interested in knowing more options or detailed technical information, then please take a look at man pages of specified commands.<br />
<br />
== One Time Setup ==<br />
VNC use on Cheaha requires a one-time-setup to configure settings to starting the virtual desktop. These instructions will configure the VNC server to use the Gnome desktop environment, the default desktop environment on the cluster. (Alternatively, you can run the vncserver command without this configure and and start a very basic (but harder to use) desktop environment.) To get started [[Cheaha_GettingStarted#Login | log in to cheaha via ssh.]]<br />
<br />
=== Set VNC Session Password ===<br />
You must maintain a password for your VNC server sessions using the vncpasswd command. The password is validated each time a connection comes in, so it can be changed on the fly using vncpasswd command anytime later. '''Remember this password as you will be prompted for it when you access your cluster desktop'''. By default, the command stores an obfuscated version of the password in the file $HOME/.vnc/passwd.<br />
<br />
<pre><br />
$ vncpasswd <br />
</pre><br />
<br />
=== Configure the Cluster Desktop ===<br />
The vncserver command relies on a configuration script to start your virtual desktop environment. The [[wikipedia:GNOME|GNOME]] desktop provides a familiar desktop experience and can be selected by creating the following vncserver startup script (~/.vnc/xstartup).<br />
<br />
<pre><br />
mkdir $HOME/.vnc<br />
<br />
cat > $HOME/.vnc/xstartup <<\EOF<br />
<br />
#!/bin/sh<br />
<br />
# Start up the standard system desktop<br />
unset SESSION_MANAGER<br />
unset DBUS_SESSION_BUS_ADDRESS<br />
<br />
#exec /etc/X11/xinit/xinitrc<br />
/usr/bin/mate-session<br />
<br />
[ -x /etc/vnc/xstartup ] && exec /etc/vnc/xstartup<br />
[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources<br />
xsetroot -solid grey<br />
vncconfig -iconic &<br />
x-terminal-emulator -geometry 80x24+10+10 -ls -title "$VNCDESKTOP Desktop" &<br />
x-window-manager &<br />
<br />
EOF<br />
<br />
chmod +x $HOME/.vnc/xstartup<br />
</pre><br />
<br />
By default a VNC server displays graphical environment using a tab-window-manager. If the above xstartup file is absent, then a file with the default tab-window-manager settings will be created by the vncserver command during startup. If you want to switch to the GNOME desktop, simply replace this default file with the settings above. <br />
<br />
This completes the one-time setup on the cluster for creating a VNC server password and selecting the preferred desktop environment.<br />
<br />
=== Select a VNC Client ===<br />
You will also need a VNC client on your personal desktop in order to remotely access your cluster desktop. <br />
<br />
Mac OS comes with a native VNC client so you don't need to use any third-party software. Chicken of the VNC is a popular alternative on Mac OS to the native VNC client, especially for older Mac OS, pre-10.7.<br />
<br />
Most Linux systems have the VNC software installed so you can simply use the vncviewer command to access a VNC server. <br />
<br />
If you use MS Windows then you will need to install a VNC client. Here is a list of VNC client softwares and you can any one of it to access VNC server. <br />
* http://www.tightvnc.com/ (Mac, Linux and Windows)<br />
* http://www.realvnc.com/ (Mac, Linux and Windows)<br />
* http://sourceforge.net/projects/cotvnc/ (Mac)<br />
<br />
== Start your VNC Desktop == <br />
Your VNC desktop must be started before you can connect to it. To start the VNC desktop you need to log into cheaha using an [[Cheaha_GettingStarted#Login|standard SSH connection]]. The VNC server is started by executing the vncserver command after you log in to cheaha. It will run in the background and continue running even after you log out of the SSH session that was used to run the vncserver command.<br />
<br />
To start the VNC desktop run the vncserver command. You will see a short message like the following from the vncserver before it goes into the background. You will need this information to connect to your desktop.<br />
<pre><br />
$ vncserver <br />
New 'login001:24 (blazer)' desktop is login001:24<br />
<br />
Starting applications specified in /home/blazer/.vnc/xstartup<br />
Log file is /home/blazer/.vnc/login001:24.log<br />
</pre><br />
<br />
The above command output indicates that a VNC server is started on VNC X-display number 24, which translates to system port 5924. The vncserver automatically selects this port from a list of available ports.<br />
<br />
The actual system port on which VNC server is listening for connections is obtained by adding a VNC base port (default: port 5900) and a VNC X-display number (24 in above case). Alternatively you can specify a high numbered system port directly (e.g. 5927) using '-rfbport <port-number>' option and the vncserver will try to use it if it's available. See vncserver's man page for details.<br />
<br />
Please note that the vncserver will continue to run in the backgound on the head node until it is explicitly stopped. This allows you to reconnect to the same desktop session without having to first start the vncserver, leaving all your desktop applications active. When you no longer need your desktop, simply log out of your desktop using the desktop's log out menu option or by explicitly ending the vncserver command with the 'vncserver -kill ' command.<br />
<br />
=== Alternate Cluster Desktop Sizes ===<br />
The default size of your cluster desktop is 1024x768 pixels. If you want to start your desktop with an alternate geometry to match your application, personal desktop environment, or other preferences, simply add a "-geometry hieghtxwidth" argument to your vncserver command. For example, if you want a wide screen geometry popular with laptops, you might start the VNC server with:<br />
<pre><br />
vncserver -geometry 1280x800<br />
</pre><br />
<br />
== Establish a Network Connection to your VNC Server ==<br />
<br />
As indicated in the output from the vncserver command, the VNC desktop is listening for connections on a higher numbered port. This port isn't directly accessible from the internet. Hence, we need to use SSH local port forwarding to connect to this server.<br />
<br />
This SSH session provides the connection to your VNC desktop and must remain active while you use the desktop. You can disconnect and reconnect to your desktop by establishing this SSH session whenever you need to access your desktop. In other words, your desktop remains active across your connections to it. This supports a mobile work environment.<br />
<br />
=== Port-forwarding from Linux or Mac Systems ===<br />
Set up SSH port forwarding using the native SSH command. <br />
<pre><br />
# ssh -L <local-port>:<remote-system-host>:<remote-system-port> USERID@<SSH-server-host><br />
$ ssh -L 5924:localhost:5924 USERID@cheaha.rc.uab.edu<br />
</pre><br />
Above command will forward connections on local port 5924 to a remote system's (same as SSH server host Cheaha - hence localhost) port 5924.<br />
<br />
=== Port-forwarding from Windows Systems ===<br />
Windows users need to establish the connection using whatever SSH software they commonly use. The following is an example configuration using Putty client on Windows.<br />
<br />
[[File:Putty-SSH-Tunnel.png]]<br />
<br />
== Access your Cluster Desktop ==<br />
<br />
With the network connection to the VNC server established, you can access your cluster desktop using your preferred VNC client. When you access your cluster desktop you will be prompted for the VNC password you created during the one time setup above.<br />
<br />
The VNC client will actually connect to your local machine, eg. "localhost", because it relies on the SSH port forwarding to connect to the VNC server on the cluster. You do this because you have already created the real connection to Cheaha using the SSH tunnel. The SSH tunnel "listens" on your local host and forwards all of your VNC traffic across the network to your VNC server on the cluster.<br />
<br />
You can access the VNC server using the following connection scenarios based on your personal desktop environment.<br />
<br />
==== From Mac ====<br />
<br />
'''For Mac OSX 10.8 and higher'''<br />
Mac users can use the default VNC client and start it from Finder. Press '''cmd+k''' to bring up the "connect to server" window. Enter the following connection string in Finder: <br />
<pre>vnc://localhost:5924 </pre><br />
The connection string pattern is "vnc://<vnc-server>:<vnc-port>". Adjust your port setting for the specific value of your cluster desktop given when you run vncserver above.<br />
<br />
'''For Mac OSX 10.7 and lower'''<br />
Download and install Chicken of the VNC from [http://sourceforge.net/projects/cotvnc/| sourceforge].<br />
Start COTVNC and enter the following in the host window and provide the VNC password you created during set up when prompted:<br />
<pre>localhost:5924</pre><br />
<br />
<br />
==== From Linux ====<br />
Linux users can use the command<br />
<pre><br />
vncviewer :24 <br />
</pre><br />
<br />
===== Shortcut for Linux Users =====<br />
Linux users can optionally skip the explicit SSH tunnel setup described above by using the -via argument to the vncviewer command. The "-via <gateway>" will set up the SSH tunnel implicitly. For the above example, the following command would be used:<br />
<pre><br />
vncviewer -via cheaha.rc.uab.edu :24<br />
</pre><br />
This option is preferred since it will also establish VNC settings that are more efficient for slow networks. See the man page for vncviewer for details on other encodings.<br />
<br />
==== From Windows ====<br />
Windows users should use whatever connection string is applicable to their VNC client. <br />
<br />
Remember to use "localhost" as the host address in your VNC client. You do this because you have already created the real connection to Cheaha using the SSH tunnel. The SSH tunnel "listens" on your local host and forwards all of your VNC traffic across the network to your VNC server on the cluster.<br />
<br />
== Using your Desktop ==<br />
Once we have a VNC session established with Gnome desktop environment, we can use it to launch any graphical application on Cheaha or use it to open GUI (X11) supported SSH session with a remote system in the cluster. <br />
<br />
VNC can be particularly useful when we are trying to access and X Windows application from MS Windows, as native X11 setup on Windows is typically more involved than the VNC setup above. For example, it's much easier to start X11 based SSH session with the remote system on the cluster from above Gnome desktop environment than doing all X11 setup on Windows.<br />
<pre> <br />
$ ssh -X $USER@172.x.x.x<br />
</pre><br />
<br />
=== Performance Considerations for Slow Networks ===<br />
<br />
If the network you are using to connect to your VNC session is slow (eg. wifi or off campus), you may be able to improve the responsiveness of the VNC session by adjusting simple desktop settings in your VNC desktop. The VNC screen needs to be repainted every time your desktop is modified, eg. opening or moving a window. Any bit of data you don't have to send will improve the drawing speed. Most modern desktops default to a pretty picture. While nice to look at these pictures contain lots data. If you set your desktop background to a solid color (no gradients) the screen refresh will be much quicker (see System->Preferences->Desktop Background). Also, if you change to a basic windowing theme it will also speed screen refreshes (see System->Preferences->Themes->Mist).</div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=Template:Main_Banner&diff=5580Template:Main Banner2017-07-31T21:01:13Z<p>Tanthony@uab.edu: VNC issue</p>
<hr />
<div><!-- MAIN PAGE BANNER --><br />
<table id="mp-banner" style="width: 100%; margin:4px 0 0 0; background:none; border-spacing: 0px;"><br />
<tr><td class="MainPageBG" style="text-align:center; padding:0.2em; background-color:#cef2e0; border:2px solid #f2e0ce; color:#000; font-size:100%;"><br />
<br />
<span style="color:#009000"> '''<big></big>''' </span><br />
<br />
[[Image:information.png|left|link=]]<br />
<span> ''' <big> Cheaha Summer Maintenance 2017 Complete</big> <br />
<br />
'''Maintenance window July 23rd - July 29th''' <br /><br />
For more information click [https://docs.uabgrid.uab.edu/wiki/Maintenance Maintenance Information] <br />
<br />
<big> [https://docs.uabgrid.uab.edu/wiki/VNC_issue VNC Black Screen Issue Resolution] </big><br />
</span><br />
<br />
<br />
</td><br />
</tr><br />
</table></div>Tanthony@uab.eduhttps://docs.uabgrid.uab.edu/w/index.php?title=VNC_issue&diff=5579VNC issue2017-07-31T20:59:28Z<p>Tanthony@uab.edu: VNC page issue</p>
<hr />
<div>VNC Black screen issue<br />
<br />
We work around it by changing your default desktop from GNOME to MATE.<br />
<br />
1. Terminate any existing vncserver processes (use "vncserver -list" to list any current sessions, and "vncserver -kill :3" to terminate the session, in this example it is terminating the vncserver using port 3, aka 5903) <br />
<br />
2. Modify your "$HOME/.vnc/xstartup" script to look like the following (we've also updated the wiki https://docs.uabgrid.uab.edu/wiki/Setting_Up_VNC_Session#Configure_the_Cluster_Desktop): <br />
<br />
<pre><br />
#!/bin/sh <br />
<br />
# Start up the standard system desktop <br />
unset SESSION_MANAGER <br />
unset DBUS_SESSION_BUS_ADDRESS <br />
<br />
#exec /etc/X11/xinit/xinitrc <br />
/usr/bin/mate-session <br />
<br />
[ -x /etc/vnc/xstartup ] && exec /etc/vnc/xstartup <br />
[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources <br />
xsetroot -solid grey <br />
vncconfig -iconic & <br />
x-terminal-emulator -geometry 80x24+10+10 -ls -title "$VNCDESKTOP Desktop" & <br />
x-window-manager & <br />
</pre><br />
<br />
3. Start a new "vncserver" session <br />
<br />
4. Create your SSH tunnel <br />
<br />
5. Connect using the VNC client of choice</div>Tanthony@uab.edu