Resources

From UABgrid Documentation
(Difference between revisions)
Jump to: navigation, search
(Compute Resources: added gen 10 short text)
(Moved legacy, added compute/container stubs)
Line 7: Line 7:
 
== UAB High Performance Computing (HPC) Clusters ==
 
== UAB High Performance Computing (HPC) Clusters ==
  
=== Compute Resources ===
+
=== HPC (Batch) Compute Resources ===
  
 
The current compute fabric for this system is anchored by the [[Cheaha]] HPC cluster, a commodity cluster with 6144 cores connected by low-latency Fourteen Data Rate (FDR) and Enhanced Data Rate (EDR) InfiniBand networks.  
 
The current compute fabric for this system is anchored by the [[Cheaha]] HPC cluster, a commodity cluster with 6144 cores connected by low-latency Fourteen Data Rate (FDR) and Enhanced Data Rate (EDR) InfiniBand networks.  
Line 26: Line 26:
 
{{Partition}}
 
{{Partition}}
  
==== Legacy Compute ====
+
=== Cloud Compute Resources ===
  
* Gen5: 12 2x8 core (192 cores total) 2.0 GHz Intel Xeon E2650 nodes with 96GB RAM per node and 10 Gbps interconnect dedicated to OpenStack and Ceph (supported by UAB IT, 2012).
+
Coming soon.
* Gen4: 3 2x8 core (48 cores total) 2.70 GHz Intel Xeon compute nodes with 384GB RAM per node (24GB per core), QDR InfiniBand interconnect (supported by Section on Statistical Genetics, School of Public Health, 2012). This hardware collection was purchased by [https://scholars.uab.edu/display/htiwari Dr. Hemant Tiwari of SSG]. These nodes were given the code name "ssg" and were tagged as such in the node naming for the queue system. These nodes were tagged as "ssg-compute-0-#" in the ROCKS naming convention.
+
 
* Gen3: 48 2x6 core (576 cores total) 2.66 GHz Intel Xeon compute nodes with 48GB RAM per node (4GB per core), QDR InfiniBand interconnect (supported by NIH grant S10RR026723-01, 2010). This hardware collection was purchased with a combination of the NIH SIG funds and some of the 2010 annual VPIT investment. These nodes were given the code name "sipsey" and were tagged as such in the node naming for the queue system. These nodes were tagged as "sipsey-compute-#-#" in the ROCKS naming convention. In 2014, 16 of the gen3 nodes (sipsey-compute-0-1 thru sipsey-compute-0-16) were upgraded from 48GB to 96GB of memory per node.
+
=== Container Compute Resources ===
* Gen2: 24 2x4 (192 cores total) Intel 3.0 GHz Intel Xeon compute nodes with 16GB RAM per node (2GB per core), DDR InfiniBand interconnect (supported by UAB IT, 2008). This hardware collection was purchased exclusively with the annual VPIT funds allocation, approx $150K/yr for the 2008 and 2009 fiscal years.  These nodes were sometimes confusingly called "cheaha2" or "cheaha" nodes. These nodes were tagged as "cheaha-compute-#-#" in the ROCKS naming convention.
+
 
* Gen1: 60 2-core (120 cores total) AMD 1.6GHz Opteron 64-bit compute nodes with 2GB RAM per node (1GB per core), and Gigabit Ethernet connectivity between the nodes (supported by Alabama EPSCoR Research Infrastructure Initiative, NSF EPS-0091853, 2005). This was the original hardware collection purchased with NSF EPSCoR funds in 2005, approx $150K. These nodes were sometimes called the "Verari" nodes. These nodes were tagged as "verari-compute-#-#" in the ROCKS naming convention.
+
Coming soon.
  
 
=== Storage Resources ===
 
=== Storage Resources ===
Line 58: Line 58:
  
 
UAB was awarded the NSF CC*DNI Networking Infrastructure grant ([http://www.nsf.gov/awardsearch/showAward?AWD_ID=1541310 CC-NIE-1541310]) in Fall 2016 to establish a dedicated high-speed research network (UAB Science DMZ) that establishes a 40Gbps networking core and provides researchers at UAB with 10Gbps connections from selected computers to the shared computational facility.
 
UAB was awarded the NSF CC*DNI Networking Infrastructure grant ([http://www.nsf.gov/awardsearch/showAward?AWD_ID=1541310 CC-NIE-1541310]) in Fall 2016 to establish a dedicated high-speed research network (UAB Science DMZ) that establishes a 40Gbps networking core and provides researchers at UAB with 10Gbps connections from selected computers to the shared computational facility.
 +
 +
=== Legacy Compute ===
 +
 +
* Gen5: 12 2x8 core (192 cores total) 2.0 GHz Intel Xeon E2650 nodes with 96GB RAM per node and 10 Gbps interconnect dedicated to OpenStack and Ceph (supported by UAB IT, 2012).
 +
* Gen4: 3 2x8 core (48 cores total) 2.70 GHz Intel Xeon compute nodes with 384GB RAM per node (24GB per core), QDR InfiniBand interconnect (supported by Section on Statistical Genetics, School of Public Health, 2012). This hardware collection was purchased by [https://scholars.uab.edu/display/htiwari Dr. Hemant Tiwari of SSG]. These nodes were given the code name "ssg" and were tagged as such in the node naming for the queue system. These nodes were tagged as "ssg-compute-0-#" in the ROCKS naming convention.
 +
* Gen3: 48 2x6 core (576 cores total) 2.66 GHz Intel Xeon compute nodes with 48GB RAM per node (4GB per core), QDR InfiniBand interconnect (supported by NIH grant S10RR026723-01, 2010). This hardware collection was purchased with a combination of the NIH SIG funds and some of the 2010 annual VPIT investment. These nodes were given the code name "sipsey" and were tagged as such in the node naming for the queue system. These nodes were tagged as "sipsey-compute-#-#" in the ROCKS naming convention. In 2014, 16 of the gen3 nodes (sipsey-compute-0-1 thru sipsey-compute-0-16) were upgraded from 48GB to 96GB of memory per node.
 +
* Gen2: 24 2x4 (192 cores total) Intel 3.0 GHz Intel Xeon compute nodes with 16GB RAM per node (2GB per core), DDR InfiniBand interconnect (supported by UAB IT, 2008). This hardware collection was purchased exclusively with the annual VPIT funds allocation, approx $150K/yr for the 2008 and 2009 fiscal years.  These nodes were sometimes confusingly called "cheaha2" or "cheaha" nodes. These nodes were tagged as "cheaha-compute-#-#" in the ROCKS naming convention.
 +
* Gen1: 60 2-core (120 cores total) AMD 1.6GHz Opteron 64-bit compute nodes with 2GB RAM per node (1GB per core), and Gigabit Ethernet connectivity between the nodes (supported by Alabama EPSCoR Research Infrastructure Initiative, NSF EPS-0091853, 2005). This was the original hardware collection purchased with NSF EPSCoR funds in 2005, approx $150K. These nodes were sometimes called the "Verari" nodes. These nodes were tagged as "verari-compute-#-#" in the ROCKS naming convention.

Revision as of 14:54, 18 August 2021

The Cyberinfrastructure supporting UAB investigators includes high performance computing clusters, high-speed storage systems, campus, state-wide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment.

Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (RC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center as well as an expansion to commercial facilities at DC BLOX in Birmingham. Research Computing, in open collaboration with community members, is leading the design and development of these resources.

A description of the facilities available to UAB researchers are included below. If you would like an account on the HPC system, please To get started using Cheaha, simply visit our Open OnDemand portal at https://rc.uab.edu. This is the primary entry point for Cheaha and provides access to all cluster services directly from your web browser, including graphical desktops, Jupyter Notebooks, and even the traditional command-line.

If you don't already have an account, you will be prompted to create one the first time you log into the portal. If you are creating an account, please share some of your interests in using Cheaha as this help us understand the science interests of our users.

Please note: Usage of Cheaha is governed by UAB's Acceptable Use Policy (AUP) for computer resources. and provide a short statement on your intended use of the resources and your affiliation with the university. For more information on support requests please see Support.

Contents

UAB High Performance Computing (HPC) Clusters

HPC (Batch) Compute Resources

The current compute fabric for this system is anchored by the Cheaha HPC cluster, a commodity cluster with 6144 cores connected by low-latency Fourteen Data Rate (FDR) and Enhanced Data Rate (EDR) InfiniBand networks.

A description of the different hardware generations are summarized in the following table:

  • Gen10: (planned Sep 2021) 34 nodes with 2x64 core (4352 cores totals) 2.0 GHz AMD Epyc 7713 Milan each with 512GB RAM.
  • Gen9: 52 nodes with EDR InfiniBand interconnect: 2x24 core (2496 cores total) 3.0GHz Intel Xeon Gold 6248R compute nodes each with 192GB RAM.
  • Gen8: 35 2x12 core (840 cores total) 2.60GHz Intel Xeon Gold 6126 compute nodes with 21 compute nodes at 192GB RAM, 10 nodes at 768GB RAM and 4 nodes at 1.5TB of RAM
  • Gen7: 18 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs, and EDR InfiniBand interconnect (supported by UAB, 2017).
  • Gen6: 96 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs (supported by UAB, 2015/2016).
Generation Compute Type Partition Total Cpus Total Memory Gb Total Gpus Cores Per Node Memory Per Node Gb Nodes Cpu Info Gpu Info
6 cpu cpu 336 5376 24 384 14 Intel Xeon E5-2680 v3 2.50 GHz
6 cpu cpu 912 9728 24 256 38 Intel Xeon E5-2680 v3 2.50 GHz
6 cpu cpu 1056 5632 24 128 44 Intel Xeon E5-2680 v3 2.50 GHz
7 gpu pascalnodes 504 4608 72 28 256 18 Intel Xeon E5-2680 v4 2.40 GHz NVIDIA Tesla P100 16 GB
8 cpu cpu 504 4032 24 192 21 Intel Xeon E5-2680 v4 2.50 GHz
8 high memory largemem 240 7680 24 768 10 Intel Xeon E5-2680 v4 2.50 GHz
8 high memory largemem 96 6144 24 1536 4 Intel Xeon E5-2680 v4 2.50 GHz
9 cpu cpu 2496 9984 48 192 52 Intel Xeon Gold 6248R 3.00 GHz
10 cpu cpu 4352 17408 128 512 34 AMD Epyc 7713 2.00 GHz
TOTAL 10496 70592 72 235

Theoretical Peak Flops = (number of cores) * (clock speed) * (instructions per cycle)

Generation Cpu Tflops Per Node Gpu Tflops Per Node Tflops Per Node Nodes Tflops
6 0.96 0.96 14 13.44
6 0.96 0.96 38 36.48
6 0.96 0.96 44 42.24
7 1.08 17.06 18.14 18 326.43
8 0.96 0.96 21 20.16
8 0.96 0.96 10 9.60
8 0.96 0.96 4 3.84
9 2.30 2.30 52 119.81
10 4.10 4.10 34 139.26
TOTAL 711.26

Compare compute types to SLURM partitions below.

Name Compute Type Time Limit Hr Node Limit
interactive cpu 2 1
express cpu 2 no limit
short cpu 12 44
medium cpu 50 44
long cpu 150 5
pascalnodes gpu 12 no limit
pascalnodes-medium gpu 48 no limit
largemem high memory 50 10

Cloud Compute Resources

Coming soon.

Container Compute Resources

Coming soon.

Storage Resources

In 2016, as part of the Alabama Innovation Fund grant working in partnership with numerous departments, 6.6PB raw GPFS storage on DDN SFA12KX hardware was added to meet the growing data needs of UAB researchers. In Fall 2018, UAB IT Research Computing upgraded the 6PB GPFS storage backend with the next generation DDN SFA14KX. This hardware improved HPC performance by increasing the speed at which research application can access their data sets. In 2019, the SFA12KX was moved to the RUST data center and act as a replicate pair for the /data file system on the SFA14KX in 936.

Retired Storage Resources

In 2009, annual investment funds were directed toward establishing a fully connected dual data rate Infiniband network between the compute nodes added in 2008 and laying the foundation for a research storage system with a 60TB DDN storage system accessed via the Lustre distributed file system. In 2010, UAB was awarded an NIH Small Instrumentation Grant (SIG) to further increase analytical and storage capacity by an additional 120TB of high performance Lustre storage on a DDN hardware (retired in 2016). In Fall 2013, UAB IT Research Computing acquired an OpenStack cloud and Ceph storage software fabric through a partnership between Dell and Inktank in order to extend cloud-computing solutions to the researchers at UAB and enhance the interfacing capabilities for HPC. This storage system provides an aggregate of half-petabytes of raw storage that is distributed across 12 compute nodes each with node having 16 cores, 96GB RAM, and 36TB of storage and connected together with a 10Gigabit Ethernet networking (pilot implementation retired in Spring 2017)).

Network Resources

Research Network

UAB Research Network The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility in 936 and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System (RCS). This network is being upgraded in 2021 to replace aging equipment and extend service to the DC BLOX data center. The new network provides a 200Gbs Ethernet backbone for East-West traffic for connecting storage and compute hosting resources. The network supports direct connection to campus and high-bandwidth regional networks via 40Gbps Globus Data Transfer Nodes (DTNs) providing the capability to connect data intensive research facilities directly with the high performance computing and storage services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone, ensuring predictable latencies. The Science DMZ interface with (DTNs) includes Perfsonar measurement nodes and a Bro security node connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.


Campus Network

Campus High Speed Network Connectivity The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second backplanes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using gigabit Ethernet links over single mode optical fiber. Within multi-floor buildings, a gigabit Ethernet building backbone over multimode optical fiber is used and Category 5 or better, unshielded twisted pair wiring connects desktops to the network. Computer server clusters are connected to the building entrance using Gigabit Ethernet. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.

Regional Networks

Off-campus Network Connections UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).

Historical Network Developments

UAB was awarded the NSF CC*DNI Networking Infrastructure grant (CC-NIE-1541310) in Fall 2016 to establish a dedicated high-speed research network (UAB Science DMZ) that establishes a 40Gbps networking core and provides researchers at UAB with 10Gbps connections from selected computers to the shared computational facility.

Legacy Compute

  • Gen5: 12 2x8 core (192 cores total) 2.0 GHz Intel Xeon E2650 nodes with 96GB RAM per node and 10 Gbps interconnect dedicated to OpenStack and Ceph (supported by UAB IT, 2012).
  • Gen4: 3 2x8 core (48 cores total) 2.70 GHz Intel Xeon compute nodes with 384GB RAM per node (24GB per core), QDR InfiniBand interconnect (supported by Section on Statistical Genetics, School of Public Health, 2012). This hardware collection was purchased by Dr. Hemant Tiwari of SSG. These nodes were given the code name "ssg" and were tagged as such in the node naming for the queue system. These nodes were tagged as "ssg-compute-0-#" in the ROCKS naming convention.
  • Gen3: 48 2x6 core (576 cores total) 2.66 GHz Intel Xeon compute nodes with 48GB RAM per node (4GB per core), QDR InfiniBand interconnect (supported by NIH grant S10RR026723-01, 2010). This hardware collection was purchased with a combination of the NIH SIG funds and some of the 2010 annual VPIT investment. These nodes were given the code name "sipsey" and were tagged as such in the node naming for the queue system. These nodes were tagged as "sipsey-compute-#-#" in the ROCKS naming convention. In 2014, 16 of the gen3 nodes (sipsey-compute-0-1 thru sipsey-compute-0-16) were upgraded from 48GB to 96GB of memory per node.
  • Gen2: 24 2x4 (192 cores total) Intel 3.0 GHz Intel Xeon compute nodes with 16GB RAM per node (2GB per core), DDR InfiniBand interconnect (supported by UAB IT, 2008). This hardware collection was purchased exclusively with the annual VPIT funds allocation, approx $150K/yr for the 2008 and 2009 fiscal years. These nodes were sometimes confusingly called "cheaha2" or "cheaha" nodes. These nodes were tagged as "cheaha-compute-#-#" in the ROCKS naming convention.
  • Gen1: 60 2-core (120 cores total) AMD 1.6GHz Opteron 64-bit compute nodes with 2GB RAM per node (1GB per core), and Gigabit Ethernet connectivity between the nodes (supported by Alabama EPSCoR Research Infrastructure Initiative, NSF EPS-0091853, 2005). This was the original hardware collection purchased with NSF EPSCoR funds in 2005, approx $150K. These nodes were sometimes called the "Verari" nodes. These nodes were tagged as "verari-compute-#-#" in the ROCKS naming convention.
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox