Resources: Difference between revisions

From Cheaha
Jump to navigation Jump to search
(→‎UAB High Performance Computing (HPC) Clusters: Added info on Gen4 nodes and decomission of gen1 nodes)
(→‎UAB High Performance Computing (HPC) Clusters: Fix minor spelling and description errors)
Line 11: Line 11:
=== UAB High Performance Computing (HPC) Clusters ===
=== UAB High Performance Computing (HPC) Clusters ===


The shared facility compute resources for are organized into a Research Computing System. The compute fabric for this system is anchored by the [[Cheaha]] cluster, a commodity cluster with three generations of hardware with a total of 896 cores connected by low-latency quad data rate (QDR) and dual data rate (DDR) Infiniband networks.  
The shared facility compute resources are organized into a unified Research Computing System. The compute fabric for this system is anchored by the [[Cheaha]] cluster, a commodity cluster with three generations of hardware with a total of 816 cores connected by low-latency quad data rate (QDR) and dual data rate (DDR) Infiniband networks.  


The Cheaha cluster is rated at more than 6 Teraflops computing capacity. The three hardware generations are summarized in the following table and include:38
The Cheaha cluster is rated at more than 6 Teraflops computing capacity. The three hardware generations are summarized in the following table and include:38
* Gen4: 3 2x8 core (48 cores total) 2.70 GHz Intel Xeon compute nodes with 384GB RAM per node (24GB per core), QDR Infiniband access to a high-perf Lustre file system running on a 240TB (180TB usable) DDN disk array. (Sponsored by School of Public Health Section on Statistical Genetics)
* Gen4: 3 2x8 core (48 cores total) 2.70 GHz Intel Xeon compute nodes with 384GB RAM per node (24GB per core), QDR Infiniband access to a high-perf Lustre file system running on a 240TB (180TB usable) DDN disk array. (Sponsored by School of Public Health Section on Statistical Genetics)
* Gen3: 48 2x6 core (576 cores total) 2.66 GHz Intel Xeon compute nodes with 48GB RAM per node (4GB per core), quad data rate Infiniband, ScaleMP, and Infiniband access to a high-perf Lustre running  on 180TB DDN disk array. (Supported by NIH grant S10RR026723-01.)
* Gen3: 48 2x6 core (576 cores total) 2.66 GHz Intel Xeon compute nodes with 48GB RAM per node (4GB per core), QDR Infiniband, ScaleMP, and Infiniband access to a high-perf Lustre running  on 180TB DDN disk array. (Supported by NIH grant S10RR026723-01.)
* Gen2: 24 2x4 (192 cores total) Intel 3.0 GHz Intel Xeon compute nodes with 16GB RAM per node (2GB per core), dual data rate Infiniband interconnect and Infiniband access to a high-perf Lustre running  on 180TB DDN disk array. (Sponsored by UAB IT)  
* Gen2: 24 2x4 (192 cores total) Intel 3.0 GHz Intel Xeon compute nodes with 16GB RAM per node (2GB per core), DDR Infiniband interconnect and Infiniband access to a high-perf Lustre running  on 180TB DDN disk array. (Sponsored by UAB IT)  
* <strike>Gen1: 60 2-core (120 cores total) AMD 1.6GHz Opteron 64-bit compute nodes with 2GB RAM per node (1GB per core), and Gigabit Ethernet connectivity between the nodes and to the high-perf Lustre file system running on a 180TB DDN disk array.</strike> Gen1 decomissioned June 2013.
* <strike>Gen1: 60 2-core (120 cores total) AMD 1.6GHz Opteron 64-bit compute nodes with 2GB RAM per node (1GB per core), and Gigabit Ethernet connectivity between the nodes and to the high-perf Lustre file system running on a 180TB DDN disk array.</strike> Gen1 decomissioned June 2013.



Revision as of 16:36, 14 August 2013

The Cyberinfrastructure supporting UAB investigators includes high performance computing clusters, campus, state-wide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment. A description of the facilities available to UAB researchers are described below.

In summary, three general access compute fabrics are available for data analysis and HPC to researchers on campus via the Cheaha cluster. If you would like an account on Cheaha, please submit a request and provide a short statement on your intended use of the resources and your affiliation with the university. Other compute resources may be available to research groups who maintain their own facilities. Additionally, the Alabama Supercomputing Authority provides state wide access to compute clusters.

Compute Resources

UAB Shared High Performance Computing (HPC) Facility

The Shared HPC Facility provides conditioned data center space to house UAB's general access compute fabrics. This data center is housed in the School of Engineering and is a joint UAB IT and multi-school operation. The HPC equipment in the facility is supported by extramural and institutional funds through a number of collaborations including the Schools of Public Health, Medicine, and Engineering. The operations provide UAB with a shared software and hardware infrastructure along with staff support for the high performance parallel and distributed computing, numerical tools and information technology components.

UAB High Performance Computing (HPC) Clusters

The shared facility compute resources are organized into a unified Research Computing System. The compute fabric for this system is anchored by the Cheaha cluster, a commodity cluster with three generations of hardware with a total of 816 cores connected by low-latency quad data rate (QDR) and dual data rate (DDR) Infiniband networks.

The Cheaha cluster is rated at more than 6 Teraflops computing capacity. The three hardware generations are summarized in the following table and include:38

  • Gen4: 3 2x8 core (48 cores total) 2.70 GHz Intel Xeon compute nodes with 384GB RAM per node (24GB per core), QDR Infiniband access to a high-perf Lustre file system running on a 240TB (180TB usable) DDN disk array. (Sponsored by School of Public Health Section on Statistical Genetics)
  • Gen3: 48 2x6 core (576 cores total) 2.66 GHz Intel Xeon compute nodes with 48GB RAM per node (4GB per core), QDR Infiniband, ScaleMP, and Infiniband access to a high-perf Lustre running on 180TB DDN disk array. (Supported by NIH grant S10RR026723-01.)
  • Gen2: 24 2x4 (192 cores total) Intel 3.0 GHz Intel Xeon compute nodes with 16GB RAM per node (2GB per core), DDR Infiniband interconnect and Infiniband access to a high-perf Lustre running on 180TB DDN disk array. (Sponsored by UAB IT)
  • Gen1: 60 2-core (120 cores total) AMD 1.6GHz Opteron 64-bit compute nodes with 2GB RAM per node (1GB per core), and Gigabit Ethernet connectivity between the nodes and to the high-perf Lustre file system running on a 180TB DDN disk array. Gen1 decomissioned June 2013.
Generation Type Nodes CPUs per Node Cores Per CPU Total Cores Clock Speed (GHz) Instructions Per Cycle Hardware Reference
Gen 6 Intel Xeon E5-2680 v3 96 2 12 2304 2.50 16 Intel Xeon E5-2680 v3
Gen 7†† Intel Xeon E5-2680 v4 18 2 14 504 2.40 16 Intel Xeon E5-2680 v4
Gen 8 Intel Xeon E5-2680 v4 35 2 12 840 2.50 16 Intel Xeon E5-2680 v3
Gen 9 Intel Xeon Gold 6248R 52 2 24 2496 3.0 16 3.0GHz Intel Xeon Gold 6248R
Theoretical Peak Flops = (number of cores) * (clock speed) * (instructions per cycle)
Generation Theoretical Peak Tera-FLOPS
Gen 6 110
Gen 7†† 358
Gen 8 TBD
Gen 9 TBD

Includes four Intel Xeon Phi 7210 accelerators and four NVIDIA K80 GPUs.
†† Includes 72 NVIDIA Tesla P100 16GB GPUs.

Regional and National Resources

Alabama Supercomputing Center (ASC)

Alabama Supercomputer Center (ASC) (http://www.asc.edu) is a State-wide resource located in Hunstville, Alabama. The ASC provides UAB investigators with access to a variety of high performance computing resources. These resources include:

  • An SGI Altix Cluster has 162 CPU cores, 1340 GB of shared memory, and 19 terabytes in the Panasas file system. Each CPU is a 64 bit Intel Itanium 2 processor. The system consists of a SGI Altix 350 front end node with 1.4 GHz processors and Altix 450 nodes with dual core 1.6 GHz and 9.67 GHz processors. This gives the entire system a floating point performance of 1035 GigaFLOPS. Sets of from 6 to 72 CPUs are grouped together into shared memory nodes. There are multiple networks connecting the processors. These include: NUMAlink for sharing memory, Infiniband for file system access, gigabit ethernet for internet connectivity, and a secondary ethernet connection as a redundant fail over and management network.
  • A Dense Memory Cluster (DMC) HPC system has 1800 CPU cores and 10 terabytes of distributed memory. Each compute node has a local disk (up to 1.9 terabytes of which are accessible as /tmp). Also attached to the DMC is a high performance Panasas file server, which has 17 terabytes of high performance storage accessible as /scratch from each node. Home directories as well as third party applications use a separate Panasas Filesystem and share 47 terabytes of storage. The machine is physically configured as a set of 8 or 16 CPU core SMP boards. Forty nodes have 2.3 GHz quad-core AMD Opterons and 64 gigabytes of memory. Ninety-six nodes have 2.26 GHz Intel quad-core Nehalem processors. Forty nodes have 2.3 GHz AMD 8-core Opteron processors and 128 gigabytes of memory. The DMC has sixteen GPU (Graphic Processing Unit) chips. These are a combination of: two Tesla S1070 units (external GPUs connected in pairs to four DMC nodes); four DMC nodes configured with a pair of Tesla M2070 cards each. These multicore GPU chips are similar to those in video cards, but are installed as math coprocessors.
  • A large number of software packages are installed supporting a variety of analyses including programs for Computational Structural Analysis, Design Analysis, Quantum Chemistry, Molecular Mechanics/Dynamics, Crystallography, Fluid Dynamics, Statistics, Visualization, and Bioinformatics.

Open Science Grid

UAB is a member of the SURAgrid Virtual Organization (SGVO)_ on the Open Science Grid (OSG) (http://opensciencegrid.org) This is a national compute network consists of nearly 80,000 compute cores aggregated across national facilities and contributing member sites. The OSG provides operational support for the interconnection middleware and facilities research and operational engagement between members. UAB Network Infrastructure

Network Resources

Research Network

UAB 10GigE Research Network The UAB Research Network is currently a dedicated 10GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverage the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone ensures predictable latencies.

Campus Network

Campus High Speed Network Connectivity The campus network backbone is based on a 10 gigabit redundant Ethernet network with 480 gigabit/second backplanes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using gigabit Ethernet links over single mode optical fiber. Within multi-floor buildings, a gigabit Ethernet building backbone over multimode optical fiber is used and Category 5 or better, unshielded twisted pair wiring connects desktops to the network. Computer server clusters are connected to the building entrance using Gigabit Ethernet. Desktops are connected at 100 megabits/second speed (gigabit available when needed). The campus wireless network blankets classrooms, common areas and most academic office buildings.

Regional Networks

Off-campus Network Connections UAB connects to the Internet2 and National LambdaRail (NLR) high-speed research networks via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 10G Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center utilizing Gigabit Ethernet speeds. UAB is also connected to other universities and schools through AREN (Alabama Research and Education Network). Connection to the commodity Internet is via Gigabit Ethernet, of which UAB currently uses approximately 1.2 Giga-bits-per-second (Gbps).