Resources

From Cheaha
Jump to navigation Jump to search


Attention: Research Computing Documentation has Moved
https://docs.rc.uab.edu/


Please use the new documentation url https://docs.rc.uab.edu/ for all Research Computing documentation needs.


As a result of this move, we have deprecated use of this wiki for documentation. We are providing read-only access to the content to facilitate migration of bookmarks and to serve as an historical record. All content updates should be made at the new documentation site. The original wiki will not receive further updates.

Thank you,

The Research Computing Team

The Cyberinfrastructure supporting UAB investigators includes high performance computing clusters, high-speed storage systems, campus, state-wide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment.

Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing Services group (UAB ITRCS) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high-performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in the UAB IT Data Centers in the 936 Building and the RUST Computer Center. UAB ITRCS in open collaboration with community members is leading the design and development of these resources. UAB IT’s Infrastructure Services group provides operational support and maintenance of these resources.

A description of the facilities available to UAB researchers are described below. If you would like an account on the HPC system, please submit a request and provide a short statement on your intended use of the resources and your affiliation with the university.

UAB High Performance Computing (HPC) Clusters

Compute Resources

The compute resources are organized into a unified Research Computing System. The compute fabric for this system is anchored by the Cheaha cluster, a commodity cluster with several generations of hardware with approximately 3000 cores connected by low-latency Fourteen Data Rate (FDR) and Quad Data Rate (QDR) InfiniBand networks.

The different hardware generations are summarized in the following table and include:

  • Gen6: 96 2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with FDR InfiniBand interconnect. Out of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs.
  • Gen5: 12 2x8 core (192 cores total) 2.0 GHz Intel Xeon E2650 nodes with 96GB RAM per node and 10 Gbps interconnect dedicated to OpenStack and Ceph. (Sponsored by UAB IT)
  • Gen4: 3 2x8 core (48 cores total) 2.70 GHz Intel Xeon compute nodes with 384GB RAM per node (24GB per core), QDR InfiniBand interconnect. (Sponsored by Section on Statistical Genetics, School of Public Health )
  • Gen3: 48 2x6 core (576 cores total) 2.66 GHz Intel Xeon compute nodes with 48GB RAM per node (4GB per core), QDR InfiniBand interconnect. (Supported by NIH grant S10RR026723-01)
  • Gen2: 24 2x4 (192 cores total) Intel 3.0 GHz Intel Xeon compute nodes with 16GB RAM per node (2GB per core), DDR InfiniBand interconnect. (Sponsored by UAB IT) [set to be decommissioned by December 2016]
  • Gen1: 60 2-core (120 cores total) AMD 1.6GHz Opteron 64-bit compute nodes with 2GB RAM per node (1GB per core), and Gigabit Ethernet connectivity between the nodes. Gen1 decomissioned June 2013.
Generation Type Nodes CPUs per Node Cores Per CPU Total Cores Clock Speed (GHz) Instructions Per Cycle Hardware Reference
Gen 6 Intel Xeon E5-2680 v3 96 2 12 2304 2.50 16 Intel Xeon E5-2680 v3
Gen 7†† Intel Xeon E5-2680 v4 18 2 14 504 2.40 16 Intel Xeon E5-2680 v4
Gen 8 Intel Xeon E5-2680 v4 35 2 12 840 2.50 16 Intel Xeon E5-2680 v3
Gen 9 Intel Xeon Gold 6248R 52 2 24 2496 3.0 16 3.0GHz Intel Xeon Gold 6248R
Theoretical Peak Flops = (number of cores) * (clock speed) * (instructions per cycle)
Generation Theoretical Peak Tera-FLOPS
Gen 6 110
Gen 7†† 358
Gen 8 TBD
Gen 9 TBD

Includes four Intel Xeon Phi 7210 accelerators and four NVIDIA K80 GPUs.
†† Includes 72 NVIDIA Tesla P100 16GB GPUs.

Storage Resources

The compute nodes are backed by 180TB of high performance Lustre storage on a DDN hardware and an additional 20TB available for home directories on a traditional Hitachi SAN and other ancillary services. In Fall 2013, UAB IT Research Computing acquired an OpenStack cloud and Ceph storage software fabric through a partnership between Dell and Inktank in order to extend cloud-computing solutions to the researchers at UAB and enhance the interfacing capabilities for HPC. This storage system provides an aggregate of half-petabytes of raw storage that is distributed across 12 compute nodes each with node having 16 cores, 96GB RAM, and 36TB of storage and connected together with a 10Gigabit Ethernet networking. During 2016, as part of the Alabama Innovation Fund grant working in partnership with numerous departments, 6.6PB raw GPFS storage on DDN SFA12KX hardware was added to meet the growing data needs of UAB researchers.

Network Resources

Research Network

UAB 10GigE Research Network The UAB Research Network is currently a dedicated 10GE optical connection between the UAB Shared HPC Facility and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System, which leverage the network for connecting storage and compute hosting resources. The network supports direct connection to high-bandwidth regional networks and the capability to connect data intensive research facilities directly with the high performance computing services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone ensures predictable latencies.

Campus Network

Campus High Speed Network Connectivity The campus network backbone is based on a 10 gigabit redundant Ethernet network with 480 gigabit/second backplanes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using gigabit Ethernet links over single mode optical fiber. Within multi-floor buildings, a gigabit Ethernet building backbone over multimode optical fiber is used and Category 5 or better, unshielded twisted pair wiring connects desktops to the network. Computer server clusters are connected to the building entrance using Gigabit Ethernet. Desktops are connected at 100 megabits/second speed (gigabit available when needed). The campus wireless network blankets classrooms, common areas and most academic office buildings.

Regional Networks

Off-campus Network Connections UAB connects to the Internet2 and National LambdaRail (NLR) high-speed research networks via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 10G Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center utilizing Gigabit Ethernet speeds. The UASRON is currently being upgraded to support 100Gbps connections. UAB is also connected to other universities and schools through AREN (Alabama Research and Education Network). Connection to the commodity Internet is via Gigabit Ethernet, of which UAB currently uses approximately 1.2 Giga-bits-per-second (Gbps).

UAB was recently awarded the NSF CC*DNI Networking Infrastructure grant (CC-NIE-1541310) to establish a dedicated high-speed research network (UAB Science DMZ) that establishes a 40Gbps networking core and provides researchers at UAB with 10Gbps connections from selected computers to the shared computational facility.

Regional and National Resources

Alabama Supercomputing Center (ASC)

Alabama Supercomputer Center (ASC) (http://www.asc.edu) is a State-wide resource located in Hunstville, Alabama. The ASC provides UAB investigators with access to a variety of high performance computing resources. These resources include:

  • The SGI UV (ULTRAVIOLET) has 256 Xeon E5-4640 CPU cores operating at 2.4 GHz and 4 TB of shared memory, and 182 terabytes in the GPFS storage cluster.
  • A Dense Memory Cluster (DMC) HPC system has 2216 CPU cores and 16 terabytes of distributed memory. Each compute node has a local disk (up to 1.9 terabytes of which are accessible as /tmp). Also attached to the DMC is a high performance GPFS storage cluster, which has 45 terabytes of high performance storage accessible as /scratch from each node. Home directories as well as third party applications use a separate GPFS volume and share 137 terabytes of storage. The machine is physically configured as a cluster of 8, 16, or 20 CPU core SMP boards. Ninety-six nodes have 2.26 GHz Intel quad-core Nehalem processors and 24 gigabytes of memory. Forty nodes have 2.3 GHz AMD 8-core Opteron Magny-Cours processors and 128 gigabytes of memory. Forty nodes have 2.5 GHz Intel 10-core Xeon Ivy Bridge processors and 128 gigabytes of memory.
  • A large number of software packages are installed supporting a variety of analyses including programs for Computational Structural Analysis, Design Analysis, Quantum Chemistry, Molecular Mechanics/Dynamics, Crystallography, Fluid Dynamics, Statistics, Visualization, and Bioinformatics.

Open Science Grid

UAB is a member of the SURAgrid Virtual Organization (SGVO)_ on the Open Science Grid (OSG) (http://opensciencegrid.org) This is a national compute network consists of nearly 80,000 compute cores aggregated across national facilities and contributing member sites. The OSG provides operational support for the interconnection middleware and facilities research and operational engagement between members.