Welcome: Difference between revisions

From Cheaha
Jump to navigation Jump to search
(→‎Personnel: update system admins)
(Update url for new docs site and use consistent text for obsolete reference.)
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{Main_Banner}}
The introduction to UAB Research Computing resources has been moved to https://docs.rc.uab.edu .
Welcome to the '''Research Computing System'''


The Research Computing System (RCS) provides a framework for sharing data, accessing compute power, and collaborating with peers on campus and around the globe.  Our goal is to construct a dynamic "network of services" that you can use to organize your data, study it, and share outcomes.
The obsolete content of the original page can be found at [[Obsolete: Welcome]] for historical reference.
 
''''docs'''' (the service you are looking at while reading this text) is one of a set of core services, or libraries, available for you to organize information you gather. Docs is a wiki, an online editor to collaboratively write and share documentation. ([http://en.wikipedia.org/wiki/Wiki Wiki is a Hawaiian term] meaning fast.)  You can learn more about '''docs''' on the page [[UnderstandingDocs]].  The docs wiki is filled with pages that document the many different services and applications available on the Research Computing System.  If you see information that looks out of date please don't hesitate to [mailto:support@vo.uabgrid.uab.edu ask about it] or fix it.
 
The Research Computing System is designed to provide services to researchers in three core areas:
 
* '''Data Analysis''' - using the High Performance Computing (HPC) fabric we call [[Cheaha]] for analyzing data and running simulations. Many [[Cheaha_Software|applications are already available]] or you can install your own
* '''Data Sharing''' - supporting the trusted exchange of information using virtual data containers to spark new ideas
* '''Application Development''' - providing virtual machines and web-hosted development tools empowering you to serve others with your research
 
== Support and Development ==
 
The Research Computing System is developed and supported by UAB IT's Research Computing Group.  We are also developing a core set of applications to help you to easily incorporate our services into your research processes and this documentation collection to help you leverage the resources already available. We follow the best practices of the Open Source community and develop the RCS openly.  You can follow our progress via the [http://dev.uabgrid.uab.edu our development wiki].
 
The Research Computing System is an out growth of the UABgrid pilot, launched in September 2007 which has focused on demonstrating the utility of unlimited analysis, storage, and application for research.  RCS is being built on the same technology foundations used by major cloud vendors and decades of distributed systems computing research, technology that powered the last ten years of large scale systems serving prominent national and international initiatives like the [http://opensciencegrid.org/ Open Science Grid], [http://xsede.org XSEDE], [http://www.teragrid.org/ TeraGrid], the [http://lcg.web.cern.ch/LCG/ LHC Computing Grid], and [https://cabig.nci.nih.gov caBIG].
 
== Outreach ==
 
The UAB IT Research Computing Group has collaborated with a number of prominent research projects at UAB to identify use cases and develop the requirements for the RCS.  Our collaborators include the Center for Clinical and Translational Science (CCTS), Heflin Genomics Center, the Comprehensive Cancer Center (CCC), the Department of Computer and Information Sciences (CIS), the Department of Mechanical Engineering (ME), Lister Hill Library, the School of Optometry's Center for the Development of Functional Imaging, and Health System Information Services (HSIS).
 
As part of the process of building this research computing platform, the UAB IT Research Computing Group has hosted an annual campus symposium on research computing and cyber-infrastructure (CI) developments and accomplishments. Starting as CyberInfrastructure (CI) Days in 2007, the name was changed to [http://docs.uabgrid.uab.edu/wiki/UAB_Research_Computing_Day '''UAB Research Computing Day'''] in 2011 to reflect the broader mission to support research.  IT Research Computing also participates in other campus wide symposiums including UAB Research Core Day.
 
== Featured Research Applications ==
 
The Research Computing Group also helps support the campus MATLAB license with self-service installation documentation and supports using MATLAB on the HPC platform, providing a pathway to expand your computational power and freeing your laptop from serving as a compute platform.
 
{{abox
| UAB MATLAB Information |
In January 2011, UAB acquired a site license from Mathworks for MATLAB, SimuLink and 42 Toolboxes. 
* Learn more about [[MATLAB|MATLAB and how you can use it at UAB]]
* Learn more about the [[UAB TAH license|UAB Mathworks Site license]] and review [[Matlab site license FAQ|frequently asked questions about the license]]
}}
 
The UAB IT Research Computing group, the CCTS BMI, and [http://www.uab.edu/hcgs/bioinformatics Heflin Center for Genomic Science] have teamed up to help improve genomic research at UAB.  Researchers can work with the scientists and research experts to produce a research pipeline from sequencing, to analysis, to publication.
 
{{abox
|'''Galaxy'''|
A web front end to run analyses on the cluster fabric. Currently focused on NGS (Next Generation Sequencing; biology) analysis support.
* [[Galaxy|Galaxy Project Home]]
* [http://projects.uabgrid.uab.edu/galaxy Galaxy Development Wiki]
}}
 
== Data Backups ==
 
Users of Cheaha are solely responsible for backing up their files. This includes files under '''/data/user''', '''/data/project''', and '''/home'''.
 
{{ClusterDataBackup}}
 
== Grant and Publication Resources ==
 
The following description may prove useful in summarizing the services available via Cheaha. Any publications that rely on computations performed on Cheaha should include a statement acknowledging the use of UAB Research Computing facilities in your research, see the suggested example below.  We also request that you send us a list of publications based on your use of Cheaha resources.
 
=== Description of Cheaha for Grants (short)===
 
UAB IT Research Computing maintains high performance compute (HPC) and storage resources for investigators. The Cheaha compute cluster provides over 3744 conventional INTEL CPU cores and 80 accelerators (including 72 NVIDIA P100 GPUS's) interconnected via an EDR InfiniBand network and provides 528 TFLOP/s of aggregate theoretical peak performance. A high performance, 6.6PB raw GPFS storage on a DDN SFA14KX cluster with site replication to a DDN SFA12KX cluster, is also connected to the compute nodes via an InfiniBand fabric. An additional 20TB of traditional SAN storage is also available for home directories. This general access compute fabric is available to all UAB investigators.
 
=== Description of Cheaha for Grants (Detailed) ===
 
The Cyberinfrastructure supporting University of Alabama at Birmingham (UAB) investigators includes high performance computing clusters, storage, campus, statewide and regionally connected high-bandwidth networks, and conditioned space for hosting and operating HPC systems, research applications and network equipment.
 
==== Cheaha HPC system ====
 
Cheaha is a campus HPC resource dedicated to enhancing research computing productivity at UAB. Cheaha is managed by UAB Information Technology's Research Computing group (RC) and is available to members of the UAB community in need of increased computational capacity. Cheaha supports high performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha is composed of resources that span data centers located in two UAB campus IT data centers, in the 936 Building and the RUST Computer Center, and a commercial data center at DC BLOX in Birmingham. Research Computing, in open collaboration with the campus research community, is leading the design and development of these resources.
 
==== Compute Resources ====
 
Cheaha provides users with both a web based interface, via open OnDemand,  and a traditional command-line interactive environment, via SSH.  These interfaces provide access to many scientific tools that can leverage a dedicated pool of local compute resources via the SLURM batch scheduler. The local compute pool provides access to five generations of compute hardware based on the x86 64-bit architecture. Gen6 (2015-2016) includes 96 nodes:  2x12 core (2304 cores total) 2.5 GHz Intel Xeon E5-2680 v3 compute nodes with an FDR InfiniBand interconnect. Of the 96 compute nodes, 36 nodes have 128 GB RAM, 38 nodes have 256 GB RAM, and 14 nodes have 384 GB RAM. There are also four compute nodes with the Intel Xeon Phi 7210 accelerator cards and four compute nodes with the NVIDIA K80 GPUs. Gen7 (2017) is composed of 18 nodes: 2x14 core (504 cores total) 2.4GHz Intel Xeon E5-2680 v4 compute nodes with 256GB RAM, four NVIDIA Tesla P100 16GB GPUs per node, and an EDR InfiniBand interconnect. Gen8 (2019) is composed of 35 nodes with EDR InfiniBand interconnect: 2x12 core (840 cores total) 2.60GHz Intel Xeon Gold 6126 compute nodes with 21 compute nodes at 192GB RAM, 10 nodes at 768GB RAM and 4 nodes at 1.5TB of RAM. Gen9 (available Q2 2021) is composed of 52 nodes with EDR InfiniBand interconnect: 2x24 core (2496 cores total) 3.0GHz Intel Xeon Gold 6248R compute nodes each with 192GB RAM. The compute nodes combine to provide over 600 TFLOP/s of dedicated computing power.
 
In addition UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (OSG).
 
==== Cloud Resources ====
 
Research Computing has operated a development OpenStack cloud resource since 2019.  This platform has been used to support application development and DevOps processes to research labs across campus.  In 2021 a production implementation of this cloud platform will be made available to researchers on campus.  This fabric is composed of five Dell R640 48 core 192G RAM compute nodes for 240 cores and 960GB of standard cloud compute resources.  In addition the fabric will feature four NVIDIA DGX A100 nodes that include 8 A100 GPUs and 1TB of RAM each.  All of these resources will be available to the research community for provisioning on demand via the OpenStack services (Ussuri release).  The production implementation will further support researchers making their hosted services available beyond campus while adhering to standard campus network security practices.  This off-campus access feature has not been available via the development cloud.
 
==== Storage Resources ====
 
The compute nodes on Cheaha are backed by high performance, 6.6PB GPFS raw storage on DDN SFA14KX hardware connected via an EDR /FDR InfiniBand fabric. The non-scratch files on the GPFS cluster are replicated to 6.0PB raw storage on a DDN SFA12KX located in the RUST data center to provide site redundancy. An additional 10TB of traditional SAN storage is also available for home directories.
 
Three new storage fabrics will come on line in 2021.  All three storage fabrics are based on Ceph with different hardware configurations to address different usage scenarios.  The fabrics are a 6.9PB archive storage fabric built using 12 Dell DSS7500 nodes, an expanded 1.3PB nearline storage fabric built with 14 Dell 740xd nodes, and a 248TB SSD cache storage fabric built with 8 Dell 840 nodes.
 
==== Network Resources ====
 
The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility in 936 and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System (RCS).  This network is being upgraded in 2021 to replace aging equipment and extend service to the DC BLOX data center.  The new network provides a 200Gbs Ethernet backbone for East-West traffic for connecting storage and compute hosting resources. The network supports direct connection to campus and high-bandwidth regional networks via 40Gbps Globus Data Transfer Nodes (DTNs) providing the capability to connect data intensive research facilities directly with the high performance computing and storage services of the Research Computing System. This network can support very high speed secure connectivity between nodes connected to it for high speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone, ensuring predictable latencies. The Science DMZ interface with (DTNs) includes Perfsonar measurement nodes and a Bro security node connected directly to the border router  that provide a "friction-free" pathway to access external data repositories as well as computational resources.
 
The campus network backbone is based on a 40 gigabit redundant Ethernet network with 480 gigabit/second back-planes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using 10 Gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.
 
UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).
 
==== Personnel ====
 
UAB IT Research Computing currently maintains a support staff of 10 lead by the Assistant Vice President for Research Computing and includes an HPC Architect-Manager, four Software developers, two Scientists, two system administrators and a project coordinator.
 
=== Acknowledgment in Publications ===
 
{{Grant_Ack}}

Latest revision as of 20:11, 31 August 2022

The introduction to UAB Research Computing resources has been moved to https://docs.rc.uab.edu .

The obsolete content of the original page can be found at Obsolete: Welcome for historical reference.