Cheaha:Community Portal
HPC Services Plans
Mission
HPC Services is the division within the IT Infrastructure Services organization with a focus on HPC support for research and other HPC activities. HPC Services support includes HPC Cluster Support, Networking & Infrastructure, Middleware, and Academic Research Support. By Research, it is meant specifically to assist or collaborate with grant activities that require IT resources. In addition, it may also include acquiring and managing high performance computing resources, such as Beowulf clusters and network storage arrays. HPC Services participates in institutional strategic planning and self-study as related to academic IT. HPC Services represents the Office of Vice-President of Information Technology to IT-related academic campus committees, regional / national technology research organizations and/or committees as requested.
Note: The term HPC is used to mean high performance computing, which has many definitions available on the web. At UAB, HPC generally refer to “computational facilities substantially more powerful than current desktops computers (PCs and workstations) …by an order of magnitude or better.” See http://parallel.hpc.unsw.edu.au/rks/docs/hpc-intro/node3.html for more description of this usage of HPC.
HPC Project Five Year Plan as of Summer 2006
As a result of discussions between IT, CIS, and ETL to determine the best methods and associated costs to interconnect HPC clusters in campus buildings BEC and CH, a preliminary draft of scope and five year plan for HPC at UAB was prepared. In order to ensure growth and stability of IT support for research computing and to obtain wide support for academic researchers for a workable model the mission of IT Academic Computing has been revised and merged into a more focused unit within IT Network & Infrastructure Services under the name of HPC Services, which is the division within the IT Infrastructure Services. See Office of VP of IT Organization Chart.
- Scope: Building upon the exiting UAB HPC resources in CIS and ETL, IT and campus researchers are setting a goal to establish a UAB HPC data center, whose operations will be managed by IT Infrastructure and which will include additional machine room space designed for HPC and equipped with a new cluster. The UAB HPC Data Center and HPC resource will be used by researchers throughout UAB, the UAS system, and other State of Alabama Universities and research entities in conjunction with the Alabama Supercomputer Authority. Oversight of the UAB HPC resources will be provided by a committee made up of UAB Deans, Department Heads, Faculty, and the VPIT. Daily administration of this shared resource will be provided by the Department of Network and Infrastructure Services.
- Integrate the design, construction, and staffing of an HPC Data Center with overall IT plans.
- Secure funding for a new xxxxTeraFlop HPC Cluster. For example, HPCS will continue working with campus researchers in submitting proposals.
- Preliminary Timeline
- FY2007: Rename Academic Computing, HPCS, and merge HPCS with Network and Infrastructure, to leverage the HPC related talents, and resources of both organizations.
- FY2007: Connect existing HPC Clusters to each other and 10Gig backbone.
- FY2007: Bring up pilot grid identity management system – GridShib (HPCS, Network/Services)
- FY2007: Enable Grid Meta Scheduling (HPCS, CIS, ETL)
- FY2007: Establish Grid connectivity with SURA, UAS, and, ASA.
- FY2007: Develop shared HPC resource policies.
- FY2008: Increase support staff as needed by reassigning legacy Mainframe technical resources
- FY2008: Develop requirements for expansion or replacement of older HPC’s. xxxxTeraFlops.
- FY2008: Using HPC requirements (xxxx TeraFlops) for Data Center Design, begin design of HPC Data Center.
- FY2009: Secure Funding for new HPC Cluster xxxxTera Flops
- FY2010: Complete HPC Data Center Infrastructure.
- FY2010: Secure final funding for expansion or replacement of older HPC’s.
- FY2011: Procure and deploy new HPC cluster. xxxxTeraFlops.
HPC Services Goals and Accomplishments for FY2007
Goals for FY2007
- GOAL 1: UAB Grid Computing Project
- Bring up pilot of grid identity management based on using GridShib software which incorporate Shibboleth in the core grid software Globus;
- Enable a grid meta-scheduling capability in collaboration with CIS and ETL so that UAB users will see a single interface for submission of HPC jobs running on primary clusters in ETL and CIS;
- Explore expanding the campus model for HPC to other campuses of UA System and to the Alabama Supercomputing Center.
- GOAL 2: InCommon / Shibboleth Project
- Work with Infrastructure and Network Services to coordinate new and expanding campus applications using Shibboleth;
- Evaluate establishing a second pilot Shibboleth application with other members of InCommon;
- Establish UAB grid as a UAB application offered to InCommon members; and
- Evaluate establishing pilot Shibboleth applications as an advanced technology demonstration of capabilities for inter-institutional user authentication and authorization for access to common workspace supporting calendar, document sharing, data sharing, and communication technologies for desktop.
- GOAL 3: Participation in External IT Groups within Alabama, Region and US, such as, UA System Collaborative Technology activities, Alabama Regional Optical Network, Internet2, SURA grid, EDUCAUSE, Global Grid Forum, and Super-Computing
Accomplishments for FY2007
- GOAL 1: UAB Grid Computing Project
- Bring up pilot of grid identity management based on using GridShib software which incorporate Shibboleth in the core grid software Globus;
- IdM equipment order and operational, May 9, 2007
- GridShib installed - May 25, 2007
- UABgrid Login sevice operational – June 19, http://uabgrid.uab.edu/login
- UABgrid VO management service operational - target July 1
- UABgrid GridShib CA migration operational - target July 17
- Enable a grid meta-scheduling capability in collaboration with CIS and ETL so that UAB users will see a single interface for submission of HPC jobs running on primary clusters in ETL and CIS;
- SURA talk and demonstration – The GridWay meta-scheduler and an example research application, DynamicBLAST, was demonstrated to the SURAgrid all-hands mtg in collaboration with CIS
- UABgrid meta-scheduler operation - target July 17
- UABgrid Boot Camp being scheduled for mid-August
- Explore expanding the campus model for HPC to other campuses of UA System and to the Alabama Supercomputing Center.
- Bring up pilot of grid identity management based on using GridShib software which incorporate Shibboleth in the core grid software Globus;
- GOAL 2: InCommon / Shibboleth Project
- Work with Infrastructure and Network Services to coordinate new and expanding campus applications using Shibboleth;
- Evaluate establishing a second pilot Shibboleth application with other members of InCommon;
- Establish UAB grid as a UAB application offered to InCommon members; and
- UABgrid Incommon Application draft has been circulated for reviews and comments.
- Evaluate establishing pilot Shibboleth applications as an advanced technology demonstration of capabilities for inter-institutional user authentication and authorization for access to common workspace supporting calendar, document sharing, data sharing, and communication technologies for desktop.
- This is the research collaboration focus of UABgrid
- GOAL 3: Participation in External IT Groups within Alabama, Region and US, such as, UA System Collaborative Technology activities, Alabama Regional Optical Network, Internet2, SURA grid, EDUCAUSE, Global Grid Forum, and Super-Computing
- List all meetings attended since Oct 1, 06: SC06, Internet2 Fall 06, SURAgrid All Hands (march), Internet2 Spring 07l
- SURAgrid Goverance: John-Paul Robinson has been elected to serve a one-year term on the inaugural SURAgrid GC
- SURAgrid working group: John-Paul Robinson is serving on accounting systems working group
- CI-Team proposals: David L Shealy was a senior scientist of the large collabortive proposal submitted to NSF by Texas Tech University to present 3 two day workshops on grid computing
- UAB Research Computing plans
- Developed IT CyberInfrastructure presentation for ASA campus visit on April 3, 2007
- Circulated IT research computing planning draft to the Office of VP of Research and Economic Development
Research Computing Web Pages
Campus Network
Research Network
Grid Computing
High Performance Computing
UAB Shared High Performance Computing Facility provides UAB-wide shared software and hardware infrastructure and support for the high performance parallel and distributed computing, numerical tools and information technology-based computing environments, and computational simulation to UAB researchers. The facility now a joint IT and multi-school use, supported and funded initiative initially jump started by the School of Engineering, in collaboration with the Schools of Medicine and Public Health. The current HPC combined performance of the facility is about 2.2 Teraflops. The facility is equipped with the following: • IBM BlueGene L cluster with 2048 700 MHz processors with 512 MB of memory in each. The system has 13 terabytes of storage. This cluster should benchmark at 4.5 to 5 Teraflops. • DELL Xeon 64-bit Linux Cluster (CHEAHA) which consists of 128 nodes of DELL PE1425 computer, with dual Xeon 3.6GHz processors with either 2GB or 6GB of memory per node. It uses a Gigabit Ethernet inter-node network connection. There are 4 Terabytes of disk storage available to this cluster. This cluster is rated at more than 1.1 Teraflops computing capacity. • Verari Opteron 64-bit Linux Cluster (COOSA) which is a 64-node computing cluster consisting of dual AMD Opteron 242 processors, with 2GB of memory each node. Each node is interconnected with a Gigabit Ethernet network. • IBM Linux Cluster (CAHABA) is a highly scalable Linux cluster solution for high performance and commercial computing workloads. It is constructed with IBM x335 Series with a total of 128-processor (64 nodes, dual Xeon 2.4GHz, 2 to 4GB memory each node) and 1 Terabyte storage unit. Each node is interconnected with Gigabit network. • Supermicro Xeon 32-bit Linux Cluster which is a 10-node visualization cluster consisting of Supermicro computers with dual Xeon 2.4GHz processors, 2GB of memory each node and 3-Terabytes of accumulative disk space. • DNP Holo Screen Display (60”), a transparent display which allows viewers to look at and see through the screen and makes the image appear suspended in mid-air. It gives an impression of almost-3D depth. • Passive Stereoscopic Display System (VisBox), which is a one-wall, fully integrated, projection-based VR system with head-tracking and stereo display. The screen is 10 feet diagonal, which makes it significantly more immersive than other much more expensive systems. The VisBox uses high-end LINUX PCs and bright projectors. The footprint of a VisBox is 8’x8’, and it is a few inches shy of being 8 feet tall, making it close to an 8’x8’x8’ cube. With this system, researchers can visualize their data in a stereoscopic virtual environment. This display system is a passive stereo display system in an all-in-one unit with 2 polarizing LCD projectors and 2 mirrors, precision-mounted in a custom frame. A Linux PC drives this system with a high-end dual-headed graphics card. Users wear lightweight, inexpensive polarized eyeglasses and see a stereoscopic image. • Tiled Display Wall System (VisWall) (8'x8' and 3x3 configurations) is capable of a combined screen resolution of 3000x2300 pixels. It provides researchers with a display solution to visualize data or images at an ultra-high resolution. A high-end dual-processor LINUX cluster and nVidia graphics cards are used to drive the graphics applications. This is a scalable solution, which means that we can expand the number of tiles to m x n to increase the combined resolution as the budget permits. A 10-node dual processor Linux cluster drives this nine-tile visualization wall. The software synchronizes images at the tile interface. This provides an ultra-resolution visualization capability for very large-scale images/data. A Linux PC console communicating through high-speed Myrinet network drives the VisWall. Each computer is connected to a projector that contributes 1024x768screen resolution in the overall projection area.
Off-campus Resources
- ASA/AREN
- Internet2/NLR
- Alabama RON
- SURA
=== Tools and Support