UAB IT HPC Update January 2009

Cheaha, a community resource dedicated to enhancing research computing productivity at UAB, has reached a significant milestone that incorporates several incremental updates. With the acquisition of new hardware in August of last year and the addition of grid computing technologies, Cheaha is now able to harness an expanded compute pool of local and remote resources.

This milestone reflects the collaboration of many individuals and includes contributions of expertise from many organizations, most notably, Computer and Information Sciences, Mechanical Engineering, the Section on Statistical Genetics, and the Alabama Supercomputing Authority.

These updates to Cheaha represent a significant performance upgrade that provides almost 3 TeraFlops of dedicated compute power and the introduction of advanced software and networking tools to harness the power of grid computing. Researchers can now explore the development of scientific workflows that leverage the full complement of their available resources via the GridWay scheduler and 10Gigabit connectivity between clusters.

These new features are, of course, available alongside the familiar SGE scheduler environment to ensure that all users can continue leveraging Cheaha for their established operations without interruption.

Cheaha documentation is maintained on-line and includes an overview of its services, links to get started using this resource, and other useful information. Please bookmark the Cheaha documentation URL for future reference.

http://docs.uabgrid.uab.edu/wiki/Cheaha

A summary of these updates and some guidance on migration follow.

Cheaha is resource for research computing developed and sponsored by UAB IT and located in the UAB Shared Computing Facility. Operational support for Cheaha is provided by the UAB School of Engineering cluster support group.

Hostname Change
Please update your client software to use the new official host name to access Cheaha:

cheaha.uabgrid.uab.edu

The previous host name will be mapped to this host name to support the transition for existing users. If you continue to use the old host name, the name mapping may trigger a warning in your secure shell software about the host fingerprint having changed.

To ensure that you are connecting to the legitimate host, you can verify that the fingerprint presented by your secure shell client for cheaha.uabgrid.uab.edu matches:

d4:2e:cc:12:95:a2:39:cc:b7:2c:d8:97:37:75:e9:6f

Hardware Updates
The August 2008 hardware acquisition migrated Cheaha's core infrastructure to a Dell blade clustering solution. This transition provides a 3 fold increase in processor density over the original hardware and enables more computing power to be located in the same physical space with room for expansion, an important consideration in light of the continued growth in processing demand.

The new head node and the 24 new compute nodes each contain 2 quad-core 3.0GHz Intel x86-64 processors with 16GB of RAM per node. This translates into a processing pool of 192 cores with 2GB of RAM per core and represents 2.3TeraFlops of raw compute power, an 6 fold increase over the original hardware.

In a change from previous cluster upgrades, the original compute hardware is being folded into the system along-side the new hardware. This expands the total pool of compute nodes and continues to leverage hardware that can contribute usefully to the overall performance of scientific workflows.

Software Updates
The introduction of the GridWay scheduler enables users to manage the distribution of compute jobs across multiple clusters through a single interface. This can significantly increase the total compute capacity available by enabling users to access all the available compute cycles   of participating clusters (currently Cheaha, Olympus, Everest, and Ferrum with others to be added). This can lead to simplified workflows and maximized use of compute resources. More details on GridWay are available here:

http://docs.uabgrid.uab.edu/wiki/Cheaha#GridWay

The system software has also been upgraded to CentOS 5 with new versions of the GNU and Intel compilers. This will require recompilation of any dynamically linked application codes. Statically linked codes should continue to operate fine. More details on the software available are available here:

http://docs.uabgrid.uab.edu/wiki/Cheaha#Software

Network Updates
Cheaha is connected to the UAB Research Network, which provides a 10Gbs networking backplane for clusters in the UAB Shared Computing Facility and Department of Computer and Information Science HPC Center. At present only Cheaha and Ferrum (a CIS cluster) are connected via 10Gbs interfaces. Data transfers rates of almost 8Gbs between these hosts have been demonstrated using Grid FTP, a multi-channel file transfer service that is used by GridWay to move data between clusters as part of the job management operations. This performance promises very efficient job management and the seamless integration of other clusters as connectivity to the research network is expanded.

Account Migration
The storage systems from the original Cheaha system are scheduled for decommission. This affects the home directories of all users who had accounts on the original Cheaha system. Affected users should copy any data they want to preserve from their old home directories to their current home directory on Cheaha.

As a convenience, the old home directories are accessible directly via the file system of the new system. Users should copy any files they wish to preserve from their old home directories located at /u/$LOGNAME to their current home directory (located at $HOME).

The following command will copy all files to a folder call "oldcheaha" under your current home directory: cp -r /oldcheaha/$LOGNAME $HOME/oldcheaha Users are encouraged, however, to use this opportunity to clean out unused files and selectively copy only data they wish to preserve.

Support
To request authorization to use Cheaha or for assistance migrating your account from the old system, please submit a service request with the cluster support group.