Cheaha-BGL Comparison
https://docs.rc.uab.edu/
Please use the new documentation url https://docs.rc.uab.edu/ for all Research Computing documentation needs.
As a result of this move, we have deprecated use of this wiki for documentation. We are providing read-only access to the content to facilitate migration of bookmarks and to serve as an historical record. All content updates should be made at the new documentation site. The original wiki will not receive further updates.
Thank you,
The Research Computing Team
The results of the UAB BlueGene (IBM BG/L) and Cheaha (commodity x86-64) benchmarking exercise are as follows.
Benchmark Overview
The compute benchmarks are presented from two perspectives
- TeraFLOP rating -- considering both the theoretical peak performance and sustained peak performance measured with the HPL benchmark popularized by Top500.org
- application-specific rating -- comparing the real application performance of popular molecular dynamics packages.
For the purpose of these benchmarks, UAB BlueGene is an IBM BG/L with 2048 700MHz PowerPC-based CPU cores and custom inter-process communication network. Cheaha is a commodity Intel-based cluster with three generations of hardware acquired since 2005 and presented to the user as single compute environment via a common login node and scheduler. The second generation (gen2) hardware acquired in 2008 and 2009 by UAB IT and the third generation (gen3) hardware acquired in 2010 through the NIH SIG award (PI Allison) are the focus of this benchmark. <ref>The first generation hardware (gen1) is the original Cheaha hardware acquired via an EPSCoR award (PI Shealy) and includes 64 compute nodes with 2 1.6 GHz 1-core AMD CPUs per node (128 cores total) and a 1Gigabit Ethernet inter-process communication fabric. It's approximate HPL rating was 0.3TFlops.</ref>
The benchmarks tools assume identical nodes, therefore, the Cheaha benchmarks are presented separately for the gen2 and gen3 hardware. The gen2 hardware includes 24 compute nodes with 2 3.0GHz 4-core Intel CPUs per node (192 cores total) and a dual-data rate (DDR) Infiniband network for inter-process communication. The gen3 hardware includes 48 compute nodes with 2 2.66GHz 6-core Intel CPUs per node (576 cores total) and a quad-data rate (QDR) Infiniband network for inter-process communication.
TeraFLOP Rating
System | Theoretical Peak (TFLOPS) | HPL Computed Peak (TFLOPS) | Efficiency |
---|---|---|---|
UAB BlueGene (2048 cores) | 5.733 | 4.733 | 82.5% |
Cheaha gen3 (576 cores) | 6.128 | 5.342 | 87.2% |
Cheaha gen2 (192 cores) | 1.424 | ||
Cheaha gen3 (192 cores) | -- | 1.820 | -- |
The results show the SIG hardware (gen3) provides improved sustained performance over the BG/L.
The last two entries also compare the Cheaha gen2 and gen3 hardware directly. On compute density, 1 chassis of the gen3 hardware contains 192 cores whereas the gen2 hardware needed 1.5 chassis for 192 cores. On performance, 6-core CPUs of the gen3 hardware out perform the 4-core CPUs of the gen2 hardware and show there is no loss of memory bandwidth due to the increased core-count in the CPUs.
Application Rating
NAMD
The application specific comparison between UAB BlueGene and Cheaha was based on NAMD and shows an 8-fold increase in speed from BGL to current-generation Cheaha, the same scale NAMD job can be performed on 8x fewer CPU nodes on Cheaha
System | Atoms | ns/day | CPU count |
---|---|---|---|
UAB BlueGene | 246,000 | 0.80 | 256 |
Cheaha with SIG | 235,000 | 0.88 | 32 |
GROMACS
It wasn't possible to compare GROMACS performance between BGL and Cheaha since GROMACS was not supported on BGL, but comparison the Cheaha gen2 and gen3 hardware shows performance increases on 6-core versus 4-core platform. The performance increase confirms the expectations from the benchmarks that memory bandwidth is not degraded by the 6-core architecture.
System | Atoms | ns/day |
---|---|---|
Cheaha gen2 | 276,263 | 424.5 |
Cheaha gen3 | 276,263 | 456.0 |
Footnotes
<references />