Cheaha-BGL Comparison

From Cheaha
Revision as of 20:06, 24 March 2011 by Jpr@uab.edu (talk | contribs) (Initial benchmark results for bgl v cheaha)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Attention: Research Computing Documentation has Moved
https://docs.rc.uab.edu/


Please use the new documentation url https://docs.rc.uab.edu/ for all Research Computing documentation needs.


As a result of this move, we have deprecated use of this wiki for documentation. We are providing read-only access to the content to facilitate migration of bookmarks and to serve as an historical record. All content updates should be made at the new documentation site. The original wiki will not receive further updates.

Thank you,

The Research Computing Team

The results of the UAB BlueGene (IBM BG/L) and Cheaha (commodity x86-64) benchmarking exercise are as follows.

Benchmark Overview

The compute benchmarks are presented from two perspectives

  1. TeraFLOP rating -- considering both the theoretical peak performance and sustained peak performance measured with the HPL benchmark popularized by Top500.org
  2. application-specific rating -- comparing the real application performance of popular molecular dynamics packages.

For the purpose of these benchmarks, UAB BlueGene is an IBM BG/L with 2048 700MHz PowerPC-based CPU cores and custom inter-process communication network. Cheaha is a commodity Intel-based cluster with three generations of hardware acquired since 2005 and presented to the user as single compute environment via a common login node and scheduler. The second generation (gen2) hardware acquired in 2008 and 2009 by UAB IT and the third generation (gen3) hardware acquired in 2010 through the NIH SIG award (PI Allison) are the focus of this benchmark.*

The benchmarks tools assume identical nodes, therefore, the Cheaha benchmarks are presented separately for the gen2 and gen3 hardware. The gen2 hardware includes 24 compute nodes with 2 3.0GHz 4-core Intel CPUs per node (192 cores total) and a dual-data rate (DDR) Infiniband network for inter-process communication. The gen3 hardware includes 48 compute nodes with 2 2.66GHz 6-core Intel CPUs per node (576 cores total) and a quad-data rate (QDR) Infiniband network for inter-process communication.

TeraFLOP Rating

HPC Benchmarks BGL v Cheaha gen2 and gen3
System Theoretical Peak (TFLOPS) HPL Computed Peak (TFLOPS) Efficiency
UAB BlueGene (2048 cores) 5.733 4.733 82.5%
Cheaha gen3 (576 cores) 6.128 5.342 87.2%
Cheaha gen2 (192 cores) 1.424
Cheaha gen3 (192 cores) -- 1.820 --


The results show the SIG hardware (gen3) provides improved sustained performance over the BG/L.

The last two entries also compare the Cheaha gen2 and gen3 hardware directly. On compute density, 1 chassis of the gen3 hardware contains 192 cores whereas the gen2 hardware needed 1.5 chassis for 192 cores. On performance, 6-core CPUs of the gen3 hardware out perform the 4-core CPUs of the gen2 hardware and show there is no loss of memory bandwidth due to the increased core-count in the CPUs.

Application Rating

NAMD

The application specific comparison between UAB BlueGene and Cheaha was based on NAMD and shows an 8-fold increase in speed from BGL to current-generation Cheaha, the same scale NAMD job can be performed on 8x fewer CPU nodes on Cheaha

NAMD Comparison
System Atoms ns/day CPU count
UAB BlueGene 246,000 0.80 256
Cheaha 235,000 0.88 32

GROMACS

It wasn't possible to compare GROMACS performance between BGL and Cheaha since GROMACS was not supported on BGL, but comparison the Cheaha gen2 and gen3 hardware shows performance increases on 6-core versus 4-core platform. The performance increase confirms the expectations from the benchmarks that memory bandwidth is not degraded by the 6-core architecture.

GROMACS Comparison
System Atoms ns/day
Cheaha gen2 276,263 424.5
Cheaha gen3 276,263 456.0
  • The first generation hardware (gen1) is the original Cheaha hardware acquired via an EPSCoR award (PI Shealy) and includes 64 compute nodes with 2 1.6 GHz 1-core AMD CPUs per node (128 cores total).