Talk:NAMD Benchmarks: Difference between revisions

From Cheaha
Jump to navigation Jump to search
No edit summary
(added category NAMD)
 
(33 intermediate revisions by the same user not shown)
Line 1: Line 1:
Benchmark introduction
The efficiency of a parallel system describes the fraction of the time that is being used by the processors for a given computation. It is defined as
Calculation of efficiency
<pre>
Scaling to higher processor count- Calculation for scaling
        Execution time using one processor      ts
E(n)= --------------------------------------  = ----
      N * Execution time using N processors    N tn
</pre>
 
In general, parallel jobs should scale to at least 70% efficiency. The ASC's DMC  recommends a scaling efficiency of 75% or greater. For NAMD the efficiency of a parallel job can be calculated as follows (where N is processors committed to the job):
 
<pre>
  days/ns where N = 1
--------------------- * 100 = Efficiency
      N * days/ns
</pre>
 
 
Information on NAMD performance Scaling is available at: http://www.ks.uiuc.edu/Research/namd/wiki/?NamdPerformanceTuning
 
Benchmark used for performance evaluation on the [[Cheaha]], [http://biowulf.nih.gov Biowulf], and [http://www.asc.edu/supercomputing/index.shtml DMC] is Apoa1 from the NAMD suite  and is available at: http://www.ks.uiuc.edu/Research/namd/utilities/
 
The parameters for the benchmark used throughout are: 500 steps, 92K atoms, 12A cutoff + PME every 4 steps.
 
 


== Sample Benchmark using NAMD ==
== Sample Benchmark using NAMD ==


===Sample Benchmark comparing Days/ns for Cheaha, Biowulf, and DMC using InfiniBand===
===Sample Benchmark comparing Days/ns for Cheaha, Biowulf, and DMC using InfiniBand===
{| border="1" align="center" style="text-align:center;"
{| border="1" align="center" cellpadding="10" style="text-align:center;"
|+ Days/ns (efficiency) on Cheaha, Biowulf, and DMC
|+ Days/ns (efficiency) on Cheaha, Biowulf, and DMC
!Processors
!Processors
!Days/ns Cheaha
! Cheaha
!Days/ns Biowulf
! Biowulf
!Days/ns DMC
! DMC <ref group="note" >No hardware control was possible for the ASC's DMC and as such it is possible that different runs might have run on different hardware. *Efficiency on DMC using 4 processors is above 100%. This could be probably because of an older generation processor being used to run the job serially i.e. using 1 processor, thus setting a lower baseline to begin with causing the efficiency to go up as better processors are available for the parallel jobs.</ref>
|-
|-
|1
|1
|15.4054
|15.4054 (100%)
|18.0535
|18.0535 (100%)
|19.1000
|19.1000 (100%)
|-
|-
|2
|2
|7.7119
|7.7119 (99.87%)
|9.5163
|9.5163 (94.86%)
|9.7600
|9.7600 (97.84%)
|-
|-
|4
|4
|3.8933
|3.8933 (98.92%)
|4.9222
|4.9222 (91.69%)
|4.7570
|4.7570 (100.3%)*
|-
|-
|8
|8
|1.9653
|1.9653 (97.98%)
|2.5763
|2.5763 (87.59%)
|2.5360
|2.5360 (94.14%)
|-
|-
|16
|16
|0.9950
|0.9950 (96.76%)
|1.2658
|1.2658 (89.14%)
|1.3870
|1.3870 (86.06%)
|-
|-
|32
|32
|0.5101
|0.5101 (94.37%)
|0.6463
|0.6463 (87.29%)
|0.7438
|0.7438 (80.24%)
|-
|-
|64
|64
|0.2592
|0.2592 (92.83%)
|0.3390
|0.3390 (83.22%)
|0.3938
|0.3938 (75.78%)
|-
|-
|128
|128
|0.1360
|0.1360 (88.45%)
|NA
|NA
|NA
|NA
|-
|-
|256
|256
|0.0770
|0.0770 (78.09%)
|NA
|NA
|-
|512
|0.0439 (68.49%)
|NA
|NA
|NA
|NA
|}
|}
[[File:NAMD_benchmark_img1_Days_cheaha_dmc_biowulf.png]]
[[File:NAMD_benchmark_img1_Days_cheaha_dmc_biowulf.png|center]]
[[File:NAMD_benchmark_img2_Efficiency_cheaha_dmc_biowulf.png|center]]
 
 
===Bench mark notes===
 
The above benchmarks were run using NAMD 2.8b1 and the [http://www.ks.uiuc.edu/Research/namd/utilities/ Apoa1 benchmark suite] from NAMD.


On Cheaha, only the third generation hardware was used for the above benchmarks. More information about the hardware used on [[Cheaha]] is available  [[Cheaha#Hardware |here]].


===Graph's===
The data for the NIH-Biowulf benchmarks is available at: http://biowulf.nih.gov/apps/namd/namd_bench.html. The Biowulf hardware used for comparison in the e2800 with Infiniband.


===Efficiency===
;ASC-DMC
<references group="note"/>


===Bench mark notes===




== Comparison of Ethernet and IB ==
== Comparison of Ethernet and IB ==
{| border="1" align="center" cellpadding="10" style="text-align:center;"
|+ Ethernet vs Infiniband interconnect on Cheaha Gen 3 hardware
!Processors
! Infiniband
! Ethernet
|-
|1
|15.4054 (100%)
|15.4040 (100%)
|-
|2
|7.7119 (99.87%)
|7.7800 (98.99%)
|-
|4
|3.8933 (98.92%)
|3.9405 (97.72%)
|-
|8
|1.9653 (97.98%)
|2.2816 (84.39%)
|-
|16
|0.9950 (96.76%)
|1.2714 (75.74%)
|-
|32
|0.5101 (94.37%)
|0.6973 (69.03%)
|-
|64
|0.2592 (92.83%)
|0.6562 (36.67%)
|-
|128
|0.1360 (88.45%)
|0.8950 (13.44%)
|-
|256
|0.0770 (78.09%)
|0.8632 (6.97%)
|-
|512
|0.0439 (68.49%)
|NA
|}
[[File:NAMD_benchmark_img3_Ib_eth_days_ns_2.8.png|center]]
[[File:NAMD_benchmark_img4_Ib_eth_efficiency_2.8.png|center]]
== Comparison on different Generations of Cheaha hardware on NAMD with Infinband ==
{| border="1" align="center" cellpadding="10" style="text-align:center;"
|+  Cheaha Gen 2 and Gen 3 hardware using NAMD 2.8 and Inifiband
!Processors
! Generation 3
! Generation 2
|-
|1
|15.4054 (100%)
|15.9300 (100%)
|-
|2
|7.7119 (99.87%)
|8.6620 (91.98%)
|-
|4
|3.8933 (98.92%)
|4.2560 (93.59%)
|-
|8
|1.9653 (97.98%)
|2.3450 (84.93%)
|-
|16
|0.9950 (96.76%)
|1.2060 (82.56%)
|-
|32
|0.5101 (94.37%)
|0.6420 (77.57%)
|-
|64
|0.2592 (92.83%)
|0.2750 (90.38%)
|-
|128
|0.1360 (88.45%)
|0.1620 (76.81%)
|-
|256
|0.0770 (78.09%)
|NA
|-
|512
|0.0439 (68.49%)
|NA
|}


[[File:NAMD_benchmark_img5_Days_gen2_gen3_infiniband.png|center]]
[[File:NAMD_benchmark_img6_Efficiency_gen2_gen3_ib.png|center]]


== Comparison on different nodes on Cheaha Gen 2 and Gen 3 with IB ==
===Hardware Notes===
Generation 3 hardware contains 576 cores. 2.66 GHz Intel Xeon Hex-core (48-2x6 core)
Generation 2 hardware contains 192 cores. 3.06 GHz Intel Xeon Quad-core (24-2x4 core)
More information about the hardware used on [[Cheaha]] is available  [[Cheaha#Hardware |here]].


== Actual job benchmarks (Segrest job) ==
== Actual job benchmarks (Segrest job) ==
Actual NAMD job courtesy of Center for Computational and Structural Biology.
STRUCTURE SUMMARY:
Job parameters: 100,000 steps, 246K atoms, 12A cutoff + PME every 4 steps
245956 ATOMS
174192 BONDS
130742 ANGLES
84410 DIHEDRALS
1622 IMPROPERS
0 CROSSTERMS
0 EXCLUSIONS
232802 RIGID BONDS
505063 DEGREES OF FREEDOM
84698 HYDROGEN GROUPS
4 ATOMS IN LARGEST HYDROGEN GROUP
84698 MIGRATION GROUPS
4 ATOMS IN LARGEST MIGRATION GROUP
TOTAL MASS = 1.48179e+06 amu
TOTAL CHARGE = 2.7746e-05 e
MASS DENSITY = 1.01378 g/cm^3
ATOM DENSITY = 0.101334 atoms/A^3
=== Speedup (Actual Wall times, days/ns, efficiency)===
{| border="1" align="center" cellpadding="10" style="text-align:center;"
|+ Actual NAMD CSB job on Cheaha Gen 3 hardware
!Processors
! Wall Clock
! Days/ns
! Speedup
|-
|1
|357278 (99hrs 14mins)
|20.4950 (100%)
|1
|-
|8
|45784 (12hrs 42mins)
|2.6237 (97.64%)
|7.80
|-
|16
|23380 (6hrs 29mins)
|1.33205 (96.16%)
|15.28
|-
|32
|11871 (3hrs 17mins)
|0.6809 (94.04%)
|30.09
|-
|64
|6242 (1hr 43mins)
|0.3626 (88.30%)
|57.23
|-
|128
|3386 (56mins 26secs)
|0.1940 (82.51%)
|105.51
|-
|256
|1814 (30mins 14secs)
|0.0982 (81.51%)
|196.96
|-
|512
|1129 (18mins 49secs)
|0.0578 (69.21%)
|316.46
|}
[[File:NAMD_benchmark_image_6_Effciency_combined_AM.png|center]]
=== Actual job comparison to IBM BG/L ===
NAMD Comparison data to UAB IBM BlueGene/L.
Job parameters: 100,000 steps, 246K atoms, 12A cutoff + PME every 4 steps
{| border="1" align="center" cellpadding="10" style="text-align:center;"
|+ NAMD Comparison to IBM BG/L
! System
! CPU count
! Atoms
! Days/ns
|-
|UAB BlueGene/L
|256
|246K
|0.80
|-
|Cheaha Gen3
|32
|246K
|0.6809
|-
|Cheaha Gen3
|256
|246K
|0.0982
|-
|Cheaha Gen3
|512
|246K
|0.0578
|}
The Cheaha outperforms the BG/L using only 32 cores on the gen 3 hardware. At equal processor count the Cheaha is roughly 8 times faster than the BG/L.


== Perform Your Own Benchmarks ==
== Perform Your Own Benchmarks ==
[[Category:NAMD]]

Latest revision as of 17:19, 9 June 2011

The efficiency of a parallel system describes the fraction of the time that is being used by the processors for a given computation. It is defined as

        Execution time using one processor       ts
E(n)= --------------------------------------  = ----
      N * Execution time using N processors     N tn

In general, parallel jobs should scale to at least 70% efficiency. The ASC's DMC recommends a scaling efficiency of 75% or greater. For NAMD the efficiency of a parallel job can be calculated as follows (where N is processors committed to the job):

  days/ns where N = 1
 --------------------- * 100 = Efficiency
      N * days/ns


Information on NAMD performance Scaling is available at: http://www.ks.uiuc.edu/Research/namd/wiki/?NamdPerformanceTuning

Benchmark used for performance evaluation on the Cheaha, Biowulf, and DMC is Apoa1 from the NAMD suite and is available at: http://www.ks.uiuc.edu/Research/namd/utilities/

The parameters for the benchmark used throughout are: 500 steps, 92K atoms, 12A cutoff + PME every 4 steps.


Sample Benchmark using NAMD

Sample Benchmark comparing Days/ns for Cheaha, Biowulf, and DMC using InfiniBand

Days/ns (efficiency) on Cheaha, Biowulf, and DMC
Processors Cheaha Biowulf DMC <ref group="note" >No hardware control was possible for the ASC's DMC and as such it is possible that different runs might have run on different hardware. *Efficiency on DMC using 4 processors is above 100%. This could be probably because of an older generation processor being used to run the job serially i.e. using 1 processor, thus setting a lower baseline to begin with causing the efficiency to go up as better processors are available for the parallel jobs.</ref>
1 15.4054 (100%) 18.0535 (100%) 19.1000 (100%)
2 7.7119 (99.87%) 9.5163 (94.86%) 9.7600 (97.84%)
4 3.8933 (98.92%) 4.9222 (91.69%) 4.7570 (100.3%)*
8 1.9653 (97.98%) 2.5763 (87.59%) 2.5360 (94.14%)
16 0.9950 (96.76%) 1.2658 (89.14%) 1.3870 (86.06%)
32 0.5101 (94.37%) 0.6463 (87.29%) 0.7438 (80.24%)
64 0.2592 (92.83%) 0.3390 (83.22%) 0.3938 (75.78%)
128 0.1360 (88.45%) NA NA
256 0.0770 (78.09%) NA NA
512 0.0439 (68.49%) NA NA
NAMD benchmark img1 Days cheaha dmc biowulf.png
NAMD benchmark img2 Efficiency cheaha dmc biowulf.png


Bench mark notes

The above benchmarks were run using NAMD 2.8b1 and the Apoa1 benchmark suite from NAMD.

On Cheaha, only the third generation hardware was used for the above benchmarks. More information about the hardware used on Cheaha is available here.

The data for the NIH-Biowulf benchmarks is available at: http://biowulf.nih.gov/apps/namd/namd_bench.html. The Biowulf hardware used for comparison in the e2800 with Infiniband.

ASC-DMC

<references group="note"/>


Comparison of Ethernet and IB

Ethernet vs Infiniband interconnect on Cheaha Gen 3 hardware
Processors Infiniband Ethernet
1 15.4054 (100%) 15.4040 (100%)
2 7.7119 (99.87%) 7.7800 (98.99%)
4 3.8933 (98.92%) 3.9405 (97.72%)
8 1.9653 (97.98%) 2.2816 (84.39%)
16 0.9950 (96.76%) 1.2714 (75.74%)
32 0.5101 (94.37%) 0.6973 (69.03%)
64 0.2592 (92.83%) 0.6562 (36.67%)
128 0.1360 (88.45%) 0.8950 (13.44%)
256 0.0770 (78.09%) 0.8632 (6.97%)
512 0.0439 (68.49%) NA
NAMD benchmark img3 Ib eth days ns 2.8.png
NAMD benchmark img4 Ib eth efficiency 2.8.png

Comparison on different Generations of Cheaha hardware on NAMD with Infinband

Cheaha Gen 2 and Gen 3 hardware using NAMD 2.8 and Inifiband
Processors Generation 3 Generation 2
1 15.4054 (100%) 15.9300 (100%)
2 7.7119 (99.87%) 8.6620 (91.98%)
4 3.8933 (98.92%) 4.2560 (93.59%)
8 1.9653 (97.98%) 2.3450 (84.93%)
16 0.9950 (96.76%) 1.2060 (82.56%)
32 0.5101 (94.37%) 0.6420 (77.57%)
64 0.2592 (92.83%) 0.2750 (90.38%)
128 0.1360 (88.45%) 0.1620 (76.81%)
256 0.0770 (78.09%) NA
512 0.0439 (68.49%) NA
NAMD benchmark img5 Days gen2 gen3 infiniband.png
NAMD benchmark img6 Efficiency gen2 gen3 ib.png

Hardware Notes

Generation 3 hardware contains 576 cores. 2.66 GHz Intel Xeon Hex-core (48-2x6 core) Generation 2 hardware contains 192 cores. 3.06 GHz Intel Xeon Quad-core (24-2x4 core) More information about the hardware used on Cheaha is available here.

Actual job benchmarks (Segrest job)

Actual NAMD job courtesy of Center for Computational and Structural Biology.

STRUCTURE SUMMARY:

Job parameters: 100,000 steps, 246K atoms, 12A cutoff + PME every 4 steps
245956 ATOMS
174192 BONDS
130742 ANGLES
84410 DIHEDRALS
1622 IMPROPERS
0 CROSSTERMS
0 EXCLUSIONS
232802 RIGID BONDS
505063 DEGREES OF FREEDOM
84698 HYDROGEN GROUPS
4 ATOMS IN LARGEST HYDROGEN GROUP
84698 MIGRATION GROUPS
4 ATOMS IN LARGEST MIGRATION GROUP
TOTAL MASS = 1.48179e+06 amu
TOTAL CHARGE = 2.7746e-05 e
MASS DENSITY = 1.01378 g/cm^3
ATOM DENSITY = 0.101334 atoms/A^3


Speedup (Actual Wall times, days/ns, efficiency)

Actual NAMD CSB job on Cheaha Gen 3 hardware
Processors Wall Clock Days/ns Speedup
1 357278 (99hrs 14mins) 20.4950 (100%) 1
8 45784 (12hrs 42mins) 2.6237 (97.64%) 7.80
16 23380 (6hrs 29mins) 1.33205 (96.16%) 15.28
32 11871 (3hrs 17mins) 0.6809 (94.04%) 30.09
64 6242 (1hr 43mins) 0.3626 (88.30%) 57.23
128 3386 (56mins 26secs) 0.1940 (82.51%) 105.51
256 1814 (30mins 14secs) 0.0982 (81.51%) 196.96
512 1129 (18mins 49secs) 0.0578 (69.21%) 316.46
NAMD benchmark image 6 Effciency combined AM.png

Actual job comparison to IBM BG/L

NAMD Comparison data to UAB IBM BlueGene/L.

Job parameters: 100,000 steps, 246K atoms, 12A cutoff + PME every 4 steps

NAMD Comparison to IBM BG/L
System CPU count Atoms Days/ns
UAB BlueGene/L 256 246K 0.80
Cheaha Gen3 32 246K 0.6809
Cheaha Gen3 256 246K 0.0982
Cheaha Gen3 512 246K 0.0578

The Cheaha outperforms the BG/L using only 32 cores on the gen 3 hardware. At equal processor count the Cheaha is roughly 8 times faster than the BG/L.

Perform Your Own Benchmarks