NAMD GPU: Difference between revisions

From Cheaha
Jump to navigation Jump to search
No edit summary
 
(3 intermediate revisions by 2 users not shown)
Line 78: Line 78:
page: http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD
page: http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD


Following example assumes NAMD was built in /scratch/user/BLAZERID/NAMD directory.
Following example assumes NAMD was built in $USER_SCRATCH/NAMD directory.


=== Running NAMD on Cheaha CUDA BLADE ===  
=== Running NAMD on Cheaha CUDA BLADE ===  
Line 101: Line 101:
==== Export Path ====
==== Export Path ====
<pre>  
<pre>  
export LD_LIBRARY_PATH=/lustre/scratch/BALZERID/NAMD/NAMD_2.8_Linux-x86_64-CUDA/bin/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$USER_SCRATCH/NAMD/NAMD_2.8_Linux-x86_64-CUDA/bin/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/lustre/scratch/BLAZERID/NAMD/NAMD_2.8_Linux-x86_64-CUDA/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$USER_SCRATCH/NAMD/NAMD_2.8_Linux-x86_64-CUDA/:$LD_LIBRARY_PATH
</pre>
</pre>


==== NAMD2 ====
==== NAMD2 ====
run namd2 from the /lustre/scratch/BLAZERID/NAMD directory  (which contains the CUDA build of NAMD)
run namd2 from the $USER_SCRATCH/NAMD directory  (which contains the CUDA build of NAMD)
<pre>
<pre>
./charmrun ++local +p12 ./namd2 +idlepoll /LOCATION/OF/CONFIGURATION/FILE/*.conf>OUTPUT_FILE
./charmrun ++local +p12 ./namd2 +idlepoll /LOCATION/OF/CONFIGURATION/FILE/*.conf>OUTPUT_FILE

Latest revision as of 19:54, 31 October 2016

CUDA GPU Acceleration

NAMD only uses the GPU for nonbonded force evaluation. Energy evaluation is done on the CPU. To benefit from GPU acceleration you should set outputEnergies to 100 or higher in the simulation config file. Some features are unavailable in CUDA builds, including alchemical free energy perturbation.

As this is a new feature you are encouraged to test all simulations before beginning production runs. Forces evaluated on the GPU differ slightly from a CPU-only calculation, an effect more visible in reported scalar pressure values than in energies.

To benefit from GPU acceleration you will need a CUDA build of NAMD and a recent high-end NVIDIA video card. CUDA builds will not function without a CUDA-capable GPU. You will also need to be running the NVIDIA Linux driver version 195.17 or newer (released Linux binaries are built with CUDA 2.3, but can be built with newer versions as well).

Finally, the libcudart.so.2 included with the binary (the one copied from the version of CUDA it was built with) must be in a directory in your LD_LIBRARY_PATH before any other libcudart.so libraries. For example:

 setenv LD_LIBRARY_PATH ".:$LD_LIBRARY_PATH"
 (or LD_LIBRARY_PATH=".:$LD_LIBRARY_PATH"; export LD_LIBRARY_PATH)
 ./namd2 +idlepoll <configfile>
 ./charmrun ++local +p4 ./namd2 +idlepoll <configfile>

When running CUDA NAMD always add +idlepoll to the command line. This is needed to poll the GPU for results rather than sleeping while idle.

Each namd2 process can use only one GPU. Therefore you will need to run at least one process for each GPU you want to use. Multiple processes can share a single GPU, usually with an increase in performance. NAMD will automatically distribute processes equally among the GPUs on a node. Specific GPU device IDs can be requested via the +devices argument on the namd2 command line, for example:

./charmrun ++local +p4 ./namd2 +idlepoll +devices 0,2 <configfile>

Devices are selected cyclically from those available, so in the above example processes 0 and 2 will share device 0 and processes 1 and 3 will share device 2. One could also specify +devices 0,0,2,2 to cause device 0 to be shared by processes 0 and 1, etc. GPUs with two or fewer multiprocessors are ignored unless specifically requested with +devices. GPUs of compute capability 1.0 are no longer supported and are ignored.

While charmrun with ++local will preserve LD_LIBRARY_PATH, normal charmrun does not. You can use charmrun ++runscript to add the namd2 directory to LD_LIBRARY_PATH with the following executable runscript:

 #!/bin/csh
 setenv LD_LIBRARY_PATH "${1:h}:$LD_LIBRARY_PATH"
 $*

For example:

 ./charmrun ++runscript ./runscript +p8 ./namd2 +idlepoll <configfile>

An InfiniBand network is highly recommended when running CUDA-accelerated NAMD across multiple nodes. You will need either an ibverbs NAMD binary (available for download) or an MPI NAMD binary (must build Charm++ and NAMD as described below) to make use of the InfiniBand network.

The CUDA (NVIDIA's graphics processor programming platform) code in NAMD is completely self-contained and does not use any of the CUDA support features in Charm++. When building NAMD with CUDA support you should use the same Charm++ you would use for a non-CUDA build. Do NOT add the cuda option to the Charm++ build command line. The only changes to the build process needed are to add --with-cuda and possibly --cuda-prefix ... to the NAMD config command line.


CUDA NAMD on Cheaha

Download and build CUDA NAMD

Download and build NAMD with NVIDIA CUDA Acceleration from the NAMD Download page: http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD

Following example assumes NAMD was built in $USER_SCRATCH/NAMD directory.

Running NAMD on Cheaha CUDA BLADE

Cheaha CUDA enabled blade is: cheaha-compute-1-9 ssh to the particular host to work on CUDA

 ssh cheaha-compute-1-9 

Load CUDA module

module load cuda/cuda-4     ( to load cuda )

CUDA commands

deviceQuery 	(to check the status of the device)

bandwidthTest  (to test the bandwidth for data transfer)

If the above tests fail please contact Mike Hanby.

Export Path

 
export LD_LIBRARY_PATH=$USER_SCRATCH/NAMD/NAMD_2.8_Linux-x86_64-CUDA/bin/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$USER_SCRATCH/NAMD/NAMD_2.8_Linux-x86_64-CUDA/:$LD_LIBRARY_PATH

NAMD2

run namd2 from the $USER_SCRATCH/NAMD directory (which contains the CUDA build of NAMD)

./charmrun ++local +p12 ./namd2 +idlepoll /LOCATION/OF/CONFIGURATION/FILE/*.conf>OUTPUT_FILE

here,

  • ++local makes use of the processors only on the compute node
  • +p12 tells NAMD to make use of the 12 processor cores on the compute node