NWChem: Difference between revisions

From Cheaha
Jump to navigation Jump to search
Line 299: Line 299:
cp libscalapack.a $HOME/ScaLAPACK/lib
cp libscalapack.a $HOME/ScaLAPACK/lib
</pre>
</pre>
===Notes on Compiling ScaLAPACK===
#Before the recent few versions of ScaLAPACK, BLACS, a communication layer between the BLAS parts and MPI parts of ScaLAPACK was needed.  It had to be compiled by itself, and included in the SLmake.inc file, just like the OpenBLAS libraries.  This is no longer necessary as BLACS has been "absorbed" into ScaLAPACK and is compiled during the 'make all' step.  However, in the BLACS directory there is still the source folder, and a testing folder.  After ScaLAPACK has been compiled, try running BLACS/TESTING/xCbtest and xFbtest with mpirun to make sure that part went as expected.
#Inside the ScaLAPACK directory there is also a TESTING directory.  Try these as well.
#If something shouldn't go right and you need to start over, for example, if you forgot to load the mpi module, run 'make clean' in the BLACS/SRC as well as the ScaLAPACK/SRC directories.  Running 'make clean' from the top level dir didn't seem to clear out some 'bad stuff' that was left over.


=References=
=References=
{{Reflist}}
{{Reflist}}

Revision as of 21:11, 28 September 2015

NWChem: Open Source High-Performance Computational Chemistry

NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters.

NWChem software can handle

  • Biomolecules, nanostructures, and solid-state
  • From quantum to classical, and all combinations
  • Ground and excited-states
  • Gaussian basis functions or plane-waves
  • Scaling from one to thousands of processors
  • Properties and relativistic effects

NWChem is actively developed by a consortium of developers and maintained by the EMSL located at the Pacific Northwest National Laboratory (PNNL) in Washington State. Researchers interested in contributing to NWChem should review the Developers page. The code is distributed as open-source under the terms of the Educational Community License version 2.0 (ECL 2.0).

The NWChem development strategy is focused on providing new and essential scientific capabilities to its users in the areas of kinetics and dynamics of chemical transformations, chemistry at interfaces and in the condensed phase, and enabling innovative and integrated research at EMSL. At the same time continued development is needed to enable NWChem to effectively utilize architectures of tens of petaflops and beyond.

Science with NWChem

NWChem used by thousands of researchers worldwide to investigate questions about chemical processes by applying theoretical techniques to predict the structure, properties, and reactivity of chemical and biological species ranging in size from tens to millions of atoms. With NWChem, researchers can tackle molecular systems including biomolecules, nanostructures, actinide complexes, and materials. NWChem offers an extensive array of highly scalable, parallel computational chemistry methods needed to address scientific questions that are relevant to reactive chemical processes occurring in our everyday environment—photosynthesis, protein functions, and combustion, to name a few. They include a multitude of highly correlated methods, density functional theory (DFT) with an extensive set of exchange-correlation functionals, time-dependent density functional theory (TDDFT), plane-wave DFT with exact exchange and Car-Parrinello, molecular dynamics with AMBER and CHARMM force fields, and combinations of them.

A list of research publications that utilized NWChem can be found here.

Software Similar to NWChem

Quantum Espresso [1]
CPMD [2]
Gaussian [3]
CP2k [4]

Compiling NWChem for the Cheaha cluster

The steps outlined here are adapted from the general guide for a site installation consisting of commodity hardware over MPI <ref name="COMPILE">Compiling NWChem from source</ref>. There are many compilation options for differing architectures, network protocols, and optimized mathematics libraries. This guide will show the steps necessary to compile NWChem 6.5 for OpenMPI <ref name="OMPI">OpenMPI - Message Passing</ref>, using OpenBLAS<ref name="OBLAS">OpenBLAS - Optimized BLAS Package</ref> tuned for Intel's Nehalem microarchitecture and ScaLAPACK<ref name="SCALAPACK">ScaLAPACK - Scalable Linear Algebra Package</ref> for optimized linear algebra calculations. This guide assume some familiarity with linux commands and utilities. This guide also assumes that each software will be downloaded, compiled, and installed in the user's home directory. After each subsection, I'll add my own comments from experience that may help with any hiccups.

Download and Compile OpenBLAS

At the time of writing, the latest available version of OpenBLAS is 2.14<ref name="OBLAS-SRC">OpenBLAS - Source Code</ref>

Create the source directory

 mkdir -p $HOME/src 

Download the source code

cd $HOME/src
wget http://github.com/xianyi/OpenBLAS/archive/v0.2.14.tar.gz 

Unpack the source code

 
mv v0.2.14 v0.2.14.tar.gz
tar xf v0.2.14.tar.gz         (this step may take a few moments)
mv OpenBLAS-0.2.14 OpenBLAS

Edit the configuration file so that its contents match those below. Some special notes here - (1) The TARGET is set to PENRYN for use in the sipsey queue. As the cluster ages and new generations are added, this sill probably need to be updated. (2) GCC is used as opposed to Intel's compilers. (3) NWChem uses 64 bit integers by default, so INTERFACE64 is set to 1. Most other options are fairly self explanatory. The default compiler is a little dated at this point. Load the gcc 4.9.3 module to use gcc/gfortran 4.9.3 for some hopeful optimization!

module load gcc/4.9.3
cd OpenBLAS
nano Makefile.rule
#
#  Beginning of user configuration
#

# This library's version
VERSION = 0.2.14

# If you set the suffix, the library name will be libopenblas_$(LIBNAMESUFFIX).a
# and libopenblas_$(LIBNAMESUFFIX).so. Meanwhile, the soname in shared library
# is libopenblas_$(LIBNAMESUFFIX).so.0.
# LIBNAMESUFFIX = 

# You can specify the target architecture, otherwise it's
# automatically detected.
TARGET = PENRYN

# If you want to support multiple architecture in one binary
#DYNAMIC_ARCH = 1

# C compiler including binary type(32bit / 64bit). Default is gcc.
# Don't use Intel Compiler or PGI, it won't generate right codes as I expect.
CC = gcc

# Fortran compiler. Default is g77.
FC = gfortran

# Even you can specify cross compiler. Meanwhile, please set HOSTCC.

# cross compiler for Windows
# CC = x86_64-w64-mingw32-gcc
# FC = x86_64-w64-mingw32-gfortran

# cross compiler for 32bit ARM
# CC = arm-linux-gnueabihf-gcc
# FC = arm-linux-gnueabihf-gfortran

# cross compiler for 64bit ARM
# CC = aarch64-linux-gnu-gcc
# FC = aarch64-linux-gnu-gfortran


# If you use the cross compiler, please set this host compiler.
# HOSTCC = gcc

# If you need 32bit binary, define BINARY=32, otherwise define BINARY=64
BINARY=64

# About threaded BLAS. It will be automatically detected if you don't
# specify it.
# For force setting for single threaded, specify USE_THREAD = 0
# For force setting for multi  threaded, specify USE_THREAD = 1
USE_THREAD = 0

# If you're going to use this library with OpenMP, please comment it in.
# USE_OPENMP = 1

# You can define maximum number of threads. Basically it should be
# less than actual number of cores. If you don't specify one, it's
# automatically detected by the the script.
# NUM_THREADS = 999

# if you don't need to install the static library, please comment it in.
# NO_STATIC = 1

# if you don't need generate the shared library, please comment it in.
# NO_SHARED = 1

# If you don't need CBLAS interface, please comment it in.
# NO_CBLAS = 1

# If you only want CBLAS interface without installing Fortran compiler,
# please comment it in.
# ONLY_CBLAS = 1

# If you don't need LAPACK, please comment it in.
# If you set NO_LAPACK=1, the library automatically sets NO_LAPACKE=1.
# NO_LAPACK = 1

# If you don't need LAPACKE (C Interface to LAPACK), please comment it in.
# NO_LAPACKE = 1

# If you want to use legacy threaded Level 3 implementation.
# USE_SIMPLE_THREADED_LEVEL3 = 1

# If you want to drive whole 64bit region by BLAS. Not all Fortran
# compiler supports this. It's safe to keep comment it out if you
# are not sure(equivalent to "-i8" option).
INTERFACE64 = 1

# Unfortunately most of kernel won't give us high quality buffer.
# BLAS tries to find the best region before entering main function,
# but it will consume time. If you don't like it, you can disable one.
NO_WARMUP = 1

# If you want to disable CPU/Memory affinity on Linux.
NO_AFFINITY = 1

# if you are compiling for Linux and you have more than 16 numa nodes or more than 256 cpus
BIGNUMA = 1

# Don't use AVX kernel on Sandy Bridge. It is compatible with old compilers
# and OS. However, the performance is low.
# NO_AVX = 1

# Don't use Haswell optimizations if binutils is too old (e.g. RHEL6)
# NO_AVX2 = 1

# Don't use parallel make.
# NO_PARALLEL_MAKE = 1

# If you would like to know minute performance report of GotoBLAS.
# FUNCTION_PROFILE = 1

# Support for IEEE quad precision(it's *real* REAL*16)( under testing)
# QUAD_PRECISION = 1

# Theads are still working for a while after finishing BLAS operation
# to reduce thread activate/deactivate overhead. You can determine
# time out to improve performance. This number should be from 4 to 30
# which corresponds to (1 << n) cycles. For example, if you set to 26,
# thread will be running for (1 << 26) cycles(about 25ms on 3.0GHz
# system). Also you can control this mumber by THREAD_TIMEOUT
# CCOMMON_OPT	+= -DTHREAD_TIMEOUT=26

# Using special device driver for mapping physically contigous memory
# to the user space. If bigphysarea is enabled, it will use it.
# DEVICEDRIVER_ALLOCATION = 1

# If you need to synchronize FP CSR between threads (for x86/x86_64 only).
# CONSISTENT_FPCSR = 1

# If any gemm arguement m, n or k is less or equal this threshold, gemm will be execute
# with single thread. You can use this flag to avoid the overhead of multi-threading
# in small matrix sizes. The default value is 4.
# GEMM_MULTITHREAD_THRESHOLD = 4

# If you need santy check by comparing reference BLAS. It'll be very
# slow (Not implemented yet).
# SANITY_CHECK = 1

# Run testcases in utest/ . When you enable UTEST_CHECK, it would enable
# SANITY_CHECK to compare the result with reference BLAS.
# UTEST_CHECK = 1

# The installation directory.
PREFIX = $HOME/OpenBLAS

# Common Optimization Flag;
# The default -O2 is enough.
COMMON_OPT = -O2

# gfortran option for LAPACK
# enable this flag only on 64bit Linux and if you need a thread safe lapack library
# FCOMMON_OPT = -frecursive

# Profiling flags
COMMON_PROF = -pg

# Build Debug version
# DEBUG = 1

# Improve GEMV and GER for small matrices by stack allocation.
# For details, https://github.com/xianyi/OpenBLAS/pull/482
#
# MAX_STACK_ALLOC=2048

# Add a prefix or suffix to all exported symbol names in the shared library.
# Avoid conflicts with other BLAS libraries, especially when using
# 64 bit integer interfaces in OpenBLAS.
# For details, https://github.com/xianyi/OpenBLAS/pull/459
#
# SYMBOLPREFIX=
# SYMBOLSUFFIX=

#
#  End of user configuration
#

Compile OpenBLAS, and install it to $HOME/OpenBLAS. Compiling may take a few minutes, so go grab another cup of coffee.

 make all 

When the compilation is finished, you should see an output similar to the one below

OpenBLAS build complete. 
OS               ... GNU/Linux             
Architecture     ... x86_64              
BINARY           ... 64bit      

Install the libraries and header files

 make install 

Notes on Compiling OpenBLAS

  1. When editing the config file 'Makefile.rule', you may have noticed an option for threading, and that it was ignored. NWChem will use MPI for its threading model, and spawns many different processes that are distributed. Each of these processes will use libopenblas directly; if we were to allow OpenBLAS to spawn its own threads, 'bad stuff' would happen.
  2. Inside the $HOME/src/OpenBLAS directory there is a 'test' directory. It contains several "blat" files that can and should be used to test the installation. In a successful compile and install, these should all run and pass. Otherwise, something isn't correct and things wont work down the line.

Download and Compile ScaLAPACK

At the time of writing, the latest version of ScaLAPACK is 2.0.2<ref name="SCALAPACK-SOURCE">ScaLAPACK - Source Code</ref>.

Download and unpack the source code

cd $HOME/src
wget http://www.netlib.org/scalapack/scalapack-2.0.2.tgz
tar xf scalapack-2.0.2.tgz
mv scalapack-2.0.2 ScaLAPACK
cd ScaLAPACK

Since ScaLAPACK leverages OpenMPI, load the module openmpi/openmpi-gnu.

 module load openmpi/openmpi-gnu 

Copy the example SLmake.inc.example to SLmake.inc and edit SLmake.inc making the the following changes.

cp SLmake.inc.example SLmake.inc 
nano SLmake.inc
CDEFS         = -DAdd_
FC            = mpif90
CC            = mpicc
NOOPT         = -O0
FCFLAGS       = -O3
CCFLAGS       = -O3
FCLOADER      = $(FC)
CCLOADER      = $(CC)
FCLOADFLAGS   = $(FCFLAGS)
CCLOADFLAGS   = $(CCFLAGS)
ARCH          = ar
ARCHFLAGS     = cr
RANLIB        = ranlib
SCALAPACKLIB  = libscalapack.a
BLASLIB       = $HOME/OpenBLAS/lib/libopenblas.a
LAPACKLIB     = $HOME/OpenBLAS/lib/libopenblas.a
LIBS          = $(LAPACKLIB) $(BLASLIB)

Compile ScaLAPACK; this will also take a while. Grab some Lunch!

 make all 

While there is no "make install" option in the ScaLAPACK Makefile, we'll create a directory and lib structure in our home directory to keep things consistent, and copy the generated library file there.

mkdir -p $HOME/ScaLAPACK/lib
cp libscalapack.a $HOME/ScaLAPACK/lib

Notes on Compiling ScaLAPACK

  1. Before the recent few versions of ScaLAPACK, BLACS, a communication layer between the BLAS parts and MPI parts of ScaLAPACK was needed. It had to be compiled by itself, and included in the SLmake.inc file, just like the OpenBLAS libraries. This is no longer necessary as BLACS has been "absorbed" into ScaLAPACK and is compiled during the 'make all' step. However, in the BLACS directory there is still the source folder, and a testing folder. After ScaLAPACK has been compiled, try running BLACS/TESTING/xCbtest and xFbtest with mpirun to make sure that part went as expected.
  2. Inside the ScaLAPACK directory there is also a TESTING directory. Try these as well.
  3. If something shouldn't go right and you need to start over, for example, if you forgot to load the mpi module, run 'make clean' in the BLACS/SRC as well as the ScaLAPACK/SRC directories. Running 'make clean' from the top level dir didn't seem to clear out some 'bad stuff' that was left over.

References

1 | references-column-count references-column-count-{{{1}}} }} }} }}" {{#if: | style="-moz-column-width:{{{colwidth}}}; column-width:{{{colwidth}}};" | {{#if: | style="-moz-column-count:{{{1}}}; column-count:{{{1}}};" }} }}> <references group=""></references>