NWChem: Difference between revisions
Line 35: | Line 35: | ||
== Download and Compile OpenBLAS== | == Download and Compile OpenBLAS== | ||
At the time of writing, the latest available version of OpenBLAS is 2.14<ref name="OBLAS-SRC">[http://github.com/xianyi/OpenBLAS/archive/v0.2.14.tar.gz OpenBLAS - Source Code]</ref> | At the time of writing, the latest available version of OpenBLAS is 2.14<ref name="OBLAS-SRC">[http://github.com/xianyi/OpenBLAS/archive/v0.2.14.tar.gz OpenBLAS - Source Code]</ref> | ||
# Create the source directory | |||
<pre> | |||
mkdir -p $HOME/src/OpenBLAS | |||
</pre> | |||
# Download the source code | |||
<pre> | |||
cd $HOME/src/OpenBLAS | |||
wget http://github.com/xianyi/OpenBLAS/archive/v0.2.14.tar.gz | |||
</pre> | |||
# Unpack the source code | |||
<pre> | |||
mv v0.2.14 v0.2.14.tar.gz | |||
tar xf v0.2.14.tar.gz (this step may take a few moments) | |||
mv OpenBLAS-0.2.14 OpenBLAS | |||
</pre> | |||
# Edit the configuration file so that its contents match those below. Some special notes here - (1) The TARGET is set to PENRYN for use in the sipsey queue. As the cluster ages and new generations are added, this sill probably need to be updated. (2) GCC is used as opposed to Intel's compilers. (3) NWChem uses 64 bit integers by default, so INTERFACE64 is set to 1. Most other options are fairly self explanatory. | |||
<pre> | |||
cd OpenBLAS | |||
nano Makefile.rule | |||
</pre> | |||
<pre> | |||
# | |||
# Beginning of user configuration | |||
# | |||
# This library's version | |||
VERSION = 0.2.14 | |||
# If you set the suffix, the library name will be libopenblas_$(LIBNAMESUFFIX).a | |||
# and libopenblas_$(LIBNAMESUFFIX).so. Meanwhile, the soname in shared library | |||
# is libopenblas_$(LIBNAMESUFFIX).so.0. | |||
# LIBNAMESUFFIX = | |||
# You can specify the target architecture, otherwise it's | |||
# automatically detected. | |||
TARGET = PENRYN | |||
# If you want to support multiple architecture in one binary | |||
#DYNAMIC_ARCH = 1 | |||
# C compiler including binary type(32bit / 64bit). Default is gcc. | |||
# Don't use Intel Compiler or PGI, it won't generate right codes as I expect. | |||
CC = gcc | |||
# Fortran compiler. Default is g77. | |||
FC = gfortran | |||
# Even you can specify cross compiler. Meanwhile, please set HOSTCC. | |||
# cross compiler for Windows | |||
# CC = x86_64-w64-mingw32-gcc | |||
# FC = x86_64-w64-mingw32-gfortran | |||
# cross compiler for 32bit ARM | |||
# CC = arm-linux-gnueabihf-gcc | |||
# FC = arm-linux-gnueabihf-gfortran | |||
# cross compiler for 64bit ARM | |||
# CC = aarch64-linux-gnu-gcc | |||
# FC = aarch64-linux-gnu-gfortran | |||
# If you use the cross compiler, please set this host compiler. | |||
# HOSTCC = gcc | |||
# If you need 32bit binary, define BINARY=32, otherwise define BINARY=64 | |||
BINARY=64 | |||
# About threaded BLAS. It will be automatically detected if you don't | |||
# specify it. | |||
# For force setting for single threaded, specify USE_THREAD = 0 | |||
# For force setting for multi threaded, specify USE_THREAD = 1 | |||
USE_THREAD = 0 | |||
# If you're going to use this library with OpenMP, please comment it in. | |||
# USE_OPENMP = 1 | |||
# You can define maximum number of threads. Basically it should be | |||
# less than actual number of cores. If you don't specify one, it's | |||
# automatically detected by the the script. | |||
# NUM_THREADS = 999 | |||
# if you don't need to install the static library, please comment it in. | |||
# NO_STATIC = 1 | |||
# if you don't need generate the shared library, please comment it in. | |||
# NO_SHARED = 1 | |||
# If you don't need CBLAS interface, please comment it in. | |||
# NO_CBLAS = 1 | |||
# If you only want CBLAS interface without installing Fortran compiler, | |||
# please comment it in. | |||
# ONLY_CBLAS = 1 | |||
# If you don't need LAPACK, please comment it in. | |||
# If you set NO_LAPACK=1, the library automatically sets NO_LAPACKE=1. | |||
# NO_LAPACK = 1 | |||
# If you don't need LAPACKE (C Interface to LAPACK), please comment it in. | |||
# NO_LAPACKE = 1 | |||
# If you want to use legacy threaded Level 3 implementation. | |||
# USE_SIMPLE_THREADED_LEVEL3 = 1 | |||
# If you want to drive whole 64bit region by BLAS. Not all Fortran | |||
# compiler supports this. It's safe to keep comment it out if you | |||
# are not sure(equivalent to "-i8" option). | |||
INTERFACE64 = 1 | |||
# Unfortunately most of kernel won't give us high quality buffer. | |||
# BLAS tries to find the best region before entering main function, | |||
# but it will consume time. If you don't like it, you can disable one. | |||
NO_WARMUP = 1 | |||
# If you want to disable CPU/Memory affinity on Linux. | |||
NO_AFFINITY = 1 | |||
# if you are compiling for Linux and you have more than 16 numa nodes or more than 256 cpus | |||
BIGNUMA = 1 | |||
# Don't use AVX kernel on Sandy Bridge. It is compatible with old compilers | |||
# and OS. However, the performance is low. | |||
# NO_AVX = 1 | |||
# Don't use Haswell optimizations if binutils is too old (e.g. RHEL6) | |||
# NO_AVX2 = 1 | |||
# Don't use parallel make. | |||
# NO_PARALLEL_MAKE = 1 | |||
# If you would like to know minute performance report of GotoBLAS. | |||
# FUNCTION_PROFILE = 1 | |||
# Support for IEEE quad precision(it's *real* REAL*16)( under testing) | |||
# QUAD_PRECISION = 1 | |||
# Theads are still working for a while after finishing BLAS operation | |||
# to reduce thread activate/deactivate overhead. You can determine | |||
# time out to improve performance. This number should be from 4 to 30 | |||
# which corresponds to (1 << n) cycles. For example, if you set to 26, | |||
# thread will be running for (1 << 26) cycles(about 25ms on 3.0GHz | |||
# system). Also you can control this mumber by THREAD_TIMEOUT | |||
# CCOMMON_OPT += -DTHREAD_TIMEOUT=26 | |||
# Using special device driver for mapping physically contigous memory | |||
# to the user space. If bigphysarea is enabled, it will use it. | |||
# DEVICEDRIVER_ALLOCATION = 1 | |||
# If you need to synchronize FP CSR between threads (for x86/x86_64 only). | |||
# CONSISTENT_FPCSR = 1 | |||
# If any gemm arguement m, n or k is less or equal this threshold, gemm will be execute | |||
# with single thread. You can use this flag to avoid the overhead of multi-threading | |||
# in small matrix sizes. The default value is 4. | |||
# GEMM_MULTITHREAD_THRESHOLD = 4 | |||
# If you need santy check by comparing reference BLAS. It'll be very | |||
# slow (Not implemented yet). | |||
# SANITY_CHECK = 1 | |||
# Run testcases in utest/ . When you enable UTEST_CHECK, it would enable | |||
# SANITY_CHECK to compare the result with reference BLAS. | |||
# UTEST_CHECK = 1 | |||
# The installation directory. | |||
PREFIX = $HOME/OpenBLAS | |||
# Common Optimization Flag; | |||
# The default -O2 is enough. | |||
COMMON_OPT = -O2 | |||
# gfortran option for LAPACK | |||
# enable this flag only on 64bit Linux and if you need a thread safe lapack library | |||
# FCOMMON_OPT = -frecursive | |||
# Profiling flags | |||
COMMON_PROF = -pg | |||
# Build Debug version | |||
# DEBUG = 1 | |||
# Improve GEMV and GER for small matrices by stack allocation. | |||
# For details, https://github.com/xianyi/OpenBLAS/pull/482 | |||
# | |||
# MAX_STACK_ALLOC=2048 | |||
# Add a prefix or suffix to all exported symbol names in the shared library. | |||
# Avoid conflicts with other BLAS libraries, especially when using | |||
# 64 bit integer interfaces in OpenBLAS. | |||
# For details, https://github.com/xianyi/OpenBLAS/pull/459 | |||
# | |||
# SYMBOLPREFIX= | |||
# SYMBOLSUFFIX= | |||
# | |||
# End of user configuration | |||
# | |||
</pre> | |||
=References= | =References= | ||
{{Reflist}} | {{Reflist}} |
Revision as of 19:33, 28 September 2015
NWChem: Open Source High-Performance Computational Chemistry
NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters.
NWChem software can handle
- Biomolecules, nanostructures, and solid-state
- From quantum to classical, and all combinations
- Ground and excited-states
- Gaussian basis functions or plane-waves
- Scaling from one to thousands of processors
- Properties and relativistic effects
NWChem is actively developed by a consortium of developers and maintained by the EMSL located at the Pacific Northwest National Laboratory (PNNL) in Washington State. Researchers interested in contributing to NWChem should review the Developers page. The code is distributed as open-source under the terms of the Educational Community License version 2.0 (ECL 2.0).
The NWChem development strategy is focused on providing new and essential scientific capabilities to its users in the areas of kinetics and dynamics of chemical transformations, chemistry at interfaces and in the condensed phase, and enabling innovative and integrated research at EMSL. At the same time continued development is needed to enable NWChem to effectively utilize architectures of tens of petaflops and beyond.
Science with NWChem
NWChem used by thousands of researchers worldwide to investigate questions about chemical processes by applying theoretical techniques to predict the structure, properties, and reactivity of chemical and biological species ranging in size from tens to millions of atoms. With NWChem, researchers can tackle molecular systems including biomolecules, nanostructures, actinide complexes, and materials. NWChem offers an extensive array of highly scalable, parallel computational chemistry methods needed to address scientific questions that are relevant to reactive chemical processes occurring in our everyday environment—photosynthesis, protein functions, and combustion, to name a few. They include a multitude of highly correlated methods, density functional theory (DFT) with an extensive set of exchange-correlation functionals, time-dependent density functional theory (TDDFT), plane-wave DFT with exact exchange and Car-Parrinello, molecular dynamics with AMBER and CHARMM force fields, and combinations of them.
A list of research publications that utilized NWChem can be found here.
Software Similar to NWChem
Quantum Espresso [1]
CPMD [2]
Gaussian [3]
CP2k [4]
Compiling NWChem for the Cheaha cluster
The steps outlined here are adapted from the general guide for a site installation consisting of commodity hardware over MPI <ref name="COMPILE">Compiling NWChem from source</ref>. There are many compilation options for differing architectures, network protocols, and optimized mathematics libraries. This guide will show the steps necessary to compile NWChem 6.5 for OpenMPI <ref name="OMPI">OpenMPI - Message Passing</ref>, using OpenBLAS<ref name="OBLAS">OpenBLAS - Optimized BLAS Package</ref> tuned for Intel's Nehalem microarchitecture and ScaLAPACK<ref name="SCALAPACK">ScaLAPACK - Scalable Linear Algebra Package</ref> for optimized linear algebra calculations. This guide assume some familiarity with linux commands and utilities. This guide also assumes that each software will be downloaded, compiled, and installed in the user's home directory.
Download and Compile OpenBLAS
At the time of writing, the latest available version of OpenBLAS is 2.14<ref name="OBLAS-SRC">OpenBLAS - Source Code</ref>
- Create the source directory
mkdir -p $HOME/src/OpenBLAS
- Download the source code
cd $HOME/src/OpenBLAS wget http://github.com/xianyi/OpenBLAS/archive/v0.2.14.tar.gz
- Unpack the source code
mv v0.2.14 v0.2.14.tar.gz tar xf v0.2.14.tar.gz (this step may take a few moments) mv OpenBLAS-0.2.14 OpenBLAS
- Edit the configuration file so that its contents match those below. Some special notes here - (1) The TARGET is set to PENRYN for use in the sipsey queue. As the cluster ages and new generations are added, this sill probably need to be updated. (2) GCC is used as opposed to Intel's compilers. (3) NWChem uses 64 bit integers by default, so INTERFACE64 is set to 1. Most other options are fairly self explanatory.
cd OpenBLAS nano Makefile.rule
# # Beginning of user configuration # # This library's version VERSION = 0.2.14 # If you set the suffix, the library name will be libopenblas_$(LIBNAMESUFFIX).a # and libopenblas_$(LIBNAMESUFFIX).so. Meanwhile, the soname in shared library # is libopenblas_$(LIBNAMESUFFIX).so.0. # LIBNAMESUFFIX = # You can specify the target architecture, otherwise it's # automatically detected. TARGET = PENRYN # If you want to support multiple architecture in one binary #DYNAMIC_ARCH = 1 # C compiler including binary type(32bit / 64bit). Default is gcc. # Don't use Intel Compiler or PGI, it won't generate right codes as I expect. CC = gcc # Fortran compiler. Default is g77. FC = gfortran # Even you can specify cross compiler. Meanwhile, please set HOSTCC. # cross compiler for Windows # CC = x86_64-w64-mingw32-gcc # FC = x86_64-w64-mingw32-gfortran # cross compiler for 32bit ARM # CC = arm-linux-gnueabihf-gcc # FC = arm-linux-gnueabihf-gfortran # cross compiler for 64bit ARM # CC = aarch64-linux-gnu-gcc # FC = aarch64-linux-gnu-gfortran # If you use the cross compiler, please set this host compiler. # HOSTCC = gcc # If you need 32bit binary, define BINARY=32, otherwise define BINARY=64 BINARY=64 # About threaded BLAS. It will be automatically detected if you don't # specify it. # For force setting for single threaded, specify USE_THREAD = 0 # For force setting for multi threaded, specify USE_THREAD = 1 USE_THREAD = 0 # If you're going to use this library with OpenMP, please comment it in. # USE_OPENMP = 1 # You can define maximum number of threads. Basically it should be # less than actual number of cores. If you don't specify one, it's # automatically detected by the the script. # NUM_THREADS = 999 # if you don't need to install the static library, please comment it in. # NO_STATIC = 1 # if you don't need generate the shared library, please comment it in. # NO_SHARED = 1 # If you don't need CBLAS interface, please comment it in. # NO_CBLAS = 1 # If you only want CBLAS interface without installing Fortran compiler, # please comment it in. # ONLY_CBLAS = 1 # If you don't need LAPACK, please comment it in. # If you set NO_LAPACK=1, the library automatically sets NO_LAPACKE=1. # NO_LAPACK = 1 # If you don't need LAPACKE (C Interface to LAPACK), please comment it in. # NO_LAPACKE = 1 # If you want to use legacy threaded Level 3 implementation. # USE_SIMPLE_THREADED_LEVEL3 = 1 # If you want to drive whole 64bit region by BLAS. Not all Fortran # compiler supports this. It's safe to keep comment it out if you # are not sure(equivalent to "-i8" option). INTERFACE64 = 1 # Unfortunately most of kernel won't give us high quality buffer. # BLAS tries to find the best region before entering main function, # but it will consume time. If you don't like it, you can disable one. NO_WARMUP = 1 # If you want to disable CPU/Memory affinity on Linux. NO_AFFINITY = 1 # if you are compiling for Linux and you have more than 16 numa nodes or more than 256 cpus BIGNUMA = 1 # Don't use AVX kernel on Sandy Bridge. It is compatible with old compilers # and OS. However, the performance is low. # NO_AVX = 1 # Don't use Haswell optimizations if binutils is too old (e.g. RHEL6) # NO_AVX2 = 1 # Don't use parallel make. # NO_PARALLEL_MAKE = 1 # If you would like to know minute performance report of GotoBLAS. # FUNCTION_PROFILE = 1 # Support for IEEE quad precision(it's *real* REAL*16)( under testing) # QUAD_PRECISION = 1 # Theads are still working for a while after finishing BLAS operation # to reduce thread activate/deactivate overhead. You can determine # time out to improve performance. This number should be from 4 to 30 # which corresponds to (1 << n) cycles. For example, if you set to 26, # thread will be running for (1 << 26) cycles(about 25ms on 3.0GHz # system). Also you can control this mumber by THREAD_TIMEOUT # CCOMMON_OPT += -DTHREAD_TIMEOUT=26 # Using special device driver for mapping physically contigous memory # to the user space. If bigphysarea is enabled, it will use it. # DEVICEDRIVER_ALLOCATION = 1 # If you need to synchronize FP CSR between threads (for x86/x86_64 only). # CONSISTENT_FPCSR = 1 # If any gemm arguement m, n or k is less or equal this threshold, gemm will be execute # with single thread. You can use this flag to avoid the overhead of multi-threading # in small matrix sizes. The default value is 4. # GEMM_MULTITHREAD_THRESHOLD = 4 # If you need santy check by comparing reference BLAS. It'll be very # slow (Not implemented yet). # SANITY_CHECK = 1 # Run testcases in utest/ . When you enable UTEST_CHECK, it would enable # SANITY_CHECK to compare the result with reference BLAS. # UTEST_CHECK = 1 # The installation directory. PREFIX = $HOME/OpenBLAS # Common Optimization Flag; # The default -O2 is enough. COMMON_OPT = -O2 # gfortran option for LAPACK # enable this flag only on 64bit Linux and if you need a thread safe lapack library # FCOMMON_OPT = -frecursive # Profiling flags COMMON_PROF = -pg # Build Debug version # DEBUG = 1 # Improve GEMV and GER for small matrices by stack allocation. # For details, https://github.com/xianyi/OpenBLAS/pull/482 # # MAX_STACK_ALLOC=2048 # Add a prefix or suffix to all exported symbol names in the shared library. # Avoid conflicts with other BLAS libraries, especially when using # 64 bit integer interfaces in OpenBLAS. # For details, https://github.com/xianyi/OpenBLAS/pull/459 # # SYMBOLPREFIX= # SYMBOLSUFFIX= # # End of user configuration #