Talk:Cheaha Software

From Cheaha
Jump to navigation Jump to search

New content to this page pasted by Tanthony@uab.edu 14:23, 2 April 2012 (CDT)

Content from this page will be edited and the content from https://docs.uabgrid.uab.edu/wiki/Cheaha_software will be pasted and redirected to this page.


Tanthony@uab.edu 13:47, 20 March 2012 (CDT)


Old page

Installed software

We try to install local software in /opt, /opt/uabeng and /share/apps. However, please do not depend on a particular piece of software being in a specific directory, as we may need to move things around at some point.

In most cases, the descriptions for each software package was copied from the authors web site and represents their own work.

If you don't find a particular package listed on this page, please send a request to cheaha support to request the software.

If a module file is available for the software, it is recommended to use the module file in your job script and/or shell profile.

Software (Link to home page) Version Software Installation-Directory Information
Amber 10 /opt/uabeng/amber10/intel "Amber" refers to two things: a set of molecular mechanical force fields for the simulation of biomolecules (which are in the public domain, and are used in a variety of simulation programs); and a package of molecular simulation programs which includes source code and demos.

Amber is compiled using Intel compilers and uses OpenMPI for the parallel binaries.

The following Modules files should be loaded for this package (the amber module will automatically load the openmpi module):

For Intel:

module load amber/amber-10-intel

Use the openmpi parallel environment in your job script (example for a 4 slot job)

#$ -pe openmpi 4
APBS 1.0.0 /share/apps/apbs/apbs-1.0.0-amd64 APBS - Adaptive Poisson-Boltzmann Solver APBS is a software package for the numerical solution of the Poisson-Boltzmann equation (PBE), one of the most popular continuum models for describing electrostatic interactions between molecular solutes in salty, aqueous media.

Submit APBS jobs via the Grid Engine and do not run them on the head node!

module load apbs/apbs-1.0 
Atlas 3.8.3 /usr/lib64/atlas The ATLAS (Automatically Tuned Linear Algebra Software) project is an

ongoing research effort focusing on applying empirical techniques in order to provide portable performance. At present, it provides C and Fortran77 interfaces to a portably efficient BLAS implementation, as well as a few routines from LAPACK.

module load atlas/atlas 
Biopython 1.51 Biopython is a set of freely available tools for biological computation written in Python by an international team of developers.

The Biopython packages along with its dependencies (Numpy, python-reportlab, Flex, etc...) are all installed in the default location for Python site-packages, so you should not need to modify any environment variables to use this package.

Birdsuite 1.5.3 /share/apps/birdsuite/1.5.3 The Birdsuite is a fully open-source set of tools to detect and report SNP genotypes, common Copy-Number Polymorphisms (CNPs), and novel, rare, or de novo CNVs in samples processed with the Affymetrix platform. While most of the components of the suite can be run individually (for instance, to only do SNP genotyping), the Birdsuite is especially intended for integrated analysis of SNPs and CNVs. Support for chips and platforms other than the Affymetrix SNP 6.0 is currently limited, but we are currently working on creating the supporting files for other common genotyping platforms.

An example job submission script can be found here (copy this to your job directory and make sure to edit the email address!)

/share/apps/example-scripts/birdsuite-job.qsub

The following Modules files should be loaded for this package:

module load birdsuite/birdsuite-1.5
boost 1.33.1 /usr/lib

/usr/lib64

Boost provides free peer-reviewed portable C++ source libraries.

The Boost team emphasize libraries that work well with the C++ Standard Library. Boost libraries are intended to be widely useful, and usable across a broad spectrum of applications. The Boost license encourages both commercial and non-commercial use.

Both 32bit and 64bit versions of Boost C++ libraries are provided under /usr/lib and /usr/lib64

Bowtie 0.10.1 /share/apps/bowtie/bowtie-0.10.1 Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end). It supports alignment policies equivalent to Maq and SOAP but is substantially faster.

A Bowtie tutorial is available here: http://bowtie-bio.sourceforge.net/tutorial.shtml

The following Modules files should be loaded for this package:

module load bowtie/bowtie-0.10
eigenstrat 3.0 /share/apps/eigenstrat EIGENSTRAT also provides a decent FAQ on their website, click here.

"The EIGENSOFT package combines functionality from our population genetics methods (Patterson et al. 2006) and our EIGENSTRAT stratification method (Price et al. 2006). The EIGENSTRAT method uses principal components analysis to explicitly model ancestry differences between cases and controls along continuous axes of variation; the resulting correction is specific to a candidate marker's variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. The EIGENSOFT package has a built-in plotting script and supports multiple file formats and quantitative phenotypes."

The following Modules file should be loaded for this package:

module load eigenstrat/eigenstrat
fastPHASE 1.4.0 /share/apps/fastPHASE/1.4 The program fastPHASE implements methods for estimating haplotypes and missing genotypes from population SNP genotype data.

The following Modules files should be loaded for this package:

module load fastphase/fastphase-1.4
FFTW 3.1.2 /opt/uabeng/fftw3/gnu

/opt/uabeng/fftw3/intel

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST).

The following Modules files should be loaded for this package:

For GNU:

module load fftw/fftw3-gnu

For Intel:

module load fftw/fftw3-intel
Gromacs 4.0.5 /opt/uabeng/gromacs/gnu/4

/opt/uabeng/gromacs/intel/4

GROMACS is a versatile package to perform molecular dynamics and is primarily designed for biochemical molecules like proteins and lipids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

Gromacs is compiled using Intel and GNU compilers using FFTW3, BLAS, and LAPACK and OpenMPI for the parallel binaries. Single and double precision binaries are included (double precision binaries have a _d suffix).

The following Modules files should be loaded for this package (module will automatically load any prerequisite modules):

For GNU:

module load gromacs/gromacs-4-gnu

For Intel:

module load gromacs/gromacs-4-intel

Use the openmpi parallel environment in your job script (example for a 4 slot job)

#$ -pe openmpi 4
GSL 1.10 /usr/lib

/usr/lib64

/usr/include/gsl

The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers. It is free software under the GNU General Public License.

The library provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. There are over 1000 functions in total with an extensive test suite.

HAPGEN 1.3.0 /share/apps/hapgen/1.3.0 HAPGEN is a program thats simulates case control datasets at SNP markers and can output data in the FILE FORMAT used by IMPUTE, SNPTEST and GTOOL. The approach can handle markers in LD and can simulate datasets over large regions such as whole chromosomes. Hapgen simulates haplotypes by conditioning on a set of population haplotypes and an estimate of the fine-scale recombination rate across the region.

Command line syntax example for HAPGEN can be found by clicking here.

The HAPGEN environment module can be loaded as follows

module load hapgen/hapgen
IMPUTE v2 2.0.3 /share/apps/impute/2.0.3 IMPUTE v2 is a new genotype imputation algorithm based on ideas described in Howie et al. (2009).

For the specific version

module load impute/impute-2.0.3

Or to use the latest

module load impute/impute

Examples for IMPUTE v2 are provided in the $IMPUTEHOME/Example directory.

JAGS 1.0.3 /share/apps/jags/jags-1.0.3/gnu JAGS (Just Another Gibbs Sampler) is a Bayesian hierarchical model analysis program using Markov Chain Monte Carlo (MCMC) simulation. It is similar to BUGS but will compile on Linux systems.

Click here for a good description of JAGS and how it differs from BUGS.

The JAGS environment module can be loaded as follows

module load jags/jags-1.0-gnu


Java JDK 1.5.0_10 /usr/java/jdk1.5.0_10 JDK (Java Developers Kit) and Runtime from Sun
JRE 1.6.0_04 /usr/java/jre1.6.0_04 Java Runtime
Intel 10.1.015 /opt/intel/cce

/opt/intel/fce

/opt/intel/mkl

Intel C, C++ and Fortran compilers along with the Intel Math Kernel Libraries

The following Modules file should be loaded for this package:

module load intel/intel-compilers-10.1
LAM-MPI 7.1.4 /opt/uabeng/lam/gnu

/opt/uabeng/lam/intel

LAM/MPI is now in a maintenance mode. Bug fixes and critical patches are still being applied, but little real "new" work is happening in LAM/MPI. This is a direct result of the LAM/MPI Team spending the vast majority of their time working on our next-generation MPI implementation -- Open MPI.

Although LAM is not going to go away any time soon (we certainly would not abondon our user base!) -- the web pages, user lists, and all the other resources will continue to be available indefinitely -- we would encourage all users to try migrating to Open MPI. Since it's an MPI implementation, you should be able to simply recompile and re-link your applications to Open MPI -- they should "just work." Open MPI contains many features and performance enhancements that are not available in LAM/MPI.


The following Modules files should be loaded for this package (for LAM, you must load this module in your profile script and your job script):

For GNU:

module load lammpi/lam-7.1-gnu

For Intel:

module load lammpi/lam-7.1-intel

In order to use LAM-MPI you must load the module in your ~/.bashrc script along with your job submit script. Add the following to your ~/.bashrc (replace -intel with -gnu if using GNU):

For Bash Users edit ~/.bashrc:

module load lammpi/lam-7.1-intel

For Csh Users edit ~/.cshrc:

module load lammpi/lam-7.1-intel

Use the lam_loose_rsh parallel environment in your job script (example for a 4 slot job)

#$ -pe lam_loose_rsh 4
MACS 1.3.6 /share/apps/macs/1.3.6 Next generation parallel sequencing technologies made chromatin immunoprecipitation followed by sequencing (ChIP-Seq) a popular strategy to study genome-wide protein-DNA interactions, while creating challenges for analysis algorithms. We present Model-based Analysis of ChIP-Seq (MACS) on short reads sequencers such as Genome Analyzer (Illumina / Solexa). MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, is publicly available open source, and can be used for ChIP-Seq with or without control samples.

To load MACS into your environment, use the following module command:

module load macs/macs
Maq 0.7.1 /share/apps/maq/0.7.1 Maq is a software that builds mapping assemblies from short reads generated by the next-generation sequencing machines. It is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data.

See the Maq documentation page for usage: http://maq.sourceforge.net/maq-man.shtml

The following Modules files should be loaded for this package:

module load maq/maq-0.7
MPICH 1.2.7p1 /opt/mpich/gnu

/opt/mpich/intel

GNU and Intel compiled versions of MPICH are installed under this directory

The following Modules file should be loaded to use mpich

* GNU version of mpich
module load mpich/mpich-1.2-gnu
* Intel version of mpich
module load mpich/mpich-1.2-intel

Use the mpich parallel environment in your job script (example for a 4 slot job)

#$ -pe mpich 4
NAMD 2.6 /share/apps/namd/2.6 NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of processors on high-end parallel platforms and tens of processors on commodity clusters using gigabit ethernet.

The following Modules files should be loaded for this package:

module load namd/namd-2.6
OpenMPI 1.3.3 /opt/uabeng/openmpi/gnu

/opt/uabeng/openmpi/intel

The Open MPI Project is an open source MPI-2 implementation that is developed and maintained by a consortium of academic, research, and industry partners.

The following Modules files should be loaded for this package:

For GNU:

module load openmpi/openmpi-gnu

For Intel:

module load openmpi/openmpi-intel

Use the openmpi parallel environment in your job script (example for a 4 slot job)

#$ -pe openmpi 4

To enable verbose Grid Engine logging for OpenMPI, add the following the mpirun command in the job script --mca pls_gridengine_verbose 1, for example:

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#
#$ -N j_openmpi_hello
#$ -pe openmpi 4
#$ -l h_rt=00:20:00,s_rt=0:18:00
#$ -j y
#
#$ -M USERID@uab.edu
#$ -m eas
#
# Load the appropriate module files
. /etc/profile.d/modules.sh
module load openmpi/openmpi-gnu

#$ -V

mpirun --mca pls_gridengine_verbose 1 -np $NSLOTS hello_world_gnu_openmpi

PHASE 2.1.1 /share/apps/PHASE/2.1.1 PHASE is software for haplotype reconstruction, and recombination rate estimation from population data. The software implements methods for estimating haplotypes from population genotype data described in:
  • Stephens, M., and Donnelly, P. (2003). A comparison of Bayesian methods for haplotype reconstruction from population genotype data. American Journal of Human Genetics, 73:1162-1169.
  • Stephens, M., Smith, N., and Donnelly, P. (2001). A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics, 68, 978--989.
  • Stephens, M., and Scheet, P. (2005). Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-Data Imputation. American Journal of Human Genetics, 76:449-462.


The software also incorporates methods for estimating recombination rates, and identifying recombination hotspots:

  • Crawford et al (2004). Evidence for substantial fine-scale variation in recombination rates across the human genome. Nature Genetics


Documentation on the usage of PHASE can be downloaded here.

The following Modules files should be loaded for this package:

module load phase/phase
PLINK 1.06 /share/apps/plink/1.06 PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analysis in a computationally efficient manner.

The PLINK web site also has a tutorial section that users should read through.

Please see this page for PLINK citing instructions.

To load PLINK into your environment, use the following module command:

module load plink/plink

The following commands are available

  • plink - The plink executable is the primary binary for this software. Click here for the command line reference.
  • gplink - This is a java based GUI for PLINK that provides the following functionality:
    • is a GUI that allows construction of many common PLINK operations
    • provides a simple project management tool and analysis log
    • allows for data and computation to be on a separate server (via SSH)
    • facilitates integration with Haploview

Running gplink: You should NOT run gplink from the cheaha login node (head node), only from the compute nodes using the qrsh command. The qrsh command will provide a shell on a compute node complete with X forwarding. For example:

[jsmith@cheaha ~]$ qrsh

Rocks Compute Node
Rocks 5.1 (V.I)
Profile built 13:06 21-Nov-2008

Kickstarted 13:13 21-Nov-2008

[jsmith@compute-0-10 ~]$ module load plink/plink

[jsmith@compute-0-10 ~]$ gplink

You should see the gPLINK window open. If you get an error similar to "No X11 DISPLAY variable was set", make sure your initial connection to Cheaha had X forwarding enabled.

If you want to use the PLINK R plugin functionality, please see this page http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml for instructions. You'll need to install the Rserve package to use the plugin, for example:

install.packages("Rserve")
pvm 3.4.5 /usr/bin/pvm PVM3 (Parallel Virtual Machine) is a library and daemon that allows

distributed processing environments to be constructed on heterogeneous machines and architectures.

R 2.7.2

2.8.1 2.9.0 2.9.2 2.11.1

/share/apps/R/2.7.2/gnu

/share/apps/R/2.8.1/gnu /share/apps/R/2.9.0/gnu /share/apps/R/2.9.2/gnu /share/apps/R/2.11.1/gnu

R is a free software environment for statistical computing and graphics. Please refer to the following page for additional instructions for running R on Cheaha Running R Jobs on a Rocks Cluster.

The following Modules files should be loaded for this package:

module load R/R-2.7.2

For other versions, simply replace the version number

module load R/R-2.11.1

The following libraries are available, additional libraries should be installed by the user under ~/R_exlibs

  • /share/apps/R/R-X.X.X/gnu/lib/R/library
    • The default libraries that come with R
    • Rmpi
    • Snow
  • /share/apps/R/R-X.X.X/gnu/lib/R/bioc
    • BioConductor libraries (default package set using getBioC)

Sample R Grid Engine Job Script This is an example of a serial (i.e. non parallel) R job that has a 2 hour run time limit requesting 256M of RAM

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#
#$ -j y
#$ -N rtestjob
# Use '#$ -m n' instead to disable all email for this job
#$ -m eas
#$ -M YOUR_EMAIL_ADDRESS
#$ -l h_rt=2:00:00,s_rt=1:55:00
#$ -l vf=256M
. /etc/profile.d/modules.sh
module load R/R-2.7.2

#$ -v PATH,R_HOME,R_LIBS,LD_LIBRARY_PATH,CWD

R CMD BATCH rscript.R
s.a.g.e. 6.0.0 /share/apps/s.a.g.e./SAGE_6.0.0_Linux64 S.A.G.E. - Statistical Analysis for Genetic Epidemiology contains programs for use in the genetic analysis of family, pedigree and individual data.

Note: This software is NOT the same as the SAGE listed below!

Make sure that every publication which presents results from using
S.A.G.E. carries an appropriate acknowledgement such as: 

'(Some of) The results of this paper were obtained by using the program
package S.A.G.E., which is supported by a U.S. Public Health Service
Resource Grant (1 P41 RR03655) from the National Center for Research
Resources'  - (it is important that the grant numbers appear under
'acknowledgments'). 

Send bibliographic information about every paper in which S.A.G.E. is
used (author(s), title, journal, volume and page numbers; a reprint will
do provided it has the necessary information on it) to: 

R.C. Elston 
Department of Epidemiology and Biostatistics 
Case Western Reserve University 
Wolstein Research Building 
2103 Cornell Road 
Cleveland, Ohio  44106-7281 

The recommended way of referencing the S.A.G.E. programs is as follows: 

S.A.G.E. [2009]. Statistical Analysis for Genetic Epidemiology 6.0 
Computer program package available from the Department of Epidemiology and 
Biostatistics, Case Western Reserve University, Cleveland. 

Demo data files are available under /share/apps/s.a.g.e./SAGE_6.0.0_Linux64/demo/data_files

To load the S.A.G.E. environment, use

module load s.a.g.e./sage-6.0
SHRiMP 1.3.2 /share/apps/shrimp/SHRiMP_1_3_2 SHRiMP is a software package for aligning genomic reads against a target genome. It was primarily developed with the multitudinous short reads of next generation sequencing machines in mind, as well as Applied Biosystem's colourspace genomic representation.

The following Modules files should be loaded for this package:

module load shrimp/shrimp-1.3
SPRNG 2.0a /share/apps/sprng/2.0a Scalable Parallel Pseudo Random Number Generators Library

The following Modules files should be loaded for this package:

module load sprng/sprng-2
Subversion 1.4.2 /usr/bin/svn Subversion is a concurrent version control system which enables one

or more users to collaborate in developing and maintaining a hierarchy of files and directories while keeping a history of all changes. Subversion only stores the differences between versions, instead of every complete file. Subversion is intended to be a compelling replacement for CVS.

STRAT 1.1 /share/apps/STRAT/1.1 STRAT is a companion program to structure. This is a structured association method, for use in association mapping, enabling valid case-control studies even in the presence of population structure.
Structure 2.2.2 /share/apps/structure/2.2.2 Structure is software for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed. It can be applied to most of the commonly-used genetic markers, including SNPS, microsatellites, RFLPs and AFLPs.
TopHat 1.0.8 /share/apps/tophat/1.0.8 TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.

TopHat is a collaborative effort between the University of Maryland Center for Bioinformatics and Computational Biology and the University of California, Berkeley Departments of Mathematics and Molecular and Cell Biology.

A TopHat tutorial is available here: http://tophat.cbcb.umd.edu/tutorial.html

The following Modules files should be loaded for this package, the tophat module will also load the bowtie module:

module load tophat/tophat
VMD 1.8.6 /share/apps/vmd/vmd-1.8.6 VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting.

You'll need to use X forwarding to launch VMD (for example, on a Windows machine, X-Win32).

The following Modules files should be loaded for this package:

module load vmd/vmd-1.8.6

Tanthony@uab.edu 14:23, 2 April 2012 (CDT)