AppBuildBestPractices

Best Practices for Building Apps on the Cluster

If you need to maintain your own software on the cluster, follow these best practices to create a reliable build and make sure you can repeat your work in the future.

Good software is reliable because you control the conditions under which it is built and used. It's all about the discipline of your practices as a developer, builder, or user.

Good research follows good practices.

Background and Example
This guide builds the software Breakdancer as an example because it is typical of evolving leading edge research software, it touches on a several build issues, and because it bubbled to the top.

The upstream build instructions for Breakdancer are on GitHub. You should always adjust upstream instructions to meet the needs of your local environment, in this case the environment of the cluster.

Generally, the upstream build instructions will be mostly correct. You are, however, maintaining software packages in a shared environment without root privileges. This means you must make standard minor modifications to build successfully as a non-root user:


 * you must specify the install location somewhere you have write permission
 * you won't need to do `sudo` anything

Control your build environment
''NOTE: This is a big part of reproducable results in research so it's important! Don't be a lazy builder.''

Start with a pristine environment. You don't want careless inclusion of binary or library paths to alter your build environment. You want your build to be predicatable from one build to the next.

The most important variables to control for builds are PATH and LD_LIBRARY_PATH. Don't ever use the default PATH for a build. Define your PATH explictly for predictable results.

You can do this easily and not "mess up" your current working environment simply by starting a new subshell, just enter `bash` to get started and then `exit` when you are done. (If you're build is very involved and long, you might want to do this via an interactive job so it doesn't run on the head node.)

Choose a space to work in either a via subshell or interactive job bash qlogin -l h_rt=01:00:00,vf=2G
 * 1) or if your build is a heavy-weight process

Clean up your build environment:

PATH=/bin:/usr/bin unset LD_LIBRARY_PATH

After this basis build environment is in place then you should add only the capabilities needed to complete the build into your environment.

A build can be thought of just like any other job. A task that helps you get your research done. You always want a tidy task environment to avoid confusion in the future.

In this example, a non-default build framework is needed. The cmake tool is also the only additional requirement for this build. Simply load it's configuration using:

module load cmake/2.8.10.2

You may also need to add other environment customizations specific to a build. A common example, the build may have dependencies on software that you built previously.

In this example, this is the samtools part of the instructions. You download, and build samtools and then set an environment variable so the later build steps can find the install location of samtools. That's the step:

export SAMTOOLS_ROOT=$(pwd)

Adjust build instructions for non-root users
This mostly means adjusting the install path specified during the configuration step. In this example you make this decision with the -DCMAKE_INSTALL_PREFIX cmake argument.

cmake .. -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=$HOME/breakdancer-1.4.3

For example you might be using a GNU build process so adjust the configure command like this

./configure --prefix=WRITABLE_INSTALL_PATH

Adjust the install instructions for non-root users
Most build steps published on project web sites assume you are the administrator of a computer environement and have full control over it. This is rarely a requirement and is a very poor assumption for shared computing environments.

If you configured your build correctly to install in directories you own you won't have any problem running the install step as yourself.

make install

If you get permission denied errors at this step, you didn't build your software correctly for a shared computing environment. Go back to the configure step and start over.

When you are done with the install and if you started a subshell or

Make your tool available to others on the cluster
This is where scaling a process and being a good cluster citizen comes into play.

If you are maintaining software on the cluster, consider the potential benefit to others who now may not need to build the software themselves if they can leverage your install.

At a minimum, document your effort in the docs wiki
MyAppName
 * Create a new App page on the wiki: http://docs.uabgrid.uab.edu/wiki/MyNewApp
 * Click "edit this page" on the page you get back.
 * Transclude the AppStub template into your new page
 * As your first edit in your new page enter the following text string to add the AppStub content to your page so you can edit it. The "MyAppName" value should be the name of the new App you are documenting.

This is the AppStub template. Replace it's content with your own content for your specific App. See App Build Best Practices for more information.

First write a short blurb about what this App does and who might be interested in using it. You should provide a link to the upstream project. A Wikipedia page name reference can be helpful here, if it it exists.

How to use MyAppName
How to use the software. Describe how to use it on the cluster by setting your environment.

Better yet, write a modules file and tell folks what module they need to load

module load MyAppName

How to build MyAppName
Describe the steps followed to build this software.

NOTE: This is critical to reproducible research and should be considered a core part of your research notebook.

Make sure you are using AppBuildBestPractices

Getting Help
If you are providing support for this software on the cluster let folks know how to reach you. At least let the know where they can find help online in the upstream community.

Go further, create a module wrapper
For well maintained Apps it's best to encode the runtime requirments using the modules system. Write a module file. This let's folks (and you) simply load the module and have the environment set up.

module load breakdancer-1.4.3

More importantly, it let's folks unload the environment when they are done and avoids creating cluttered unpredictable environments (a.k.a. sloppy research environments).