Best Practices for Building Apps on the Cluster
If you need to maintain your own software on the cluster, follow these best practices to create a reliable build and make sure you can repeat your work in the future.
Good software is reliable because you control the conditions under which it is built and used. It's all about the discipline of your practices as a developer, builder, or user.
Good research follows good practices.
Background and Example
This guide builds the software Breakdancer as an example because it is typical of evolving leading edge research software, it touches on a several build issues, and because it bubbled to the top.
The upstream build instructions for Breakdancer are on GitHub. You should always adjust upstream instructions to meet the needs of your local environment, in this case the environment of the cluster.
Generally, the upstream build instructions will be mostly correct. You are maintaining software packages in a shared environment without root privileges, however, so you must make standard minor modifications so the build works as a non-root user:
- you must specify the install location somewhere you have write permission
- you won't need to do `sudo` anything
Control your build environment
NOTE: This is a big part of reproducable results in research so it's important! Don't be a lazy builder.
Start with a pristine environment. You don't want careless inclusion of binary or library paths to alter your build environment. You want your build to be predicatable from one build to the next.
The most important variables to control for builds are PATH and LD_LIBRARY_PATH. Don't ever use the default PATH for a build. Define your PATH explictly for predictable results.
You can do this easily and not "mess up" your current working environment simply by starting a new subshell, just enter `bash` to get started and then `exit` when you are done. (If you're build is very involved and long, you might want to do this via an interactive job so it doesn't run on the head node.)
Choose a space to work in either a via subshell or interactive job
bash # or if your build is a heavy-weight process qlogin -l h_rt=01:00:00,vf=2G
Clean up your build environment:
PATH=/bin:/usr/bin unset LD_LIBRARY_PATH
Then add only the capabilities you need for the build into your environment. In this case:
module load cmake/18.104.22.168
You may also need to add environment customizations specific to the current build. For example the build may have dependencies on software that you built previously.
In this example, this is the samtools part of the instructions. You download, and build samtools and then set an environment variable so the later build steps can find the install location of samtools. That's the step:
Adjust build instructions for non-root users
This mostly means adjusting the install path specified during the configuration step. In this example you make this decision with the -DCMAKE_INSTALL_PREFIX cmake argument.
cmake .. -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=$HOME/breakdancer-1.4.3
For example you might be using a GNU build process so adjust the configure command like this
Adjust the install instructions for non-root users
Most build steps published on project web sites assume you are the administrator of a computer environement and have full control over it. This is rarely a requirement and is a very poor assumption for shared computing environments.
If you configured your build correctly to install in directories you own you won't have any problem running the install step as yourself.
If you get permission denied errors at this step, you didn't build your software correctly for a shared computing environment. Go back to the configure step and start over.
When you are done with the install and if you started a subshell or
Make your tool available to others on the cluster
This is where scaling a process and being a good cluster citizen comes into play.
If you are maintaining software on the cluster, consider the potential benefit to others who now may not need to build the software themselves if they can leverage your install.
At a minimum, document your effort in the docs wiki
- Create a new App page on the wiki: http://docs.uabgrid.uab.edu/wiki/MyNewApp
- Click "edit this page" on the page you get back.
- Transclude the AppStub template into your new page
- As your first edit in your new page enter the following text string to add the AppStub content to your page so you can edit it. The "MyAppName" value should be the name of the new App you are documenting.
This is the AppStub template. Replace it's content with your own content for your specific App. See App Build Best Practices for more information.
First write a short blurb about what this App does and who might be interested in using it. You should provide a link to the upstream project. A Wikipedia page name reference can be helpful here, if it it exists.
How to use MyAppName
How to use the software. Describe how to use it on the cluster by setting your environment.
Better yet, write a modules file and tell folks what module they need to load
module load MyAppName
How to build MyAppName
Describe the steps followed to build this software.
NOTE: This is critical to reproducible research and should be considered a core part of your research notebook.
Make sure you are using AppBuildBestPractices
If you are providing support for this software on the cluster let folks know how to reach you. At least let the know where they can find help online in the upstream community.
At a minimum record the link to the upstream project from where you got this software.
See Editing_Docs for information on how to edit pages on this site.
- Save this first version of the page with the comment "Derived AppStub template"
- Edit the page again, now you will see all the content from the Tempate:AppStub and can adjust for your specific app
Go further, create a module wrapper
For well maintained Apps it's best to encode the runtime requirments using the modules system. Write a module file. This let's folks (and you) simply load the module and have the environment set up.
module load breakdancer-1.4.3
More importantly, it let's folks unload the environment when they are done and avoids creating cluttered unpredictable environments (a.k.a. sloppy research environments).