R: Difference between revisions

From Cheaha
Jump to navigation Jump to search
(Added per-project libraries implementation)
 
Line 1: Line 1:
{{Generic_stub}}
{{Generic_stub}}


=Introduction=


R is a free software environment for statistical computing and graphics.  
R is a free software environment for statistical computing and graphics. Versions available on Cheaha can be found and loaded using the following commands, where <version> must be replaced by one of the versions shown by the spider command.
 
<pre>
module spider R
module load R/<version>
</pre>
 
=Usage=
 
==Per-Project Package Libraries==
 
When working with multiple projects, or when using software like [[AFNI]] which make use of R internally, it may be helpful to use separate folders to store libraries for separate projects and software. Keeping library paths separate on a per-project or per-software basis will minimize the risk of library conflicts and hard-to-trace bugs.
 
Library paths may be managed within R using the [https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/libPaths libPaths] function. Simply pass a list of directories to the function to change the library paths available to R.
 
The following reference assumes that an R module is already loaded. To achieve separation of libraries on a per-project bases, navigate to one of the desired project directory. Create a folder called "rlibs", which will be used to store packages. Create an empty text file called ".Rprofile" in the root project directory, if it doesn't already exist. Add the following to the ".Rprofile". If a call to libPaths already exists, exercise judgement before modifying it.
 
<pre>
.libPaths(c('./rlibs'))
</pre>
 
When using...
 
* [[RStudio]]: you will need to set your root project folder as the working directory. Run the ".Rprofile" file you created earlier to set the "rlibs" folder as the default library path. All newly installed packages in this session will be installed in that folder.
 
* R REPL (read-eval-print loop): started by calling "R" at the command line. The ".Rprofile" file will be loaded and executed as R code by the REPL environment before giving control to you. This will set the "rlibs" folder as the default library path. All newly installed packages in this session will be installed in that folder.
 
* Rscript: ensure that you start any scripts from the directory containing the ".Rprofile". As with the R REPL environment, the file will be executed prior to running any other code, setting the "rlibs" folder as the default library path. All newly installed packages in this script will be installed in that folder.
 
Repeating this process for each project ensures that libraries are kept separate, allowing more flexibility, repeatability, and reducing the risk of errors or cross-contamination of versions and dependencies. The only downside is maintaining discipline in creating the folder and file each time a new project is started, and the additional maintenance relating to each separate library.
 
It is possible for collisions to still occur with packages installed in the default locations. If you wish to use the practice described above, you may need to remove packages installed in the default locations.
 
If you have a single workflow using multiple versions of R that are causing package collisions, please contact us for [[Support]]. We will work with you to find an optimal solution.
 
=SGE=


==SGE module files==
==SGE module files==

Latest revision as of 21:55, 24 June 2021

This page is a Generic stub.

You can help by expanding this page..

Introduction

R is a free software environment for statistical computing and graphics. Versions available on Cheaha can be found and loaded using the following commands, where <version> must be replaced by one of the versions shown by the spider command.

module spider R
module load R/<version>

Usage

Per-Project Package Libraries

When working with multiple projects, or when using software like AFNI which make use of R internally, it may be helpful to use separate folders to store libraries for separate projects and software. Keeping library paths separate on a per-project or per-software basis will minimize the risk of library conflicts and hard-to-trace bugs.

Library paths may be managed within R using the libPaths function. Simply pass a list of directories to the function to change the library paths available to R.

The following reference assumes that an R module is already loaded. To achieve separation of libraries on a per-project bases, navigate to one of the desired project directory. Create a folder called "rlibs", which will be used to store packages. Create an empty text file called ".Rprofile" in the root project directory, if it doesn't already exist. Add the following to the ".Rprofile". If a call to libPaths already exists, exercise judgement before modifying it.

.libPaths(c('./rlibs'))

When using...

  • RStudio: you will need to set your root project folder as the working directory. Run the ".Rprofile" file you created earlier to set the "rlibs" folder as the default library path. All newly installed packages in this session will be installed in that folder.
  • R REPL (read-eval-print loop): started by calling "R" at the command line. The ".Rprofile" file will be loaded and executed as R code by the REPL environment before giving control to you. This will set the "rlibs" folder as the default library path. All newly installed packages in this session will be installed in that folder.
  • Rscript: ensure that you start any scripts from the directory containing the ".Rprofile". As with the R REPL environment, the file will be executed prior to running any other code, setting the "rlibs" folder as the default library path. All newly installed packages in this script will be installed in that folder.

Repeating this process for each project ensures that libraries are kept separate, allowing more flexibility, repeatability, and reducing the risk of errors or cross-contamination of versions and dependencies. The only downside is maintaining discipline in creating the folder and file each time a new project is started, and the additional maintenance relating to each separate library.

It is possible for collisions to still occur with packages installed in the default locations. If you wish to use the practice described above, you may need to remove packages installed in the default locations.

If you have a single workflow using multiple versions of R that are causing package collisions, please contact us for Support. We will work with you to find an optimal solution.

SGE

SGE module files

The following Modules files should be loaded for this package:

module load R/R-2.7.2

For other versions, simply replace the version number

module load R/R-2.11.1

The following libraries are available

  • /share/apps/R/R-X.X.X/gnu/lib/R/library
    • The default libraries that come with R
    • Rmpi
    • Snow
  • /share/apps/R/R-X.X.X/gnu/lib/R/bioc
    • BioConductor libraries (default package set using getBioC)

Additional libraries should be installed by the user under ~/R_exlibs or follow these instructions:

    • Make a directory on your home page to install packages/libraries to (DEST_DIR)
    • Make a .Rprofile document in your home space (~/) with the following content, i.e. run the following command on your terminal
cat > $HOME/.Rprofile <<\EOF
.libPaths(“~/DEST_DIR")
cat(".Rprofile: Setting UK repositoryn")
r = getOption("repos") # hard code the UK repo for CRAN
r["CRAN"] = "http://cran.uk.r-project.org"
options(repos = r)
rm(r)

EOF

NOTE:Change DEST_DIR to the name of the directory you created.

    • Load R module and open it.
    • Run install.packages(“Package_Name”) at the prompt.
    • This would install Package_Name to DEST_DIR. To use it just use library(Package_Name)

NOTE: If you install a package with one version of R, it might not be compatible with another version. So it would be advisable to pick one version of R and go with it, so that you don’t have to install multiple versions of the same package.

SGE Job script

Sample R Grid Engine Job Script This is an example of a serial (i.e. non parallel) R job that has a 2 hour run time limit requesting 256M of RAM

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#
#$ -j y
#$ -N rtestjob
# Use '#$ -m n' instead to disable all email for this job
#$ -m eas
#$ -M YOUR_EMAIL_ADDRESS
#$ -l h_rt=2:00:00,s_rt=1:55:00
#$ -l vf=256M
. /etc/profile.d/modules.sh
module load R/R-2.7.2

#$ -v PATH,R_HOME,R_LIBS,LD_LIBRARY_PATH,CWD

R CMD BATCH rscript.R