Running R on SURAgrid

From SURAgrid
Jump to: navigation, search

R is a fine example of an application that can be deployed on resources within SURAgrid with relative ease. Most default installations of R are single threaded, which fits well with the configurations of the batch schedulers on SURAgrid. R is currently available on the following resources:

OSG Resource Grid_Resource BLAS
TAMU_BRAZOS hurr.tamu.edu/jobmanager-pbs Intel MKL
TAMU_Calclab calclab-ce.math.tamu.edu/jobmanager-pbs OpenBLAS
TTU_ANTAEUS antaeus.hpcc.ttu.edu/jobmanager-sge OpenBLAS
FNAL_FERMIGRID fermigridosg1.fnal.gov/jobmanager-condor OpenBLAS
GridUNESP_CENTRAL ce.grid.unesp.br/jobmanager-pbs OpenBLAS
UTA_SWT2 gk04.swt2.uta.edu/jobmanager-pbs OpenBLAS

Contents

Prerequisites - Certificates, Submit Host

To run these you will need a user certificate from OSG.

Next, you will need to register your user certificate with the SURAgrid VO. Once you have completed these steps and your certificate has been approved, you are now a member of SURAgrid. Congratulations!

You will need a submit host, which involves installing the OSG client software on RHEL5 (or CentOS5, Scientific Linux 5) or RHEL6. Installing and configuring a submit host is far easier than a Compute Element installation. It can be done on a VM, but it's highly recommended that the VM have a static IP address so that campus firewall rules will persist for the host. See the OSG Client Installation documentation for detailed instructions on installing your submit host.

After the installation you should have a local instance of Condor running which will be used for submitting Condor-G jobs to remote sites.

Finally, for your submit host you will need to open the GRAM callback ports on your system's firewall and your campus firewall. These are defined by the GLOBUS_TCP_PORT_RANGE environment variable.

Example - Single input file, stdout, stderr

Here's an example as shown by Steve Johnson (TAMU) at the April 25-26, 2012, SURAgrid All-Hands meeting - the R benchmark. It uses one input file and returns stdout and stderr from the job to unique local files on the submit host. The example consists of three files:

R-benchmark-25.R
Input file for the R benchmark. This is sent to the remote site with each job in the rbench.condor script.

rbench.sh
The benchmark shell script. This is copied in with the rbench.condor job. This will source $OSG_APP/suragrid/etc/profile on the remote resource to setup the correct paths for R, and then execute the R commands in R-benchmark-25.R.

rbench.condor
Condor script to submit the R benchmark. It stages in rbench.sh and R-benchmark-25.R to the remote resource. In this file there are definitions for 3 different OSG resources: TAMU_Calclab, TAMU_BRAZOS, and TTU_ANTAEUS. The Grid_Resource, Output, Error, and Log variables are unique for each. The script requests 8 instances of the job on each resource.

Before submitting jobs from your submit host, you will need to get a VOMS proxy:

voms-proxy-init -voms suragrid
Enter GRID pass phrase: <your passphrase>
Your identity: /DC=org/DC=doegrids/OU=People/CN=Steve Johnson 737432
Creating temporary proxy ....................................................... Done
Contacting  voms.hpcc.ttu.edu:15003 [/DC=org/DC=doegrids/OU=Services/CN=http/voms.hpcc.ttu.edu] "suragrid" Done 
Creating proxy ................................................................................ Done
Your proxy is valid until Tue May 1 04:08:31 2012

Submit the rbench.condor script to your local Condor system (with Condor-G enabled):

condor_submit rbench.condor
Logging submit event(s)........................
24 job(s) submitted to cluster 83.
 

83 is the cluster ID for the rbench set of jobs (total of 24).  In this case, the rbench.condor script defines the output, error, and log files to be rbench-$(Cluster)-$(Process).out, .err, and .log, respectively. Next, check the status of the jobs:


condor_q 83
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
  83.0   steve           4/30 15:30   0+00:00:00 I  0   0.0  rbench.sh        
  83.1   steve           4/30 15:30   0+00:00:00 I  0   0.0  rbench.sh        
  83.2   steve           4/30 15:30   0+00:00:00 I  0   0.0  rbench.sh
 ...
  83.23  steve           4/30 15:30   0+00:00:00 R  0   0.0  rbench.sh


Output Files

rbench-CC-PP-Name.out
Standard Output file for job cluster CC, process PP within the cluster, Name to associate with the resource. This is returned at the end of the job.

rbench-CC-PP-Name.err
Standard Error file for job cluster CC, process PP within the cluster, Name to associate with the resource. This is returned at the end of the job.

rbench-CC-PP-Name.log
Condor log file for job cluster CC, process PP within the cluster, Name to associate with the resource. This is updated by Condor on your submit host.


Review the log, output, and error files:

more rbench-83-0-Brazos.log rbench-83-0-Brazos.out rbench-83-0-Brazos.err
 ...
more rbench-83-1-Brazos.log rbench-83-1-Brazos.out rbench-83-1-Brazos.err
 ...

Finally

This example simply writes to the GRAM scratch directory on the remote resources, which may not have sufficient space for large problems. An R example with more complex data handling is in the works.

We can easily add additional R compute resources as they become available.

Questions/suggestions or if you need a R package installed: Steve Johnson, steve \\ at // math.tamu.edu.

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox