Running Octave on SURAgrid

From SURAgrid
Jump to: navigation, search

Octave is another example of an application that can be deployed on resources within SURAgrid with relative ease. Most default installations of Octave are single threaded, which fits well with the configurations of the batch schedulers on SURAgrid. This version of Octave was compiled with the OpenBLAS library and will select the most suitable BLAS routines at runtime based on the processor type.

Octave is currently available on the following resources:

OSG Resource Grid_Resource BLAS
TAMU_BRAZOS hurr.tamu.edu/jobmanager-pbs OpenBLAS
TAMU_Calclab calclab-ce.math.tamu.edu/jobmanager-pbs OpenBLAS
TTU_ANTAEUS antaeus.hpcc.ttu.edu/jobmanager-sge OpenBLAS
GridUNESP_CENTRAL ce.grid.unesp.br/jobmanager-pbs OpenBLAS
UTA_SWT2 gk04.swt2.uta.edu/jobmanager-pbs OpenBLAS

Contents

Prerequisites - Certificates, Submit Host

To run these you will need a user certificate from OSG.

Next, you will need to register your user certificate with the SURAgrid VO. Once you have completed these steps and your certificate has been approved, you are now a member of SURAgrid. Congratulations!

You will need a submit host, which involves installing the OSG client software on RHEL5 (or CentOS5, Scientific Linux 5) or RHEL6. Installing and configuring a submit host is far easier than a Compute Element installation. It can be done on a VM, but it's highly recommended that the VM have a static IP address so that campus firewall rules will persist for the host. See the OSG Client Installation documentation for detailed instructions on installing your submit host.

After the installation you should have a local instance of Condor running which will be used for submitting Condor-G jobs to remote sites.

Finally, for your submit host you will need to open the GRAM callback ports on your system's firewall and your campus firewall. These are defined by the GLOBUS_TCP_PORT_RANGE environment variable.

Example - Single input file, stdout, stderr

Here's an example as shown by Steve Johnson (TAMU) at the April 25-26, 2012, SURAgrid All-Hands meeting, based on the R benchmark. The application to Octave is straightforward. It uses one input file and returns stdout and stderr from the job to unique local files on the submit host. The example consists of three files:

sgtest1.oct
Input file for the simple Octave test. This is sent to the remote site with each job in the octave1.condor script. It creates a small matrix, takes its transpose, and computes its inverse.

octave1.sh
The Octave shell script. This is copied in with the octave1.condor job. This will source $OSG_APP/suragrid/etc/profile on the remote resource to setup the correct paths for Octave, and then execute the Octave commands in sgtest1.oct.

octave1.condor
Condor script to submit the Octave shell script. It stages in octave1.sh and sgtest1.oct to the remote resource. In this file there are definitions for the OSG resources shown at the top of this page. In its current form it will submit two jobs each to TAMU_Calclab, TAMU_BRAZOS, and TTU_ANTAEUS. The Grid_Resource, Output, Error, and Log variables are unique for each.

Before submitting jobs from your submit host, you will need to get a VOMS proxy:

voms-proxy-init -voms suragrid
Enter GRID pass phrase: <your passphrase>
Your identity: /DC=org/DC=doegrids/OU=People/CN=Steve Johnson 737432
Creating temporary proxy ....................................................... Done
Contacting  voms.hpcc.ttu.edu:15003 [/DC=org/DC=doegrids/OU=Services/CN=http/voms.hpcc.ttu.edu] "suragrid" Done 
Creating proxy ................................................................................ Done
Your proxy is valid until Tue May 1 04:08:31 2012


Submit the octave1.condor script to your local Condor system (with Condor-G enabled):

condor_submit octave1.condor
Logging submit event(s)........................
24 job(s) submitted to cluster 86.
 

86 is the cluster ID for the octave1 set of jobs (total of 6).  In this case, the octave1.condor script defines the output, error, and log files to be octave1-NAME-$(Cluster)-$(Process).out, .err, and .log, respectively. Next, check the status of the jobs:


condor_q 86
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
  86.0   steve           4/17 10:04   0+00:00:00 I  0   0.0  octave1.sh        
  86.1   steve           4/17 10:04   0+00:00:00 I  0   0.0  octave1.sh        
  86.2   steve           4/17 10:04   0+00:00:00 I  0   0.0  octave1.sh
 ...
  86.5   steve           4/17 10:04   0+00:00:00 R  0   0.0  octave1.sh


Output Files

octave1-CC-PP-Name.out
Standard Output file for job cluster CC, process PP within the cluster, Name to associate with the resource. This is returned at the end of the job.

octave1-CC-PP-Name.err
Standard Error file for job cluster CC, process PP within the cluster, Name to associate with the resource. This is returned at the end of the job.

octave1-CC-PP-Name.log
Condor log file for job cluster CC, process PP within the cluster, Name to associate with the resource. This is updated by Condor on your submit host.


Review the log, output, and error files:

more octave1-86-0-Brazos.log octave1-86-0-Brazos.out octave1-86-0-Brazos.err
 ...
more octave1-86-1-Brazos.log octave1-86-1-Brazos.out octave1-86-1-Brazos.err
 ...

Finally

This example simply writes to the GRAM scratch directory on the remote resources, which may not have sufficient space for large problems. An Octave example with more complex data handling is in the works.

We can easily add additional Octave compute resources as they become available.

Questions/suggestions: Steve Johnson, steve \\ at // math.tamu.edu.

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox