Compute Element RPM Installation

From SURAgrid
Jump to: navigation, search

The installation of the Compute Element (CE) software will make your cluster available to users within SURAgrid and any other VO's you choose to support. In this example, we will follow the OSG Release3 Compute Element Documentation and will assume a Torque (PBS) installation. The OSG documentation provides details on supporting other schedulers: SGE, LSF, and Condor. Instead of reproducing all of the steps in the OSG document, we will only comment on the steps to help with your installation of the CE software.


Contents

Prerequisites

  • A personal certificate - Requesting Personal Certificates
  • Certificates for host, http, rsv - Requesting Certificates
  • Registration of your resource in OIM - OIM_Registration
  • A batch scheduler, either on your CE or accessible by your CE.
  • Root access to a x86_64 real or virtual machine running RHEL5 or RHEL6 (or CentOS/SL) with current updates. Minimum 1GB RAM, 8GB disk. FQDN in DNS with a static IP. We'll assume RHEL6 for this example.
  • Firewall port openings for GRAM (2119/tcp), GridFTP(2811/tcp), and callback ports. You can choose the callback range. E.g., 20000-24999/tcp. These will need to be opened on your campus firewall as well as in /etc/sysconfig/iptables on the CE host.
  • User accounts: apache (uid 48), tomcat (uid 91) on your CE. rsv, sura000-sura050, sgvoadmin on your CE, head node, and compute nodes (and uid's). The sgvoadmin is used for maintaining software packages such as R and Octave.
  • Decide on locations for $OSG_APP and $OSG_DATA. These should be accessible from every node.

Installation Notes

Important: umask 022 - to ensure that everything is installed with proper permissions.

Following the OSG Release3 Compute Element Documentation, carry out the preliminary steps through Section 5.

If you haven't done so already, install your certificates as described in Requesting Certificates.

In section 6, choose the package for your batch system. For a base RHEL6 installation, this step may include over 350 package dependencies, over 250MB of downloads, and over 600MB of installed packages.

Section 7.1 describes the configuration of the OSG stack. Go to /etc/osg/config.d/ and edit the following files:

  • 10-misc.ini - set gums_host = null.yourdomain.edu. We're not using GUMS in this example.
  • 10-storage.ini - set app_dir, data_dir, worker_node_temp, site_read (usually = data_dir), and site_write (usually data_dir). Permissions on app_dir and data_dir should be 1777.
  • 20-pbs.ini (or 20-your_scheduler.ini)
  • 30-cemon.ini - set enabled = False if your CE is not yet registered in OIM
  • 30-gip.ini - set batch = yourscheduler, advertise_gsiftp = TRUE, gsiftp_host = your-ce-name.yourdomain.edu. Create [Subcluster ALLNODES] and describe your cluster.
  • 30-gratia.ini - set enabled = False if your CE is not yet registered in OIM
  • 40-siteinfo.ini - fill in all your site info, which should correlate to your OIM registration.

Run osg-configure -v. Run osg-configure -c.

Sections 7.2-7.4 deal with scheduler-specific items. There's nothing in particular about PBS for our case.

Authentication

Section 8 presents two authentication options: GUMS and edg-mkgridmap. For our example case, we will use edg-mkgridmap, which is a script used to generate /etc/grid-security/grid-mapfile.

Not present in the OSG documentation at the time of this wiki page writing is a step required to disable GUMS. In /etc/grid-security/gsi-authz.conf, comment out the existing globus_mapping line:

#globus_mapping liblcas_lcmaps_gt4_mapping.so lcmaps_callout

Next, save a copy of the distribuited /etc/edg-mkgridmap.conf.

cp -p /etc/edg-mkgridmap.conf /etc/edg-mkgridmap.conf.dist

Edit /etc/edg-mkgridmap.conf so that it contains the following lines:

#### GROUP: group URI [lcluser]
#
#-------------------
group vomss://voms.hpcc.ttu.edu:8443/voms/suragrid AUTO
group vomss://voms.hpcc.ttu.edu:8443/voms/suragrid?/suragrid/sgvoadmin sgvoadmin
gmf_local /etc/grid-security/local-grid-mapfile

The AUTO definition for lcluser above will use a /usr/libexec/edg-mkgridmap/local-subject2user script to generate local usernames for a given grid certificate's subject DN. This script does not exist, so you will need to create it. A simple example is given below.

cat << EOT > /usr/libexec/edg-mkgridmap/local-subject2user
#!/bin/bash
#
# Simple local-subject2user script called by edg-mkgridmap.
# Persistent mappings are stored in $mapfile - sufficient
# for small VO's. Last used index is stored in $lastfile.
# Could extend this to a simple sqlite3 DB.
#
mapfile=/usr/local/etc/subject2user.lis
lastfile=/usr/local/etc/subject2user.last
dn="$1"
base="sura"
fmt="%03d"
test -f $mapfile || touch $mapfile
test -f $lastfile || echo -1 > $lastfile
last=`cat $lastfile`
fgrep -q "$dn" $mapfile
if [ $? -ne 0 ]; then
  last=$((last+1))
  user=`printf "%s$fmt\n" $base $last`
  echo $user
  echo "$dn" $user >> $mapfile
  echo $last > $lastfile
fi
exit 0
EOT
chmod u+x /usr/libexec/edg-mkgridmap/local-subject2user

Next, create a new file, /etc/grid-security/local-grid-mapfile containing the mapping for your RSV user:

 "/DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=rsv/your-ce.yourdomain.edu" rsv

Run edg-mkgridmap and check the file /etc/grid-security/grid-mapfile.

Go ahead and install RSV per Section 9.

Gratia Adjustments

By default, Gratia will report all job accounting records to OSG, including those of your local, non-grid users. To send only grid accounting data to OSG, edit /etc/gratia/pbs-lsf/ProbeConfig and set the following:

   SuppressUnknownVORecords="1"
   SuppressNoDNRecords="1"

Globus Callback Ports

The port range defined in 40-network.ini needs to be set in some additional files.

echo export GLOBUS_TCP_PORT_RANGE=begin_port,end_port >> /etc/sysconfig/globus-gatekeeper
echo export GLOBUS_TCP_PORT_RANGE=begin_port,end_port >> /var/lib/osg/globus-firewall
echo export GLOBUS_TCP_PORT_RANGE=begin_port,end_port >>  /etc/profile.d/globus-firewall.sh
echo setenv GLOBUS_TCP_PORT_RANGE begin_port,end_port >> /etc/profile.d/globus-firewall.csh


Services and Cron Scripts

If you have not yet registered your resource with OIM, or if your resource has not yet been approved, go ahead and disable the reporting cron scripts. Comment out the scripts in /etc/cron.d/gratia* and /etc/cron.d/osg-info-services.

Start and enable the services as described in Sections 10 and 11. Do not start/enable tomcat5 nor gratia-probes-cron if your site is not registered in OIM.


Advanced

When your site's GIP scripts begin publishing data to the OSG BDII, it will send job managers in the form ce.yourdomain.edu/jobmanager-pbs-queuename. However, if a user tries to submit a job to that resource it will be bounced by your CE, claiming that the resource doesn't exist. One way to publish this queue and have it handled as a legitimate resource is to add it to the /etc/grid-services/ directory. For this example, let's assume that your queue is named "suragrid" and you're running Torque. The changes for other job managers should be similar.

cd /etc/grid-resources/available
cp -p jobmanager-pbs-seg jobmanager-pbs-suragrid-seg
- or, depending on if seg files aren't present -
cp -p jobmanager-pbs jobmanager-pbs-suragrid
cd /etc/grid-resources
ln -s available/jobmanager-pbs-suragrid-seg jobmanager-pbs-suragrid-seg
ln -s jobmanager-pbs-suragrid-seg jobmanager-pbs-suragrid

If your installation does not have -seg files you can just make the analogous symlinks using the regular files.

Next, you'll need to add a Perl handler for the jobmanager-pbs-suragrid resource. Our version of Perl is 5.8.8 as shown in the directory below. Your's may be different.

cd /usr/lib/perl5/vendor_perl/5.8.8/Globus/GRAM/JobManager
cp -p pbs.pm pbs-suragrid.pm
vi pbs-suragrid.pm
 - Near the top, change JobManager::pbs; to JobManager::pbs-suragrid;
 - Search for the section of code containing "PBS -q".
   Replace the entire if section starting at:
   if ($description->queue() ne ...
      with
   print JOB "#PBS -q suragrid\n";

Finally

Try it out:

service globus-gatekeeper start
service globus-gridftp-server start
voms-proxy-init -voms suragrid
globus-job-run ce.yourdomain.edu/jobmanager-pbs-suragrid /bin/env

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox