SURAgrid Resource and Application Discovery

From SURAgrid
Jump to: navigation, search

SURAgrid Resource and Application Discovery

SURAgrid resources registered in the Open Science Grid Information Manager (OIM) https://oim.grid.iu.edu/oim/home can announce the availability of hardware and software through OSG's BDII database (http://is.grid.iu.edu/documentation.html). This database is searchable by common LDAP tools and it's obvious that considerable effort went into designing its schema. Using simple PHP scripting it's possible to present a summary view of all resources available to the SURAgrid VO and its users. This summary also provides a snapshot in time of the queue status on those resources, allowing a user or a metascheduler to use this information for selecting an appropriate Compute Element for executing a user's job or array of jobs.

An example of such a query using the ldapsearch command is shown below. We show only one entry for brevity:

ldapsearch -xLLL -p2170 -h is.grid.iu.edu -b o=grid \
 GlueCEAccessControlBaseRule=VO:suragrid

This LDAP search produces the following output:


dn: GlueVOViewLocalID=suragrid,GlueCEUniqueID=calclab-ce.math.tamu.edu:2119/jobmanager-pbs-night,Mds-Vo-name=TAMU_Calclab,
Mds-Vo-name=local,o=grid
objectClass: GlueCEAccessControlBase
objectClass: GlueCEInfo
objectClass: GlueCEPolicy 
objectClass: GlueCEState 
objectClass: GlueCETop 
objectClass: GlueKey 
objectClass: GlueSchemaVersion 
objectClass: GlueVOView 
GlueCEInfoDataDir: /data/suragrid 
GlueCEStateRunningJobs: 0 
GlueCEStateTotalJobs: 0 
GlueSchemaVersionMajor: 1 
GlueCEStateWaitingJobs: 0 
GlueSchemaVersionMinor: 3 
GlueVOViewLocalID: suragrid 
GlueCEStateFreeJobSlots: 13 
GlueCEAccessControlBaseRule: VO:suragrid 
GlueCEStateEstimatedResponseTime: 3600 
GlueChunkKey: GlueCEUniqueID=calclab-ce.math.tamu.edu:2119/jobmanager-pbs-night 
GlueCEInfoDefaultSE: TAMU_Calclab_SE 
GlueCEStateWorstResponseTime: 3600 
GlueCEInfoApplicationDir: /apps/suragrid 

If you have set up your compute element (CE) properly, it will automatically publish information about your CE to the BDII. Go to the link http://is.grid.iu.edu/cgi-bin/status.cgi and scroll down until you find your CE.  If you find a timestamp in the column labeled Raw Incoming CEMon Data, this tells you the last time that data was received from the CE.  By clicking on that timestamp link, you can see the LDAP information that is being sent by the CE.  If there is the string N/A instead of a timestamp, it means that the CE is not properly set up to push information to the BDII.  The column labeled Fed to OSG BDII... should contain a timestamp indicating when the BDII was last updated with the information from your CE.

Steve Johnson at TAMU has been working on a PHP script (http://www.math.tamu.edu/osg/sgstatus.php) that parses data from LDAP queries (specifically for the attribute listed in red above) to generate a status page showing the OSG resources available to SURAgrid VO members. (Note: This status page is a work in progress.  If you have any comments about the content of the page, send feedback through the suragrid-support@sura.org email list.)  In order for your resource to show up on the status page, it is important that you use the lowercase string 'suragrid' in all references to the SURAgrid VO, whether they are in configuration files or a GUMS server.

This query specifically selects resources for which the SURAgrid VO has been granted access. The amount of information available is impressive. However, some of the availability information doesn't always fit into what the user is expecting. This is usually due to the resource owner's preferences in configuring their scheduler. For example, calclab-ce.math.tamu.edu/jobmanager-pbs-weekend and jobmanager-pbs-night both list a capability of running 894 jobs. This was done to allow local users to flip between weekend and weeknight mode. Yes, it could be better facilitated through a single queue or arouting queue which is reported to OSG, but it's an example of how some information in BDII may not paint the whole picture.

The two big Physics VOs, ATLAS and CMS, have made extensive use of BDII for advertising software availability at each resource. This is extremely helpful to their users as it allows them to query BDII for what versions of the core ATLAS and CMS software are installed at a particular site or what sites offer a particular version of a package.

The schema for application discovery is a bit limited. We have three LDAP attributes to use: the name of the package (GlueLocationLocalID), its version (GlueLocationVersion), and the path to its installation directory or executable (GlueLocationPath). The Physics VO's name their packages as something like VO-cms-packageversion. The name attribute is pretty flexible, but each entry must be unique, so an initial SURAgrid proposed format for the value of this attribute would be:

VO-suragrid-OS-Arch-pkgname-version

As an example, for Octave running on a RHEL5 x86_64 resource the GlueLocationLocalID attribute would be:

VO-suragrid-rhel5-x86_64-octave-3.2.4

GlueLocationVersion would simply contain the version (e.g., 3.2.4).

GlueLocationPath has some constraints. In particular, because it's assumed to be a directory it must start with a '/' and it may not contain whitespace. This presents a problem for some HPC sites that might require additional packages to be loaded in order to run the target package. For example, Octave may be compiled at a site so that it uses special BLAS or FFTW libraries. Shell environment modules are a common way of loading the requisite packages to run a particular application. A simple 'module load octave' usually will load the modules and make the environment ready to run Octave. In order to accommodate resources that use modules, we can advertise this capability in BDII, but the GlueLocationPath attribute still needs to lead with a '/'. Our proposed solution in SURAgrid is the simply stick a '/' at the front.

Here's our example for defining GlueLocationPath when a module load is necessary to prepare the shell to run Octave:

/module%20load%20octave/3.2.4

Because the path cannot contain whitespace, we have substituted %20 for space. While it may be tempting to use '-' or '_' there's a good possibility that these are used in the package name, which complicates the parsing of this string.

Note that if a site wants to advertise the OS vendor's octave, then all that would be necessary would be to specify the path to the executable:

/usr/bin/octave

This can be quite application dependent. In SURAgrid we may have applications that are installed in a base directory and that Globus job scripts have knowledge of what is in the bin/ and lib/ subdirectories of the application. One option may be to append a '/' to the end of the GlueLocationPath to indicate that the path points to a top level installation:

/usr/local/octave-3.4.2/

In order to publish applications through the BDII, you need to create and edit the file app_dir/etc/grid3-locations.txt, where app_dir is a variable indicating the compute element's application directory and defined in $VDT_LOCATION/osg/etc/config.ini for Pacman installations or /etc/osg/config.d/10-storage.ini for RPM installations. For each application you want to publish in the BDII, you need to add a line to grid3-locations.txt with the format:

GlueLocationLocalID[white space]GlueLocationVersion[white space]GlueLocationPath

Here is a sample grid3-locations.txt file:

# grid3-locations.txt
# This is stored under app_dir/etc where app_dir is defined in
# $VDT_LOCATION/osg/etc/config.ini
#
# lines starting with # are comments
# the first word of the first valid line is the location list name
# each following line has the format LogName Version PhName (where the first 2   
# are words, the last is the rest)
# Applications/users are supposed to modify this file to publish their locations
#MountPoints
#SAMPLE_LOCATION default /SAMPLE-path
#SAMPLE_SCRATCH devel /SAMPLE-path
#
# Examples from CMS
#VO-cms-slc5_ia32_gcc434 slc5_ia32_gcc434 /home/cms/app/cmssoft/cms
#VO-cms-CMSSW_3_11_1 CMSSW_3_11_1 /home/cms/app/cmssoft/cms
# 
VO-suragrid-rhel5-x86_64-gcc-4.1.2 4.1.2 /usr/bin/gcc
VO-suragrid-rhel5-x86_64-uptime-3.2.7 3.2.7 /usr/bin/uptime
VO-suragrid-rhel5-x86_64-R-2.13.0 2.13.0 /module%20load%20r/2.13.0

Steve Johnson has also been working on a PHP script (http://www.math.tamu.edu/osg/sgapps.php) that parses data from LDAP queries to generate a page currently showing the applications on SURAgrid resources. (Note: This status page is a work in progress. If you have any comments about the content of the page, send feedback through the suragrid-support@sura.org email list.)

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox