MATLAB DCS

The MATLAB Distributed Computing Server (MATLAB DCS) is a parallel computing extension to MATLAB that enables processing to be spread across a large number of worker nodes, accelerating the speed at which compute intensive operations can complete.

UAB IT Research Computing maintains a 128 worker node license for the the Cheaha computing platform. In order to use DCS on Cheaha, you will need to use a MATLAB instance with the Parallel Computing Toolbox installed.

In order to leverage the MATLAB worker nodes on Cheaha the Parallel Computing Toolbox will need to be configured to submit compute tasks to Cheaha by following the steps in this document.

Overview
The following outline highlights the steps involved to configure your MatLab install and write programs that submit tasks to the worker nodes of the Distributed Computing Server on Cheaha:


 * Configure the Task Submit Environment (One-time Setup)
 * Install MatLab with the Parallel Computing Toolbox on your Windows / Linux / Mac workstation
 * Download and extract the MatLab task submission functions to your workstation MatLab environment
 * Define the "cheaha" parallel configuration in your workstation MatLab environment to submit tasks to Cheaha
 * Run the validation tests to ensure your "cheaha" parallel configuration works
 * Develop and Run Parallel Computing Applications
 * Write, test and debug your parallel code on your local workstation using the default "local" parallel configuration
 * Once your code works, select the "cheaha" parallel configuration to submit tasks to the Cheaha cluster. Note: your workstation MatLab application does not need to keep running after the tasks are submitted.
 * You will receive an email when the tasks you submitted are complete
 * Use your workstation MatLab application to retrieve the results
 * When you are finished with your job contexts, clean up the job related content to free disk space

Using MATLAB DCS
The MATLAB Distributed Computing Services (DCS) are accessed via the Parallel Computing Toolbox (PCT) which is installed as part of your desktop MATLAB installation. The PCT allows MATLAB running on your workstation to send MATLAB code and data (tasks) to the cluster directly from the comfort of your familiar MATLAB environment on your desktop. This makes the expanded compute power of Cheaha available to processes work loads that exceed the capabilities of your desktop computer. Once your tasks are submitted to Cheaha, your desktop MATLAB is also free to move on to other tasks or be closed completely, freeing your desktop or laptop for your other activities.

Configuring the Parallel Computing Toolbox involves three steps documented below:
 * 1) install MATLAB submit functions on your workstation
 * 2) configure the "cheaha" parallel computing target to which PCT tasks can be submitted
 * 3) run the validation tests to confirm a working installation.

This page documents the DCS configuration for MATLAB 2010b and later. For DCS configuration instructions on previous versions of MATLAB, please see the page MatLab DCS R2010a and Earlier

Using MATLAB DCS requires you have a cluster account on Cheaha. Please request an account by sending an email to mailto:support@listserv.uab.edu and include your campus affiliation and a brief statement of your research interests for using the cluster.

MATLAB Submit Functions
The MATLAB submit functions create a cluster job context for your code and are responsible for transferring your code and the data it analyzes to the cluster for processing.

These submit functions must be installed on your computer and must be accessible to MATLAB via the MATLAB PATH environment. The easiest way to accomplish this is to copy the submit functions to the default directory created for by MATLAB. These directories on the respective operating systems are listed below.

All operating systems (Windows, Linux and Mac) are supported by the same set of submit functions. The functions are written in MATLAB making them cross-platform and only dependent on the version of MATLAB in use.


 * 1) Download the MATLAB submit functions
 * 2) * Submit Functions for MATLAB R2013a -(updated 09/04/2013)
 * 3) * Submit Functions for MATLAB R2012a -(updated 03/07/2012)
 * 4) * Submit Functions for MATLAB R2010b, R2011a, R2011b -(updated 02/21/2011)
 * 5) Unzip the files to a directory included in your MATLAB PATH setting. Recommended locations are:
 * 6) * Windows:  My Documents\MATLAB
 * 7) * Linux:    $HOME/Documents/MATLAB
 * 8) * Mac:      $HOME/Documents/MATLAB

Once the submit function files have been downloaded and unzipped in the above paths, restart MATLAB to ensure they are properly loaded in your environment.

NOTE: If you choose not to use the above path recommendations, your MATLAB PATH may be viewed/altered by starting the MATLAB client on your workstation and clicking File -> Set Path and adding the path in which you unpacked the submit functions.

Parallel Computing Toolbox Configuration
The Parallel Computing Toolbox (PCT) enables language extensions in MATLAB that support dividing your application into tasks that can be executed in parallel. By default, all of these tasks will run on your local workstation using the pre-defined "local" PCT configuration.

To run these tasks on the Cheaha compute cluster, a new configuration for the PCT must be defined. In this section we will create the "cheaha" configuration and run a quick validation test to confirm its operation.

Prior to continuing, make sure you:
 * can establish an SSH connection to Cheaha
 * have followed the steps in the previous section

Create the "cheaha" PCT Configuration
Download and save the Cheaha cluster configuration file for your MATLAB version
 * 1) R2010b, R2011a, R2011b cluster configuration file
 * 2) * Start MATLAB on your workstation
 * 3) * Click the "Parallel" menu
 * 4) * Click "Manage Configurations"
 * 5) * In the "Configurations Manager" window, click "File -> Import"
 * 6) * Browse to the location where you saved the cheaha-R2011b.mat file, select it, and click "Open"
 * 7) R2012a cluster configuration file
 * 8) * Start MATLAB R2012a on your workstation
 * 9) * Click the "Parallel" menu
 * 10) * Click "Manage Cluster Profiles"
 * 11) * In the "Cluster Profile Manager" window, click the "Import" button on the toolbar
 * 12) * Browse to the location where you saved the cheaha-R2012a.settings file, select it, and click "Open"

The Configuration Manager for R2011b and prior should now list a new entry named "cheaha" as shown in the following image, R2012a and later will also have a new entry in the Cluster Profile list:

Personalize the "cheaha" PCT Configuration -2011b and earlier

 * 1) Double click on cheaha in the Configuration Manager window to open the configuration editor. (Note: stretch the "Generic Scheduler Configuration Properties" window to the right so that you can view all of the text in the fields making it easier to read and edit correctly.)
 * 2) Edit the following fields to use your personal data directories
 * 3) * ClusterMatlabRoot: Make sure that the Root directory of MATLAB installation for workers matches the exact version of MATLAB you are using on your workstation. In this example /share/apps/mathworks/R2011a matches a MATLAB R2011a workstation install. Change the "R2011a" to match your workstation MATLAB version.
 * 4) * DataLocation    : Change the directory path where job data is stored to an existing directory on your workstation where MATLAB can stage job files.
 * 5) * ParallelSubmitFcn: Change the text "YOURUSERID" to your login id on Cheaha
 * 6) * SubmitFcn       : Change the text "YOURUSERID" to your login id on Cheaha
 * 7) Click 'OK'to save the configuration
 * 8) SSH to cheaha and make sure to create the $USER_SCRATCH/matlab directory. If this directory does not exist, the parallel computing toolbox jobs will fail.

The initial configuration will look similar to this screen shot. You will need to edit the fields as describe in the preceding steps before you can use the configuration. NOTE: be sure to replace the template user name settings "YOURUSERNAME" with the appropriate settings for your desktop and cluster account.



Personalize the "cheaha" PCT Configuration -2012a

 * 1) Double click on cheaha in the Configuration Manager window to open the configuration editor. (Note: stretch the "Generic Scheduler Configuration Properties" window to the right so that you can view all of the text in the fields making it easier to read and edit correctly.)
 * 2) Edit the following fields to use your personal data directories
 * 3) * ClusterMatlabRoot: Make sure that the Root directory of MATLAB installation for workers matches the exact version of MATLAB you are using on your workstation. In this example /share/apps/mathworks/R2012a matches a MATLAB R2012a workstation install. Change the "R2012a" to match your workstation MATLAB version.
 * 4) * DataLocation    : Change the directory path where job data is stored to an existing directory on your workstation where MATLAB can stage job files.
 * 5) * independentSubmitFcn: Change the text "YOURUSERID" to your login id on Cheaha
 * 6) * communicatingSubmitFcn       : Change the text "YOURUSERID" to your login id on Cheaha
 * 7) Click 'OK'to save the configuration
 * 8) SSH to cheaha and make sure to create the $USER_SCRATCH/matlab directory. If this directory does not exist, the parallel computing toolbox jobs will fail.

The initial configuration will look similar to this screen shot. You will need to edit the fields as describe in the preceding steps before you can use the configuration. NOTE: be sure to replace the template user name settings "YOURUSERNAME" and "YOURUSERID" with the appropriate settings for your desktop and cluster account.



Validate the "cheaha" PCT Configuration

 * 1) Before starting validation please make sure the directory 'lustre/scratch/YOURUSERID/matlab' (please convert all settings to point to the new preferred location) or '/scratch/user/YOURUSERID/matlab' (preferred)  exists on the  scratch space on the Cheaha. If it does not please SSH into Cheaha and create the directory before proceeding.
 * 2) Select Cheaha on the configuration manager page and click 'Start Validation'
 * 3) Wait for the validation to complete. This might take a few minutes and you ask for User credentials on Cheaha. All tests other than 'Matlabpool' validate on the Cheaha and the output is as shown.



Begin Using MATLAB DCS from your Desktop
The MATLAB DCS is now configured for Desktop usage. A simple parallel wave job "rParforWave" to test the configuration is described in MatLab_DCS_Examples.

A summary of the above steps is available at MATLAB_workshop_2011 with additional examples and submit scripts available in the workshop demo  section.

MATLAB DCS from Cheaha
MATLAB can be started interactively from Cheaha via an SSH session using the command line, X windows forwarding, or VNC. This is very similar to using MATLAB from your desktop: in order to leverage the compute power of the cluster the Parallel Computing Toolbox must be configured to send tasks to the cluster scheduler.

If you do not follow these configuration steps, parallel tasks will be executed locally on the cluster log in node (head node). This will negatively impact your own and others interactive use of this log in node and may lead to your computations being stopped administratively.

MATLAB Submit Functions
The MATLAB submit functions create a cluster job context for your code. When running MATLAB from Cheaha, the submit functions are already installed and no additional actions are required by the user for this step.

Parallel Computing Toolbox Configuration
The Parallel Computing Toolbox (PCT) for your copy of MATLAB running in your cluster account must be configured to submit tasks to the compute nodes of the cluster. Keep in mind that running MATLAB interactively on the cluster is the same as running it from your own desktop: MATLAB runs all tasks on the machine on which it is running unless it is told to send the work to another computer. This condition holds even "inside" the Cheaha cluster: the cluster compute nodes are physically separate computers and MATLAB must request access via the scheduler just like any other job running on the cluster.

Configuring the PCT when running MATLAB interactively from Cheaha is just like the configuration when running MATLAB from your desktop with two exceptions:
 * you must transfer any code or data to your Cheaha account explicitly outside of MATLAB using standard cluster procedures, aka SSH.
 * when MATLAB submits your tasks to the compute nodes, it benefits from the shared storage on the cluster and does not need to further copy your code and data to the compute nodes

To address these differences, follow the PCT instructions above and when editing the "cheaha" configuration modify the steps for the following fields:
 * 1) Folder where job data is stored (DataLocation): specify a directory in your personal Cheaha account
 * 2) Function called when submitting parallel jobs (ParallelSubmitFnc): change the value to "{@ParallelSubmitFnc}"
 * 3) Function called when submitting distributed jobs (DistributedSubmitFnc): change the value to "{@DistributedSubmitFnc}"
 * 4) Job data location is accessible from both client and cluster nodes: change this value to "True"

Now save the "cheaha" configuring by clicking OK and proceeding to the validation tests described above. Note: when running MATLAB interactively on Cheaha MatlabPool works and the validation tests are expected to pass (you'll see a green checkmark).