MATLAB DCS Tutorial

Matlab DCS

VPAC has installed MATLAB Distributed Computing Server (DCS) and has purchased 16 Worker licenses.

Matlab DCS is a version of MATLAB that can run MATLAB tasks as PBS jobs. These jobs have to be specially prepared by MATLAB Parallel Computing Toolbox (PCT) installed on your desktop at your institute.

MATLAB PCT will copy across the compiled jobs and data and submit them into the queuing system and can poll for them to complete. Once finished MATLAB PCT will recover the output and insert the results into your MATLAB session.

Prerequisites

To make use of this you need:

  1. a licensed installation of MATLAB on your computer at your institute,
  2. a copy of the Parallel Computing Toolbox (PCT) installed on your client by a person with administrator (MS-Windows) or root (Linux) access
You will also need to accept the MATLAB DCS license agreement, as it is restricted software. Login to the VPAC website using your vpac username and password and select the MATLAB software license.

Please note that VPAC cannot provide you with a license for MATLAB or the PCT, it must belong to you or your University and must be installed on your desktop or laptop computer there.

Please note that the instructions on the README file must be followed and the relevant files copies as requested. The respective paths for Linux and Windows clients (post MATLAB 2012b) are: $matlabroot/toolbox/distcomp/examples/integration/old/pbs/nonshared/README
$matlabroot\toolbox\distcomp\examples\integration\old\pbs\nonshared\README

Configuring SSH on Linux

  1. Run ssh-keygen to create a new key (if you don't already have one) and set a passphrase!
  2. Run ssh-copy-id user@trifid.vpac.org (replacing user with your VPAC user name) to copy your ID to VPAC.
  3. Run ssh-add to add the newly created key to your existing ssh-agent configuration.

Configuring SSH/PSCP on Windows

Windows is, sadly, far more complicated than Linux as it does not include the functionality needed by default. If you run into problems here please do drop an email to help@vpac.org for assistance!

First you will need to follow the Windows Pageant tutorial and set up a passphrase protected public/private key pair between your computer and VPAC.

Then you will need to download the programs plink.exe and pscp.exe from the main PuTTY Download Page and put them into your Windows PATH.

When using MATLAB with PuTTY you must include your full domain name in the session name and add your username in Connection--Data--Auto-login.

PSCP and Plink are command line applications; you cannot just double-click on its icon to run it. For PSCP and Plink to work it will need either to be on your PATH. To set your PATH on Windows NT, 2000, and XP, use the Environment tab of the System Control Panel.

PATH=C:\path\to\putty\directory;%PATH%

Use of PSCP is virtually identical to the commands used in SCP, except a 'p' is appended to the start of each command. Thus the general principle is;

pscp account@source.address:/path/to/file c:\path\to\destination.txt

Please note when using PuTTy on MS-Windows to save your session and to be attentive to case-sensitivity. Linux based systems are case-sensitive, so when saving your session, trifid.vpac.org use the lower-case for your login name. When connecting to MATLAB on the server this session will be loaded by PLINK. If it does not use the right case, it will not work!

Finally, you will need to establish a c:\matlab directory where you can put your data and results, as scp does not work with spaces in path names in this instance. The entire MATLAB Windows toolbox needs to be copied into this directory. e.g.,

copy c:\$matlabroot\toolbox\distcomp\examples\integration\pbs\nonshared\windows\ c:\matlab\windows

Again with MATLAB 2010b you will need the new path

cp $matlabroot\toolbox\distcomp\examples\integration\old\pbs\nonshared\windows c:\matlab\windows

Setting the MATLAB path correctly

Before using MATLAB PCT you will need to configure the MATLAB path to include the correct set of submit functions for your system. To do this go to the File menu and select the "Set Path" option. Select the "Add Folder" option and then add one of the two options below depending on whether you use Linux or Windows. $MATLAB represents the location where MATLAB is installed on your computer, you will need to navigate to the directory below by hand.

Linux: $MATLAB/toolbox/distcomp/examples/integration/pbs/nonshared/unix

Again in MS-Windows the task is a little trickier. First the existing path has to be removed i.e.,

$MATLAB\toolbox\distcomp\examples\integration\pbs\nonshared\windows

In both cases the MATLAB 2010b paths are different.

$matlabroot/toolbox/distcomp/examples/integration/old/pbs/nonshared/unix/>
$matlabroot\toolbox\distcomp\examples\integration\old\pbs\nonshared\windows


Then the new path (see above) has to be added.

c:\matlab\windows

Copy Submit Functions to Your Path How to use the MATLAB PCT to submit tasks to VPAC

The below examples are examples that use the base functionality provided in MATLAB PCT. This does not allow the selection of a queue and so will result in running in the default queue for the default walltime which, if the machine is busy, may result in a long wait for the results.

The examples below demonstrate how to use PCT to submit either a set of serial, independent, tasks or a larger parallel task.

Initial Configuration

MATLAB PCT requires some initial configuration of the scheduler to say what the name of the cluster is, where MATLAB is installed on the cluster, where the MATLAB temporary files are to be stored there and where they should be created on your PC.

Linux users will need to use ssh-add to unlock their key for this session, Windows users will need to run Pageant and unlock their key.

This code is mostly common to both the examples below and should be run before each of them.

The key changes to the code below (which is Linux-based) and the MS-Windows version includes:

  1. You must load a putty and pageant session beforehand
  2. The data location of your local files must be c:\matlab or similar

;
cluster = parallel.cluster.Generic( 'JobStorageLocation', '/tmp/' );
set(cluster, 'HasSharedFilesystem', false);
;change if you're using a different version
set(cluster, 'ClusterMatlabRoot', '/usr/local/matlab/default');
set(cluster, 'OperatingSystem', 'unix');
clusterHost = 'trifid.vpac.org';
;insert your username here
remoteJobStorageLocation = '/home/$username/matlab';
set(cluster, 'IndependentSubmitFcn', {@independentSubmitFcn, clusterHost, remoteJobStorageLocation});
set(cluster, 'CommunicatingSubmitFcn', {@communicatingSubmitFcn, clusterHost, remoteJobStorageLocation});
set(cluster, 'GetJobStateFcn', @getJobStateFcn);
set(cluster, 'DeleteJobFcn', @deleteJobFcn);

Example of a number of serial jobs

The following code sequence is an example of how to submit a number of individual tasks, each doing independent work, as a single job.

;
j = createJob(cluster);
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
submit(j);
; wait for all those tasks to finish.
; NB: This could take some time if you have to wait in the queue!
results = getAllOutputArguments(j);
celldisp(results);

Example for a parallel job

This submits a parallel job running on 2 processors that returns a series of random numbers.

;
job = createCommunicatingJob(cluster, 'Type', 'spmd');
createTask(job, 'rand', 1, {3});
cluster.NumWorkers=2;
submit(job);
results = getAllOutputArguments(j);
celldisp(results);

How to Add Functions to Parallel Jobs

To add a function to a parallel job it must be specified as a file dependency. For example consider the function abc.m

;
function [a]=abc(input)
a=[input input input]*5;

;
clusterHost = 'trifid.vpac.org';
remoteDataLocation = '/home/username/matlab;
sched = findResource('scheduler', 'type', 'generic');
set(sched, 'DataLocation', 'C:\matlab');
set(sched, 'ClusterMatlabRoot', '/usr/local/matlab/R2008b');
set(sched, 'HasSharedFilesystem', true);
set(sched, 'ClusterOsType', 'unix');
set(sched, 'GetJobStateFcn', @pbsGetJobState);
set(sched, 'DestroyJobFcn', @pbsDestroyJob);
set(sched, 'ParallelSubmitFcn', {@pbsNonSharedParallelSubmitFcn, clusterHost, remoteDataLocation});
pjob = createParallelJob(sched, ‘FileDependencies’, {‘abc.m’});
createTask(pjob,'abc', 1,{3});
set(pjob,'MinimumNumberOfWorkers',2);
set(pjob,'MaximumNumberOfWorkers',2);
submit(pjob);
waitForState(pjob);
results = getAllOutputArguments(pjob)
celldisp(results)

Thanks to Shafriza Basah in assistance with the above.

More Information

For more information please see the MATLAB Parallel Computing Toolbox documentation.

In addition to this document, Microsoft Windows users should also reference a specific tutorial page for use of MATLAB DCS with that client operating system.

Top of Page