Tutorials - MATLAB DCS With MS-Windows

MATLAB DCS With MS-Windows

Previous Knowledge

It is strongly recommended that one is familiar with the following before reading this tutorial.

  1. The Introduction to VPAC, Linux and HPC Job Submission
  2. Using MATLAB DCS at VPAC
  3. PuTTY User Manual
  4. Pageant - help with windows ssh pass phrase

Connecting to VPAC

Step1: Connect to the account using public key and private key using PuTTY key generator:

puttykey

Step2: Download PSCP and Plink and save in a folder (for example C:\WINDOWS). Then go to Environment tab in the System Control Panel and set the path to the saved folder (e.g., C:\WINDOWS)

path

Step3: Start the PuTTY application and enter details as below. Then save the session;

puttysession

After doing this save the session and try log in your account with PuTTY. You should be able to login automatically without having to enter your username and password.

Step4: Run the Pageant application, as follows;

pagent

Step5: You will need to establish a c:\matlab directory where you can put your data and results, as scp does not work with spaces in path names in this instance. The entire MATLAB Windows toolbox needs to be copied into this directory. i.e.,

cp c:\$matlabroot\toolbox\distcomp\examples\integration\pbs\nonshared\windows\ c:\matlab\windows

Setting the MATLAB path correctly

Before using MATLAB PCT you will need to configure the MATLAB path to include the correct set of submit functions for your system. To do this go to the File menu and select the "Set Path" option. Select the "Add Folder" option and then add one of the two options below depending on whether you use Linux or Windows. $MATLAB represents the location where MATLAB is installed on your computer, you will need to navigate to the directory below by hand.

Linux: $MATLAB/toolbox/distcomp/examples/integration/pbs/nonshared/unix

Again in MS-Windows the task is a little trickier. First the existing path has to be removed i.e.,

$MATLAB\toolbox\distcomp\examples\integration\pbs\nonshared\windows

Then the new path (see above) has to be added.

c:\matlab\windows

Run simple job using MATLAB Distributed Computing

The concept for this process is that you will write some simple code to send to VPAC cluster and it will run the code there then return back to you the result. You can copy the following code put in an .m file and run it your MATLAB directly (make sure that all the above sections are setup correctly).

j = createJob(cluster);
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
submit(j);
; wait for all those tasks to finish.
; NB: This could take some time if you have to wait in the queue!
results = getAllOutputArguments(j);
celldisp(results);

Running A Parallel Job using MATLAB Distributed Computing

job = createCommunicatingJob(cluster, 'Type', 'spmd');
createTask(job, 'rand', 1, {3});
cluster.NumWorkers=2;
submit(job);
results = getAllOutputArguments(j);
celldisp(results);

In general, to connect to a VPAC cluster, MATLAB will make use of the running Pagent to connect using the public and private key. A connection is made to prepare the job and submit it to the cluster. After this is done, the result will be send back to the local MATLAB instance.

Using FileDependencies property to send code and data to VPAC

Assume, we have a dataset called trainin.mat and an algorithm called test1.m (below).

Now we want to write the code to send the test.m function and trainin.mat to VPAC and run it there: The code for this is below:

; filedependency.m

clusterHost = 'tango.vpac.org';
remoteDataLocation = '/home/xxxxxx'; //replace this by your own account path
sched = findResource('scheduler', 'type', 'generic');
get(sched);
set(sched, 'DataLocation', 'C:\matlab\share');
set(sched, 'ClusterMatlabRoot', '/usr/local/matlab/R2008a');
set(sched, 'HasSharedFilesystem', true);
set(sched, 'ClusterOsType', 'unix');
set(sched, 'SubmitFcn', {@pbsNonSharedSimpleSubmitFcn, clusterHost, remoteDataLocation});
j = createJob(sched);
set(j ,'FileDependencies' , {'C:\matlab\share\test1.m','C:\matlab\share\trainin.mat'}); //call the FileDependiencies function to send our file to vpac, you must put the path to where you store the file in your computer, When you run the code, all the the file in this part will be zipped and send to vpac server, there, they are unzipped and put in a place somewhere on server, And in order to access the file, we need to have some code in the test1.m to do that, please see the test1.m file for more information.
createTask(j, @test1 , 1, {}); //this task will tell server to run the test1.m function
createTask(j, @rand, 1, {3,3}); //this is just a test function to give feed back that the program is working well.
get(j);
submit(j)
waitForState(j)
results = getAllOutputArguments(j);
results{1:2}
celldisp(results)

; test1.m

function result = test1() /
pwd
depdir = getFileDependencyDir //as mention above, our file will be sent to vpac and stored there, so this code return the path to the place where our file is stored on vpac
cd(depdir) //this will tell the program to acces that path where the file is store.
load('trainin.mat'); //now we can load our file directly because we are already in the place where the file
result = traininput; // the traininput variable is the dataset in that trainin.mat file, so now we can do what ever we want with that dataset
return
end

Using PathDependencies property to run code and data which are already on VPAC

For large dataset files it is not best practise to use FileDependencies to send it to the VPAC cluster every time we execute the program. Instead, one can just put the dataset on VPAC and program MATLAB to access it when required. . For example, with the trainin.mat dataset, we can upload it to our account on VPAC first and program MATLAB to access it instead of sending it as filedependencies to VPAC each time.

; test1.m

clusterHost = 'tango.vpac.org';
remoteDataLocation = '/home/xxxxxx';
sched = findResource('scheduler', 'type', 'generic');
get(sched);
set(sched, 'DataLocation', 'C:\matlab\share');
set(sched, 'ClusterMatlabRoot', '/usr/local/matlab/R2008a');
set(sched, 'HasSharedFilesystem', true);
set(sched, 'ClusterOsType', 'unix');
set(sched, 'SubmitFcn', {@pbsNonSharedSimpleSubmitFcn, clusterHost, remoteDataLocation});
j = createJob(sched);
%set(j, 'JobData', {traininput,traintarget});
set(j ,'FileDependencies' , {'C:\matlab\share\test.m'}); //only put the test function in this place
set(j ,'PathDependencies' , {'/home/vuhuyquan/matlab/share/trainin.mat'}); //change this pathe to the place you store your dataset file on vpac
createTask(j, @test , 1, {});
createTask(j, @rand, 1, {3,3});
get(j);
submit(j)
waitForState(j)
results = getAllOutputArguments(j);
results{1:2}
celldisp(results)

; test.m

function result = test() //this is the code for test.m file
load('/home/vuhuyquan/matlab/share/trainin.mat'); //we can tell the function to load our dataset on vpac use direct path that file.
result = traininput; // we can do everything with the dataset now.
return
end

How to run .m file with data set which are already on VPAC

In some case, you have a very big function file (.m file) and a big data set, and it is not sufficient to send that function file to server every time you run it. You may want to upload everything to server at one time and then program the cluster to run that function up there directly.

This tutorial will show you how to do that:

To make it easy to explain the concept, I only use a small function to show that it is running well using this method:

Assume, we have a funtion file : abc.m and the dataset trainin.mat. We want to upload these two file to the cluster, then tell matlab to run it there.

; abc.m

function y = abc() //this code is in the abc.m file to load the dataset trainin.mat save it as another file and return the value of the dataset.
load('trainin.mat');
y = traininput;
save traintest y;
return
end

Assuming these these two file on VPAC sever are already in one's home directory, or subdirectory, then another function needs to be written that calls the abc.m function on another. (test1.m)

This is the function that you will send from you computer to VPAC to run it, it will call the function abc.m and run it there.

; test1.m

function result = test1()
cd /home/xxxxxxx // this code tell vpac to access to the place where we put our abc.m and trainin.mat file before.
result = abc(); //call abc function and get the return result.
return
end

When we have test1.m ready, we again need to write the code to submit this function.

; distributedcomputing.m

clusterHost = 'tango.vpac.org';
remoteDataLocation = '/home/xxxxxxxx';
sched = findResource('scheduler', 'type', 'generic');
get(sched);
set(sched, 'DataLocation', 'C:\matlab\share');
set(sched, 'ClusterMatlabRoot', '/usr/local/matlab/R2008a');
set(sched, 'HasSharedFilesystem', true);
set(sched, 'ClusterOsType', 'unix'); set(sched, 'SubmitFcn', {@pbsNonSharedSimpleSubmitFcn, clusterHost, remoteDataLocation});
j = createJob(sched);
set(j ,'FileDependencies' , {'C:\matlab\Oldsource\test1.m'}); //send the test1.m fie to vpac
set(j ,'PathDependencies' , {'/home/vuhuyquan/matlab'}); //provide the path to the place where we store abc.m and trainin.mat file.
createTask(j, @test1 , 1, {});
get(j);
submit(j)
waitForState(j, 'finished')
results = getAllOutputArguments(j);
results{1:1}
celldisp(results)
destroy(j)

When you run this file, it will send the test1.m file to VPAC and run it there. Inside the test1.m file it calls the abc() function, so that abc.m file will also run and return the result.

This tutorial was originally written by Quan Vu with assistance from Gang Li. VPAC would like to extend their warmest thanks for their work.

Top of Page