Compiling MPI Applications
To create an example MPI application, we'll use the source code that is located in /common/examples/mpi.
Change into the directory we made previously, and copy the file:
trainXX@trifid:~/class> cp -v /common/examples/mpi/* .
The -v indicates that the copy will be 'verbose', giving information back to the user during the copy process. The '.' indicates to copy to the working directory.
Three files will be copied into your current directory. First, we’ll compile up the mpi-pong.c example MPI application.
trainXX@trifid:~/class> mpicc mpi-pong.c -o mpi-pong
This should produce the binary mpi-pong which we can execute, but before we do, we need to work out how to submit it as a job to be processed by the cluster.
To launch a job on the cluster, you will require a script to specify the parameters of the job. There are a few examples of job scripts for PBS in:
Generally there are a number of things that will need checking:
Job Name, set with the -N option
Number of CPUs, using the -l nodes=[number of CPUs] option
Wall time of the Job, using the -l walltime=[hrs:min:sec] option
You may also want to check that you are collecting stdout and errout. Many users like to keep a pbs-script file in their home directory set up how they like it so that only a couple of things need to be changed after it is copied to their run directory.
Note that you must deal with both the number of nodes and CPUs required. If you want to use 8 CPUs for your job, you need to specify that in the the PBS script. For example:
#PBS -l nodes=8
Then, submit the job like this:
trainXX@trifid:~/class> qsub pbs-pong
You will get a response that includes a job number, its worth noting it. See the overall picture with the command showq. Keep track of whats happening with your job by using the qstat command, by itself it will list all jobs in the queue, put the job number on the command line and it will list only that job and put -f there and you get lots of useful information about the job. So suppose the job number is 4567 you would type:
trainXX@trifid:~> checkjob 4567
Try compiling the source file we grabbed before :
trainXX@trifid:~/class> mpicc msum.c -o msum -lm
Create a PBS script for it, say pbs_msum, based on one we have used before. Perhaps grab a pristine copy from /usr/local/examples/PBS/pbs-script:
trainXX@trifid:~/class> cp /common/examples/PBS/pbs-script pbs_msum
Edit the PBS script to set your walltime, job name, number of CPUs and binary we just compiled. When you're happy with it, submit the job to the resource manager:
trainXX@trifid:~/class> qsub pbs_msum
You can see where your program is running by using a number of commands, the most useful are showq and qstat. You'll have to be quick, because the job won't last long!
We need to examine ways of running this job with a long and short wall time and with one, two and four cpus. We will play with the idea of sharing a node with another user or using both CPUs ourselves.
Lets see if we have time to fill in the table below. Try switching between the GCC compiled version of MVAPICH2 with the XL compiled version to compare the speed differences of our MPI code: