R Tutorial


  1. Log in using X-forwarding; ssh -X [username]@trifid.vpac.org
  2. Create a R test directory and move to it; mdir R_test, cd R_test
  3. Copy sample files to the directory; cp -r /common/examples/R/* .. Check what files are there (ls).
  4. The data is used to indicate an estimate of biomass of ponderosa pine in a study performed by Dale W. Johnson, J. Timothy Ball, and Roger F. Walker who are associated with the Biological Sciences Center, Desert Research Institute, University of Nevada. It consists of observation measurements and markers for 28 different measurements of a given tree. For example, the first number in each row is a number, either 1, 2, 3, or 4, which signifies a different level of exposure to carbon dioxide. The sixth number in every row is an estimate of the biomass of the stems of a tree. Note that the very first line in the file is a list of labels used for the different columns of data.
  5. Look at the file pbs-script to see what it is doing. The comments should be self-explanatory; less pbs-script

  6. #!/bin/bash

    # To give your job a name, replace "MyJob" with an appropriate name
    #PBS -N MyJob

    # For R need to run on single CPU
    # PBS -l nodes=1

    # set your minimum acceptable walltime=hours:minutes:seconds
    #PBS -l walltime=1:00:00

    # Inherit the correct environment variables
    #PBS -V

    # Specify your email address to be notified of progress.
    # PBS -M yourname@domain
    # To receive an email:
    # - job is abored: 'a'
    # - job begins execution: 'b'
    # - job terminates: 'e'
    # Note: Please ensure that the PBS -M option above is set.
    # PBS -m abe

    # Changes directory to your execution directory (Leave as is)

    # Load the environment variables for R
    module load R

    # The command to actually run the job
    R --vanilla < tutorial.R

  7. Submit the job qsub pbs-script. Whilst the job is running look at the tutorial.R script (less tutorial.R). Firstly, it imports the w1.dat and trees91.csv files into appropriate variables. Then it plots a histogram, the breaks, the size of the domain, titles and a strip chart on to the histogram.

  8. # Import the tree data CSV
    w1 <- read.csv(file="w1.dat",sep=",",head=TRUE)
    tree <- read.csv(file="trees91.csv",sep=",",head=TRUE)
    # Plot a histogram of the data
    # Specify the number of breaks to use
    # Specify the size of the domain using xlim
    # adding titles and labels - always annotate your plots!
    hist(w1$vals,main='Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves')
    title('Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves')
    # add other kinds of plots (e.g., stripchart)
    hist(w1$vals,main='Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves',ylim=c(0,16))

  9. Check the status of the job (qstat -u [username]) until the job is completed. When it is complete note the directory listing (ls). You should have something like the following:
    [lev@trifid R_test]$ ls
    MyJob.e1149329 MyJob.o1149329 pbs-script Rplots.pdf trees91.csv tutorial.R w1.dat
  10. The two files MyJob.e1149329 and MyJob.o1149329 are the job error and output files, respectively. The error file in this case is empty, but can be useful for debugging purposes if a job fails. The output file in this instance documents the actions of the program. The real output is Rplots.pdf, which can displayed on the desktop from the cluster with the command evince Rplots.pdf.

    Top of Page