Tutorials - Hadoop On Demand

Hadoop On Demand

Hadoop On Demand (HOD) is a system for provisioning virtual Hadoop clusters over a large physical cluster. It uses the Torque resource manager to do node allocation. On the allocated nodes, it can start Hadoop Map/Reduce and HDFS daemons. It automatically generates the appropriate configuration files (hadoop-site.xml) for the Hadoop daemons and client. HOD also has the capability to distribute Hadoop to the nodes in the virtual cluster that it allocates. In short, HOD makes it easy for administrators and users to quickly setup and use Hadoop.

First load the hod module:

module load hod/0.20.1

Then set the allocation process (note that this command has to be submitted on one line):

hod -t /usr/local/hadoop/0.20.1/build/hadoop-0.20.2-dev.tar.gz -o "allocate $HOME/hadoop 10" -l 3600

Then you should see something like this:

INFO - Cluster Id 832284.tango-m.vpac.org
INFO - HDFS UI at http://tango109.vpac.org:55729
INFO - Mapred UI at http://tango109.vpac.org:55975
INFO - hadoop-site.xml at $HOME/hadoop

To list the active created hod cluster:

hod list -d $HOME/hadoop/

Which will give a result like this:

INFO - alive 832284.tango-m.vpac.org $HOME/hadoop

To run a test job enter the following:

HADOOP_CONF_DIR=$HOME/hadoop
hadoop fs -mkdir input
hadoop fs -mkdir output
hadoop fs -put /usr/share/dict/words input/words
hadoop jar /usr/local/hadoop/0.20.1/hadoop-0.20.1-examples.jar wordcount input/words output/wordcount
hadoop fs -get output/wordcount ~/wc_results


To deallocate the cluster enter the following:

hod list -d $HOME/hadoop/
INFO - alive 832325.tango-m.vpac.org $HOME/hadoop
hod deallocate -d $HOME/hadoop/
hod list -d $HOME/hadoop/

Top of Page