In my previous post,Hadoop 1.0.0 single node configuration on ubuntu deals with hadoop 1.0.0 version, but it is very difficult to configure multi-node setup on ubuntu with hadoop 1.0.0 in the same way. Therefore here I used the following configuration
OS:ubuntu 10.04
Hadoop version: 0.22.0
A small Hadoop cluster will include a single master and multiple worker nodes. But here I am using two machines, one for master and other for slave. The master node consists of a JobTracker, TaskTracker, NameNode, and DataNode.A slave acts as both a DataNode and TaskTracker.I assigned the IP address 192.168.0.1 to the master machine and 192.168.0.2 to the slave machine.
Step 1: Install oracle jdk
Follow this step on both master and slave.
Add the repository to your apt-get:
$sudo apt-get install python-software-properties
$sudo add-apt-repository ppa:sun-java-community-team/sun-java6
Update the source list
$sudo apt-get updateInstall sun-java6-jdk
$ sudo apt-get install sun-java6-jdkSelect Sun’s Java as the default on your machine.
$ sudo update-java-alternatives -s java-6-sunAfter the installation check the java version using
hadooptest@hadooptest-VM$java -versionPart 2: Configure the network
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing)
You must change the /etc/hosts file with the details of the master and slave IP. Open /etc/hosts file in both master and slave using.
$sudo vi /etc/hostsAnd add the following lines
192.168.0.1 masterPart 3: Create hadoop user
192.168.0.2 slave
In this step, we will create a new user and group in master and slave to run the hadoop. Here I added user 'hduser' with in the group 'hd' using following commands.
$sudo addgroup hdPart 4: SSH Setup
$sudo adduser --ingroup hd hduser
Install ssh on master and slave using
$sudo apt-get install sshLet’s configure password less shh between master and slave
$ su - hduser
$ssh-keygen -t rsa -P ""
$cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
On the Master machine run the following
$hduser@master:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slaveTest the ssh configuration on master :
$ ssh masterIf the ssh configuration is correct. the above command does nor ask for password.
$ ssh slave
Part 5: Configuring Hadoop
(Run this step on master and slave as normal user)
Download the latest hadoop 0.22 from: http://www.reverse.net/pub/apache//hadoop/common/ and extract it using :
Hadoop: tar -xvf hadoop*.tar.gzMove hadoop folder from downloaded folder to /usr/local
$sudo mv /home/user/Download/hadoop /usr/local/Change the ownership of the hadoop directory
$sudo chown -R hduser:hd /usr/local/hadoopConfigure /home/hduser/.bashrc with the Hadoop variables enter the following commands:
$ sudo vi /home/hduser/.bashrcAdd the following lines to the end
export JAVA_HOME=/usr/lib/jvm/java-6-sunCreate a folder which Hadoop will use to store its data file
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
$sudo mkdir -p /app/hadoop/tmpOpen the core-site.xml file in hadoop configuration direction
$sudo chown hduer:hd /app/hadoop/tmp
$sudo vi /usr/local/hadoop/conf/core-site.xml
Add the following property tags between
Open the mapred-site.xml file in hadoop configuration direction
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>Temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>Default file system.</description>
</property>
$sudo vi /usr/local/hadoop/conf/mapred-site.xml
Add the following property tags to mapred-site.xml:
Open the hdfs-site.xml file in hadoop configuration direction
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>MapReduce job tracker.</description>
</property>
$sudo vi /usr/local/hadoop/conf/hdfs-site.xml
Add the following property tags to hdfs-site.xml:
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
Part 6: Configure Master Slave Settings
Edit the following files on both the master and slave machines.
conf/masters
conf/slaves
On Master machine:
Open the following file: conf/masters and change ‘locahost’ to ‘master’:
master
Open the following file: conf/slaves and change ‘localhost’ to
master
slave
On the Slave machine:
Open the following file: conf/masters and change ‘locahost’ to ‘slave’:
slave
Open the following file: conf/slaves and change ‘localhost’ to ‘slave’
slave
Part 7 : Starting Hadoop
To format hdaoop datanode, run the following on master in hadoop/bin(/usr/local/hadoop/bin):
$ hadoop namenode -format
Start HDFS daemons, run the following command in hadoop/bin:
$./start-dfs.sh
Run jps command on master, got output like this
14399 NameNode
16244 DataNode
16312 SecondaryNameNode
12215 Jps
Run jps command on slave,got output like this
11501 DataNode
11612 Jps
To Start Map Reduce daemons, run the following command in hadoop/bin
$./start-mapred.sh
Run jps command on master
14399 NameNode
16244 DataNode
16312 SecondaryNameNode
18215 Jps
17102 JobTracker
17211 TaskTracker
Run jps command on slave
11501 DataNode
11712 Jps
11695 TaskTracker
Part 8:Example MapReduce job using word count
Download Plain Text UTF-8 encoding file for following books and store into a local directory (here using /home/hadoopmaster/gutenberg)
- The Outline of Science, Vol. 1 (of 4) by J. Arthur Thomson
- The Notebooks of Leonardo Da Vinci
- Ulysses by James Joyce
Download mapreduce programme jar(hadoop-examples-0.20.203.0.jar) file to any local folder (here using /home/hadoopmaster).
To run mapreduce programe, we need to copy these files into HDFS directory from local directory. For this purpose, first login to the hadoop user and move hadoop directory
$su hduserCopy local file to HDFS using
$cd /usr/local/hadoop/
$hadoop dfs -copyFromLocal /home/hadoopmaster/gutenberg /user/hduser/gutenbergCheck the content inside HDFS directory using
$hadoop dfs -ls /user/hduser/gutenberg
Move to folder that containing downloaded jar file.
Run the following command to execute the programme
$hadoop jar /user/hduser/hadoop-examples-0.20.203.0.jar wordcount
/user/hduser/gutenberg /user/hduser/gutenberg-out
Here /user/hduser/gutenberg is the input directory and /user/hduser/gutenberg-out is the output directory. Both input and output directory must be in HDFS file system.
It will take some time according to your system configuration. You can track the job progress using hadoop tracker websites
Check the result of the programme using
$hadoop dfs -cat /user/hduser/gutenberg-output/part-r-00000

