Sunday, February 26, 2012

Multi-Node Hadoop Cluster On Ubuntu Linux

In my previous post,Hadoop 1.0.0 single node configuration on ubuntu deals with hadoop 1.0.0 version, but it is very difficult to configure multi-node setup on ubuntu with hadoop 1.0.0 in the same way. Therefore here I used the following configuration

OS:ubuntu 10.04
Hadoop version: 0.22.0

A small Hadoop cluster will include a single master and multiple worker nodes. But here I am using two machines, one for master and other for slave. The master node consists of a JobTracker, TaskTracker, NameNode, and DataNode.A slave acts as both a DataNode and TaskTracker.

I assigned the IP address 192.168.0.1 to the master machine and 192.168.0.2 to the slave machine.





Step 1: Install oracle jdk

Follow this step on both master and slave.

Add the repository to your apt-get:
$sudo apt-get install python-software-properties
$sudo add-apt-repository ppa:sun-java-community-team/sun-java6

Update the source list
$sudo apt-get update
Install sun-java6-jdk
$ sudo apt-get install sun-java6-jdk
Select Sun’s Java as the default on your machine.
$ sudo update-java-alternatives -s java-6-sun
After the installation check the java version using
hadooptest@hadooptest-VM$java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing)
Part 2: Configure the network

You must change the /etc/hosts file with the details of the master and slave IP. Open /etc/hosts file in both master and slave using.
$sudo vi /etc/hosts
And add the following lines
192.168.0.1     master
192.168.0.2 slave
Part 3: Create hadoop user

In this step, we will create a new user and group in master and slave to run the hadoop. Here I added user 'hduser' with in the group 'hd' using following commands.
$sudo addgroup hd
$sudo adduser --ingroup hd hduser
Part 4: SSH Setup

Install ssh on master and slave using
$sudo apt-get install ssh
Let’s configure password less shh between master and slave
$ su - hduser
$ssh-keygen -t rsa -P ""
$cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

On the Master machine run the following
$hduser@master:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave
Test the ssh configuration on master :
$ ssh master
$ ssh slave
If the ssh configuration is correct. the above command does nor ask for password.

Part 5: Configuring Hadoop

(Run this step on master and slave as normal user)
Download the latest hadoop 0.22 from: http://www.reverse.net/pub/apache//hadoop/common/ and extract it using :
Hadoop: tar -xvf hadoop*.tar.gz
Move hadoop folder from downloaded folder to /usr/local
$sudo mv /home/user/Download/hadoop /usr/local/
Change the ownership of the hadoop directory
$sudo chown -R hduser:hd /usr/local/hadoop
Configure /home/hduser/.bashrc with the Hadoop variables enter the following commands:
$ sudo vi /home/hduser/.bashrc
Add the following lines to the end
export JAVA_HOME=/usr/lib/jvm/java-6-sun
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
Create a folder which Hadoop will use to store its data file
$sudo mkdir -p /app/hadoop/tmp
$sudo chown hduer:hd /app/hadoop/tmp
Open the core-site.xml file in hadoop configuration direction
$sudo vi /usr/local/hadoop/conf/core-site.xml

Add the following property tags between and tag in core-site.xml:

<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>Temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
<description>Default file system.</description>
</property>
Open the mapred-site.xml file in hadoop configuration direction
$sudo vi /usr/local/hadoop/conf/mapred-site.xml

Add the following property tags to mapred-site.xml:

<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>MapReduce job tracker.</description>
</property>
Open the hdfs-site.xml file in hadoop configuration direction
$sudo vi /usr/local/hadoop/conf/hdfs-site.xml

Add the following property tags to hdfs-site.xml:

<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

Part 6: Configure Master Slave Settings
Edit the following files on both the master and slave machines.

conf/masters
conf/slaves

On Master machine:

Open the following file: conf/masters and change ‘locahost’ to ‘master’:

master

Open the following file: conf/slaves and change ‘localhost’ to

master
slave

On the Slave machine:

Open the following file: conf/masters and change ‘locahost’ to ‘slave’:

slave

Open the following file: conf/slaves and change ‘localhost’ to ‘slave’

slave

Part 7 : Starting Hadoop
To format hdaoop datanode, run the following on master in hadoop/bin(/usr/local/hadoop/bin):

$ hadoop namenode -format

Start HDFS daemons, run the following command in hadoop/bin:

$./start-dfs.sh

Run jps command on master, got output like this

14399 NameNode
16244 DataNode
16312 SecondaryNameNode
12215 Jps

Run jps command on slave,got output like this

11501 DataNode
11612 Jps

To Start Map Reduce daemons, run the following command in hadoop/bin

$./start-mapred.sh

Run jps command on master

14399 NameNode
16244 DataNode
16312 SecondaryNameNode
18215 Jps
17102 JobTracker
17211 TaskTracker

Run jps command on slave

11501 DataNode
11712 Jps
11695 TaskTracker

Part 8:Example MapReduce job using word count
Download Plain Text UTF-8 encoding file for following books and store into a local directory (here using /home/hadoopmaster/gutenberg)

Download mapreduce programme jar(hadoop-examples-0.20.203.0.jar) file to any local folder (here using /home/hadoopmaster).
To run mapreduce programe, we need to copy these files into HDFS directory from local directory. For this purpose, first login to the hadoop user and move hadoop directory
$su hduser
$cd /usr/local/hadoop/
Copy local file to HDFS using
$hadoop dfs -copyFromLocal /home/hadoopmaster/gutenberg /user/hduser/gutenberg
Check the content inside HDFS directory using
$hadoop dfs -ls /user/hduser/gutenberg

Move to folder that containing downloaded jar file.
Run the following command to execute the programme
$hadoop jar /user/hduser/hadoop-examples-0.20.203.0.jar wordcount 
/user/hduser/gutenberg /user/hduser/gutenberg-out

Here /user/hduser/gutenberg is the input directory and /user/hduser/gutenberg-out is the output directory. Both input and output directory must be in HDFS file system.
It will take some time according to your system configuration. You can track the job progress using hadoop tracker websites
Check the result of the programme using
$hadoop dfs -cat /user/hduser/gutenberg-output/part-r-00000

Friday, January 27, 2012

Hadoop 1.0.0 single node configuration on ubuntu

Hadoop is a framework for distributed processing across multiple compute clusters. It provides reliable data storage using Hadoop Distributed File System(HDFS) and high performance parallel data processing using MapReduce method. You can find more information from following
http://wiki.apache.org/hadoop/
Here I am describing my own experience with hadoop 1.0.0 configuration in a ubuntu box. I am using ubuntu 11.04 for this configuration.

Step 1: Download and install oracle jdk

Install jdk 1.6 or above using following step
Add the repository to your apt-get:
hadooptest@hadooptest-VM$sudo apt-get install python-software-properties
hadooptest@hadooptest-VM$ sudo add-apt-repository ppa:sun-java-community-team/sun-java6

Update the source list
hadooptest@hadooptest-VM$ sudo apt-get update

Install sun-java6-jdk
hadooptest@hadooptest-VM$ sudo apt-get install sun-java6-jdk

Select Sun’s Java as the default on your machine.
hadooptest@hadooptest-VM$ sudo update-java-alternatives -s java-6-sun

After the installation check the java version using
hadooptest@hadooptest-VM$java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing)
Step 2:Download and Install Hadoop

Download i386 or amd64 version(according to your os version) of .deb package from http://ftp.jaist.ac.jp/pub/apache/hadoop/common/hadoop-1.0.0/. Install the hadoop by double clicking the file or using dpkg command.

hadooptest@hadooptest-VM$sudo dpkg -i  hadoop_1.0.0-1_i386.deb
Step 3: Set up Hadoop for single node

Setup hadoop for single node using following command
hadooptest@hadooptest-VM$sudo hadoop-setup-single-node.sh
Answer "yes" for all questions. Service will automatically started after the installation.

Step 4: Test hadoop configuration
hadooptest@hadooptest-VM$ sudo hadoop-validate-setup.sh --user=hdfs
If you get "teragen, terasort, teravalidate passed." near the end of the output, everything is ok.

Hadoop Tracker websites

JobTracker website: http://localhost:50030/
NameNode website : http://localhost:50070/
Task track website: http://localhost:50060/

Step 5: Example MapReduce job using word count

5.1 Download Plain Text UTF-8 encoding file for following books and store into a local directory (here using /home/hadooptest/gutenberg)
5.2 Download mapreduce programme jar(hadoop-examples-0.20.203.0.jar) file to any local folder (here using /home/hadooptest)
5.3. To run mapreduce programe, we need to copy these files into HDFS directory from local directory. For this purpose, first login to the hadoop user using
hadooptest@hadooptest-VM$su hdfs
Copy local file to HDFS using
hdfs@hadooptest-VM$hadoop dfs -copyFromLocal /home/hadooptest/gutenberg /user/hdfs/gutenberg
Check the content inside HDFS directory using
hdfs@hadooptest-VM$hadoop dfs -ls /user/hdfs/gutenberg
5.4. Move to folder that containing downloaded jar file.
5.5. Run the following command to execute the programme
hdfs@hadooptest-VM:/home/hadooptest$hadoop jar /user/hdfs/hadoop-examples-0.20.203.0.jar wordcount /user/hdfs/gutenberg /user/hdfs/gutenberg-out
Here /user/hdfs/gutenberg is the input directory and /user/hdfs/gutenberg-out is the output directory. Both input and output directory must be in HDFS file system.
It will take some time according to your system configuration. You can track the job progress using hadoop tracker websites

5.6. Check the result of the programme using
hdfs@hadooptest-VM:/home/hadooptest$hadoop dfs -cat /user/hduser/gutenberg-output/part-r-00000

Friday, December 30, 2011

Install Xen 4.1 on Ubuntu 11.10

Ubuntu support xen officially from 11.10 version. But there is some issue while creating DomU. Here find some easy step to configure xen 4.1 on 64 bit version of ubuntu 11.10 and solve issue in DomU creation.

1. Install xen and utilities

$sudo apt-get install xen-hypervisor-4.1-amd64 xen-utils-4.1 xenwatch xen-tools xen-utils-common xenstore-utils
$sudo apt-get install virtinst
$sudo apt-get install virt-viewer virt-manager

2.Restart OS and Select Xen Kernal

Verify the Xen installation using

$sudo xm info

If this command does not return any error, then the installation is correct.

3.Xend Configuration

Edit
/etc/xen/xend-config.sxp and uncomment this line

(xend-unix-server no)

and change to
(xend-unix-server yes)

Edit .bashrc file using

#vi ~/.bashrc
, add the following line:
export VIRSH_DEFAULT_CONNECT_URI="xen:///"

4. Restart OS and Select Xen Kernel

Verify libvirt Installation

$sudo virsh version 

Compiled against library: libvir 0.8.3
Using library: libvir 0.8.3
Using API: Xen 3.0.1
Running hypervisor: Xen 4.0.0

If got output like this, you have installed every package correctly.

5.Creating VM using virt-manager

Here I am telling about virtual machine creation using virt-manger

Type $sudo virt-manager for getting GUI for virt-manager

Create new VM using virt-manager(Learn More)

Solution for common error during DomU is given below..

1. Show Something like this

Fix it using

$sudo mkdir /usr/lib64/xen -p
$sudo cp /usr/lib/xen-4.1/* -r /usr/lib64/xen/

2. Get like this

Solve it using

$sudo mkdir  /usr/share/qemu
$sudo cp -r /usr/share/qemu-linaro/keymaps /usr/share/qemu/

Sunday, November 20, 2011

How To Disable Guest Session in Ubuntu 11.10 Oneiric Ocelot

Ubuntu 11.10 added guest account by default.This is the part of new display manager lightDM. To disable his feature follow the below step

1. Open terminal
2. Type ' sudo vi /etc/lightdm/lightdm.conf ' .Then you got a file like this

[SeatDefault]
greeter-session=unity-greeter
user-session=ubuntu


3. Add the line 'allow-guest=false' to the end of the file
4. Save file and exist
5. Reboot your system or restart lightdm service using “sudo restart lightdm” command


For video demonstration visit: http://www.youtube.com/watch?v=qBmF6rZCYh8

Sunday, July 24, 2011

Mount Virtual Box Image in Ubuntu

VirtualBox provides a tool called vdfuse, which is a FUSE-based filesystem package that can mount any VirtualBox VDI image.It support VDI, VMDK, VHD and raw format

Install vdfuse package using the following command

$ sudo apt-get install virtualbox-ose-fuse

You can now mount the VDI file like this:

$ sudo vdfuse [options] -f /path/to/file.vdi /path/to/mountpoint

The options are

-h help
-r readonly
-t specify type (VDI, VMDK, VHD, or raw; default: auto)
-f VDimage file
-a allow all users to read disk
-w allow all users to read and write to disk
-g run in foreground
-v verbose
-d debug

This command creates file like 'EntireDisk', 'Partition1', etc at the mount point.

To mount the filesystem, just use:

mount /path/to/mountpoint/Partition1 /path/to/someother/mountpoint

Then view the file system at /path/to/someother/mountpoint