{ 2.1 } Installing Hadoop
Open the VM as a copy
Change the Virtual machine name (you can use Master, Slave1, …)
Change pwd
Logon as root (pwd: tomtom)
Create a user as sudoer
Logon with sudoer user
Change hostname in a terminal (as root)
>hostname masterbicing
Assure hostname has been change using "hostname" command (with options)
Find the machine ip (the 4 numerals after "inet addr" from ethernet)
(We will use the file /etc/hosts instead of DNS Server). Add all the host to /etc/hosts file
> masterbicing<br>
> slave1
Use ping command to assure each machine can find all the other machine by its hostname
Edit /etc/sysconfig/network file and change the HOSTNAME by the new hostname
Open a web navigator and download hadoop from Hadoop Web. File: hadoop-1.2.1-bin.tar.gz
Copy the file to /opt directory and unzip it
>cd /opt<br>
>tar xvfz hadoop-1.2.1-bin.tar.gz
Add Hadoop to the path
>echo 'export PATH=$PATH:/opt/hadoop-1.2.1/bin' > /etc/profile.d/hadoop.sh<br>
Download Development java kit 6u31 from Oracle (it is needed to be registered). File jdk-6u31-linux-x64-rpm.bin
Move the binary file to /opt folder and allow execution
>cd /opt<br>
>chmod u+x jdk6u31linuxx64rpm.bin
Execute the binary to install java
Add the binaries to the path
>echo 'export PATH=$PATH:/usr/java/default/bin/' > /etc/profile.d/java.sh
Close the terminal and open a new one (again as a root). Assure which java version is installed. (It should be java 1.6.0_31)
>java -version
Go to Hadoop configuration folder. Add below lines at the begining of /opt/hadoop-1.2.1/conf/hadoop-env.sh file
Copy the 3 files from resources file (this github project) to /opt/hadoop-1.2.1/conf/
Assure mapred-site.xml file has the correct name of the master (in my case is masterbicing)
Create the HDFS folders.
>mkdir -pv /srv/data/dfs/nn /srv/data/dfs/dn<br>
>mkdir -pv /srv/data/dfs/sn
ONLY FOR MASTER MACHINE. Format HDFS and validate it.
>hadoop namenode -format<br>
>ls -ltrh /srv/data/dfs/nn/
Patch the service starter scripts.
>sed -i 's/hadoopdaemons/hadoopdaemon/g' /opt/hadoop1.2.1/bin/startdfs.sh
ONLY FOR MASTER MACHINE. Edit /opt/hadoop-1.2.1/bin/start-dfs.sh file and commnet (#) the line "start datanode"
$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode $dataStartOpt<br>
ONLY FOR SLAVE MACHINES. Edit /opt/hadoop-1.2.1/bin/start-dfs.sh file and commnet (#) the lines that contains "start namenode” and “start secondarynamenode”.
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode $nameStartOpt<br>
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters star
Assure shh is running
>service sshd start
Firewall blocks connections. Configure iptables: We can configure iptables to allow all connections, if these nodes are in a secure local area network which is most of the situation, by this command on all nodes:
>iptables -F<br>
>service iptables save
When all previous steps have been executed for all the servers (master and slaves) launch DFS daemons.
>cd /opt/hadoop-1.2.1/bin<br>
List all run java process in each node. Master node should have "NameNode" and "SecondaryNameNode". Slave node should have "DataNode".
>jps -m
Open a browser and navigate to HDFS webUI. All the 4 slaves nodes should appear as "Live Nodes"
Create the mapreduce temporal folder
>mkdir -pv /srv/data/mapred/local
Patch the service starter scripts.
>sed -i 's/hadoopdaemons/hadoopdaemon/g' /opt/hadoop1.2.1/bin/startmapred.sh
ONLY FOR MASTER MACHINE. Edit /opt/hadoop-1.2.1/bin/start-mapred.sh file and commnet (#) the line "start tasktracker"
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start tasktracker
ONLY FOR SLAVE MACHINES. Edit /opt/hadoop-1.2.1/bin/start-mapred.sh file and commnet (#) the line that contains "start jobtracker".
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker
When all previous steps have been executed for all the servers (master and slaves) launch MR daemons.
>cd /opt/hadoop-1.2.1/bin<br>
List all run java process in each node. Master node should have "JobTracker". Slave node should have "TaskTracker".
>jps -m
On one of the servers, open a terminal as root and create the user home.
>hadoop fs -mkdir /user/root<br>
>hadoop fs -chown root:root /user/root<br>
>hadoop fs -mkdir /tmp/input <br>
>hadoop fs -put /etc/passwd /tmp/input
Go to the file with a sample jar and run the job MR that do a "grep" from the "bash" token to the uploaded file.
>cd /opt/hadoop-1.2.1<br>
>hadoop jar hadoop-examples-1.2.1.jar grep /tmp/input /tmp/output bash
See the result
>hadoop fs -cat /tmp/output/part00000
Find de job in the JT webUI (http://masterbicing:50030)