{ 2.1 } Installing Hadoop - GemaAlbiach/Bicing-RealTime-Occupancy-estimation GitHub Wiki
-
Open the VM as a copy
-
Change the Virtual machine name (you can use Master, Slave1, …)
-
Change pwd
>passwd
-
Logon as root (pwd: tomtom)
-
Create a user as sudoer
-
Logon with sudoer user
-
Change hostname in a terminal (as root)
>hostname masterbicing
-
Assure hostname has been change using "hostname" command (with options)
-
Find the machine ip (the 4 numerals after "inet addr" from ethernet)
-
(We will use the file /etc/hosts instead of DNS Server). Add all the host to /etc/hosts file
>192.168.1.128 masterbicing<br>
>192.168.1.129 slave1
-
Use ping command to assure each machine can find all the other machine by its hostname
-
Edit /etc/sysconfig/network file and change the HOSTNAME by the new hostname
-
Open a web navigator and download hadoop from Hadoop Web. File: hadoop-1.2.1-bin.tar.gz
-
Copy the file to /opt directory and unzip it
>cd /opt<br>
>tar xvfz hadoop-1.2.1-bin.tar.gz
-
Add Hadoop to the path
>echo 'export PATH=$PATH:/opt/hadoop-1.2.1/bin' > /etc/profile.d/hadoop.sh<br>
-
Download Development java kit 6u31 from Oracle (it is needed to be registered). File jdk-6u31-linux-x64-rpm.bin
-
Move the binary file to /opt folder and allow execution
>cd /opt<br>
>chmod u+x jdk6u31linuxx64rpm.bin
-
Execute the binary to install java
>./jdk6u31linuxx64rpm.bin
-
Add the binaries to the path
>echo 'export PATH=$PATH:/usr/java/default/bin/' > /etc/profile.d/java.sh
-
Close the terminal and open a new one (again as a root). Assure which java version is installed. (It should be java 1.6.0_31)
>java -version
-
Go to Hadoop configuration folder. Add below lines at the begining of /opt/hadoop-1.2.1/conf/hadoop-env.sh file
>JAVA_HOME=/usr/java/default<br>
>export HADOOP_HEAPSIZE=12
-
Copy the 3 files from resources file (this github project) to /opt/hadoop-1.2.1/conf/
>mapred-site.xml<br>
>hdfs-site.xml<br>
>core-site.xml
-
Assure mapred-site.xml file has the correct name of the master (in my case is masterbicing)
<value>masterbicing:8021</value>
-
Create the HDFS folders.
>mkdir -pv /srv/data/dfs/nn /srv/data/dfs/dn<br>
>mkdir -pv /srv/data/dfs/sn
-
ONLY FOR MASTER MACHINE. Format HDFS and validate it.
>hadoop namenode -format<br>
>ls -ltrh /srv/data/dfs/nn/
-
Patch the service starter scripts.
>sed -i 's/hadoopdaemons/hadoopdaemon/g' /opt/hadoop1.2.1/bin/startdfs.sh
-
ONLY FOR MASTER MACHINE. Edit /opt/hadoop-1.2.1/bin/start-dfs.sh file and commnet (#) the line "start datanode"
$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode $dataStartOpt<br>
-
ONLY FOR SLAVE MACHINES. Edit /opt/hadoop-1.2.1/bin/start-dfs.sh file and commnet (#) the lines that contains "start namenode” and “start secondarynamenode”.
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode $nameStartOpt<br>
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters star
-
Assure shh is running
>service sshd start
-
Firewall blocks connections. Configure iptables: We can configure iptables to allow all connections, if these nodes are in a secure local area network which is most of the situation, by this command on all nodes:
>iptables -F<br>
>service iptables save
-
When all previous steps have been executed for all the servers (master and slaves) launch DFS daemons.
>cd /opt/hadoop-1.2.1/bin<br>
>./start-dfs.sh
-
List all run java process in each node. Master node should have "NameNode" and "SecondaryNameNode". Slave node should have "DataNode".
>jps -m
-
Open a browser and navigate to HDFS webUI. All the 4 slaves nodes should appear as "Live Nodes"
>http://masterbicing:50070
-
Create the mapreduce temporal folder
>mkdir -pv /srv/data/mapred/local
-
Patch the service starter scripts.
>sed -i 's/hadoopdaemons/hadoopdaemon/g' /opt/hadoop1.2.1/bin/startmapred.sh
-
ONLY FOR MASTER MACHINE. Edit /opt/hadoop-1.2.1/bin/start-mapred.sh file and commnet (#) the line "start tasktracker"
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start tasktracker
-
ONLY FOR SLAVE MACHINES. Edit /opt/hadoop-1.2.1/bin/start-mapred.sh file and commnet (#) the line that contains "start jobtracker".
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker
-
When all previous steps have been executed for all the servers (master and slaves) launch MR daemons.
>cd /opt/hadoop-1.2.1/bin<br>
>./start-mapred.sh
-
List all run java process in each node. Master node should have "JobTracker". Slave node should have "TaskTracker".
>jps -m
-
On one of the servers, open a terminal as root and create the user home.
>hadoop fs -mkdir /user/root<br>
>hadoop fs -chown root:root /user/root<br>
>hadoop fs -mkdir /tmp/input <br>
>hadoop fs -put /etc/passwd /tmp/input
-
Go to the file with a sample jar and run the job MR that do a "grep" from the "bash" token to the uploaded file.
>cd /opt/hadoop-1.2.1<br>
>hadoop jar hadoop-examples-1.2.1.jar grep /tmp/input /tmp/output bash
-
See the result
>hadoop fs -cat /tmp/output/part00000
-
Find de job in the JT webUI (http://masterbicing:50030)