{ 2.1 } Installing Hadoop - GemaAlbiach/Bicing-RealTime-Occupancy-estimation GitHub Wiki

Create Hadoop Cluster on Virtual Machine

Logon with sudoer user
Change hostname in a terminal (as root)
```
>hostname masterbicing
```
Assure hostname has been change using "hostname" command (with options)
Find the machine ip (the 4 numerals after "inet addr" from ethernet)
(We will use the file /etc/hosts instead of DNS Server). Add all the host to /etc/hosts file
```
>192.168.1.128 masterbicing 
```
```
>192.168.1.129 slave1
```
Use ping command to assure each machine can find all the other machine by its hostname
Edit /etc/sysconfig/network file and change the HOSTNAME by the new hostname
Open a web navigator and download hadoop from Hadoop Web. File: hadoop-1.2.1-bin.tar.gz

Copy the file to /opt directory and unzip it

>cd /opt<br>

>tar xvfz hadoop-1.2.1-bin.tar.gz

Add Hadoop to the path

>echo 'export PATH=$PATH:/opt/hadoop-1.2.1/bin' 
> /etc/profile.d/hadoop.sh<br>

Download Development java kit 6u31 from Oracle (it is needed to be registered). File jdk-6u31-linux-x64-rpm.bin

Move the binary file to /opt folder and allow execution

>cd /opt<br>

>chmod u+x jdk6u31linuxx64rpm.bin

Add the binaries to the path

>echo 'export PATH=$PATH:/usr/java/default/bin/' > /etc/profile.d/java.sh

Close the terminal and open a new one (again as a root). Assure which java version is installed. (It should be java 1.6.0_31)
```
>java -version
```

Go to Hadoop configuration folder. Add below lines at the begining of /opt/hadoop-1.2.1/conf/hadoop-env.sh file
```
>JAVA_HOME=/usr/java/default 
```
```
>export HADOOP_HEAPSIZE=12
```
Copy the 3 files from resources file (this github project) to /opt/hadoop-1.2.1/conf/
```
>mapred-site.xml 
```
```
>hdfs-site.xml 
```
```
>core-site.xml
```
Assure mapred-site.xml file has the correct name of the master (in my case is masterbicing)
```
<value>masterbicing:8021</value>
```

Create the HDFS folders.

>mkdir -pv /srv/data/dfs/nn /srv/data/dfs/dn<br>

>mkdir -pv /srv/data/dfs/sn

ONLY FOR MASTER MACHINE. Format HDFS and validate it.

>hadoop namenode -format<br>

>ls -ltrh /srv/data/dfs/nn/

Patch the service starter scripts.

>sed -i 's/hadoopdaemons/hadoopdaemon/g' /opt/hadoop1.2.1/bin/startdfs.sh

ONLY FOR MASTER MACHINE. Edit /opt/hadoop-1.2.1/bin/start-dfs.sh file and commnet (#) the line "start datanode"
```
$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode $dataStartOpt 
```
ONLY FOR SLAVE MACHINES. Edit /opt/hadoop-1.2.1/bin/start-dfs.sh file and commnet (#) the lines that contains "start namenode” and “start secondarynamenode”.
```
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode $nameStartOpt 
```
```
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters star
```
Assure shh is running
```
>service sshd start
```
Firewall blocks connections. Configure iptables: We can configure iptables to allow all connections, if these nodes are in a secure local area network which is most of the situation, by this command on all nodes:
```
>iptables -F 
```
```
>service iptables save
```
When all previous steps have been executed for all the servers (master and slaves) launch DFS daemons.
```
>cd /opt/hadoop-1.2.1/bin 
```
```
>./start-dfs.sh
```
List all run java process in each node. Master node should have "NameNode" and "SecondaryNameNode". Slave node should have "DataNode".
```
>jps -m
```
Open a browser and navigate to HDFS webUI. All the 4 slaves nodes should appear as "Live Nodes"
```
>http://masterbicing:50070
```

Patch the service starter scripts.

>sed -i 's/hadoopdaemons/hadoopdaemon/g' /opt/hadoop1.2.1/bin/startmapred.sh

ONLY FOR MASTER MACHINE. Edit /opt/hadoop-1.2.1/bin/start-mapred.sh file and commnet (#) the line "start tasktracker"
```
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start tasktracker
```
ONLY FOR SLAVE MACHINES. Edit /opt/hadoop-1.2.1/bin/start-mapred.sh file and commnet (#) the line that contains "start jobtracker".
```
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker
```
When all previous steps have been executed for all the servers (master and slaves) launch MR daemons.
```
>cd /opt/hadoop-1.2.1/bin 
```
```
>./start-mapred.sh
```
List all run java process in each node. Master node should have "JobTracker". Slave node should have "TaskTracker".
```
>jps -m
```

On one of the servers, open a terminal as root and create the user home.

>hadoop fs -mkdir /user/root<br>

>hadoop fs -chown root:root /user/root<br>

>hadoop fs -mkdir /tmp/input <br>

>hadoop fs -put /etc/passwd /tmp/input

See the result

>hadoop fs -cat /tmp/output/part00000