Creating Hadoop Cluster on Ubuntu - Vishwajeetsinh98/random-forest-using-hadoop-mapreduce GitHub Wiki
In this page, we will describe how to create a 4 node Hadoop Cluster
@on ALL nodes add this to /etc/hosts:
#master
192.168.10.135 master
#secondary name node
192.168.10.136 secondarymaster
#slave 1
192.168.10.140 slave1
#slave 2
192.168.10.141 slave2
Replace the IPs with those of your nodes
@on MASTER node:
# GENERATE DSA KEY-PAIR
ssh-keygen -t dsa -f ~/.ssh/id_dsa
# MAKE THE KEYPAIR TRUSTED
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
# COPY THIS KEY TO ALL OTHER NODES:
scp -r ~/.ssh ubuntu@secondarymaster:~/
scp -r ~/.ssh ubuntu@slave1:~/
scp -r ~/.ssh ubuntu@slave2:~/
scp -r ~/.ssh ubuntu@cassandra1:~/
@ on ALL nodes
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update && sudo apt-get -y install oracle-java8-installer
@ on ALL nodes
echo '
#HADOOP VARIABLES START
export HADOOP_PREFIX=/home/ubuntu/hadoop
export HADOOP_HOME=/home/ubuntu/hadoop
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=${HADOOP_PREFIX}/lib/native"
export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar
#HADOOP VARIABLES END
' >> ~/.bashrc
source ~/.bashrc
@ on ALL nodes
#Download
wget http://apache.mirror.gtcomm.net/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
#Extract
tar -xzvf ./hadoop-2.7.1.tar.gz
#Rename to target directory of /home/ubuntu/hadoop
mv hadoop-2.7.1 hadoop
#Create directory for HDFS filesystem
mkdir ~/hdfstmp
@on ALL nodes
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/ubuntu/hdfstmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:8020</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:8020</value>
</property>
</configuration>
@ on ALL nodes
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>secondarymaster:50090</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:8020</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/ubuntu/hdfstmp/dfs/name/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/ubuntu/hdfstmp/dfs/name</value>
<final>true</final>
</property>
</configuration>
@ on ALL nodes
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://hadoopmaster:8021</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
@ on ALL nodes
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
@ on ALL nodes change from:
export JAVA_HOME=${JAVA_HOME}
to:
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
@ on MASTER AND SECOONDARYMASTER, Edit slaves
slave1
slave2
That's it! You've setup the cluster for Hadoop to run on. In the next page, we will discuss how to test and run codes on the cluster.