Configuring Hadoop on Ubuntu Machine - manojkumar3036/BigData-using-Hadoop GitHub Wiki

Open the bashrc file

sudo nano ~/.bashrc
// To edit -- append all those things at the end  (done for me)
export HADOOP_PREFIX="/home/hduser/hadoop-2.7.1/"
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}
// To refresh the values
source ~/.bashrc

Add the Hadoop HDFS URI (Namenode and its port)

sudo nano /home/hduser/hadoop-2.7.1/etc/hadoop/core-site.xml

<!--add this property with the configuration tag(done for me)-->
 <property>
        <name>fs.defaultFS</name>
        <value>hdfs://192.168.56.123:8020</value>
        <final>true</final>
    </property>

Add the HDFS properties

sudo nano /home/hduser/hadoop-2.7.1/etc/hadoop/hdfs-site.xml
<!--add all those properties within the configuration tag-->
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///home/hduser/hadoop-2.7.1/hadoop_data/dfs/name</value>
    </property>
  
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///home/hduser/hadoop-2.7.1/hadoop_data/dfs/data</value>
    </property>
</configuration>

Specify the MapReduce framework as YARN

sudo nano /home/hduser/hadoop-2.7.1/etc/hadoop/mapred-site.xml
//add this configuration within this file

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

Specifying the YARN Properties

sudo nano /home/hduser/hadoop-2.7.1/etc/hadoop/yarn-site.xml
<!--add all those properties within the configuration tag-->
<configuration>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>192.168.56.123:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>192.168.56.123:8030</value>
    </property>

    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>192.168.56.123:8031</value>
    </property>

    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>192.168.56.123:8033</value>
    </property>

    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>192.168.56.123:8088</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>

    <property>
        <name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
    </property>
</configuration>

Add the JAVA_HOME for Hadoop

//so first find the  whether java is installed or not and which version you have
echo $JAVA_HOME
sudo nano /home/hduser/hadoop-2.7.1/etc/hadoop/hadoop-env.sh

//add this
export JAVA_HOME=<value for JAVA_HOME variable>

Format the NameNode

hdfs namenode -format

Now start the services

Namenode			hadoop-daemon.sh start namenode
Datanode			hadoop-daemon.sh start datanode
Resourcemanager		        yarn-daemon.sh start resourcemanager
Nodemanager			yarn-daemon.sh start nodemanager
Job History Server	        mr-jobhistory-daemon.sh start historyserver

Check the version of Hadoop

hadoop version or hadoop -version

⚠️ **GitHub.com Fallback** ⚠️