Hadoop Development Onboarding (Linux, Single Cluster) - BurraAbhishek/Python_Hadoop_MapReduce_MarketBasketAnalysis Wiki

Original URL: https://github.com/BurraAbhishek/Python_Hadoop_MapReduce_MarketBasketAnalysis/wiki/Hadoop-Development-Onboarding-(Linux,-Single-Cluster)

The following instructions outline how to set up your Hadoop development environment. The instructions are aimed to be agnostic of the platform the stack is installed on, so a working knowledge of the specifics of your GNU/Linux distribution or other such Unix based operating system is assumed.

Prerequisites

Before beginning, please ensure that you have the following tools installed, using your favorite package manager to install them where applicable.

Hardware

Tools and dependency managers

Setup

$ java -version
openjdk version "15.0.2" 2021-01-19
OpenJDK Runtime Environment (build 15.0.2+7-27)
OpenJDK 64-Bit Server VM (build 15.0.2+7-27, mixed mode, sharing)

indicates that OpenJDK 15.0.2 was installed successfully into the user's computer.

1:8.2p1-4

then ssh was successfully installed.

sudo service ssh status

The active parameter should be active and running.

ssh-keygen -t rsa -P ""

When prompted to enter the file in which to save the key, just hit Enter without typing anything. Wait until you get the key's randomart image.

cat /home/<username>/.ssh/id_rsa.pub >> /home/<username>/.ssh/authorized_keys

Replace <username> with your username (Click on files and go to home directory to see your username).

sudo chmod 700 ~/.ssh
sudo chmod 600 ~/.ssh/authorized_keys

Now your ssh configuration is ready.

<hadoop_directory>/etc/hadoop

Replace <hadoop_directory> with your Hadoop installation location.

export JAVA_HOME=

(in Hadoop 3.3.0, this is line #54) This would be in the following location (Do NOT add these to the end of your file):

# The java implementation to use. By default, this environment
# variable is REQUIRED on ALL platforms except OS X!
export JAVA_HOME=<jdk_location>

(in Hadoop 3.3.0, this is line #52-54) Replace <jdk_location> with the location of your Java installation. Save the file.

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>

Save the file.

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>dfs.name.dir</name>
    <value>file:///{hdfs_location}/hdfs/namenode</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>file:///{hdfs_location}/hdfs/datanode</value>
  </property>
</configuration>
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <name>yarn.app.mapreduce.am.env</name>
    <value>HADOOP_MAPRED_HOME={path_to_hadoop_installation}</value>
  </property>
  <property>
    <name>mapreduce.map.env</name>
    <value>HADOOP_MAPRED_HOME={path_to_hadoop_installation}</value>
  </property>
  <property>
    <name>mapreduce.reduce.env</name>
    <value>HADOOP_MAPRED_HOME={path_to_hadoop_installation}</value>
  </property>
</configuration>
<configuration>
  <!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
  </property>
</configuration>

Save the file.

sudo mkdir -p {hdfs_directory}/hdfs/namenode
sudo mkdir -p {hdfs_directory}/hdfs/datanode

Replace {hdfs_directory} with the location of {hdfs_location} mentioned while editing hdfs_site.xml,

In the following steps, replace {hdfs_directory} with this location.

sudo chmod 777 -R {hdfs_directory}
hdfs namenode -format
# Hadoop path setting
export HADOOP_HOME=<hadoop_directory_location>/hadoop
export HADOOP_CONF_DIR=<hadoop_directory_location>/hadoop/etc/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Replace the <hadoop_directory_location> with the location of your hadoop installation.

source ~/.bashrc

or restart your system.