Hadoop Installation (Single node) - cchantra/bigdata.github.io GitHub Wiki
(Ref: https://tecadmin.net/setup-hadoop-on-ubuntu/)
You have to install Java first. But update your VM.
sudo apt update
Install java 8.
sudo apt install openjdk-8-jdk
Create User for Haddop
We recommend creating a normal (nor root) account for Hadoop working. To create an account using the following command.
sudo adduser hadoop
make hadoop user a sudoer
sudo usermod -aG sudo hadoop
set passwod for user hadoop
sudo passwd hadoop
After creating the account, it also requires to set up key-based ssh to its own account. To do this use execute following commands.
Then login using hadoop user.
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
Now, SSH to localhost with hadoop user. This should not ask for the password but the first time it will prompt for adding RSA to the list of known hosts.
ssh hadoop@localhost
exit
Next : Install hadoop
Please login into hadoop to install:
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
(check the latest one in https://downloads.apache.org/hadoop/common/hadoop-3.2.1/)
tar xzf hadoop-3.2.1.tar.gz
mv hadoop-3.2.1 hadoop
Note: Current version is available at https://hadoop.apache.org/releases.html (3.4.1 as of Oct, 2024)
Set up pseudo distributed mode.
Edit .bashrc
check: what is .bashrc
?
add the following environment variables.
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
apply the changes using the command
source ~/.bashrc
setup java path
change hadoop environment to add JAVA_HOME
using vi editor or nano
vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
add the following line
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
set up hadoop configuration file
cd $HADOOP_HOME/etc/hadoop
Edit core-site.xml add the following to the file with your favorite editor.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Edit hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>
Edit mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Edit yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Then, format name node with the command.
hdfs namenode -format
Output is as following.
2020-12-07 22:29:55,567 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = bigdata/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.2.1
STARTUP_MSG: classpath = /home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/share/hadoop/common/lib/jetty-http-9.3.24.v20180605.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jersey-json-1.19.jar:/home/hadoop/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/hadoop/share/hadoop/common/lib/animal-sniffer-annotations-1.17.jar:/home/hadoop/hadoop/share/hadoop/common/lib/curator-framework-2.13.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/curator-client-2.13.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/re2j-1.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/metrics-core-3.2.4.jar:/home/hadoop/hadoop/share/hadoop/common/lib/error_prone_annotations-2.2.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/hadoop-auth-3.2.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/token-provider-1.0.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/httpclient-4.5.6.jar:/home/hadoop/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/common/lib/netty-3.10.5.Final.jar:/home/hadoop/hadoop/share/hadoop/common/lib/audience-annotations-0.5.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-configuration2-2.1.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/kerby-asn1-1.0.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-compress-1.18.jar:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar:/home/hadoop/hadoop/share/hadoop/common/lib/kerb-crypto-1.0.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/kerb-server-1.0.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/stax2-api-3.1.4.jar:/home/hadoop/hadoop/share/hadoop/common/lib/nimbus-jose-jwt-4.41.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/zookeeper-3.4.13.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jsr305-3.0.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/kerb-admin-1.0.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-annotations-2.9.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/kerb-simplekdc-1.0.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-lang3-3.7.jar:/home/hadoop/hadoop/share/hadoop/common/lib/j2objc-annotations-1.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jetty-webapp-9.3.24.v20180605.jar:/home/hadoop/hadoop/share/hadoop/common/lib/kerby-config-1.0.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jetty-server-9.3.24.v20180605.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jsch-0.1.54.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jetty-util-9.3.24.v20180605.jar:/home/hadoop/hadoop/share/hadoop/common/lib/kerb-common-1.0.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jetty-servlet-9.3.24.v20180605.jar:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-api-1.7.25.jar:/home/hadoop/hadoop/share/hadoop/common/lib/kerb-util-1.0.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jcip-annotations-1.0-1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/dnsjava-2.1.7.jar:/home/hadoop/hadoop/share/hadoop/common/lib/curator-recipes-2.13.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-codec-1.11.jar:/home/hadoop/hadoop/share/hadoop/common/lib/checker-qual-2.5.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/json-smart-2.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jersey-servlet-1.19.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jul-to-slf4j-1.7.25.jar:/home/hadoop/hadoop/share/hadoop/common/lib/kerb-core-1.0.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jetty-io-9.3.24.v20180605.jar:/home/hadoop/hadoop/share/hadoop/common/lib/woodstox-core-5.0.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-net-3.6.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jersey-server-1.19.jar:/home/hadoop/hadoop/share/hadoop/common/lib/asm-5.0.4.jar:/home/hadoop/hadoop/share/hadoop/common/lib/httpcore-4.4.10.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-databind-2.9.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/kerby-xdr-1.0.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-io-2.5.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-beanutils-1.9.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/accessors-smart-1.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/kerb-client-
.....
STARTUP_MSG: build = https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842; compiled by 'rohithsharmaks' on 2019-09-10T15:56Z
STARTUP_MSG: java = 1.8.0_275
************************************************************/
2020-12-07 22:29:55,579 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2020-12-07 22:29:55,693 INFO namenode.NameNode: createNameNode [-format]
2020-12-07 22:29:55,819 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-93c533c3-239b-4c60-99ed-846e023b800c
2020-12-07 22:29:56,188 INFO namenode.FSEditLog: Edit logging is async:true
2020-12-07 22:29:56,204 INFO namenode.FSNamesystem: KeyProvider: null
2020-12-07 22:29:56,206 INFO namenode.FSNamesystem: fsLock is fair: true
2020-12-07 22:29:56,207 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
2020-12-07 22:29:56,212 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE)
2020-12-07 22:29:56,212 INFO namenode.FSNamesystem: supergroup = supergroup
2020-12-07 22:29:56,212 INFO namenode.FSNamesystem: isPermissionEnabled = true
2020-12-07 22:29:56,212 INFO namenode.FSNamesystem: HA Enabled: false
2020-12-07 22:29:56,273 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2020-12-07 22:29:56,286 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
2020-12-07 22:29:56,286 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
2020-12-07 22:29:56,292 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
2020-12-07 22:29:56,292 INFO blockmanagement.BlockManager: The block deletion will start around 2020 Dec 07 22:29:56
2020-12-07 22:29:56,294 INFO util.GSet: Computing capacity for map BlocksMap
2020-12-07 22:29:56,294 INFO util.GSet: VM type = 64-bit
2020-12-07 22:29:56,296 INFO util.GSet: 2.0% max memory 1.7 GB = 35.2 MB
2020-12-07 22:29:56,296 INFO util.GSet: capacity = 2^22 = 4194304 entries
2020-12-07 22:29:56,309 INFO blockmanagement.BlockManager: Storage policy satisfier is disabled
2020-12-07 22:29:56,309 INFO blockmanagement.BlockManager: dfs.block.access.token.enable = false
2020-12-07 22:29:56,317 INFO Configuration.deprecation: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISECONDS
2020-12-07 22:29:56,317 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
2020-12-07 22:29:56,317 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
2020-12-07 22:29:56,317 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
2020-12-07 22:29:56,318 INFO blockmanagement.BlockManager: defaultReplication = 1
2020-12-07 22:29:56,318 INFO blockmanagement.BlockManager: maxReplication = 512
2020-12-07 22:29:56,318 INFO blockmanagement.BlockManager: minReplication = 1
2020-12-07 22:29:56,318 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
2020-12-07 22:29:56,318 INFO blockmanagement.BlockManager: redundancyRecheckInterval = 3000ms
2020-12-07 22:29:56,318 INFO blockmanagement.BlockManager: encryptDataTransfer = false
2020-12-07 22:29:56,318 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
2020-12-07 22:29:56,344 INFO namenode.FSDirectory: GLOBAL serial map: bits=29 maxEntries=536870911
2020-12-07 22:29:56,345 INFO namenode.FSDirectory: USER serial map: bits=24 maxEntries=16777215
2020-12-07 22:29:56,345 INFO namenode.FSDirectory: GROUP serial map: bits=24 maxEntries=16777215
2020-12-07 22:29:56,345 INFO namenode.FSDirectory: XATTR serial map: bits=24 maxEntries=16777215
2020-12-07 22:29:56,400 INFO util.GSet: Computing capacity for map INodeMap
2020-12-07 22:29:56,400 INFO util.GSet: VM type = 64-bit
2020-12-07 22:29:56,400 INFO util.GSet: 1.0% max memory 1.7 GB = 17.6 MB
2020-12-07 22:29:56,400 INFO util.GSet: capacity = 2^21 = 2097152 entries
2020-12-07 22:29:56,402 INFO namenode.FSDirectory: ACLs enabled? false
2020-12-07 22:29:56,402 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true
2020-12-07 22:29:56,402 INFO namenode.FSDirectory: XAttrs enabled? true
2020-12-07 22:29:56,402 INFO namenode.NameNode: Caching file names occurring more than 10 times
2020-12-07 22:29:56,407 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRootDescendant: true, maxSnapshotLimit: 65536
2020-12-07 22:29:56,409 INFO snapshot.SnapshotManager: SkipList is disabled
2020-12-07 22:29:56,413 INFO util.GSet: Computing capacity for map cachedBlocks
2020-12-07 22:29:56,413 INFO util.GSet: VM type = 64-bit
2020-12-07 22:29:56,414 INFO util.GSet: 0.25% max memory 1.7 GB = 4.4 MB
2020-12-07 22:29:56,414 INFO util.GSet: capacity = 2^19 = 524288 entries
2020-12-07 22:29:56,422 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2020-12-07 22:29:56,422 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2020-12-07 22:29:56,422 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2020-12-07 22:29:56,426 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2020-12-07 22:29:56,426 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2020-12-07 22:29:56,428 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2020-12-07 22:29:56,428 INFO util.GSet: VM type = 64-bit
2020-12-07 22:29:56,428 INFO util.GSet: 0.029999999329447746% max memory 1.7 GB = 540.2 KB
2020-12-07 22:29:56,428 INFO util.GSet: capacity = 2^16 = 65536 entries
Re-format filesystem in Storage Directory root= /home/hadoop/hadoopdata/hdfs/namenode; location= null ? (Y or N) Y
2020-12-07 22:30:01,784 INFO namenode.FSImage: Allocated new BlockPoolId: BP-559948819-127.0.1.1-1607355001775
2020-12-07 22:30:01,785 INFO common.Storage: Will remove files: [/home/hadoop/hadoopdata/hdfs/namenode/current/fsimage_0000000000000000000.md5, /home/hadoop/hadoopdata/hdfs/namenode/current/VERSION, /home/hadoop/hadoopdata/hdfs/namenode/current/fsimage_0000000000000000000, /home/hadoop/hadoopdata/hdfs/namenode/current/seen_txid]
2020-12-07 22:30:01,809 INFO common.Storage: Storage directory /home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted.
2020-12-07 22:30:01,836 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hadoop/hadoopdata/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 using no compression
2020-12-07 22:30:01,925 INFO namenode.FSImageFormatProtobuf: Image file /home/hadoop/hadoopdata/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 401 bytes saved in 0 seconds .
2020-12-07 22:30:01,937 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2020-12-07 22:30:01,943 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2020-12-07 22:30:01,944 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at bigdata/127.0.1.1
************************************************************/
Congratulations!!
Now, start your cluster.
cd $HADOOP_HOME/sbin/
./start-all.sh
This gives outputs:
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [bigdata]
bigdata: Warning: Permanently added 'bigdata' (ECDSA) to the list of known hosts.
2020-12-07 22:30:31,270 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting resourcemanager
Starting nodemanagers
check to see if everything is running correctly with jps
command.
jps
which should show the following components. Try to understand each component from the lecture.
19376 NodeManager
19202 ResourceManager
18610 NameNode
19715 Jps
19019 SecondaryNameNode
18798 DataNode
you can access the system status in the browser
Note: Name node status at 9870
http://<yourvmip>:9870/
See all information at port 8042.
http://<yourip>:8042/
Tips: In our VM, we may need to enable the inbound port for your VM in order to access the VM using web browser. If the firewall is up, checking
sudo ufw status
if it is not active,
Status: inactive
If active,
Enable firewall to allow on the port on your VM eg. for port 9870 for Ubuntu.
check: What is ufw
command?
sudo ufw allow 9870/tcp
sudo ufw reload
You may use ssh tunnel for that port to bypass the firewall before accessing the web browser. Suppose that your VM ip is 10.3.134.8
ssh -L 9870:10.3.134.8:9870 [email protected]
Then you can try
http://localhost:<port>
Eg.
http://localhost:9870
http://localhost:8088
check: What do you see?