Errors&Fixes - SpiRITlab/SparkFHE-Addon GitHub Wiki
1. Datanode failed to start after calling the start dfs script (due to Incompatible clusterIDs in namenode and datanode)
Try to run this command in one of the datanode
sudo /hadoop/bin/hdfs datanode
If you see the following error, then read on
19/06/02 10:52:28 INFO common.Storage: Using 1 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
19/06/02 10:52:28 INFO common.Storage: Lock on /data/hadoop/data/in_use.lock acquired by nodename [email protected]
19/06/02 10:52:28 WARN common.Storage: Failed to add storage directory [DISK]file:/data/hadoop/data/
java.io.IOException: **Incompatible clusterIDs in /data/hadoop/data: namenode clusterID = CID-79fa7c14-46a5-4c3d-b693-c026a8d0d3b9; datanode clusterID = CID-c63786f8-0907-4356-b4f0-f079abd3eafa**
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:760)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:293)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:409)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:388)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
at java.base/java.lang.Thread.run(Thread.java:844)
19/06/02 10:52:28 ERROR datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid c9692d9e-a360-423d-89bb-626ad5b85748) service to master/10.10.1.2:9000. Exiting.
java.io.IOException: All specified directories have failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:557)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
at java.base/java.lang.Thread.run(Thread.java:844)
19/06/02 10:52:28 WARN datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid c9692d9e-a360-423d-89bb-626ad5b85748) service to master/10.10.1.2:9000
19/06/02 10:52:28 INFO datanode.DataNode: Removed Block pool <registering> (Datanode Uuid c9692d9e-a360-423d-89bb-626ad5b85748)
19/06/02 10:52:30 WARN datanode.DataNode: Exiting Datanode
19/06/02 10:52:30 INFO datanode.DataNode: SHUTDOWN_MSG:
Solution: We need to remove the old clusterID assignment to the datanode. Then, reformat namenode will make sure the clusterID is consistent for namenode and datanode.
on datanode
sudo rm -rf /hdfs/*
on namenode
sudo /hadoop/sbin/stop-dfs.sh
sudo rm -rf /hdfs/*
sudo /hadoop/bin/hdfs namenode -format
sudo /hadoop/sbin/start-dfs.sh
Reference from this post
java.net.ConnectException: Call From marta-komputer/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Solution:
<!-- default RPC IP,and use 0.0.0.0 to represent all ips-->
<property>
<name>dfs.namenode.rpc-bind-host</name>
<value>0.0.0.0</value>
</property>
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fe858021c27, pid=7166, tid=0x00007fe859382700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_201-b09) (build 1.8.0_201-b09)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.201-b09 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V [libjvm.so+0x8cac27] Monitor::ILock(Thread*) [clone .part.2]+0x17
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /var/lib/mesos/slaves/7fc7fe9a-3b38-418e-89c2-8b4a4068fa9a-S1/frameworks/5d9c828d-b45d-470b-ab17-a69f822379a2-0000/executors/driver-20190602122729-0009/runs/d827d88f-b5da-4f21-bdd6-9a6ef9fa9fb4/hs_err_pid7166.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
This is a problem caused by function call from shared library failed to release resources and crashes the JVM at closing. This problem exists when you have two separate share libraries expose functionalities through JNI.
Solution: apply this patch and replace libhdfs.so on all nodes.
Solution:
wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.bz2
tar xvf protobuf-2.5.0.tar.bz2
cd protobuf-2.5.0
./configure CC=clang CXX=clang++ CXXFLAGS='-std=c++11 -stdlib=libc++ -O3 -g' LDFLAGS='-stdlib=libc++' LIBS="-lc++ -lc++abi"
make -j 4
sudo make install
protoc --version
More info, instructions for installing Hadoop on mac osx
You can double check on this error with sudo tail -n100 /var/log/syslog
.
To fix this problem, if zookeeper is installed through apt get, then sudo vim /etc/rc1.d/K01zookeeper
and change NAME=zookeeper
to NAME=root
.
Then, restart zookeeper
sudo systemctl daemon-reload
sudo systemctl stop zookeeper.service
sudo systemctl start zookeeper.service
and restart mesos as well
sudo systemctl restart mesos-master
sudo systemctl restart zookeeper
sudo systemctl restart spark