[Archive 1.0] 12. Troubleshooting - mata-elang-stable/MataElang-Platform GitHub Wiki
Spark
Spark-submit Error: "The offset was changed."
MicroBatchExecution: Query [id = 4bc1...b55c, runId = 9645...a6f8] terminated with error java.lang.IllegalStateException: Partition snoqttv5-0's offset was changed from 59311 to 182, some data may have been missed.
Or
Caused by: java.lang.IllegalStateException: Cannot fetch offset 223540 (GroupId: spark-kafka-source-e6c...779-executor, TopicPartition: snoqttv5-0). Some data may have been lost because they are not available in Kafka any more; either the data was aged out by Kafka or the topic may have been deleted before all the data in the topic was processed. If you don't want your streaming query to fail on such cases, set the source option "failOnDataLoss" to "false".
- Cause
Some data in Kafka was lost while the Spark application was stopped.
- Solution
You need to reset the offset of the Spark application stored in Hadoop.
Run the following scripts.
This command tool is stored in GitHub.
# Delete directories
hdfs dfs -rm -r hdfs://localhost:9000/user/[USERNAME]/job
hdfs dfs -rm -r hdfs://localhost:9000/user/[USERNAME]/kaspa
hdfs dfs -rm -r hdfs://localhost:9000/user/[USERNAME]/kafka-checkpoint
hdfs dfs -rm -r hdfs://localhost:9000/user/[USERNAME]/kaspa-checkpoint
hdfs dfs -rm -r hdfs://localhost:9000/user/[USERNAME]/schema/raw_kaspa
# Reconstruct directories
hdfs dfs -mkdir -p hdfs://localhost:9000/user/[USERNAME]/job
hdfs dfs -mkdir -p hdfs://localhost:9000/user/[USERNAME]/kaspa
hdfs dfs -mkdir -p hdfs://localhost:9000/user/[USERNAME]/kafka-checkpoint
hdfs dfs -mkdir -p hdfs://localhost:9000/user/[USERNAME]/kaspa-checkpoint
hdfs dfs -mkdir -p hdfs://localhost:9000/user/[USERNAME]/schema/raw_kaspa
After resetting Hadoop, submit the spark application again.
Spark-submit Error: "AccessControlException"
FileFormatWriter: Aborting job 02fc...d126. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 9, xxx.xxx.xxx.xxx, executor 2): org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user/ubuntu/kaspa":ubuntu:supergroup:drwxr-xr-x
- Cause
The execution user of the command is wrong.
- Solution
According to the above error message, the execution user must be "ubuntu", not "root". You need to run the command as user "ubuntu".
Spark-submit Error: "All masters are unresponsive!"
WARN HttpChannel: /jobs/ java.util.NoSuchElementException: Failed to get the application information. If you are starting up Spark, please wait a while until it's ready.
ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
ERROR SparkContext: Error initializing SparkContext.
- Cause
The value of spark master address you set is incorrect.
- Solution
Make sure that the same host address is set when you execute following commands and edit files.
/usr/local/spark/conf/spark-env.sh
SPARK_MASTER_HOST=localhost
start-slave command
start-slave.sh spark://localhost:7077
~/KaspaCoreSystem/src/main/resources/application.conf
SPARK_MASTER = "spark://localhost:7077
spark-submit command
spark-submit --master spark://localhost:7077 --class me.mamotis.kaspacore.jobs.DataStream --total-executor-cores 4 --conf spark.submit.deployMode=cluster --conf spark.executor.cores=1 --conf spark.executor.memory=2g file:///usr/local/spark/jars/KaspaCore-assembly-0.1.jar
Cassandra
Installation Error: "the public key is not available"
ERROR GPG error: http://www.apache.org 36x InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A278B781FE4B2BDA
- Cause
You specified the wrong public key. However, this error is caused by Inadequate document.
- Solution
You need to run the command with the correct public key like the following.
sudo apt-key adv --keyserver pool.sks-keyservers.net --recv-key A278B781FE4B2BDA && sudo apt update
Installation Error: "Unable to connect to any servers"
ERROR: Connection error: ('Unable to connect to any servers', {'127.0.0.1': TypeError ('ref () does not take keyword arguments',)})
- Cause
The required tools are not installed.
- Solution
Install the required tools and set the environment variable as follows:
sudo apt install python-pip
pip install cassandra-driver
export CQLSH_NO_BUNDLED=true