[Archive 1.0] 12. Troubleshooting - mata-elang-stable/MataElang-Platform GitHub Wiki

Spark

Spark-submit Error: "The offset was changed."

MicroBatchExecution: Query [id = 4bc1...b55c, runId = 9645...a6f8] terminated with error java.lang.IllegalStateException: Partition snoqttv5-0's offset was changed from 59311 to 182, some data may have been missed.

Caused by: java.lang.IllegalStateException: Cannot fetch offset 223540 (GroupId: spark-kafka-source-e6c...779-executor, TopicPartition: snoqttv5-0). Some data may have been lost because they are not available in Kafka any more; either the data was aged out by Kafka or the topic may have been deleted before all the data in the topic was processed. If you don't want your streaming query to fail on such cases, set the source option "failOnDataLoss" to "false".

- Cause

Some data in Kafka was lost while the Spark application was stopped.

- Solution

You need to reset the offset of the Spark application stored in Hadoop.

Run the following scripts.

This command tool is stored in GitHub.

# Delete directories
hdfs dfs -rm -r hdfs://localhost:9000/user/[USERNAME]/job  
hdfs dfs -rm -r hdfs://localhost:9000/user/[USERNAME]/kaspa  
hdfs dfs -rm -r hdfs://localhost:9000/user/[USERNAME]/kafka-checkpoint  
hdfs dfs -rm -r hdfs://localhost:9000/user/[USERNAME]/kaspa-checkpoint  
hdfs dfs -rm -r hdfs://localhost:9000/user/[USERNAME]/schema/raw_kaspa  
 
# Reconstruct directories
hdfs dfs -mkdir -p hdfs://localhost:9000/user/[USERNAME]/job  
hdfs dfs -mkdir -p hdfs://localhost:9000/user/[USERNAME]/kaspa  
hdfs dfs -mkdir -p hdfs://localhost:9000/user/[USERNAME]/kafka-checkpoint  
hdfs dfs -mkdir -p hdfs://localhost:9000/user/[USERNAME]/kaspa-checkpoint  
hdfs dfs -mkdir -p hdfs://localhost:9000/user/[USERNAME]/schema/raw_kaspa

After resetting Hadoop, submit the spark application again.

Spark-submit Error: "AccessControlException"

FileFormatWriter: Aborting job 02fc...d126. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 9, xxx.xxx.xxx.xxx, executor 2): org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user/ubuntu/kaspa":ubuntu:supergroup:drwxr-xr-x

- Cause

The execution user of the command is wrong.

- Solution

According to the above error message, the execution user must be "ubuntu", not "root". You need to run the command as user "ubuntu".

Spark-submit Error: "All masters are unresponsive!"

WARN HttpChannel: /jobs/ java.util.NoSuchElementException: Failed to get the application information. If you are starting up Spark, please wait a while until it's ready.

ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.

ERROR SparkContext: Error initializing SparkContext.

- Cause

The value of spark master address you set is incorrect.

- Solution

Make sure that the same host address is set when you execute following commands and edit files.

/usr/local/spark/conf/spark-env.sh

SPARK_MASTER_HOST=localhost

start-slave command

start-slave.sh spark://localhost:7077

~/KaspaCoreSystem/src/main/resources/application.conf

SPARK_MASTER = "spark://localhost:7077

spark-submit command

spark-submit --master spark://localhost:7077 --class me.mamotis.kaspacore.jobs.DataStream --total-executor-cores 4 --conf spark.submit.deployMode=cluster --conf spark.executor.cores=1 --conf spark.executor.memory=2g file:///usr/local/spark/jars/KaspaCore-assembly-0.1.jar

Cassandra

Installation Error: "the public key is not available"

ERROR GPG error: http://www.apache.org 36x InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A278B781FE4B2BDA

- Cause

You specified the wrong public key. However, this error is caused by Inadequate document.

- Solution

You need to run the command with the correct public key like the following.

sudo apt-key adv --keyserver pool.sks-keyservers.net --recv-key A278B781FE4B2BDA && sudo apt update

Installation Error: "Unable to connect to any servers"

ERROR: Connection error: ('Unable to connect to any servers', {'127.0.0.1': TypeError ('ref () does not take keyword arguments',)})

- Cause

The required tools are not installed.

- Solution

Install the required tools and set the environment variable as follows:

sudo apt install python-pip
pip install cassandra-driver
export CQLSH_NO_BUNDLED=true