Troubleshoot Environment - TheLadders/pipeline GitHub Wiki

Common Errors

Running out of disk space

It's likely that you have old, unused containers from each docker run command
These don't get garbage collected automatically as Docker assumes you may want to start them again
Use the following command to clean them out

docker rm `docker ps -aq`

NOTE: If docker fills up the root partition of your VM, then docker daemon might not start, in which case running any docker commands will tell you that the daemon is not running.

Confirm out of disk space using df -l
Blow away the Docker working dirs sudo rm -rf /var/lib/docker
Pull the pipeline image again and start over.

Are you trying to connect to a TLS-enabled daemon without TLS?

Make sure you've run the following

eval "$(docker-machine env pipelinebythebay)"

java.nio.channels.ClosedChannelException

java.nio.channels.ClosedChannelException
	at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416)
	at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416)
	at scala.util.Either.fold(Either.scala:97)
	at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:415)
	at com.bythebay.pipeline.spark.streaming.StreamingRatings$.main(StreamingRatings.scala:39)
	at com.bythebay.pipeline.spark.streaming.StreamingRatings.main(StreamingRatings.scala)

You likely have not started your services using bythebay-start.sh.
Or there was an issue starting your Spark Master and Worker services.

Caused by: java.io.FileNotFoundException: datasets/dating/ratings.csv (No such file or directory)

Caused by: java.io.FileNotFoundException: datasets/dating/ratings.csv (No such file or directory)
	at java.io.FileInputStream.open(Native Method)
	at java.io.FileInputStream.<init>(FileInputStream.java:146)
	at scala.io.Source$.fromFile(Source.scala:90)
	at scala.io.Source$.fromFile(Source.scala:75)
	at scala.io.Source$.fromFile(Source.scala:53)
	at com.bythebay.pipeline.akka.feeder.FeederActor.initData(FeederActor.scala:34)
	at com.bythebay.pipeline.akka.feeder.FeederActor.<init>(FeederActor.scala:23)

You likely have not run bythebay-config.sh or bythebay-setup.sh as the required datasets have not been uncompressed.

Failed to initialize machine "boot2docker-vm": exit status 1

Run the following to repair your busted boot2docker:

macosx-laptop$ sudo /Library/Application\ Support/VirtualBox/LaunchDaemons/VirtualBoxStartup.sh restart

More docs here

Re-run the following including the -v flag

macosx-laptop$ boot2docker stop
macosx-laptop$ boot2docker destroy
macosx-laptop$ boot2docker -v --memory=8192 --disksize=20000 init
macosx-laptop$ boot2docker up

You likely need to remove an existing directory and re-initialize boot2docker:

macosx-laptop$ rm -rf /Users/<user-name>/.boot2docker/certs/boot2docker-vm/
macosx-laptop$ boot2docker stop
macosx-laptop$ boot2docker destroy
macosx-laptop$ boot2docker -v --memory=8192 --disksize=20000 init
macosx-laptop$ boot2docker up

TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

You likely have not configured your VM environment to have enough cores to run the Spark jobs
Also, check spark-defaults.conf has the following:

spark.executor.cores=2
spark.cores.max=2

Troubleshoot Environment - TheLadders/pipeline GitHub Wiki

Common Errors

Running out of disk space

Are you trying to connect to a TLS-enabled daemon without TLS?

java.nio.channels.ClosedChannelException

Caused by: java.io.FileNotFoundException: datasets/dating/ratings.csv (No such file or directory)

Failed to initialize machine "boot2docker-vm": exit status 1

TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️