Notebooks and tools - derlin/bda-lsa-project GitHub Wiki

Notebooks for spark-scala

spark-notebook

Source and Download: https://github.com/spark-notebook/spark-notebook

Works out of the box, but the interface is not very smooth (compared to jupyter).

toree / jupyter

Apache Toree is an extension to work with spark and scala in the Jupyter notebook (originally made for IPython) (discovered in this article). The current "stable" version is still at spark 1.X and uses scala 2.10.4.

There is a workaround to make it work with scala 2.11 though (see here). First, download the dev version of toree and then use the pip from anaconda (in ~/anaconda/bin):

# download 
wget https://dist.apache.org/repos/dist/dev/incubator/toree/0.2.0/snapshots/dev1/toree-pip/toree-0.2.0.dev1.tar.gz
tar xvf toree-0.2.0.dev1.tar.gz
# install
pip install -e toree-0.2.0.dev1
# configure
jupyter toree install --interpreters=Scala --spark_home=$SPARK_HOME --user --kernel_name=apache_toree --interpreters=PySpark,SparkR,Scala,SQL

You will then have a toree scala option in the new tab of the jupyter notebook.

Note: for Mac users, if you installed spark with homebrew, ensure $SPARK_HOME points to /usr/local/Cellar/apache-spark/XXXX/libexec. Without the /libexec at the end, jupyter will complaint it does not find the executables.