Notebooks and tools - derlin/bda-lsa-project GitHub Wiki
Notebooks for spark-scala
spark-notebook
Source and Download: https://github.com/spark-notebook/spark-notebook
Works out of the box, but the interface is not very smooth (compared to jupyter).
toree / jupyter
Apache Toree is an extension to work with spark and scala in the Jupyter notebook (originally made for IPython) (discovered in this article). The current "stable" version is still at spark 1.X and uses scala 2.10.4.
There is a workaround to make it work with scala 2.11 though (see here). First, download the dev version of toree and then use the pip
from anaconda (in ~/anaconda/bin
):
# download
wget https://dist.apache.org/repos/dist/dev/incubator/toree/0.2.0/snapshots/dev1/toree-pip/toree-0.2.0.dev1.tar.gz
tar xvf toree-0.2.0.dev1.tar.gz
# install
pip install -e toree-0.2.0.dev1
# configure
jupyter toree install --interpreters=Scala --spark_home=$SPARK_HOME --user --kernel_name=apache_toree --interpreters=PySpark,SparkR,Scala,SQL
You will then have a toree scala option in the new tab of the jupyter notebook.
Note: for Mac users, if you installed spark with homebrew, ensure $SPARK_HOME
points to /usr/local/Cellar/apache-spark/XXXX/libexec
. Without the /libexec
at the end, jupyter will complaint it does not find the executables.