Jupyter Notes - fcrimins/fcrimins.github.io GitHub Wiki
Jupyter Kernels
- StackOverflow: Choosing a Spark/Scala kernel for Jupyter/IPython (2/14/17)
- I can't speak for all of them, but I use Spark Kernel and it works very well for using both Scala and Spark.
- Spark Kernel is now Apache Toree which possibly comes from IBM (i.e. red flag)
- jove-scala is no more. The GitHub page says use jupyter-scala instead
- Zeppelin looks pretty well developed. Zeppelin Notebook - big data analysis in Scala or Python in a notebook, and connection to a Spark cluster on EC2 is a nice explanation of installing it on AWS. Zeppelin is a JVM-based alternative to Jupyter.
- This IBM link says to use jupyter-scala
- Scala Notebook (last commit 2015; from Bridgewater) - An alternative to Jupyter.
- IScala (no commits since 2014)
- jupyter-scala - This looks like the one to use (last commit 1/17)
- Installation:
cd ~/bin
curl -L -o coursier https://git.io/vgvpD && chmod +x coursier && ./coursier --help
per herecd ~/code
git clone https://github.com/alexarchambault/jupyter-scala.git
cd jupyter-scala
- add
addSbtPlugin("io.get-coursier" % "sbt-coursier" % "1.0.0-M15")
to build.sbt per here ./jupyter-scala
- Output: "Use this kernel from Jupyter notebook, running
jupyter notebook
and selecting the 'Scala' kernel."
- Installation:
6 points to compare Python and Scala for Data Science using Apache Spark
- Python is more analytical oriented while Scala is more engineering oriented
Example Notebooks (12/6/16)
- Matplotlib Tutorial
- Nice basic data sciencey, data cleaning blog post generated from a notebook
- The Importance of Preprocessing in Data Science and the Machine Learning Pipeline tutorial series
Python Data Science Handbook from O'Reilly by Jake VanderPlas
- Run
jupyter notebook
inside~/code/PythonDataScienceHandbook/notebooks/
to see the book- Local version here: http://localhost:8888/notebooks/01.00-IPython-Beyond-Normal-Python.ipynb
- First chapter is a nice background on Jupyter