Optimizing Jupyter Notebooks - BKJackson/BKJackson_Wiki GitHub Wiki

Create high res retina plots in your notebook

From https://gist.github.com/minrk/3301035

# 1. magic for inline plot
# 2. magic to enable retina (high resolution) plots
# https://gist.github.com/minrk/3301035
%matplotlib inline
%config InlineBackend.figure_format = 'retina'  

Or set it up permanently in your config add the following line to your ipython_kernel_config.py, which for me is in ~/.ipython/profile_default/

c.InlineBackend.figure_format = 'retina'  

If the file does not already exist, you can generate it with all settings commented out by entering ipython profile create at the command line.

Force jupyter to reload modules

%load_ext autoreload  
%autoreload 2      

Only reload a particular module

%load_ext autoreload
%autoreload 1
%aimport mymodule  

Enable large plots in jupyter lab with sidecar

Install:

pip install sidecar
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install @jupyter-widgets/jupyterlab-sidecar

Usage:

from sidecar import Sidecar
from ipywidgets import IntSlider

sc = Sidecar(title='Sidecar Output')
sl = IntSlider(description='Some slider')
with sc:
    display(sl)  

Tutorials & Videos

Jupyter Notebooks and Production Data Science Workflows

JupyterHub

With JupyterHub you can create a multi-user Hub which spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server.

Project Jupyter created JupyterHub to support many users. The Hub can offer notebook servers to a class of students, a corporate data science workgroup, a scientific research project, or a high performance computing group.

A starter docker image for JupyterHub gives a baseline deployment of JupyterHub using Docker. ref

JupyterHub also provides a REST API for administration of the Hub and its users.

Snippets

Papermill readme link

Execute Papermill via the Python API

import papermill as pm

pm.execute_notebook(
   'path/to/input.ipynb',
   'path/to/output.ipynb',
   parameters = dict(alpha=0.6, ratio=0.1)
)

Execute Papermill via CLI

papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1

Read parameter files from a YAML file with -f

papermill local/input.ipynb s3://bkt/output.ipynb -f parameters.yaml

Tracking notebook cell timing with Papermill

https://papermill.readthedocs.io/en/latest/extending-entry-points.html#ensuring-your-engine-is-found-by-papermill  

Reading a README.md file with a Jupyter notebook

From a new, blank notebook, paste this in the first cell:

from IPython.display import display, Markdown

with open('README.md', 'r') as fh:
    content = fh.read()

display(Markdown(content))

Post-Save Hooks source

Creating the .py and .html files can be done simply and painlessly by editing the config file:

~/.ipython/profile_nbserver/ipython_notebook_config.py

and adding the following code:

### If you want to auto-save .html and .py versions of your notebook:
# modified from: https://github.com/ipython/ipython/issues/8009
import os
from subprocess import check_call

def post_save(model, os_path, contents_manager):
    """post-save hook for converting notebooks to .py scripts"""
    if model['type'] != 'notebook':
        return # only do this for notebooks
    d, fname = os.path.split(os_path)
    check_call(['ipython', 'nbconvert', '--to', 'script', fname], cwd=d)
    check_call(['ipython', 'nbconvert', '--to', 'html', fname], cwd=d)

c.FileContentsManager.post_save_hook = post_save

Now every save to a notebook updates identically-named .py and .html files. Add these in your commits and pull-requests, and you will gain the benefits from each of these file formats.

Jupyter Runner for multiple parameters and multiple sets of parameters (docs)

Notebook execution can happen in parallel with a fixed number of workers.
Note: Only compatible with Python 3.5.

pip install jupyter-runner

jupyter-run notebookA.ipynb notebookB.ipynb  

ENV_VAR=xxx jupyter-run notebook.ipynb

Jupyter Docker Stack

Jupyter Docker Stacks - Official docs - Jupyter Docker Stacks are a set of ready-to-run Docker images containing Jupyter applications and interactive computing tools.
Jupyter Data Science Stack + Docker in under 15 minutes

Databricks and Jupyter Notebooks

Databricks notebook deployment template

Articles

Examples

Publishing Python Notebooks

Making Publication Ready Python Notebooks
Hacking my way to a Jupyter notebook powered blog

Two steps for using notebooks effectively

Since notebooks are challenging objects for source control (e.g., diffs of the json are often not human-readable and merging is near impossible), we recommended not collaborating directly with others on Jupyter notebooks. There are two steps we recommend for using notebooks effectively:

  1. Follow a naming convention that shows the owner and the order the analysis was done in. We use the format --.ipynb (e.g., 0.3-bull-visualize-distributions.ipynb).

  2. Refactor the good parts. Don't write code to do the same task in multiple notebooks. If it's a data preprocessing task, put it in the pipeline at src/data/make_dataset.py and load data from data/interim. If it's useful utility code, refactor it to src.

Source: Cookiecutter Data Science

⚠️ **GitHub.com Fallback** ⚠️