Optimizing Jupyter Notebooks - BKJackson/BKJackson_Wiki GitHub Wiki
From https://gist.github.com/minrk/3301035
# 1. magic for inline plot
# 2. magic to enable retina (high resolution) plots
# https://gist.github.com/minrk/3301035
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
Or set it up permanently in your config
add the following line to your ipython_kernel_config.py, which for me is in ~/.ipython/profile_default/
c.InlineBackend.figure_format = 'retina'
If the file does not already exist, you can generate it with all settings commented out by entering ipython profile create at the command line.
%load_ext autoreload
%autoreload 2
%load_ext autoreload
%autoreload 1
%aimport mymodule
Enable large plots in jupyter lab with sidecar
Install:
pip install sidecar
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install @jupyter-widgets/jupyterlab-sidecar
Usage:
from sidecar import Sidecar
from ipywidgets import IntSlider
sc = Sidecar(title='Sidecar Output')
sl = IntSlider(description='Some slider')
with sc:
display(sl)
Jupyter Notebooks and Production Data Science Workflows
With JupyterHub you can create a multi-user Hub which spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server.
Project Jupyter created JupyterHub to support many users. The Hub can offer notebook servers to a class of students, a corporate data science workgroup, a scientific research project, or a high performance computing group.
A starter docker image for JupyterHub gives a baseline deployment of JupyterHub using Docker. ref
JupyterHub also provides a REST API for administration of the Hub and its users.
Papermill readme link
import papermill as pm
pm.execute_notebook(
'path/to/input.ipynb',
'path/to/output.ipynb',
parameters = dict(alpha=0.6, ratio=0.1)
)
papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1
papermill local/input.ipynb s3://bkt/output.ipynb -f parameters.yaml
https://papermill.readthedocs.io/en/latest/extending-entry-points.html#ensuring-your-engine-is-found-by-papermill
From a new, blank notebook, paste this in the first cell:
from IPython.display import display, Markdown
with open('README.md', 'r') as fh:
content = fh.read()
display(Markdown(content))
Post-Save Hooks source
Creating the .py and .html files can be done simply and painlessly by editing the config file:
~/.ipython/profile_nbserver/ipython_notebook_config.py
and adding the following code:
### If you want to auto-save .html and .py versions of your notebook:
# modified from: https://github.com/ipython/ipython/issues/8009
import os
from subprocess import check_call
def post_save(model, os_path, contents_manager):
"""post-save hook for converting notebooks to .py scripts"""
if model['type'] != 'notebook':
return # only do this for notebooks
d, fname = os.path.split(os_path)
check_call(['ipython', 'nbconvert', '--to', 'script', fname], cwd=d)
check_call(['ipython', 'nbconvert', '--to', 'html', fname], cwd=d)
c.FileContentsManager.post_save_hook = post_save
Now every save to a notebook updates identically-named .py and .html files. Add these in your commits and pull-requests, and you will gain the benefits from each of these file formats.
Jupyter Runner for multiple parameters and multiple sets of parameters (docs)
Notebook execution can happen in parallel with a fixed number of workers.
Note: Only compatible with Python 3.5.
pip install jupyter-runner
jupyter-run notebookA.ipynb notebookB.ipynb
ENV_VAR=xxx jupyter-run notebook.ipynb
Jupyter Docker Stacks - Official docs - Jupyter Docker Stacks are a set of ready-to-run Docker images containing Jupyter applications and interactive computing tools.
Jupyter Data Science Stack + Docker in under 15 minutes
Databricks notebook deployment template
- Profiling and Timing Code
- Boost Your Jupyter Notebook Productivity
- Favorite iPython Notebook Tricks
- Version Control for Jupyter Notebook
- Jupyter Notebook Best Practices for Data Science
- Power-Ups for Jupyter Notebooks
- Jupyter Notebook Best Practices - D. Haitz, 3/27/2019
Making Publication Ready Python Notebooks
Hacking my way to a Jupyter notebook powered blog
Since notebooks are challenging objects for source control (e.g., diffs of the json are often not human-readable and merging is near impossible), we recommended not collaborating directly with others on Jupyter notebooks. There are two steps we recommend for using notebooks effectively:
-
Follow a naming convention that shows the owner and the order the analysis was done in. We use the format --.ipynb (e.g., 0.3-bull-visualize-distributions.ipynb).
-
Refactor the good parts. Don't write code to do the same task in multiple notebooks. If it's a data preprocessing task, put it in the pipeline at src/data/make_dataset.py and load data from data/interim. If it's useful utility code, refactor it to src.