IPEP 17: Notebook Format 4 - ShuaiYAN/ipython GitHub Wiki
Status | Implemented |
Author | Min RK <[email protected]> |
Created | April 29, 2013 |
Updated | November 3, 2014 |
Discussion | 5733 |
Implementation | 6045 |
There are a few changes we need to make to the notebook that will not be backward compatible. We do not intend to make these changes for 1.0, because nbformat changes are quite painful. This is a catalog of the changes we intend to make when we do next rev the nbformat.
The worksheets
field is a list, but we have no UI to support multiple worksheets.
Our design has since shifted to heading-cell based structure,
so we never intend to support the multiple worksheet model. The worksheets
list of lists
shall be replaced with a single list, called cells
.
We transform mimetype output data to short names, like json
or png
.
These should be restored to proper mimetype values of image/png
and application/json
, etc. used by the message spec. The output should be generated by a simple passthrough of the messages, rather than a whitelist transform.
Following IPEP 13, Python-specific keys in the message spec and notebook will be removed. Those affecting the notebook format:
-
pyout
will becomeexecute_result
-
pyerr
will becomeerror
Currently text cells have a source
key, which contains the text, and code cells have an input
key.
There is no reason for the two cell types to have a different name for their content:
-
CodeCell.input
will becomeCodeCell.source
, matchingTextCell.source
.
- remove notebook name from metadata
- move
language
key from code cells to top-level notebook metadata - add kernel info to top-level notebook metadata in some form
- add
format
key to raw_cell metadata - add state for show/hide (already have) and auto-scroll.
Tasks involved in creating nbformat v4:
- thoroughly define the v4 spec
- update message spec keys (pyout, pyerr, etc.)
- mime-type keys for output (affects nbconvert, nbformat, javascript)
- remove worksheets, move cells to top-level list
- add conversions to nbformat: v3->v2, v4->v3, v3->v4
- metadata changes
- widget-related changes (TBD)
- we will need v4->v4 to track changes to v4 during development. If so, this should probably not be included in release, right?
I think this is the logical order of these tasks:
- Define v4 in a doc (not just changes, full spec - v3 was never fully defined)
- add downgrade API to nbformat (or nbconvert, unclear which), and implement v3->v2
- copy v3 to v4, adding empty v4->v3 and v3->v4, removing the py/json distinction (nbconvert is responsible for .py now)
- remove worksheet in v4
- update msg spec keys that are reflected in notebook
- use mime-type output keys
- update various metadata keys (this mainly affects javascript code)
v2<->v3 conversion APIs can be done while v4 is being defined, but no part of v4 should be implemented until the spec is documented. Incremental implementations of v4 features, starting with 4. can be implemented in discrete PRs, probably on a v4 feature branch. Their order relative to each other isn't critically important.
Each time a change is made to the in-development v4 spec:
- update spec doc
- update nbformat.v4
- update v4->v3 and v3->v4
- update v4->v4?
- update javascript, if affected
- update nbconvert, if affected
- TEST EVERY NEW CHANGE
The specification is being defined using a JSON schema, which notebooks can then be validated against. The actual schema document is being developed in 5733. Additionally, here is an outline of the specification:
-
metadata
: an object containing any top-level notebook metadata. There are three reserved metadata keys which are optional, but if included must follow the following format:-
kernel_info
: an object containing information about the kernel that the notebook should be run with (see also IPEP 13. It should include the following keys:-
name
: the name of the kernel specification -
language
: the language that the kernel runs -
codemirror_mode
: (optional) the codemirror mode to use when displaying the notebook
-
-
signature
: a string containing the hash of the notebook, for verification purposes -
orig_nbformat
: if the notebook was converted from a different format, this should be an integer indicating the major version of that format
-
-
nbformat_minor
: notebook format minor number -
nbformat
: notebook format major number (should be 4) -
cells
: an array of cells, which should be of typeraw
,markdown
,heading
, orcode
.
In general, cells should have:
-
cell_type
: a string indicating the cell type, one of "raw", "markdown", "heading", or "code" -
metadata
: an object containing any cell-level metadata. There are two reserved keys, which are optional but if used must conform to the following format (see also IPEP 20)-
name
: a non-empty string representing the cell's name -
tags
: an array of cell tags, each of which is a string. Tags should not contain commas, and should be unique.
-
-
source
: a "multiline string", which is either an array of strings that will be concatenated, or a single string
Raw cells have an additional reserved metadata key:
-
format
: a string indicating the raw cell format for use with nbconvert
Markdown cells have no additional properties.
Heading cells should have one additional property:
-
level
: an integer from 1-6 indicating the heading level
Code cells should have a few additional properties:
-
outputs
: an array of outputs; see the Output formats section below -
prompt_number
: the cell's prompt number, which is either an integer value or null
Code cells also have a few additional reserved metadata keys:
-
collapsed
: a boolean indicating whether the cell is collapsed or expanded -
autoscroll
: a value indicating whether the cell should be autoscrolled; should be one oftrue
,false
, or "auto"
There are four different types of outputs that may be associated with a code cell: execute_result
(the result of executing the cell), display_data
(data that is displayed from the cell), stream
(text that is printed from a stream, usually standard out), and error
(the traceback that is produced when an error occurs).
All output formats should have the following properties:
-
output_type
: a string, either "execute_result", "display_data", "stream", or "error" -
metadata
: an object containing output metadata. This is mainly used just forexecute_result
anddisplay_data
outputs, and should include the same mimetype keys as the output itself. See also IPEP 13
The execute_result
output should have the following additional properties:
-
prompt_number
: the prompt number of the output (should be the same as the cell's prompt number) - mimetype: the key itself should be a valid mimetype (e.g., "text/plain" or "image/png"). The value should be either a string, or an array of strings.
The display_data
output should have the following additional properties:
- mimetype: the key itself should be a valid mimetype (e.g., "text/plain" or "image/png"). The value should be either a string, or an array of strings.
The stream
output should have the following additional properties:
-
name
: a string denoting the stream type or destination (e.g. "stdout") -
data
: the stream's text output, which is a "multiline string" (stored as either a single string, or an array of strings).
The error
output should have the following additional properties:
-
ename
: the name of the error -
evalue
: the value, or message, of the error -
traceback
: the error's traceback, represented as an array of strings