Home - sidaw/codalab-worksheets GitHub Wiki

Why CodaLab Worksheets?

While there has been tremendous progress in machine learning, data science, natural language processing, computer vision, and many other data- and computation-intensive fields, the research process is far from optimal. Most of the time, the output of research is simply a PDF file (published paper). Even when people release their data and code (which is a big step forward), it is often not obvious how to run it to obtain the results in a paper. Simply put:

Today, researchers spend excrutiating amounts of time reproducing published results.

The goal of CodaLab Worksheets is to fix this in order to both accelerate the rate of research and make it more sound.

How does CodaLab Worksheets work?

CodaLab keeps the full provenance of an experiment, from raw data to the final performance numbers that you put in your paper.

There are two important concepts in CodaLab: bundles and worksheets.

  • Users upload bundles, which are datasets in any format or programs in any programming language). They can also create new run bundles by executing shell commands that depend on the contents of previous bundles. This forms a graph over bundles that captures the research process in an immutable way.
  • Users create worksheets to present the information in a comprehensible way, which contain pointers to the bundles. Worksheets are written in a custom markdown language.

The figure below shows the dependency graph over four bundles, along with two worksheets, which contain both text and pointers to the bundles:

A run bundle is specified by a set of bundle dependencies and an arbitrary shell command. This shell command is executed in a docker container in a directory with the dependencies. The contents of the run bundle are the files/directories which are written to the current directory by the shell command:

CodaLab's philosophy is to give you full control of how you want to run your experiments and get out of your way. It just maintains the dependency structure of your experiments and takes care of the actual execution. A good analogy is Git, which just maintains the revision history, but gives you total freedom in terms of what to put in your repository.

How do I learn more?

Where do I report bugs?

CodaLab is under active development. If you find bugs or have feature requests, please file a GitHub issue: