Overview - norrissam/ra-manual GitHub Wiki

Our workflow for joint projects draws extensively from the Gentzkow and Shapiro RA manual on GitHub. In some places we simply link to their manuals or even quote them directly. This workflow is stripped down relative to the full Gentzkow and Shapiro workflow. The simpler workflow conveys the benefit of being accessible to a broader range of collaborators who cannot pay the fixed costs of using the full system. An excellent overview of the Gentzkow-Shapiro RA manual is the PDF Code and Data for the Social Sciences: A Practitioner's Guide. While outdated on some of the specifics (e.g., SVN vs GitHub), this overview is a worthwhile read on principles for managing code and data.

The workflow has three core principles:

  1. One-stroke production. The entire project, from initial data to all final results, tables, and figures, can be run from one command, typically code/master.do. This means that all intermediate steps (e.g. importing and analyzing data, taking a csv output table and inserting into the paper, compiling the paper) are fully automated. This prevents us from, for example, changing a data prep routine but forgetting to update all the results.
  2. Coding for replication. At the end of the project, we will publicly post all code and data that we are legally able to post.
  3. Unambiguous processes. Once the concept for the project has been decided on, any new collaborator should be able to jump in and continue where a previous person had left off. We keep task management (GitHub Issues) up-to-date. We have no half-finished or legacy files in the folders.