Reproducible Research - jonathancolmer/lab-guide GitHub Wiki

Fundamentals of Reproducible Research

Reproducibility is essential. It ensures that results can be independently verified, extended, and trusted — both by others and by your future self. This page explains why reproducibility matters in our lab, what it looks like in practice, and how to make it part of your day-to-day workflow.


What We Are Trying to Avoid

Why Reproducibility Matters

  • Transparency — Public data and code allow others to replicate, extend, and evaluate your work.
  • Knowledge generation — Open data only leads to insight when paired with code and documentation.
  • Requirements — Many journals and funders now require reproducibility packages.
  • Efficiency — Saves time when you revisit or extend your own work.

“Out of 100+ reproducibility packages submitted in FY24, only 16% fully reproduced on first try.” — Lars Vilhuber, AEA Data Editor Report (2023)

Reproducibility is not an extra step at the end — it’s a mindset applied from the start of a project.


Common Problems Without Reproducibility

  • Confusion over which dataset version is correct.
  • Scripts that error out after time has passed.
  • Poorly named files like final_final2_noReallyFINAL.do.
  • Hours spent trying to recreate results, only to get different answers.

Core Practices for Reproducibility

  1. Clear Folder Structure
    Treat your folder system as project scaffolding. Organize data, code, documentation, and outputs logically.

  2. Modular, Reusable Scripts
    Avoid ad hoc console commands. Break work into scripts with clear inputs and outputs.

  3. Meaningful Comments
    Explain why you did something, not just what you did.

  4. Documentation
    Record:

    • Which data you used and where it came from.
    • Purpose of each script.
    • Key analysis decisions.
  5. Changelog or Research Log
    Track important changes over time.

  6. Version Control
    Use Git instead of appending dates or “final” to filenames.

  7. Peer Review
    Have someone else read your code before archiving or submitting.


Version Control

Version control systems like Git are critical for collaborative research:

  • Track every change to code and documentation.
  • Collaborate without overwriting each other’s work.
  • Roll back to earlier versions when something breaks.
  • Avoid clutter from multiple versions of the same file.

Code as Research Output

Your code is part of your intellectual contribution:

  • Write it as if someone else will use it (because they will).
  • Clean, readable code improves transparency and makes collaboration easier.
  • Good code lowers onboarding costs for new lab members.

Reproducibility in a Lab Setting

In team projects, reproducibility:

  • Makes onboarding faster and smoother.
  • Helps others trace, verify, and build on your work.
  • Reduces duplication and confusion.
  • Improves internal transparency, even for non-public projects.

What Reproducibility Is — and Is Not

It is:

  • Clean, traceable, revisitable work from the beginning.
  • Documentation + data + code forming a self-contained package.

It is not:

  • Publishing all project materials.
  • Perfection or over-engineering.
  • A task to leave until the end.

Goal:
A third party (including your future self) can reproduce the exact findings using only the provided data, code, and documentation.


Summary Checklist

  • Logical, documented folder structure.
  • Modular scripts with clear inputs/outputs.
  • Comments explaining why.
  • Up-to-date README and changelog.
  • Version control with Git.
  • Code reviewed before submission/archiving.

Related Resources