Task and Code Management Guide - uchicago-bfi-gnlab/lab_manual GitHub Wiki

Principles

Work consists of code (GitHub, Bitbucket, Databricks) and output in a task management system (GitHub, JIRA, Overleaf). In GitHub, this workflow is entirely self-contained and automatic. In other platforms, we try to re-create GitHub's functionality.

Independent work

  • All work should be associated with a task. If no task exists, create one.
  • Each task begins with an objective which explains how the task fits into the paper.
  • New output is recorded in two ways:
    • A commit.
    • A comment in the task. Use @ to tag anyone who should review the task.
  • Commit at least once a day.
  • All code is traceable to the relevant task. 1) it is saved in a numbered issue branch, 2) it is saved in a numbered directory, 3) commits have issue numbers.
  • Output should be presented chronologically. If you are making a new version of a previously-posted plot, do not revise your old work. Instead post a new comment.
  • Within each task, we want to be able to trace each message to the code that generates the exhibits.

Pull requests

  • A small subset of work will be merged to the main branch.
  • Before merging, your code must be reviewed via a pull request. Ask at standup who should review your code.

Table of Contents

  1. Task Management in GitHub. Here you will learn the core elements of our GitHub workflow and task management practices.
  2. Code Management Guidelines. This part discusses cross-platform practices we use to structure code and store output with Git.
  3. Additional Resources and Practices. Includes practices for non-GitHub platforms and some additional helpful resources.

1. Task Management in GitHub

When a project generates a new workflow, this is memorialized in the form of a GitHub issue ticket. These can be created by PIs and RAs alike, and the first comment summarizes the goals and objectives of the ticket, alongside concrete action steps to get the ball rolling.

An issue may include one or more tasks, which will be iterated on between RAs and PIs through comments.

When you are assigned a task, you must manually place that task in the appropriate project, and in your column of that project.

Life Cycle of a Task

In general, a task's life will consist of up to 4 stages:

  1. Task is ticketed and assigned.
  2. Assigned RAs and PIs log output, interpretations and questions.
  3. Back and forth with PIs and other RAs determining next steps and revisiting old to-dos.
  4. If output or code needs merging into another branch, create a PR and request peer/PI Review.

Comments

To solicit a review of a new comment by other RAs or by PIs:

  • The reviewers should be @-tagged explicitly in the comment. Do not tag someone if you are just mentioning them
  • If you have something you want another lab member to see immediately (i.e. within the work-day, when we may have our email closed), in addition to tagging in a GitHub comment, please also send a notification in Slack, ideally with a link to the relevant GitHub issue.

Comments may also be used to memorialize output and interpretation as notes, but in this case, it should be clearly stated at the beginning of the comment (e.g. using a "Notes to Self" header).

Regarding feedback:

  • It is up to you to judge the optimal time to request feedback.
  • You should usually not send results until you have spent some time making sure they are correct and make sense.
  • When you do request feedback, you should provide a clear and concise summary making clear the situation and exactly what input you need. At the same time, you should not feel shy about requesting feedback when you are confident it will be efficient and valuable.

Completing a task

  • It is up to the assignee to judge when the objectives in the task description plus any issues that have come up in the comment stream have been resolved. You should not depend on PG / PN to judge this, and you should not typically request feedback of the form -- "can you look over this task and tell me if you think it's complete?"
  • When an assignee thinks a task is complete, they should:
    • Ensure that the deliverable meets our standards as outlined above.
    • If there are checkboxes that are deferred, cut-and-paste those checkboxes from the top comment into the comment that closes the issue.
    • If applicable, create a pull request (see this guide).

2. Code Management Guidelines

These guidelines apply across platforms both inside and outside any firewalls.

Branch and Directory Structure

  • We try to follow the directory structure used by gslab. They published a template here.
  • When you create a new issue branch, create a folder with the same name as the branch, i.e. issue_XXX_issue_description/ with input, source, and release subfolders. All new inputs, scripts, and output should be housed in these subfolders. If/when it's time to merge to the main branch, migrate the subset of material needed to producing paper output of the ticket to the equivalent analysis subfolders and check that paths don't break.
  • Commit messages begin with the associated issue number (#XXX in GitHub, [PROJECTNAME-XXX] in JIRA).

Task deliverables:

  • Each task should have a final deliverable summarizing the results.
  • The form of the deliverable may be:
    • Content added to the draft, slides, or online appendix in the repository.
    • A markdown file stored in an issue_###_brief_description subdirectory.
    • A final summary comment in the task comment thread.
  • The deliverable should be self-contained. It should usually begin with a concise summary of the conclusions (e.g., answer to an empirical question), followed by supporting detail. A user should be able to learn all relevant results from a task from the deliverable without looking back at the comment thread or task description. The source of any figures, tables, or statistics should also be clear; this will be automatic for markdown files or checked in drafts, slides, etc. so long as figures and tables are produced from links to checked in files in the repository.
  • The final comment at the time a task is closed should include a clear, revision-stable pointer to the deliverable -- usually a link along with additional information if needed (e.g., relevant page / table / figure numbers in the draft).
  • The issue_###_brief_description subdirectory should be deleted before merging into the main branch. The revision stable link at the end of the task thread will still work.

3. Additional Resources and Practices

Non-GitHub Workflows

Overleaf Threads

  • For certain tasks, especially ones for work behind a firewall where GitHub is unavailable, work is recorded in an Overleaf document.
  • Every .tex document on Overleaf must be associated with a GitHub or JIRA issue and a branch. Concretely, our Overleaf projects have an issues folder, where you will create a new sub-folder called gh_issue_XXX_issue_description for GitHub and jira_issue_XXX_issue_description for JIRA.
  • You should strive to recreate the comment flow that you would get in GitHub.
    • For each new entry, add a section with the date, in the format yyyy-mm-dd. New comments should go at the bottom of the document.
    • Since we cannot see commits in Overleaf, please add a permalink to the commit containing the work discussed in your comment.

JIRA Threads

  • When work lives behind a firewall, another common task management platform is JIRA.
  • JIRA comments function much like GitHub comments, but are a little clunkier and have fewer features.
  • As in Overleaf, at the top of each of your comments, add a permalink to the relevant commit.
  • JIRA does not support LaTeX typesetting, so if you're planning on writing lots of equations, either switch to Overleaf, or attach a PDF write-up to your comment.

Git and GitHub Tips and Tricks

Tables in Comments

Comments to a GitHub ticket may frequently include tables as a useful summary of output. Some notes/recommendations:

  • This site is a helpful resource for creating markdown tables to embed in GitHub comments.
  • A particularly useful feature this site offers is importing existing tables from GitHub (or other Markdown sources).
    • Copy the text in the table and then click "File -> Paste table data".
    • This often doesn't work for reading in formatting (e.g. bold text, LaTeX, etc) but does consistently get the table structure and content right.
    • This is very useful for making edits (e.g. adding rows/columns) that are annoying to do by hand.
  • The site can also help make LaTeX tables, but it's only really useful for a first pass, you will have to manually fine-tune the formatting.

Useful resources

  • Intro to Git: here.
  • Using Git from the command line: here and here.
  • How to delete branches on local which have been deleted from remote.
  • Hand-holding Git basics tutorials: here and here.
  • Visualizations of more advanced Git commands.
  • To search for something across all branches, use: git log --all -- '**/file_name'.

gnlab's template repo

  • Our lab's template repo template_repo can be found here.
  • New repos should be created using this template repo.
  • The template repo maintains the script prelim.R containing a number of useful common functions shared in gnlab.