Github guide - ganong-noel/lab_manual GitHub Wiki
Github (or Bitbucket/JIRA) Tasks
Note: Our conventions are based heavily on gslab
guidelines.
A task (or issue) is a concrete piece of work for a project. Each task has a reporter (the person who created it), one or more assignees and a first comment ("task description") detailing the steps to carry out and the deliverable to produce.
Task deliverables:
- Each task should have a final deliverable summarizing the results.
- The form of the deliverable may be:
- Content added to the draft, slides, or online appendix in the repository
- A markdown file stored in an
issue_###_brief_description
subdirectory. - A final summary comment in the task comment thread (only for small tasks where the deliverable content is no more than a paragraph or so of text)
- The deliverable should be self-contained. It should usually begin with a concise summary of the conclusions (e.g., answer to an empirical question), followed by supporting detail. A user should be able to learn all relevant results from a task from the deliverable without looking back at the comment thread or task description. The source of any figures, tables, or statistics should also be clear; this will be automatic for markdown files or checked in drafts, slides, etc. so long as figures and tables are produced form links to checked in files in the repository.
- The final comment at the time a task is closed should include a clear, revision-stable pointer to the deliverable -- usually a link along with additional information if needed (e.g., relevant page / table / figure numbers in the draft).
- If an
issue_###_brief_description
subdirectory is used, it should be versioned in git within the issue branch. - The
issue_###_brief_description
subdirectory should be deleted before merging into the master branch. The revision stable link at the end of the task thread will still work.
Lifecycle of a task
- Task created.
- Assignee logs work and asks questions in comment threads.
- Assignee completes task
- Peer review
- PI review
Details on lifecycle
Creating a task:
- Anyone can create a task.
- Every task should have an objective at the top which states the overall goal of the ticket
- Every task should be assigned to a "project". Unfortunately we are not aware of a way to automate this step, so it must be done manually.
- If PG or PN assigns you to do something by e-mail or in a face-to-face meeting, you are responsible for recording this in a GitHub issue. This should be your first step after the meeting so we don't lose track of tasks.
- If you create the issue, tag PG or PN to let them know this has been created. Keep the ticket in your column. They will edit the ticket if necessary.
- Every task for an RP should have a specific assignee, though sometimes the assignment is implicit rather than explicit. If you create a task for yourself (or a PI creates one for you) and the task enters directly into your column, the "projects" heading in the ticket will reflect the assignment and so the "assignee" heading can be left blank.
- If the task is not clear to the assignee, they should always ask for clarification from one of the PIs. This is a good use of daily check-in or of just dropping by the PI office.
- The assignee should create a separate git branch for each task that involves code or output stored in the repository. The new branch should be called
issue_###_brief_description
, where ### corresponds with the issue number. Make sure you do not push your changes tomaster
before the task is completed.- Additional branches can be made off of the main
issue_###_brief_descriptionbranch
(note underscores, not hyphens). These sub-issue branches should be merged into the mainissue_###_brief_description
branch, not into the master branch. - Note that a branch off of a main
issue_###_brief_descriptionbranch
might be for a different but related issue. This is acceptable if the separate issue will reuse a bunch of the same code. The same principal applies, which is that this sub branch should be merged back to the mainissue_###_brief_descriptionbranch
, not to master. A downside of this approach is that it delays the merging ofissue_###_brief_descriptionbranch
to master, and so should be done with caution.- An alternative to the sub branch approach is to take just the file that is necessary for the new issue and pull that into the branch. This can be done via "cherry-picking" a commit, making this file a separate issue and merging that to master, or, if necessary, copy/pasting (making sure to maintain the same file name).
- Sometimes, work in a branch needs to be merged to master even when an issue is not complete. In this case, the branch should undergo peer review before being merged to master. However, no PI review is needed for a merge to master.
- Additional branches can be made off of the main
- Changes to the basic goals of a task should be reflected in the task description. Such changes should be rare. If a task has changed completely, it should be closed and replaced with a new task.
- In cases where multiple repos are relevant to a project, tasks should be opened in the repo they correspond to rather than in the general project repo (if the latter exists).
Comment threads:
- You should add regular comments to tasks summarizing progress. The comment threads are your real estate and you are free to include updates as often as you find useful. Preliminary output, "notes to self," etc. are fine.
- If you have a question that requires input from another lab member, you should write a comment, including an '@' reference, that makes clear exactly what input is needed. E.g., '@ganong123, Where would you like me to store the data files?' Users should keep email notifications for '@' references turned on.
- It is up to you to judge the optimal time to request feedback. You should usually not send results until you have spent some time making sure they are correct and make sense. When you do request feedback, you should provide a clear and concise summary making clear the situation and exactly what input you need. At the same time, you should not feel shy about requesting feedback when you are confident it will be efficient and valuable.
- If you have an important interaction about a task outside of GitHub -- in person, over slack, etc. -- add a comment to briefly summarize the content of that interaction and the conclusions reached.
- Any link in a comment or the
/issue_###_brief_description/*
markdown file to other files in the repo should contain a permalink to the file at a specific commit (see here for details). - Any link within an issue to another issue should be hyperlinked using the '#' reference. E.g., #123. Links to issues within the markdown files should also be hyperlinked manually using the same format.
- Your commits should reference the issue that you are working on in the comments. This is useful because it means that the issue will automatically be updated with links to your commits.
- In GitHub, this means that if you are working on ticket 123, your commit message must contain the string
#123
. - In Chase, we use Jira to manage tickets. This means that your commit message must contain
[PROJECTNAME-123]
, where 123 is replaced by your ticket number and PROJECTNAME is replaced by the name of the Jira project containing your ticket.
- In GitHub, this means that if you are working on ticket 123, your commit message must contain the string
- Editing comments
- Try to avoid editing prior comments where possible. Instead, you can mark an old comment as “outdated” if needed.
- If you edit a comment on github there’s no notification generated so you need to manually notify others working on the project with you.
- If you have something you want another lab member to see immediately (i.e. within the work-day, when we may have our email closed), in addition to tagging in a Github comment please also send a notification in Slack, ideally with a link to the relevant Github issue.
Checkboxes
- Completing boxes
- Check a box when you complete a task.
- Sometimes you complete a task, but get an unexpected or undesired result. In this case, use strikethroughs (
task) and do not check the box
- Adding boxes
- It is always OK to add new checkboxes as your understanding of the work to be done evolves or if you want to take a big task and break it into smaller parts. Please do not delete checkboxes without PI approval.
- Sometimes we will propose new checkboxes and ask you to move them to the top of the ticket if you agree. When doing so please keep a copy of the ask inside our comment, but turn it into a bullet point rather than a checkbox.
- Migrating boxes
- Sometimes we migrate checkboxes to another ticket. When doing so, we will keep a copy of the box inside the comment, but turn it into a bullet point rather than a checkbox.
Completing a task
- It is up to the assignee to judge when the objectives in the task description plus any issues that have come up in the comment stream have been resolved. You should not depend on PG / PN to judge this, and you should not typically request feedback of the form -- "can you look over this task and tell me if you think it's complete?"
- When an assignee thinks a task is complete, he/she should:
- Ensure that the deliverable meets our standards as outlined above
- If there are checkboxes that are deferred, cut-and-paste those checkboxes from the top comment into the comment that closes the issue.
- Add or update the description of the scripts to be merged in the
README.md
. A good example of this is inui_public
. - Create a pull request (see below)
Creating a pull request
- Refactor the code to produce just the desired outputs and no others.
- Run
styler
on all scripts that will be in the PR. After, rerun everything to make surestyler
didn't break anything. - Submit a pull request to merge the issue-specific branch into the master branch.
- Pull requests should be titled as follows: "PR #[ORIGINAL ISSUE NO]: [ORIGINAL ISSUE TITLE]"
- Initial pull request comment should
- link to the issue using syntax "Closes #[ORIGINAL ISSUE NO]"
- link to the proposed output
- Time estimate and replication vs correctness (more on this below)
- Choose a reviewer for the pull request
- If possible, avoid pull requests that change many files.
- Tip: if a branch changes many files in an unwanted way because the master branch has changed since the old branch was created, merge changes from master into branch and reproduce output within the branch.
- When a pull request that changes many files is unavoidable, explicitly notify your peer reviewer and remind them to check that “important” files (e.g. main figures of a paper) have not changed.
- Add the pull request to the relevant project and put it in the relevant column on the project board (usually the assigned reviewer's column)
Peer review metadata
Every pull request should clarify: (i) whether the goal is replication or correctness and (ii) time estimate.
For replication
- Verify that the submitter ran
styler
on every script in the PR. If not, ask them to run it and commit the styled code. - Verify that the deliverable is clear, complete, and conforms to the standards above
- Verify that the files committed to the repository conform to our organizational and code style rules
- It is not typically the job of the peer reviewer to go over every detail of the output and every line of code. While commenting on fine points of code style from time to time is fine, for example, this should not be the primary content of the peer review.
- Verify that the code runs
- By default, you should always run the code. If the code could run much more quickly with some edits, suggest those edits and pass back. If are convinced that speed cannot be improved and you don’t run the code, please note this in your review.
- The peer reviewer should review every plot that has changed.
- If png, github will show the diff, so every plot should be reviewed.
- If pdf, then reviewer should open and visually compare every plot to the prior version for any figure in the main paper (or main slide deck, if only a slide deck), but not the appendix or backup plots.
For correctness
- all of the steps above, plus
-
- Compare to written material in the paper or appendix discussing the results
-
- Verify that empirical and theoretical results are clear and appear correct
-
- Suggest sanity checks to be sure the results are correct
Time estimate
The time estimate is not a maximum or a minimum amount of time to spend. Rather, it is a piece of metadata. If the creator thinks the PR is quick and the reviewer is taking a long time or the creator thinks the PR is complicated and the reviewer thinks it is quick, then this suggests that there has been some miscommunication. Talk live to resolve.
Peer review mechanics
- When requesting peer review the assignee can request feedback in addition to the above: e.g., particular code or results that need a careful check.
- Assignees should add a reviewer and add the pull request on the project board to the reviewer. Once the review is complete, the reviewer should move the pull request on the project board to the assignee. The assignee then merge the pull request and move the pull request to the “Done” column.
- All peer review comments should be made on the pull request itself, not on the original issue.
- Revision to the files shall be made in the same branch (eg.,
issue_###_brief_description
). The pull request will automatically track changes made in the branch.
- Revision to the files shall be made in the same branch (eg.,
After peer reviewer certifies output
Assignee should:
- Use a squash merge to merge the issue-specific branch into the master branch. If the work is diagnostic in nature and performed entirely within an issues subdirectory, the branch should not be merged.
- Leave a comment in the task's original GitHub issue with
- a link to the pull request
- a summary of the task or a permalink to the
/issue_###_brief_description/*
output file (or subdirectory) as the final comment.
- Delete the
issue_###_brief_description
branch. The permalink will still work.
PI review
- Assignee closes the task with a message that tags a PI.
- Assignees moves the task on the project board to a PI column for review. Put this task at the top of the PI column.
- Once reviewed, PIs move issues to the “done” column on the project board
- If there is more back and forth on this ticket, the PI will tag the assignee and move the ticket back to the assignee’s column.
- When the assignee has completed the additional request, they should again tag the PI and move the ticket to the PI’s column.
- The PI may require changes in the /issue_###_brief_description/* output files. To change the files, pull the issue from the last commit before deleting the issues branch and make a new stable link with the updated files.
Directory structure
- We try to follow the directory structure used by
gslab
. They published a template here. - When you create a new issue branch, create a folder with the same name as the branch, i.e.
issue_XXX_brief_description/
withinput
,source
, andrelease
subfolders. All new inputs, scripts, and output should be housed in these subfolders. If/when it's time to merge to master, migrate everything essential to producing paper output of the ticket to the equivalentanalysis
subfolders and check that paths don't break.- When you are modifying inputs, scripts and/or outputs that have been previously merged to master, don't migrate anything to the issue folder. This is the only case before PR that you should be doing anything outside of the issue folder.
Other Github notes
Prioritizing work
- By default, peer review takes priority over all open tasks.
- You will have a column on the project board with your short-term tasks. Work on the top task first and proceed downward.
- Sometimes, you will need external input or clarification to complete a task. Using your time productively takes precedence over the priority ordering of tasks.
- No task that is in your column on the project board should be left for more than two weeks without a comment updating the status, even if the comment only says: "Have not done any work on this task in the last week".
When should I create a new task to replace an old one?
- If a task has > 10 comments, it might be time to close it out and create a new task with a link back to the old task. This will be context-specific.
Re-assigning an issue
When you are working on an issue and it is now time for someone else to work on it:
- Tag the person you are passing the issue to in the Github comment
- Assign the issue to that person
- Move the issue to that person's project board.
File naming
- We name files as “file_description” with no date in the name.
- Git/Github capture the revision history quite nicely so it’s fine to overwrite your old output when you update a document.
email notifications
- Always review github comments in the web browser, not via the automatically-generated emails. The reason for this is that when a github comment is updated, it does not trigger a new email notification.
Tables
- This site is a helpful resource for creating markdown tables to embed in GitHub comments.
- A particularly useful feature this site offers is importing existing tables from GitHub (or other Markdown sources).
- Copy the text in the table and then click "File -> Paste table data".
- This often doesn't work for reading in formatting (e.g. bold text, LaTeX, etc) but does consistently get the table structure and content right.
- This is very useful for making edits (e.g. adding rows/columns) that are annoying to do by hand.
Useful resources
- Intro to git: here
- Using git from the command line: here and here.
- How to delete branches on local which have been deleted from remote.
- Hand-holding git basics tutorials: here and here
- Visualizations of more advanced git commands
- To search for something across all branches, use:
git log --all -- '**/file_name'
Releases
Rule of thumb: Material being actively being edited lives on github. Read-only material lives in archive folder in a location TBD.
Incoming correspondence like editor and referee letters go in google drive folder for the project
Paper
- Every release goes through the “paper production” process
- The pull request that merges the
paper_production
branch into master should be tagged with the labelpaper_release
Slides
- Every presentation gets a ticket
- whose issue title is the date of the presentation and the location (e.g. 2018-11-02 uillinois)
- a "slides" label
- After presentation, create a pull request
- if "main" deck, create a pull request and merge to master
- If “one-off” deck, delete branch
- If you want to update a “one-off” deck
- go to list of branches (linked above) and restore that branch
- make a new branch off the restored branch
- merge in changes from master
gnlab's template repo
- Our lab's template repo
template_repo
can be found here. - New repos should be created using this template repo.
- The template repo maintains the script
prelim.R
containing a number of useful common functions shared in gnlab.