Developers guide - adcroft/MOM6-examples GitHub Wiki

Developers Guide

This guide mainly was originally written for developers who have write access to the https://github.com/NOAA-GFDL repositories. However, the guide now applies for all developers and should be useful for anyone using or contributing to MOM6. Even core-developers now use Github forks for submitting their work.

Contents

  1. Initial workspace setup
  2. Version control with git
    1. git status and fetch
    2. Your branch is behind and can be fast-forwarded - git pull
    3. Creating a branch
    4. Your branch is ahead - git push
    5. Your branch has diverged - git pull --rebase
    6. Sub-module has modified content
    7. Sub-module has new commits
    8. Useful git commands
  3. Commit procedure for MOM6 and MOM6-examples
    1. Evaluating a pull request via the gitlab pipeine
    2. Handling a pull request manually
    3. Updating MOM6-examples
  4. Debugging
    1. Debugging processor non-reproducibility issues
    2. Symmetric and tri-polar checking
    3. Concerning bitwise reproduction of simulations
  5. Policies

Initial workspace setup

Update remote urls

Each submodule of MOM6-examples is committed with a https protocol pointing to a NOAA-GFDL repository. For any submodule that you plan to edit change the remote to point to your fork, e.g.:

cd MOM6-examples/src/MOM6/
git remote set-url origin [email protected]:<github_account>/MOM6.git

This setup allows you to push to your fork of github. See "Syncing a fork" for more info.

Checkout dev/gfdl

Since sub-modules have specific versions checked out, after your initial recursive clone you will need to checkout the latest branch instead. Type:

(cd src/MOM6; git checkout dev/gfdl)
(cd src/SIS2; git checkout dev/gfdl)

Sometimes, a git status in MOM6-examples will reveal that nothing appears to have changed. That is because the HEAD of dev/master on all repositories are in sync and consistent with those recorded in MOM6-examples. If you ever issue a git submodule update then the submodule(s) will be reset to the appropriate detached state.

Switching to dev/gfdl gets you to the latest version of code/configurations. You should not make changes on this branch but create a new branch from here. See "Creating a branch" for more.

Version control with git

There are many combinations of operations a developer might have to enact which in combination can be made more succinct than follows. Here, we have atomized some operations into self-contained steps from which a developer can build more sophisticated operations.

git has many features that make it a very popular version control system. Unlike CVS, used for earlier versions of GFDL models, git versions the entire source tree as one entity, not on a file by file basis. A single commit may contain changes to many files and submodules which allows interfaces to evolve consistently.

git is also distributed which allows peer-to-peer exchanges without going through a central server. To share code updates with collaborate at the next desk you do not need to push your changes via the original NOAA-GFDL repositories.

git status and fetch

git status is your friend. It tells you everything you should know. Because of the use of sub-modules git status might show statuses that are not covered in most git tutorials and documentation.

After a recent clone of MOM6-examples a git status within the MOM6-examples directory will reveal

~/MOM6-examples$ git status
# On branch dev/gfdl
nothing to commit (working directory clean)

which means everything is OK.

git fetch will check with the origin, in this case GitHub, and find out what has changed on the server since you last synced. git fetch does not change your currently checked-out files. It will do no harm.

If you see no messages after git fetch then you were already up to date. Often you will something like this:

~/MOM6-examples$ git fetch
remote: Counting objects: 206, done.
remote: Compressing objects: 100% (121/121), done.
remote: Total 206 (delta 101), reused 116 (delta 83)
Receiving objects: 100% (206/206), 188.26 KiB | 0 bytes/s, done.
Resolving deltas: 100% (101/101), done.
From github.com:NOAA-GFDL/MOM6
   9e328e9..50fe6e4  dev/gfdl-> origin/dev/gfdl

which means there are new commits on the server. A subsequent git status will show:

On branch dev/gfdl
Your branch is behind 'origin/dev/master' by 20 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)

nothing to commit, working directory clean

Wow - someone has been busy!

Your branch is behind and can be fast forwarded - git pull

After a git fetch, when git status shows:

# On branch dev/gfdl
# Your branch is behind 'origin/dev/gfdl' by 1 commit, and can be fast-forward

then you can "pull" in changes from GitHub without fear of [code] conflicts with your current working directory. Issue git pull, for example:

~/MOM6-examples$ git pull
Updating e7b8b46..fa7e377
Fast-forward
 README.md |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Creating a branch

When you are making code or configuration changes it is best practice to do so on a well named branch. This helps you organizes your development. If you update dev/gfdl directly, your history will be out of sync with the main repository - perhaps irrevocably unless you are a git guru.

To create your new branch:

git checkout dev/gfdl
git branch my-feature-branch-name
git checkout my-feature-branch-name

where my-feature-branch-name is a descriptive branch name. Notice that the first step was to checkout dev/gfdl. This ensures the branch originates from dev/gfdl which is a generally good idea although there are occasions where you might branch from another branch.

Without the second checkout, the current working directory would still show dev/gfdl. The git branch creates the branch but does not move you on to it.

At a later stage, if you want to share your branch, you will need to push your branch upstream to GitHub with

git push origin my-feature-branch-name

Thereafter, when you are on your branch, a simple git push will push to the correct remote branch.

When and where to branch

MOM6-examples, MOM6 and SIS2 all have correspond master, dev/master and dev/gfdl branches. However, it is not necessary to create a feature branch in MOM6-examples for the corresponding MOM6 feature branch.

  • If your explicit changes are only in src/MOM6/ then make the branch within src/MOM6/ and the pull request for MOM6.
    • if there are implied changes (e.g. changes answers or parameter documentation in MOM6-examples) then a branch in MOM6-examples is also needed only if the changes are not automatic (i.e. some manual changes to inputs are required). We will see those changes in MOM6-examples and apply the update there when we evaluate the MOM6 pull request.
  • If your explicit changes are only in src/SIS2/ then make the branch within src/SIS2/ and the pull request for SIS2.
  • If your explicit changes are only in MOM6-examples/ (e.g. in tools/ or ocean_only/) then make the branch within MOM6-examples/ and the pull request for MOM6-examples.
  • If you have explicit change in both MOM6-examples/ and src/MOM6 (or other sub-module) then we need the same branch made in both. This is the only circumstance where two pull requests would be needed - this is uncommon.

Your branch is ahead - git push

After making and committing code changes a git status will show

~/MOM6-examples/src/MOM6$ git status
# On branch my-feature-branch
# Your branch is ahead of 'origin/my-feature-branch' by 1 commit.

If this is the case, then do a git push:

% git push
Counting objects: 5, done.
Delta compression using up to 64 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 432 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To [email protected]:adcroft/MOM6.git
   ae8b847..6a38fe6  my-feature-branch -> my-feature-branch

Now your commits are available for making a pull request.

Your branch has diverged - git pull --rebase

This situation should not happen unless you are ill-advisedly editing a collaborative branch such as dev/gfdl. You could also encounter this if you work on your own branch from multiple locations (such as an HPC and laptop) making changes in both places at once.

After a git fetch, when git status shows

~/MOM6-examples$ git status
# On branch dev/gfdl
# Your branch and 'origin/dev/gfdl' have diverged,
# and have 3 and 2 different commit each, respectively.

then you have 3 commits to push but there are already 2 to "pull" which will stop you from being able to "push".

There are two ways forward. The first method results in merge-loops in the history. The second method often produces a linear-history (albeit with some commits in non-chronological order!).

git pull (always create a merge)

If you issue git pull this will merge your 3 commits with the servers 2 commits and create a new commit. If there are no conflicts you will be prompted with a commit message

Merge branch 'dev/gfdl' of github.com:NOAA-GFDL/MOM6-examples into dev/gfdl

which you should leave unchanged and save. You can optionally add annotations for the merge (be sure to leave the second line blank).

If there are conflicts during the merge, you should resolve the conflicts and create commits following the commit procedure for MOM6 and MOM6-examples.

git pull --rebase (try to avoid a merge)

This method rewinds your own commits, advances through the commits from GitHub and then replays your commits after those from GitHub. Issue git pull --rebase, for example:

~/MOM6-examples$ git pull --rebase
First, rewinding head to replay your work on top of it...
Applying: Added new eddy parameterization
Applying: Fix problem with gnu compiler

If a conflict occurs you will see

~/MOM6-examples$ git pull --rebase
First, rewinding head to replay your work on top of it...
Applying: Creating conflict
Using index info to reconstruct a base tree...
Falling back to patching base and 3-way merge...
Auto-merging README.md
CONFLICT (content): Merge conflict in README.md
Failed to merge in the changes.
Patch failed at 0001 Creating conflict

When you have resolved this problem run "git rebase --continue".
If you would prefer to skip this patch, instead run "git rebase --skip".
To check out the original branch and stop rebasing run "git rebase --abort".

You will need to either resolve the conflicts or "abort" and try a merge as shown above.

Sub module has modified content

If a git status shows that MOM6 (or another sub-module) has "modified content" or "untracked content" then it means you have local edits in that directory, e.g.

% git status
# On branch dev/master
# Changes not staged for commit:
#   (use "git add/rm <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#   (commit or discard the untracked or modified content in submodules)
#
#	modified:   src/MOM6 (modified content, untracked content)
#
no changes added to commit (use "git add" and/or "git commit -a")

Sub module has new commits

There are two cases where a git status in MOM6-examples will show you that src/MOM6 has "new commits" like this:

> git status
# On branch dev/gfdl
# Changes not staged for commit:
#   (use "git add/rm <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   src/MOM6 (new commits)

Either you have added commits to MOM6 or there are new commits that MOM6-examples expects MOM6 to point too. Technically, the message means that the MOM6 commit registered in MOM6-examples is not the same as is currently checkout in the src/MOM6/ directory.

1) If you added commits to MOM6, then simply "add, commit and push" src/MOM6 as if it were a regular file. This will mean that new clones of MOM6-examples will use the newer version of MOM6:

git add src/MOM6
git commit
git push

It is important to push MOM6 first, otherwise a new clone of MOM6-examples will point to a non-existent commit of MOM6.

2) If the message appeared as a result of a git pull in MOM6-examples/ then MOM6 needs to be updated. If you are in a developer mode you could just do

cd src/MOM6/
git pull

or as an end-user, in the MOM6-examples/ directory do:

git submodule update src/MOM6

which will fetch and checkout the correct version of MOM6.

Useful git commands

Here are a few git commands:

  • gitk fname will pop various windows with history, comparisons, etc.
  • git log for logs of the changes made to the branch.
  • git diff MOM6 to see what files differ in submodule MOM6.
  • git submodule update src/SIS2 to update just the submodule SIS2.

Commit procedure for MOM6 and MOM6 examples

This section illustrates how to make coordinate commits between two repositories, namely MOM6-examples and the sub-module MOM6. The same applies to the sub-module SIS2. This is a step needed when changes in MOM6 invoke changes in MOM6-examples and the merge process cannot be handled by the GitHub web interface.

The usual workflow for code development is:

  1. Develop and test code
  2. Commit code changes to MOM6
  3. Submit a pull request

1. Develop and test code

This typically involves editing source code, compiling with all compilers, running tests with all executables and checking answers.

Rules for commiting to dev/gfdl are in section Policies for dev/master but the highlights are:

  1. each commit compiles without errors using all sanctioned compilers.
  2. regression tests pass with all the sanctioned compilers.
  3. commits are logged following the commit logging guidelines.

Steps involved are usually:

  1. on dev/gfdl, update to latest code:
    git checkout dev/gfdl
    git pull https://github.com/NOAA-GFDL/MOM6.git dev/gfdl
  2. create a feature branch git checkout -b new_branch
  3. edit code
  4. compile
  5. run tests
  6. check model output
  7. debug and iterate through steps 4-6.

2. Commit code changes to MOM6

Commit to the MOM6 first (or the relevant sub-module) so that the new commit hash of MOM6 can be recorded with any associated changes in MOM6-examples.

cd src/MOM6
git add file1 [file2] ...
git commit
git push new_branch

Because you are working on a feature branch they should not be any conflicts with GitHub.

If you are happy with your branch then submit a pull request

3. Submit a pull request

When you have code ready to submit to the core developers to merge onto dev/gfdl you should make a "pull request". This is managed via the github website.

  1. Navigate to the relevant branch on your fork (either via the branch tab or the pull down menu in the commits tab).

  2. Click the green icon, image alt text, near the top left, with a mouse-over that reads "Compare, review, create pull request".

  3. The next page shows you the change relative to where you start your branch on dev/master.

  4. Click image alt text.

  5. Fill out a descriptive but succinct title

  6. In the comment box, please summarize all the commits involved and explain or justify the code changes.

  7. Then click image alt text and you are done.

Evaluating a pull request via the gitlab pipeline (core developers only)

Pull requests should be handled expediently to avoid stale code developing conflicts. Conflicts mean more work. Pull requests are sent out as notifications (emails and message on the website) but can be found in the right-side column of icons: image alt text.

It is recommended to have a separate working directory for triggering tests on the pipeline. This avoids disrupting your development workflow.

1. Setup

You will need two remotes, one for NOAA-GFDL/MOM6 and the other for the gitlab pipeline repository:

git clone https://github.com/NOAA-GFDL/MOM6.git MOM6
git remote add gitlab [email protected]:ogrp/MOM6.git

you will also need to edit a line in MOM6/.git/config so that the origin section reads:

[remote "origin"]
        url = https://github.com/NOAA-GFDL/MOM6.git
        fetch = +refs/heads/*:refs/remotes/origin/*
        fetch = +refs/pull/*/head:refs/remotes/origin/pr/*

2. Review code

In a browser, go to the individual pull request under https://github.com/NOAA-GFDL/MOM6-examples/pulls .

There are three tabs: "Conversation", "Commits", and "Files changed". Examine all three.

Feedback and reviews are submitted under the "Files changed" tab by clicking "Review changes".

3. Trigger a test

Assign the pull request to your self on GitHub to indicate that the request is being evaluated.

To trigger a test, fetch the pull request from github and then push it as a branch to gitlab. A typical session looks like:

.../MOM6> git fetch
remote: Counting objects: 6, done.
remote: Total 6 (delta 5), reused 6 (delta 5), pack-reused 0
Unpacking objects: 100% (6/6), done.
From github.com:NOAA-GFDL/MOM6
 * [new ref]         refs/pull/551/head -> origin/pr/551

.../MOM6> git checkout pr/551
Branch pr/551 set up to track remote branch pr/551 from origin.
Switched to a new branch 'pr/551'

.../MOM6> git push gitlab pr/551
Counting objects: 13, done.
Delta compression using up to 48 threads.
Compressing objects: 100% (13/13), done.
Writing objects: 100% (13/13), 1.62 KiB | 0 bytes/s, done.
Total 13 (delta 10), reused 0 (delta 0)
To gitlab.gfdl.noaa.gov:ogrp/MOM6.git
 * [new branch]      pr/551 -> pr/551

You can monitor the test at https://gitlab.gfdl.noaa.gov/ogrp/MOM6/pipelines.

4. Report results and merge

In a browser, go to the individual pull request page under https://github.com/NOAA-GFDL/MOM6/pulls .

If the pull request failed tests, report the results in the comment box under the "Conversations" tab.

If the pull request passed tests AND is being accepted, click Merge pull request under the "Conversations" tab" and report the results there.

Handling a GitHub pull request manually

If there are expected to be changes in model output (documentation or results) then updates to MOM6-examples and the regressions repository should be handled simultaneously which is best managed by hand.

This section assumes you are prepared to meet the dev/master requirements for testing, described in Policies for dev/gfdl.

The workflow to handle a pull request is as follows but GitHub provides command-line instructions within the "Conversation view" for each pull request:

  1. Assign the request to yourself by clicking "Assignee" and then your <github_account> id.

  2. Click the blue words command line which will expand the commands you will use to obtain the code. Something like

git fetch origin
git checkout -b user-aja-stuff origin/user-aja-stuff
git merge dev/master

What this does is checkout the branch user/aja/stuff and then makes sure it is up to date with dev/master by merging in the latest code.

  1. Compile and run the tests as if you were testing a mod on dev/master.

  2. If everything passes muster, you should now merge back onto dev/master with:

git checkout dev/master
git merge --no-ff user-aja-stuff
  1. The last step is to push your changes to github. There is a choice here:

    1. You can simply issue git push origin dev/master. The pull request should now appear as "closed" on the web-site.
    2. OR, if there were no conflicts and the web-page has the icon Merge pull request you can complete the merge via the web. This latter option has one advantage which is that you can annotate the handling of the request ie. write comments whilst closing the request.

Add new version of MOM6 and new output in MOM6-examples

Some commits to MOM6 change output stored in MOM6-examples. These commits should be handled manually as above. If you cd up to MOM6-examples, a git status will now show something like:

> git status
# On branch dev/gfdl
# Changes not staged for commit:
#   (use "git add/rm <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   ice_ocean_SIS/OM4_025/MOM_parameter_doc.all
#	modified:   ice_ocean_SIS/OM4_025/MOM_parameter_doc.short
#	modified:   src/MOM6 (new commits)

In this example, the new code we just pushed to MOM6 corrected a documentation typo and changed the documentations in OM4_025. The new output and new version of MOM6 should be committed together:

git add -u ice_ocean_SIS/OM4_025/MOM_parameter_doc.*
git add src/MOM6
git commit
git push

In the instance where the output did not change then a git status will show you that only src/MOM6 (has new commits), i.e.:

> git status
# On branch dev/gfdl
# Changes not staged for commit:
#   (use "git add/rm <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   src/MOM6 (new commits)

Simply adding and committing src/MOM6 will mean MOM6-examples is using the latest version of MOM6:

git add src/MOM6
git commit
git push

Debugging

Debugging processor non-reproducibility issues

Answers should not change, to the bit, when changing processor counts or layouts. Within the MOM6 code, there is a tool available to help resolve problems when answers do change (processor non-reproducibility). Namely, set the run-time flag DEBUG=True, in which case MOM6 will write out a series of checksums to std_err, along with informative messages about which fields are being checked. This output is independent of processor count, provided the solutions themselves are the same. So to debug a processor non-reproducibility issue, do two short runs with different PE counts (say, 32 and 31), and look for the first point in the code where the output differs. Grep through the code to see where the first differing call can be found. This method thus helps to isolate where in the code the processor non-reproducibility issue occurs.

This method tends to generate a lot of output to std_err, so very short runs are advised. Also, unless you think the problem is in the barotropic solver, it is a good idea to set DEBUG_BT=False to reduce the number of messages.

There is more than one kind of chunk that will show up in the diffs. We are looking for things like:

< u-point: c=   2363061 sw=   2357351 se=   2341484 nw=   2357077 ne=   2341213 
Before steps fluxes%taux
< v-point: c=   2359974 sw=   2251831 se=   2237217 nw=   2353558 ne=   2338274 
Before steps fluxes%tauy
< h-point: c=   2811287 Before steps fluxes%ustar
---
> u-point: c=   2363061 sw=   2327151 se=   2341484 nw=   2326875 ne=   2341213 
Before steps fluxes%taux
> v-point: c=   2359974 sw=   2325100 se=   2310009 nw=   2353558 ne=   2338274 
Before steps fluxes%tauy
> h-point: c=   2811355 Before steps fluxes%ustar

Here, taux and tauy have halo checksums while ustar does not. The "c=" value refers to the checksum of the computational domain. The "sw=" is the checksum of the array offset in the "south west" direction so that part of the halo is checksummed. Similarly for "se=", "nw=" and "ne=". Basic statistics are also written out but using only the computational domain so that the values are globally meaningful.

Symmetric and tri-polar checking

There is a second type of test that we use for parallelization, which only works for a tripolar grid or symmetric memory with DEBUG=true. In these cases, the same field and location are calculated twice on different processors, and the model checks that these redundant values are identical. With this test, we can detect parallelization or symmetry errors using only a single run. These messages are of the form:

        Predictor 1 up Layer 1 redundant v-comps  3.6392E-04  3.6623E-04 differ by  -2.3073E-06 at i,j =    5   4 x,y =   2.3862E+02  1.9820E+01 on pe    1
        Predictor 1 up Layer 1 redundant v-comps  3.7382E-04  3.7614E-04 differ by  -2.3201E-06 at i,j =    6   4 x,y =   2.3867E+02  1.9848E+01 on pe    1
        Predictor 1 up Layer 1 redundant v-comps  3.8362E-04  3.8596E-04 differ by  -2.3336E-06 at i,j =    7   4 x,y =   2.3873E+02  1.9876E+01 on pe    1

and can really get verbose. Again, you want to look for the very first thing to be off.

Concerning bitwise reproduction of simulations

Answers will change when changing computer platforms. Answer changes usually start in the least significant bits in MOM6, as the result of careful coding practices. To explain how we have achieved this, and to explain why we think that it is generally impossible to do better, it is worthwhile to discuss three sources of differences across machines and compilers.

First, we have the order of arithmetic differences. We can and do control operation order by use of parentheses (assuming that compilers respect parentheses). Parentheses are important particularly when taking sums of three or more quantities that can be of either sign, because in this case the answers are not guaranteed to be in the least significant bits. For example 10^20 - 10^20 + 1 might return either 0 or 1 with 64 bit floating point arithmetic, depending on the order of the sums. In some cases, like in the denominators of the MOM6 tridiagonal solvers, there is a right order of arithmetic that gives the right answer, and we carefully enforce this order with parentheses and by introducing temporary variables that ensure that only the most devious compiler optimization would get the wrong answer. In other cases, like the spatial averaging of four points, MOM6 uses the sets of parentheses that give an answer that is invariant to the rotation of the problem. In still other cases, like the sum of three or more tendencies in the momentum equations, there is no "right" order, but we still use parentheses so that we get the same order of sums, regardless of compiler settings. When taking products of multiple terms (but not exact powers of two), the answers can differ in the least significant bit; while this can be controlled with parentheses, in MOM6 we have not been systematic about forcing the products to be taken in the same order. We also use a special extended-fixed-point representation to get order-invariant sums across processors for things like global energy metrics and tests of conservation properties (see Hallberg & Adcroft, 2014, Parallel Computing).

Different compilers use different math libraries for transcendental functions, like sin, cos, or tanh, and may choose to use either libraries or hardware-encoded algorithms (on some machines) for common functions sqrt, exp and even division. Although the differences in these functions should only show up in the least significant bits, we do not know of any practical way to control these differences. Hence, different compilers will always give different answers.

Different machines use algorithms for floating point operations. For example, they may use more bits inside of the chip, even if the result that is stored to memory is in a standard 64-bit format. We thus expect that different machines will always give different answers, even when the same compiler and compiler settings are used. However, it is our expectation that these differences will also arise in the least significant bits.

However, in an ocean model, we are dealing with nonlinear (often chaotic) systems of equations with discrete logical branches, and we are often interested in tests that run long enough to ensure that even subtle differences in macroscopic metrics will be detected. We therefore do not expect differences that arise in the last significant bits will stay there, nor that solutions that differ even at leading order after a while are necessarily wrong. This is why our testing has emphasized bitwise identical reproduction of answers on whatever machines and compilers will actually be used.


Comment: The testing procedure is time consuming and not easy to replicate without significant computational resources. The whole testing procedure takes approximately 15-30 minutes and involves running over a hundred tests with half a dozen executables. The core developers have each independently developed their own method of running these tests. It is not uncommon for the different methods to disagree, which has inevitably led us to uncovering subtle bugs. The gitlab pipeline is more extensive yet and is a superset of the tests that the core developers run manually.

Policies

⚠️ **GitHub.com Fallback** ⚠️