HOWTO Ship a Heritrix Release - internetarchive/heritrix3 GitHub Wiki
The Heritrix issue tracker is at: https://github.com/internetarchive/heritrix3/issues
The GitHub project is at: https://github.com/internetarchive/heritrix3
The project homepage is at: http://crawler.archive.org
The Docker Hub images are at: https://hub.docker.com/r/iipc/heritrix
And of course, this wiki's entry page is: https://github.com/internetarchive/heritrix3/wiki
In a release number X.Y.Z, X is the 'major' release number, Y is the
'minor' release number, and 'Z' is the 'micro' release number. Interim releases
may also have an additional -SUFFIX. For more details, see Version
Numbering
Before any release, verify that:
- all tracked issues targeted for that release are resolved or rescheduled for a later release
- the continuous build box builds successfully, and all automatic unit tests pass, both on a local developer box and the build box
- the lead developer agrees the code is ready for release and has reviewed recent commit logs for areas of concern
- committers have been aware a release is upcoming for a reasonable period (days for micro releases; a week+ for minor releases) and refrained from making destabilizing changes
(For 'minor' and 'major' releases, other production-scale test crawling should have already occurred, and an announced 'code freeze' on the relevant trunk may have been in effect for a week or more.)
Using previous wiki page Release Notes as a template, create a skeleton wiki page Release Notes for the planned version. Leave the area where a release date is declared with a 'planned' or 'TK'/'TBD' ('to come' or 'to be determined') notation.
Add notes there of significant changes anyone upgrading should be aware of, with links to other wiki pages or JIRA issues with more info.
Use the dynamic-inclusion links to pull in a live copy of the 'release notes' issue list from JIRA.
Add acknowledgement of any new or outside contributors to this release.
Make a commit to the trunk that sets the official release version number and links the in-distribution 'release notes' to the full wiki release notes.
Verify all expected artifacts (.tar.gz, .zip, -src.tar.gz, -src.zip)
were created and have their official distribution names.
Download these each to a remote directory and confirm they expand without error and create expected directory trees.
For at least the .tar.gz, launch the crawler with a webui. Connect to
the webui and verify visible version identifiers are as expected.
Using the default profile, configure a minimal test crawl of a several-pages site (>1 page, <100). Launch crawl and verify expected output in crawl.log and normal termination of crawl when finished.
The main project POM has an ossrh build profile, intended to be used to submit Maven artefacts to Maven Central, as per the OSSRH Guide.
To use it, you'll need an OSSRH account and you'll need to request access by getting a current user who as the rights to push to org.archive to comment here with a request to add you to the account.
Then, you'll need to add your username and password to your Maven ~/.m2/settings.xml file, using the sonatype-nexus-staging and sonatype-nexus-snapshots IDs, like this:
<servers>
<server>
<id>sonatype-nexus-snapshots</id>
<username>anjackson</username>
<password>********</password>
</server>
<server>
<id>sonatype-nexus-staging</id>
<username>anjackson</username>
<password>********</password>
</server>
</servers>
and set up GPG as outlined in the OSSRH guide.
Then, you should be able to deploy snapshots with
mvn -Possrh clean deploy
and for releases:
mvn -Possrh release:clean release:prepare
mvn -Possrh release:perform
Note that there may be problems GPG-signing things unless you set a GPG_TTY=$(tty) environment variable, see this for more details.
If there is a problem, you can try mvn release:rollback but sometimes you'll have to delete the local tag (if it's been created) or reset our git repository.
To get the change log right, we need to do it after the release so the changes get associated with the new release tag.
We can update the change log via github-changelog-generator. You'll need a suitable token, then you can use:
export CHANGELOG_GITHUB_TOKEN="«your-40-digit-github-token»"
github_changelog_generator -u internetarchive -p heritrix3 --release-branch master
Then commit the updated CHANGELOG.md to the master branch.
Go to https://github.com/internetarchive/heritrix3/releases and create a release from the release tag. Add a brief summary and include links to the dist TAR and ZIP files hosted on Maven Central (see e.g. https://oss.sonatype.org/content/repositories/releases/org/archive/heritrix/heritrix/3.4.0-20190205/).
Update the wiki release notes with the actual release date.
Update the project wiki front page to list the new release as the latest, and adjust other wording about upcoming releases accordingly.
Send email to the archive-crawler project list announcing the release, with links to the release notes and download area.
Commit a change to the 'xdocs/index.xml' file in heritrix trunk which auto-generates the http://crawler.archive.org home page, to include a news item in the appropriate place announcing the latest release. (BROKEN NEEDS FIXING: Currently the auto-builds are not uploading the changed website automatically to crawler.archive.org.)
See Docker about building current images.
Build images for current release number, tag them with
<user>/heritrix[:<label>] (<user> being iipc, optional label
consisting of release number, contrib for contribution builds and
jre for Java JRE), and push them to Docker Hub.