Workflow: automated dissemination - thoth-pub/thoth-dissemination GitHub Wiki

This page explains the process for automated dissemination of Thoth works to other platforms, and denotes the steps the Project Manager needs to take to ensure it runs smoothly.

Background

At present, works are automatically disseminated to the Internet Archive and to Loughborough University's Figshare repository. The same process is also used to register DOIs with Crossref. Any individual publisher can opt in to any individual platform (this is controlled via repository variables). The automated process for each platform runs on a scheduled basis. It aims to perform a "catch-up" dissemination of works which have become valid for submission since the previous run.

The process for each platform is controlled via a GitHub Action named [XX]-bulk-disseminate (where XX = ia for Internet Archive, fs for Figshare, cr for Crossref, etc). They can be found under the Actions tab of this GitHub repository. The direct link to the Actions tab is here.

From each Action's page, anyone can view details about times when the process was run, and members of Thoth can enable/disable the process. Successful runs are marked with a green tick, and failed runs are marked with a red cross.

Works which have been successfully disseminated via this process may be made visible on the target platform, sometimes under a dedicated Thoth Archiving Network collection. It may take a few hours before a disseminated work is made fully available.

Dissemination platforms

Platform Publishers Schedule Target Action name Action link Repository variables Collection link
Internet Archive Open Book Publishers, punctum 1st of every month, before UK OOB All works published since last run ia-bulk-disseminate https://github.com/thoth-pub/thoth-dissemination/actions/workflows/ia_bulk_disseminate.yml IA_ENV_PUBLISHERS, IA_ENV_EXCEPTIONS https://archive.org/details/thoth-archiving-network
Loughborough Figshare Open Book Publishers 7th of every month, before UK OOB All works published in preceding month fs-bulk-disseminate https://github.com/thoth-pub/thoth-dissemination/actions/workflows/fs_bulk_disseminate.yml FS_ENV_PUBLISHERS https://repository.lboro.ac.uk/Thoth_Archiving_Network
Crossref Open Book Publishers, punctum Hourly, around 45 minutes past All published works updated since last run cr-bulk-disseminate https://github.com/thoth-pub/thoth-dissemination/actions/workflows/cr_bulk_disseminate.yml CR_ENV_PUBLISHERS N/A (check DOI link(s) to confirm success)

Project Manager's responsibilities

Managing publisher opt-ins

  • The list of publishers who have opted in to automatic upload for each platform is stored under the [XX]_ENV_PUBLISHERS repository variable at https://github.com/thoth-pub/thoth-dissemination/settings/variables/actions (where XX represents the platform name, as above).
  • If a new publisher wants to join the scheme, the Project Manager can add their Thoth Publisher ID to the relevant list(s).
  • For platforms where individual publishers share their credentials with Thoth (e.g. Crossref), rather than Thoth having its own credentials (e.g. Internet Archive), additional steps are required:
  • The publisher's credentials need to be added to the list of repository secrets at https://github.com/thoth-pub/thoth-dissemination/settings/secrets/actions.
  • To add a new publisher to the list of repository secrets, e.g. for Crossref, the username needs to be added as CROSSREF_USER_[XXXX] and the password as CROSSREF_PW_[XXXX], where XXXX represents the publisher's Thoth ID, with hyphens changed to underscores and letters capitalised.
  • To add the publisher's credentials, add the value of the secret (e.g. username or password in appropriate format) using the free-text box.
  • The Thoth development team can handle this if preferred; steps noted here for full reference.

Handling dissemination failures

  • The Project Manager should configure their GitHub settings so that they will receive alert emails if any of the processes fails. For the processes which only run once a month, it is also worth actively checking their status on the day of the run, by viewing the relevant Action page.
  • Detailed instructions on investigating failures are available here.
  • For Crossref, the process itself may succeed, but Crossref may later report back an error via email to [email protected] (an alias of [email protected]). Email filters ensure that any such emails go into the subfolder Crossref submissions > Error reports. This subfolder should be regularly checked in case of errors which the publisher needs to rectify. Once any error report email has been dealt with, it can be moved to the subfolder Crossref submissions > Checked.
  • Crossref have some helpful documentation on troubleshooting error reports.

Other maintenance

  • Each process can be enabled and disabled at the relevant Action link.
  • Note that "scheduled workflows are automatically disabled when no repository activity has occurred in 60 days". This is unlikely to happen often while Thoth is under active development, but if the repository has been inactive for close to 60 days, the warning banner below will be displayed. Click on "Continue running workflow" for each process in turn to prevent them from being automatically disabled.
  • Some processes have an additional [XX]_ENV_EXCEPTIONS repository variable, which stores a list of works which should be excluded from the automatic upload process. The Project Manager can add additional Thoth Work IDs to this list if required.
    • Currently for Internet Archive, the list represents legacy publications with no uploadable PDF version. Attempting to upload them would fail due to the lack of PDF.
    • Currently for Crossref, the list represents works with DOIs registered via an agency other than Crossref (e.g. DataCite). Attempting to submit them would look like it had succeeded, but would then trigger an error report with the message "Unknown publisher prefix".