Workflow: automated dissemination - thoth-pub/thoth-dissemination GitHub Wiki

This page explains the process for automated dissemination of Thoth works to other platforms, and denotes the steps the Project Manager needs to take to ensure it runs smoothly.

Background

At present, works are automatically disseminated to the Internet Archive, Loughborough University's Figshare repository, Cambridge University Library's TOAN pilot DSpace repository, Google Play Books (via Thoth's dedicated server), and OAPEN. The same process is also used to register DOIs with Crossref. Any individual publisher can opt in to any individual platform (this is controlled via repository variables). The automated process for each platform runs on a scheduled basis. It aims to perform a "catch-up" dissemination of works which have become valid for submission since the previous run.

The process for each platform is controlled via a GitHub Action named [XX]-bulk-disseminate (where XX = ia for Internet Archive, fs for Figshare, cr for Crossref, etc). They can be found under the Actions tab of this GitHub repository. The direct link to the Actions tab is here.

From each Action's page, anyone can view details about times when the process was run, and members of Thoth can enable/disable the process. Successful runs are marked with a green tick, and failed runs are marked with a red cross.

Works which have been successfully disseminated via this process may be made visible on the target platform, sometimes under a dedicated Thoth Archiving Network collection. It may take a few hours before a disseminated work is made fully available.

Dissemination platforms

Platform Publishers Schedule Target Action name Action link Repository variables Repository secrets Collection link
Internet Archive Open Book Publishers, punctum 1st of every month, before UK OOB All works published since last run ia-bulk-disseminate https://github.com/thoth-pub/thoth-dissemination/actions/workflows/ia_bulk_disseminate.yml IA_ENV_PUBLISHERS, IA_ENV_EXCEPTIONS IA_S3_ACCESS, IA_S3_SECRET https://archive.org/details/thoth-archiving-network
Loughborough Figshare Open Book Publishers 7th of every month, before UK OOB All works published in preceding month fs-bulk-disseminate https://github.com/thoth-pub/thoth-dissemination/actions/workflows/fs_bulk_disseminate.yml FS_ENV_PUBLISHERS FIGSHARE_TOKEN https://repository.lboro.ac.uk/Thoth_Archiving_Network
Cambridge University Library Open Book Publishers, punctum 7th of every month, before UK OOB All works published in preceding month cul-bulk-disseminate https://github.com/thoth-pub/thoth-dissemination/actions/workflows/cul_bulk_disseminate.yml CUL_ENV_PUBLISHERS CUL_PILOT_USER, CUL_PILOT_PW https://thoth-arch.lib.cam.ac.uk/home
Google Play Books African Minds, Mattering Press, mediastudies.press, Paideia Publishing Services, punctum, Scottish Universities Press Every day, before UK OOB All works published in preceding day gp-bulk-disseminate https://github.com/thoth-pub/thoth-dissemination/actions/workflows/gp_bulk_disseminate.yml GP_ENV_PUBLISHERS GOOGLE_PLAY_BUCKET, GOOGLE_SERVICE_ACCOUNT, GOOGLE_WORKLOAD_ID_PROVIDER N/A
OAPEN adocs, African Minds, Mattering Press, mediastudies.press, Open Book Publishers, punctum, Scottish Universities Press Every Monday, before UK OOB All works published in preceding week oapen-bulk-disseminate https://github.com/thoth-pub/thoth-dissemination/actions/workflows/oapen_bulk_disseminate.yml OAPEN_ENV_PUBLISHERS OAPEN_FTP_USER, OAPEN_FTP_PW, OAPEN_NOTIF_EMAIL N/A
Crossref adocs, African Minds, Bokförlaget Stolpe, Editorial Mar Caribe, Mattering Press, Mediastudies.press, Open Book Publishers, punctum, Scottish Universities Press, UEA Publishing Project Hourly, around 45 minutes past All published works updated since last run cr-bulk-disseminate https://github.com/thoth-pub/thoth-dissemination/actions/workflows/cr_bulk_disseminate.yml CR_ENV_PUBLISHERS, CR_ENV_EXCEPTIONS CROSSREF_USER_[XXXX], CROSSREF_PW_[XXXX] N/A (check DOI link(s) to confirm success)

Project Manager's responsibilities

Managing publisher opt-ins

  • The list of publishers who have opted in to automatic upload for each platform is stored under the [XX]_ENV_PUBLISHERS repository variable at https://github.com/thoth-pub/thoth-dissemination/settings/variables/actions (where XX represents the platform name, as above).
  • If a new publisher wants to join the scheme, the Project Manager can add their Thoth Publisher ID to the relevant list(s).
  • For platforms where individual publishers share their credentials with Thoth (e.g. Crossref), rather than Thoth having its own credentials (e.g. Internet Archive), additional steps are required:
    • The publisher's credentials need to be added to the list of repository secrets at https://github.com/thoth-pub/thoth-dissemination/settings/secrets/actions.
    • Create new repository secrets for both the username and password.
    • In the name box, e.g. for Crossref, the username needs to be added in the format CROSSREF_USER_[XXXX] and the password as CROSSREF_PW_[XXXX], where XXXX represents the publisher's Thoth ID, with hyphens changed to underscores and letters capitalised.
    • Add the value of the secret (e.g. the username or password in appropriate format) using the free-text box.
  • A similar process is required for Google Play Books, where each individual publisher has a "collection code".
    • Create a new repository secret for each publisher's collection code. The naming format is GOOGLE_PLAY_COLL_[XXXX], where XXXX represents the publisher's Thoth ID, with hyphens changed to underscores and letters capitalised.
  • Remember to add the publisher to the dissemination platforms table above.

Handling dissemination failures

  • The Project Manager should configure their GitHub settings so that they will receive alert emails if any of the processes fails. For the processes which only run once a month, it is also worth actively checking their status on the day of the run, by viewing the relevant Action page.
  • Detailed instructions on investigating failures are available here.
  • For Crossref, the process itself may succeed, but Crossref may later report back an error via email to [email protected] (an alias of [email protected]). Email filters ensure that any such emails go into the subfolder Crossref submissions > Error reports. This subfolder should be regularly checked in case of errors which the publisher needs to rectify. Once any error report email has been dealt with, it can be moved to the subfolder Crossref submissions > Checked.
  • Crossref have some helpful documentation on troubleshooting error reports.

Other maintenance

  • Each process can be enabled and disabled at the relevant Action link.
  • Note that "scheduled workflows are automatically disabled when no repository activity has occurred in 60 days". This is unlikely to happen often while Thoth is under active development, but if the repository has been inactive for close to 60 days, the warning banner below will be displayed. Click on "Continue running workflow" for each process in turn to prevent them from being automatically disabled.
  • Some processes have an additional [XX]_ENV_EXCEPTIONS repository variable, which stores a list of works which should be excluded from the automatic upload process. The Project Manager can add additional Thoth Work IDs to this list if required.
    • Currently for Internet Archive, the list represents legacy publications with no uploadable PDF version. Attempting to upload them would fail due to the lack of PDF.
    • Currently for Crossref, the list represents works with DOIs registered via an agency other than Crossref (e.g. DataCite). Attempting to submit them would look like it had succeeded, but would then trigger an error report with the message "Unknown publisher prefix".