Code structure - thoth-pub/thoth-dissemination GitHub Wiki

disseminator.py:

  • "Disseminates" a single work to a single platform, based on specified arguments.
  • "Dissemination" process varies depending on platform requirements, but usually involves some or all of:
    • calling the Thoth Export Server to retrieve a formatted metadata output file for the work
    • calling the Thoth GraphQL API to retrieve raw metadata for the work, then reformatting it to match the platform's specifications
    • retrieving one or more of the work's content files from their specified Location Full Text URLs.
  • To add support for a new platform:
    • Create a new instance of the Uploader class based on the platform's requirements
    • Update the list of platforms in README.md
    • Add names of any new platform-specific secrets/variables to config.env.template (note this is mostly unused in practice, but acts as a reference).

disseminate.yml (in .github/workflows):

  • Base GitHub Action for simple running of disseminator.py.
  • In addition to dissemination, also carries out follow-up tasks, such as writing locations to Thoth (using write_locations.py) and sending notification emails, where appropriate.
  • Can be run either "manually" (via manual_disseminate.yml) or "automatically" (via bulk_disseminate.yml).
  • All options for dissemination via GitHub Actions can be found in the Actions tab (e.g. the manual_disseminate.yml file corresponds to the manual-disseminate Action).
  • For both "manual" and "automatic" dissemination to a specific platform, the relevant secrets/variables must be present in GitHub (see below).

manual-disseminate:

  • Allows ad-hoc dissemination (including follow-up tasks) of one or more works to a specified platform.
  • Any platform supported by disseminator.py is immediately available as an option to manual-disseminate.

bulk-disseminate:

  • Allows scheduled dissemination (including follow-up tasks) of multiple works to a specified platform.
  • Uses obtain_new_ids.py to determine the set of works to be disseminated.
  • To add support for a new platform:
    • Create a new ${platform}_bulk_disseminate.yml wrapper Action specifying the desired schedule.
    • Create a new instance of the IDFinder class in obtain_new_ids.py specifying any platform-specific restrictions on the set of works to be disseminated in each scheduled run (e.g. published since last run, updated since last run, etc).
    • Update the [Workflow: automated dissemination]] wiki page with any relevant information about the new platform, particularly adding it to the [dissemination platforms table.

GitHub secrets/variables:

  • These can be found in Settings > Secrets and variables > Actions.
  • GitHub syntax requires that secret/variable names are uppercase and contain no hyphens. The scripts themselves usually expect lowercase and sometimes expect hyphens (e.g. in publisher IDs), so some conversion takes place across the Actions/scripts.
  • The set of required secrets broadly reflects the list in config.env.template. These are credentials, and must not be exposed.
  • The set of required variables currently just specifies, for each platform, the set of publishers signed up to dissemination ({$PLATFORM}_ENV_PUBLISHERS), and (optionally) any works which should be omitted from dissemination ({$PLATFORM}_ENV_EXCEPTIONS). These are not sensitive information, so can be stored as plaintext.