Code structure - thoth-pub/thoth-dissemination GitHub Wiki
disseminator.py:
- "Disseminates" a single work to a single platform, based on specified arguments.
- "Dissemination" process varies depending on platform requirements, but usually involves some or all of:
- calling the Thoth Export Server to retrieve a formatted metadata output file for the work
- calling the Thoth GraphQL API to retrieve raw metadata for the work, then reformatting it to match the platform's specifications
- retrieving one or more of the work's content files from their specified Location Full Text URLs.
- To add support for a new platform:
- Create a new instance of the
Uploaderclass based on the platform's requirements - Update the list of platforms in
README.md - Add names of any new platform-specific secrets/variables to
config.env.template(note this is mostly unused in practice, but acts as a reference).
- Create a new instance of the
disseminate.yml (in .github/workflows):
- Base GitHub Action for simple running of
disseminator.py. - In addition to dissemination, also carries out follow-up tasks, such as writing locations to Thoth (using
write_locations.py) and sending notification emails, where appropriate. - Can be run either "manually" (via
manual_disseminate.yml) or "automatically" (viabulk_disseminate.yml). - All options for dissemination via GitHub Actions can be found in the Actions tab (e.g. the
manual_disseminate.ymlfile corresponds to themanual-disseminateAction). - For both "manual" and "automatic" dissemination to a specific platform, the relevant secrets/variables must be present in GitHub (see below).
manual-disseminate:
- Allows ad-hoc dissemination (including follow-up tasks) of one or more works to a specified platform.
- Any platform supported by
disseminator.pyis immediately available as an option tomanual-disseminate.
bulk-disseminate:
- Allows scheduled dissemination (including follow-up tasks) of multiple works to a specified platform.
- Uses
obtain_new_ids.pyto determine the set of works to be disseminated. - To add support for a new platform:
- Create a new
${platform}_bulk_disseminate.ymlwrapper Action specifying the desired schedule. - Create a new instance of the
IDFinderclass inobtain_new_ids.pyspecifying any platform-specific restrictions on the set of works to be disseminated in each scheduled run (e.g. published since last run, updated since last run, etc). - If required, add the new platform to the lists which trigger the
write-locationsand/orsend-emailfollow-up tasks indisseminate.yml. - Update the [Workflow: automated dissemination]] wiki page with any relevant information about the new platform, particularly adding it to the [dissemination platforms table.
- Create a new
GitHub secrets/variables:
- These can be found in Settings > Secrets and variables > Actions.
- GitHub syntax requires that secret/variable names are uppercase and contain no hyphens. The scripts themselves usually expect lowercase and sometimes expect hyphens (e.g. in publisher IDs), so some conversion takes place across the Actions/scripts.
- The set of required secrets broadly reflects the list in
config.env.template. These are credentials, and must not be exposed. - The set of required variables currently just specifies, for each platform, the set of publishers signed up to dissemination (
{$PLATFORM}_ENV_PUBLISHERS), and (optionally) any works which should be omitted from dissemination ({$PLATFORM}_ENV_EXCEPTIONS). These are not sensitive information, so can be stored as plaintext.