Code structure - thoth-pub/thoth-dissemination GitHub Wiki
disseminator.py
:
- "Disseminates" a single work to a single platform, based on specified arguments.
- "Dissemination" process varies depending on platform requirements, but usually involves some or all of:
- calling the Thoth Export Server to retrieve a formatted metadata output file for the work
- calling the Thoth GraphQL API to retrieve raw metadata for the work, then reformatting it to match the platform's specifications
- retrieving one or more of the work's content files from their specified Location Full Text URLs.
- To add support for a new platform:
- Create a new instance of the
Uploader
class based on the platform's requirements - Update the list of platforms in
README.md
- Add names of any new platform-specific secrets/variables to
config.env.template
(note this is mostly unused in practice, but acts as a reference).
- Create a new instance of the
disseminate.yml
(in .github/workflows
):
- Base GitHub Action for simple running of
disseminator.py
. - In addition to dissemination, also carries out follow-up tasks, such as writing locations to Thoth (using
write_locations.py
) and sending notification emails, where appropriate. - Can be run either "manually" (via
manual_disseminate.yml
) or "automatically" (viabulk_disseminate.yml
). - All options for dissemination via GitHub Actions can be found in the Actions tab (e.g. the
manual_disseminate.yml
file corresponds to themanual-disseminate
Action). - For both "manual" and "automatic" dissemination to a specific platform, the relevant secrets/variables must be present in GitHub (see below).
manual-disseminate
:
- Allows ad-hoc dissemination (including follow-up tasks) of one or more works to a specified platform.
- Any platform supported by
disseminator.py
is immediately available as an option tomanual-disseminate
.
bulk-disseminate
:
- Allows scheduled dissemination (including follow-up tasks) of multiple works to a specified platform.
- Uses
obtain_new_ids.py
to determine the set of works to be disseminated. - To add support for a new platform:
- Create a new
${platform}_bulk_disseminate.yml
wrapper Action specifying the desired schedule. - Create a new instance of the
IDFinder
class inobtain_new_ids.py
specifying any platform-specific restrictions on the set of works to be disseminated in each scheduled run (e.g. published since last run, updated since last run, etc). - Update the [Workflow: automated dissemination]] wiki page with any relevant information about the new platform, particularly adding it to the [dissemination platforms table.
- Create a new
GitHub secrets/variables:
- These can be found in Settings > Secrets and variables > Actions.
- GitHub syntax requires that secret/variable names are uppercase and contain no hyphens. The scripts themselves usually expect lowercase and sometimes expect hyphens (e.g. in publisher IDs), so some conversion takes place across the Actions/scripts.
- The set of required secrets broadly reflects the list in
config.env.template
. These are credentials, and must not be exposed. - The set of required variables currently just specifies, for each platform, the set of publishers signed up to dissemination (
{$PLATFORM}_ENV_PUBLISHERS
), and (optionally) any works which should be omitted from dissemination ({$PLATFORM}_ENV_EXCEPTIONS
). These are not sensitive information, so can be stored as plaintext.