Prepping data - mjanowiecki/levy-api Wiki

Workflow for preparing metadata for new items for ingest into Drupal.

Get list of existing terms from Drupal

  1. Get existing taxonomy terms from Drupal.

  2. Get existing levy_collection_names from Drupal.

Get list of terms from new data.

  1. Get list of taxonomy terms and levy_collection_names from spreadsheet of new data.
    • input:
      • spreadsheet of new data
    • script: explodeTaxonomiesAndNames.py
    • outputs:
      • levy-api/aggregated-taxonomies (new items aggregated by taxonomy name)
      • levy-api/aggregated-roles (new items aggregated by levy_collection_names and grouped by role)

Determine what terms need to be created in Drupal.

  1. Compare taxonomy terms from new items to existing terms in Drupal.

    • inputs:
      • spreadsheets in levy-api/existing-taxonomies
      • spreadsheets in levy-api/aggregated-taxonomies
    • script: findExistingTaxTermsAndTermsToCreate.py
    • outputs:
      • levy-api/items-matched (items aggregated by taxonomy terms with Drupal identifiers added, if found)
      • taxonomyTermsDone.csv (list of taxonomy terms that already exist in Drupal)
      • taxonomyTermsToCreate.csv (list of taxonomy terms that DO NOT exist in Drupal and need to be created)
  2. Compare levy_collection_names from new items to existing terms in Drupal.

    • inputs:
      • allCollectionNames.csv (spreadsheet containing all existing levy_collection_names in Drupal)
      • levy-api/aggregated-roles (spreadsheets of levy_collection_names grouped by role and aggregated by title)
    • script: findExistingCollNamesAndNamesToCreate.py
    • output:
      • matched_CollectionNames.csv (items aggregated by levy_collection_names with Drupal identifiers added, if found)
      • levy_collection_namesDone.csv (list of levy_collection_names that already exist in Drupal)
      • levy_collection_namesToCreate.csv (list of levy_collection_names that DO NOT exist in Drupal and need to be created)