5 31 2022 Tech Team Report - QualitativeDataRepository/TechnicalTeam GitHub Wiki

5-31-2022

Logged Tasks

                            Date             Task Hours (Main) Hours (EOLS) Hours (PII) Hours (QDAS)
23-May-2022 Report, coord re: licenses, investigate StorJ 100GB upload 2
24-May-2022 Mtg, send storJ note, plan replace queries, check storJ docs for versioning 2
25-May-2022 Check backup scripts/investigate issues w.r.t. drupal and Dataverse domain, coord with storJ, Seba 3
26-May-2022 Walk through all license changes (except 1), coord re: backups, storJ 4
27-May-2022 Test QDAS on file store, investigate JS errors, how to handle qdc or zip files, update CC0 licenses on prod, coord re: heal.tsv block, update Drupal core (sec). 2 2

Drupal

  • Updated Drupal core to 9.3.14 for security, deployed to dev/stage.

Operations

  • Updated license terms for published datasets on prod
    • added CC-BY-SA for one dataset
    • changed all published datasets to QDR Standard or Controlled licenses with all special conditions (copyrights, etc.) retained in "Terms of Access for Restricted Files" - coordinated to handle text variations/one-off cases.
    • changed old CC0 licenses (usually first of two versions) to match later versions or use QDR Standard.
  • Tested StorJ for 100 1GB file upload - StorJ worked, discovered it is slow in freeing space w.r.t. allowing more uploads within quota, saw a timeout that stopped dataset page from updating properly (upload completed successfully though).
  • Investigated storJ versioning - slated for Q3 this year. Until then - rely on backups?
  • Investigated backup processes re: StorJ
    • looks like the s3 sync is running at Syracuse (asked Seba to confirm) which means storJ backup would not go through AWS
    • found that drupal backup was still doing d7 instead of d8 - updated it to backup drupal8 db
    • found solr backup was doing /opt/solr* which also backed up old solr copy - changed it's name to reduce backup size/only backup solr 8.11.1
    • found that backup of Dataverse config (at /srv/glassfish/dataverse) was not working - this path is a symbolic link and the tar step does not follow the link to do the backup. I did not fix this since /srv/glassfish/dataverse/files/* has many GBs of old temp/data files that probably don't need backup. (we really want the domain configuration and probably a few things like logos/icons. (Note that duply is used here, so if we did backup all of /srv/glassfish/dataverse/*, we'd be generating a large tar but duply may then only transfer changes?)

##Dataverse

  • Tested performance of qdas previewer over file store (better than s3) (didn't try storJ - expect that to be worse than s3)
  • Investigated qdas previewer error re: missing 'Graph' section
  • Investigated how to modify qdas previewer to work with zip or separate qde/qdc file

Discussion

  • Large upload 504 on stage (< 5minutes) - should it be lengthened/is it longer on prod?
  • Fix for /srv/glassfish/dataverse/* backup?
  • File storage pricing re: qdas / priority of improving performance for s3

Plans

  • AnnoRep - continue to explore/fix docx/pdf github issues
    • Deploy updates to dev/stage/prod
  • Dataverse
    • Popup info accessibility - IQSS likes the recommendations from the source I linked to, so this can be implemented along those lines.
    • QDAS planning/design/prototyping
      • Investigate performance of community zip previewer
      • Add error handling for format variations
      • Assess whether zip access is enough/project/other files need to be cached as aux files, etc.
    • Still want to investigate the guestbook responses re version info not being included.
  • TBD: FRDR Security
  • Other tasks as discussed in strategic planning