3 15 2021 Tech Team Report - QualitativeDataRepository/TechnicalTeam GitHub Wiki

3-15-2021

Logged Tasks

                            Date             Task Hours (Main) Hours (EOLS) Hours (PII)
8-Mar-2021 Report, meeting, blacklist UT Dorkbot in pidreports, investigate datacite xml code, investigate/help restore files for F6NJU10I (inc. prod db restore), clear space on dev 5
9-Mar-2021 Try deploy ~5.4 to stage - update java, payara config, investigate file accesses w.r.t. S3 intelligent store, discuss DataCite xml, anonymize version table entry in draft dataset, debug/fix Curation command 4
10-Mar-2021 Check ssh security, AnnoRep discussion, read Datacite mapping emails/respond 1 1
12-Mar-2021 Config Payara/J11 on stage, deploy ~5.4, update to solr 8.8.1 on dev, reindex 2

Summary

Dataverse

  • Participated in discussion of datacite metadata updates
  • Debugged/fixed reported issue with the Curate (silent version update) command where a change to whether users can request access was not working. The root cause was Dataverse storing that flag in both the dataset version (where the change was made correctly) and the dataset (which was being missed). Fix is deployed on dev/stage - need to create a PR.

Operations:

  • Helped restore files for a dataset - using a database backup to find the map between the ~random storage identifiers and the original file names.
  • Freed space on dev to handle db backup import.
  • Deployed pre-5.4 version to stage. Worked through the steps to make Java 11 the default and to update Payara to handle Java 11.
  • Added the ability to blacklist to the monthly pid reporting script and blacklisted UT Dorkbot to start. Verified by running against last month's report (removed 85 calls from UT Dorkbot), blacklist will be applied to next month's report automatically.
  • Anonymized the contributor (listed in the versions table) for a Dataset per request.
  • Did some quick queries to help estimate the impact of intelligent S3 storage at AWS - as reported in slack, looks like most of our data would be in the lower cost tier.
  • Pinged Seba about new ssh vulnerability
  • Upgraded solr to 8.8.1 on dev (nominally recommended for 5.4), will do so on stage at some point.

AnnoRep:

  • Discussed API and pdf/annotation production in slack with To

##Discussion:

  • Actions from the Datacite metadata discussion?

Plans

  • DataCite updates?
  • Curate PR
  • Anno-Rep work -- continue docx parsing to get generate and then store an annotations file (as aux file in dataverse) -- start deploying service on dev once the basics are in place -- support use of Dataverse API as needed

Still TBD:

  • Drupal 9/composer 2/3