8 18 2020 Tech Team Report - QualitativeDataRepository/TechnicalTeam GitHub Wiki

8-18-2020

Logged Tasks

                            Date             Task Hours (Main) Hours (EOLS) Hours (PII)
10-Aug-2020 Report, meeting, create IQSS7177/draft PR7178 for metrics 2
11-Aug-2020 Explore dynamic time series options, add ts endpoints, try direct-download with cors changes on dev 3
12-Aug-2020 Test CORS/CSP security additions for previewers 1
13-Aug-2020 Dataverse time series query, debug dev indexing issue, implement time series display, debug graph sizing issue 6
14-Aug-2020 Datasets monthly api, fix old dev db issues, add per-dataverse queries to dataset-metrics, deploy d8 module update to stage, deploy v4.20-q7 to stage 8

Summary

Metrics:

  • Created an issue/draft PR to share results with community
  • Started creating time-series API endpoints to allow dynamic reporting per dataverse in the dataverse-metrics app (previously, python scripts had to be run periodically to generate time series from individual per-month API calls, which doesn't scale well when you want per-dataverse metrics)
  • Created simplified dataverse count query
  • Debugged/fixed old dev database errors that affected counts (looks like two early datasets were deleted but only from the dvobject table, with all of their info in the dataset table, datasetversion, and metadata tables still intact (and linked through foreign keys)
  • Implemented time-series displays using the new APIs in dataverse-metrics and implemented a per-dataverse selection mechanism (click on specific dataverses in a tree or add ?parentAlias= to the URL). (In the process, I discovered that, with the d3/d3plus libraries, having two graphs with the same title results in them sharing display attributes such as width despite otherwise showing their own data correctly!)

Operations:

  • Updated stage with the latest Drupal 8 (a module update) and Dataverse (email updated in v4.20-qdr7) in prep for deploying to prod.
  • Successfully tested adjusting the CORS settings for the dev bucket to allow us to use direct-download from S3 without breaking previewers (We turned it off a while ago but using it should allow better performance (and direct upload if we ever want to try that). In doing that, I also looked more into what is required to only allow CORS requests from specific servers (e.g. from the previewers hosted on github to our S3 bucket) and learned about/tested using the Content Security Policy (CSP) standard. (For GDCC, I then write up my findings in https://github.com/GlobalDataverseCommunityConsortium/dataverse-previewers/wiki/Using-Previewers-with-download-redirects-from-S3).

Plans

  • Hoping to complete the per-dataverse time-series metrics displays and csv download this week
  • Ready to deploy D8 and Dataverse changes to prod - would be useful to discuss whether we should turn on direct download as part of that and whether we should also lock CORS down further using CSP (as noted above). (Direct download is probably worth it. The additional CORS constraints are not hard, but when CORS is only being used to download content files, and when using presigned URLs (versus web pages), it's not clear there's much someone could do. It is something that would stop others from continuing to use QDR's previewers instead of the ones hosted at GDCC.)
  • Start work to update the /replace API

and possibly:

  • file DOI reservations
  • Drupal 9

For Discussion

  • just the CORS/direct download from S3 item noted in Plans
⚠️ **GitHub.com Fallback** ⚠️