7 18 2022 Tech Team Report - QualitativeDataRepository/TechnicalTeam GitHub Wiki

7-18-2022

Logged Tasks

                            Date             Task Hours (Main) Hours (EOLS) Hours (PII) Hours (QDAS)
11-Jul-2022 Report, meeting, investigate Zip previewer re: access pattern/performance 1 2
12-Jul-2022 Continue comparing internal vs javascript internal zip access, start debugging seek issue 3
13-Jul-2022 Coord/fix v5.11-related Drupal outage on prod, and title issue, deploy update to dev/stage as well, investigate low download counts w.r.t. db vacuum/analyze, start refactor/test qdas seekable channel, debug/propose fixes for missing checksum value (#8841), update metatags module 4 1
14-Jul-2022 PR #8844 to fix missing checksum, debug zip file error 2 4
15-Jul-2022 Find apache bug 2

Operations

  • Fixed delayed upgrade issue on prod (see Drupal item 2)
  • Investigated download counts ~2500 low - recommend vacuum/analyze for guestbookresponse table

Drupal

  • Updated metatags module
  • Updated to reflect changes in OAI_ORE export field names

Dataverse

  • Investigate/report(#8841)/fix issue with checksums missing in file display - created PR#8844

##QDAS

  • Started comparison of javascript and Java/Apache Commons Compress retrieval from within zip files
  • Ran into intermittent problem in getting files via QDAS previewer - discovered simple bug in Apache class - tested, reporting it today

Discussion

  • The fast estimate for total downloads added in a recent Dataverse release (for when direct count of table rows gets too slow) depends on accurate table statistics. A community report indicated an undercount of 1-2% with 150K downloads. I checked our table and found ~3.5% undercount (~2500/69K). Testing on dev/stage (which had much smaller undercounts) suggests that doing a manual analyze (or vacuum/analyze) on the guestbookresponse table makes the estimate much more accurate. Discussion with IQSS made me realize that the table is handled by auto-vacuum, but since it is mostly write once, that essentially never triggers. I'd suggest at least a manual vacuum/analyze, but want to make sure there's no issue with that. Assuming this works as expected we may want to monitor/periodically vacuum/analyze.

Plans

  • AnnoRep - continue to explore/fix docx/pdf github issues
    • Deploy updates to dev/stage/prod
  • Drupal
    • Make homepage resilient vs. Dataverse failures
  • Dataverse
    • Popup info accessibility - IQSS likes the recommendations from the source I linked to, so this can be implemented along those lines.
    • QDAS planning/design/prototyping
      • Explore switch to Javascript retrieval of files within zip.
    • Still want to investigate the guestbook responses re version info not being included.
  • TBD: FRDR Security
  • Other tasks as discussed in strategic planning