February 23, 2023 - UTMediaCAT/mediacat-docs GitHub Wiki

Agenda:

  • next week 4pm
  • update notes from last day
  • restarted crawl?
  • add documentation about the json-csv conversion and re-starting crawl
  • do a bit of code review of the re-start function
  • question about key pair
  • postprocessor

restarted crawl

  • number still increasing: checked count a few days ago, a few thousand more urls to some domains
  • had to restart a few times: too many failed requests and saw a brake
    • tabletmag gave us a 400s
  • cycles through the domains
  • email function not working

adding documentation?

  • update readme files: push to github repository
  • push the documentation

new small domain

  • stop

key-pair

  • Irfan sent key-pair:

postprocessor

  • spent a lot of time
  • error-handling: errors go quietly, errors aren't getting logged
  • no code logging errors in certain places
  • thinks that he understands the error has to do with CSV file combination
    • recombining after the meeting
  • otherwise the documentation has been good enough

Action Items

  • Alejandro: write to Globus support
  • check on logs to see which domains giving us the rejections
  • slow down restarted crawl
  • look into email function
  • add Irfan's key-pair and Alejandro's
  • Alejandro: change password
  • add more logging of errors to postprocessor
  • Alejandro come up with domain crawl scope
  • Alejandro: write to security person about adding Shawn to Graham