Feb 8, 2024 - UTMediaCAT/mediacat-docs GitHub Wiki

Agenda

  • attempt re-start of NYT Archive Mid E (both Graham and Arbutus IPs) - Gy
  • continue combinging results and document - Gy
  • if time, contact Nat about new kind of error with Arbutus cloud with nytimes re-start - Gy
  • unit-testing each function of the postprocessor for IA dataset - Ar
  • if unit-testing shows accuracy, then request IA Electronic Intifada dataset from Raazia and proceed with postprocessing - Ar
  • continue with postprocessing of WaPo - Fr
  • continue to work on figuring out IA NYT estimating - Ra

Crawls

  • error is gone on Arbutus
  • NYT Archive Mid E - waiting to see how it goes

IA Crawl

  • changed code to use pagination but getting error that connection refused
    • no solutions that are online work (other cases aren't IA related)
    • similar error as before but ignored

Postprocessing

Action Items

  • continue combinging results and document (reading week) - Gy
  • monitor NYT archive and send email to Alejandro to update - Gy
  • follow up with IA regarding connection refused error - Ra
  • ask Nat for meeting about connection refused error - Ra
  • try another kind of crawl to see if there's a refused error - Ra
  • try to update version of node to see if that helps - Ra
  • take a sample of Wa/Po and see if can reproduce the right result - Fr
  • follow up by email about Wa/Po output number - Fr
  • unit-testing each function of the postprocessor for IA dataset - Ar
  • if unit-testing shows accuracy, then request IA Electronic Intifada dataset from Raazia and proceed with postprocessing - Ar