July 29, 2021 - UTMediaCAT/mediacat-docs GitHub Wiki

Agenda

  1. set up politics subdomain under different instance
  2. update Apify
  3. look at postprocessor
  4. any ideas for speeding up: write down for next devs
  5. look at compute canada questionnaire
  6. for meeting with Kirsta: SWPP answers
  • update on crawls
  • question: where are the results for the crawls being stored -- please add to MVP

Apify update

  • updating code to Apify 1.0

Politics subdomain

  • logs showing politics articles

Instances:

  • spoke to Jacqueline about how instances were set up, and she explained to merge; however already have a large graham instance

Postprocessor

  • no progress

Crawler update

  • NYT/Mid E crawl is still going : 100,000+ JSONs
  • NYT twitter crawl: all done,