February 11, 2021 - UTMediaCAT/mediacat-docs GitHub Wiki

Meeting Notes

  • Upgrading sudo for all instances on Compute Canada
    • Completed!
  • Post-processor framework
    • First priority is incorporating a way of re-starting post-processor if it is stopped
    • Data model changes
      • Right now, a mentioned twitter account will receive hits for all of its tweets - this needs to be changed
  • Crawler performance
    • "I don't care about cookies" extension tested, didn't work
    • Stealth mode for Puppeteer also attempted, didn't work
    • Extensions may work for a certain website but not others
    • Looking for popup box of each page and manually pressing "accept" for cookies to be incorporated in code
      • There might not be a generalized way to handle this - different websites will have different popup identifiers
    • Focusing on 5-10 sites at first, targeting pop-ups by their class
      • From this, could create a CSV with the blocked websites and their corresponding class for the pop-up
  • Preliminary Jupyter Notebook for Visualizations
    • Put on hold last week
    • Conversion to CSV can be worked on this week, may be shifted if adjustments are made to the JSON
  • Additional development documents to be organized and incorporated into MVP doc