February 11, 2021 - UTMediaCAT/mediacat-docs GitHub Wiki
Meeting Notes
- Upgrading sudo for all instances on Compute Canada
- Post-processor framework
- First priority is incorporating a way of re-starting post-processor if it is stopped
- Data model changes
- Right now, a mentioned twitter account will receive hits for all of its tweets - this needs to be changed
- Crawler performance
- "I don't care about cookies" extension tested, didn't work
- Stealth mode for Puppeteer also attempted, didn't work
- Extensions may work for a certain website but not others
- Looking for popup box of each page and manually pressing "accept" for cookies to be incorporated in code
- There might not be a generalized way to handle this - different websites will have different popup identifiers
- Focusing on 5-10 sites at first, targeting pop-ups by their class
- From this, could create a CSV with the blocked websites and their corresponding class for the pop-up
- Preliminary Jupyter Notebook for Visualizations
- Put on hold last week
- Conversion to CSV can be worked on this week, may be shifted if adjustments are made to the JSON
- Additional development documents to be organized and incorporated into MVP doc