Puppeteer restarted with max concurrency, doesn't seem to address the performance issue
Jacqueline is writing tests to identify why we are seeing this slow performance from Puppeteer
Jacqueline and Raiyan got Cheerio working, and it retrieves the relevant data despite the lack of Javascript rendering
Cheerio is being run on a test instance, in one day it looked at 13,000 links, and it self-stopped
Cheerio does not have the "headless browser" and this could be an issue for site blocks
Raiyan and Jacqueline will continue to investigate, with a sample of the scope that includes sites returning a 1 count
Currently we are using puppeteer through Apify SDK - selecting media to not render would require making changes through Puppeteer directly without Apify SDK
Metascraper
Alex added metascraper data columns, and created corresponding tests
Mediacat Domain Crawler PR merged
Post-processor
Amy added logic to interest output for sorting
Amy to run the post-processor on the whole Twitter output