Ran post-processor on 10 twitter users, which took 2 days
Began getting output based on the whole scope - ran into CSV issue due to newline in the plaintext
This quotation problem exists ~20 times in 5 million items
Output is at 600 000/5 000 000 since the weekend
Metascraper integration - merged and closed!
Crawler performance
Jacqueline implemented a manual queue so that more than 1 link is returned for the affected domains, this has created a lot of bugs
Is the crawler not able to mark links properly when manually queueing?
Error code - says either it is an error within API or user error misconfiguring the crawler
Jacqueline wrote a script that should re-start the crawl once stopped
Also built in an email notification tool to alert user once crawl has stopped
The above is based on Cheerio - but Cheerio and Puppeteer had the same issue
Nat's advice: get some of the sites you know are going to fail, find and isolate the pieces of code that are of note in the block, and communicate to the library developers/maintainers with the relevant information