Meeting: Thursday, October 22, 2015 - UTMediaCAT/projectdocs GitHub Wiki
yuya and william multiprocessing code conflicts, merge and debug
new meeting time: 4-6:30 tues testing crawler - freezing, not yet confirmed though no log output errors
queue hits from google searches not yet do merge of multiprocessing function
check back in two weeks to see how merging has gone and debugging of multiprocessing function
queuing of backlinks - majestic and ahref for any given web domain many urls with backlinks, some which don't exist can narrow those for particular referring sites interested in can get excel spreadsheet with info from screenshot
css selectors regex for mondoweiss
nto exhaustive css selectors not even the same across theboard still helpful
backlinksand sites duplicate links
warc implemented
Rogers and Jai to set up instance with newspaper running, run RSS feeds per week
db merge shouldn't be an issue unless crawler updates scan twitter handles
sometimes crawler will crawl original site page and comment page