Meeting: Thursday, October 22, 2015 - UTMediaCAT/projectdocs GitHub Wiki

yuya and william multiprocessing code conflicts, merge and debug

new meeting time: 4-6:30 tues testing crawler - freezing, not yet confirmed though no log output errors

queue hits from google searches not yet do merge of multiprocessing function

check back in two weeks to see how merging has gone and debugging of multiprocessing function

queuing of backlinks - majestic and ahref for any given web domain many urls with backlinks, some which don't exist can narrow those for particular referring sites interested in can get excel spreadsheet with info from screenshot

css selectors regex for mondoweiss

nto exhaustive css selectors not even the same across theboard still helpful

backlinksand sites duplicate links

warc implemented

Rogers and Jai to set up instance with newspaper running, run RSS feeds per week

db merge shouldn't be an issue unless crawler updates scan twitter handles

sometimes crawler will crawl original site page and comment page