June 1, 2023 - UTMediaCAT/mediacat-docs GitHub Wiki
Agenda
logistics
communications
hours
server and installation
able to access current crawls?
reading code?
postprocessor
other questions for Shawn
nytimes archive crawl issues
visualization environment - soon
Logistics
Server & Basic operations
access granted to Arbutus, Gy able to log in, Francisco will confirm
Shawn didn't show how to set up instance, showed different repositories and readmes
nytimes
nytimes archive crawl (Mid E/Israel/Palestinians) has a couple of years (1979-1981, 2006-2011) with lower results, unclear if this is a crawl or postprocessor error
Action Items
do the two mandatory training
set up instance of mediacat domain crawler
Gy write email to Shawn to ask for demo on setting up crawl (domain and twitter) and to show how to run postprocessor on Monday at 5:30pm
check on running crawls every 2-3 days - Gy
figure out how to count total URLs crawled
look into above NYT archive crawl to see if crawl errors - Gy
start looking at Shengsong (Charles Xu) jupyterlab environment - Francisco