Sep 21, 2023 - UTMediaCAT/mediacat-docs GitHub Wiki
Agenda
- forms
- new time
- email Nat about NPM error and also history issue with restarting crawl
- Aryan will work with Francisco to figure out what trouble-shooting needs to be done
- Gy will help Raazia learn how to set up an instance
Postprocessor
- today running postprocessor on foxnews to see if it would work for large dataset
- still need to trouble-shoot accuracy
Crawler/server
- worked on running the crawler
- reading on webcrawling in general and some resources
- reading the NYT crawl and the batch
- meeting with Nat tmrw
Action Items
- Aryan and Francisco: still learning code base
- learning the dask dataframe with Nat in meeting first
- then turn to fixing accuracy of the postprocessor
- Francisco will send some resources for Aryan about Dask and Pandas
- Raazia: continue reading about webcrawling, meet with Nat and Gy, and look at web archive API