February 9, 2023 - UTMediaCAT/mediacat-docs GitHub Wiki
Agenda
- Nat any new ideas about counting URLs?
- key pair -- Alejandro and Irfan comms
- Shawn -- 100 URLs per website?
- Shawn -- checking numbers for new small domain crawl
- no answer yet about /nearline
- any answer from Shengsong or news about json-csv conversion
- postprocessor
old small domain
- problem with conversion json-csv is an issue
- Shengsong couldn't find the converter
- Nat helped
- Shengsong said that with the same call, should re-start the crawl
- let's see if it works - 2 weeks and then check
- Shawn will document and a bit of code review of this function
other ways to count URLs
- just use internet archive summary page
postprocessor
- has count for both Twitter crawls and will update on crawl list
Action Items
- add documentation about the json-csv conversion and re-starting crawl
- do a bit of code review of the re-start function