May 20, 2021 - UTMediaCAT/mediacat-docs GitHub Wiki
Agenda
-
Alejandro: Introduction to the Project
-
Raiyan: update to domain crawler: completed batch crawler, after every round of crawls, updates the user.
need to do a test run for single domain, the regular crawler should work
there is a script that Jacqueline put together, one-bash script, & June put together a script that is a one site post-processor. we can get back to this next week
for testing the batch crawler, should have a smaller scope -- run it for a week
Alejandro will check domains without pop ups and send to Raiyan.
Raiyan also did a bit of refactoring on original domain crawler to make it more modular, less .0
further tasks: will talk to Jacqueline about setting up instances this week, and has a few refactoring questions to follow up on
-
Raiyan: Introduction to Github and coding practice
-
Jacqueline: Introduction to working with ComputeCanada resources