May 20, 2021 - UTMediaCAT/mediacat-docs GitHub Wiki

Agenda

  • Alejandro: Introduction to the Project

  • Raiyan: update to domain crawler: completed batch crawler, after every round of crawls, updates the user.

need to do a test run for single domain, the regular crawler should work

there is a script that Jacqueline put together, one-bash script, & June put together a script that is a one site post-processor. we can get back to this next week

for testing the batch crawler, should have a smaller scope -- run it for a week

Alejandro will check domains without pop ups and send to Raiyan.

Raiyan also did a bit of refactoring on original domain crawler to make it more modular, less .0

further tasks: will talk to Jacqueline about setting up instances this week, and has a few refactoring questions to follow up on

  • Raiyan: Introduction to Github and coding practice

  • Jacqueline: Introduction to working with ComputeCanada resources