February 18, 2021 - UTMediaCAT/mediacat-docs GitHub Wiki

Meeting notes

  • Remount instance 2 Graham cloud
    • Test instance for crawler unmounted, and Jacqueline is trying to remount the instance to get the files back
      • Contains information about what files are not working
      • If remount doesn't work, it can be uninstalled and restarted fresh
  • Crawler performance
    • Building lists of URLs with pop-ups, URLs that exit immediately, etc, inside the issue
      • To be translated to spreadsheet
    • Raiyan explored derstandard.at domain, showing that these sites may be navigable if a first acceptance of cookies click happens
    • Different sites didn't work for Cheerio and Puppeteer
    • Database needs to be set up again to get this data on sites
  • First version of converter json-csv was pushed
    • To be tested with entire data