January 26, 2023 - UTMediaCAT/mediacat-docs GitHub Wiki

Agenda

  • formulate a question about the options for storage to clarify whether we can get internal storage (non-cloud) for back ups
  • Alejandro will generate keypair and Shawn will add
  • look at how domain crawl works and then start small domains crawl (Alejandro will send list)
  • look at postprocessor if time allows for WaPo twitter results
  • email Kirsta/Nat about meeting times

Server

  • received email from security, and needed to change settings on NFS-style storage, now need to add IP into security group for
  • old cartography standards that needed to be removed

Twitter

  • Foxnews twitter crawl is done
  • a few users blank: RealLaurieDhue BenWeinthal MarieHarfonFox MaurielleFOX2 ShainaFOX29 Nicoledoesnews MattMillerWSYM MarcLFOX13

domain crawler:

  • had a few issues with initial attempts
  • had to reinstall node dependencies, and did manage to get it going
  • did try to put the settings so that it was not full speed
  • queue: should be written to a file which potentially
    • big result set: check against existing result set

Questions:

  • should we delete the media/data/backup -- it looks like it was created January 5, 2022 perhaps when Shengsong started
  • on-going r-sync back up -- from where to where
  • set up master key pair so that not devs who pass on information
  • ensure we aren't using /dev
  • how best to maintain record of crawls
  • start the Fox twitter crawl?
  • can we tell how far we got on crawls that were in storage?
  • meeting time with Kirsta and Nat

Action Items:

  • Nat will ask Irfan to send in details for adding sys admin for access to new server resources
  • Alejandro will generate a key pair and IP information also: IP and CIDR
  • Shawn will ask Shengsong about script to convert json to csv and also about jupyter lab notebook
  • formulate a question about the options for storage to clarify whether we can get internal storage (non-cloud) for back ups