January 19, 2023 - UTMediaCAT/mediacat-docs GitHub Wiki

Agenda:

  • server: half node working? r-sync finished? back ups
  • data: any news from looking at logs? possibility of deleting /backup/ file?
  • twitter crawl: fox news? unable to produce citation scope

Server

  • r-sync:
    • finished and seems to have worked -- volume-backed storage
    • NFS style storage: accessed but didn't connect them yet
    • which instance will be primary one that is used
  • current set up:
    • half node: mediacat
    • initial one: large crawler
  • key pair:
    • need

Data:

  • logs reveal anything?
    • not very descriptive
    • small domain finished in the middle and not complete
    • NYT most descriptive:
  • deleting back up?
    • enough space to keep for now

Twitter crawl

  • fox news is going currently

Questions:

  • should we delete the media/data/backup -- it looks like it was created January 5, 2022 perhaps when Shengsong started
  • on-going r-sync back up -- from where to where
  • set up master key pair so that not devs who pass on information
  • ensure we aren't using /dev
  • how best to maintain record of crawls
  • start the Fox twitter crawl?
  • can we tell how far we got on crawls that were in storage?
  • meeting time with Kirsta and Nat

Action Items

  • formulate a question about the options for storage to clarify whether we can get internal storage (non-cloud) for back ups
  • Alejandro will generate keypair and Shawn will add
  • look at how domain crawl works and then start small domains crawl (Alejandro will send list)
  • look at postprocessor if time allows for WaPo twitter results
  • email Kirsta/Nat about meeting times