January 26, 2023 - UTMediaCAT/mediacat-docs GitHub Wiki

Agenda

formulate a question about the options for storage to clarify whether we can get internal storage (non-cloud) for back ups
Alejandro will generate keypair and Shawn will add
look at how domain crawl works and then start small domains crawl (Alejandro will send list)
look at postprocessor if time allows for WaPo twitter results
email Kirsta/Nat about meeting times

received email from security, and needed to change settings on NFS-style storage, now need to add IP into security group for
old cartography standards that needed to be removed

Foxnews twitter crawl is done
a few users blank: RealLaurieDhue BenWeinthal MarieHarfonFox MaurielleFOX2 ShainaFOX29 Nicoledoesnews MattMillerWSYM MarcLFOX13

had a few issues with initial attempts
had to reinstall node dependencies, and did manage to get it going
did try to put the settings so that it was not full speed
queue: should be written to a file which potentially
- big result set: check against existing result set

should we delete the media/data/backup -- it looks like it was created January 5, 2022 perhaps when Shengsong started
on-going r-sync back up -- from where to where
set up master key pair so that not devs who pass on information
ensure we aren't using /dev
how best to maintain record of crawls
- see https://docs.google.com/spreadsheets/d/1Uyi53IwyNOp92E4M31Q_aiidxAhRLjU4PMokz2hWQYY/edit#gid=0
start the Fox twitter crawl?
can we tell how far we got on crawls that were in storage?
meeting time with Kirsta and Nat

Nat will ask Irfan to send in details for adding sys admin for access to new server resources
Alejandro will generate a key pair and IP information also: IP and CIDR
Shawn will ask Shengsong about script to convert json to csv and also about jupyter lab notebook
formulate a question about the options for storage to clarify whether we can get internal storage (non-cloud) for back ups