January 26, 2023 - UTMediaCAT/mediacat-docs GitHub Wiki
Agenda
- formulate a question about the options for storage to clarify whether we can get internal storage (non-cloud) for back ups
- Alejandro will generate keypair and Shawn will add
- look at how domain crawl works and then start small domains crawl (Alejandro will send list)
- look at postprocessor if time allows for WaPo twitter results
- email Kirsta/Nat about meeting times
Server
- received email from security, and needed to change settings on NFS-style storage, now need to add IP into security group for
- old cartography standards that needed to be removed
Twitter
- Foxnews twitter crawl is done
- a few users blank: RealLaurieDhue
BenWeinthal
MarieHarfonFox
MaurielleFOX2
ShainaFOX29
Nicoledoesnews
MattMillerWSYM
MarcLFOX13
domain crawler:
- had a few issues with initial attempts
- had to reinstall node dependencies, and did manage to get it going
- did try to put the settings so that it was not full speed
- queue: should be written to a file which potentially
- big result set: check against existing result set
Questions:
- should we delete the media/data/backup -- it looks like it was created January 5, 2022 perhaps when Shengsong started
- on-going r-sync back up -- from where to where
- set up master key pair so that not devs who pass on information
- ensure we aren't using /dev
- how best to maintain record of crawls
- start the Fox twitter crawl?
- can we tell how far we got on crawls that were in storage?
- meeting time with Kirsta and Nat
Action Items:
- Nat will ask Irfan to send in details for adding sys admin for access to new server resources
- Alejandro will generate a key pair and IP information also: IP and CIDR
- Shawn will ask Shengsong about script to convert json to csv and also about jupyter lab notebook
- formulate a question about the options for storage to clarify whether we can get internal storage (non-cloud) for back ups