October 14, 2022 - UTMediaCAT/mediacat-docs GitHub Wiki
Agenda
- Shawn: access jupyter-lab and try to produce a visualization on arbutus with NYTimes twitter data
- Fenil: install Twitter crawler on Arbutus and try to crawl FoxNews handles:
- Fenil: try to get proquest working for mass download
- Fenil & Shawn are meeting on Saturday to go over entering Arbutus.
- Alejandro: ask digital alliance to increase Arbutus allowance
- Alejandro: ask Shengsong about moving storage to Arbutus, cc Shengsong
Arbutus
- getting on:
- managed to get on, set up
- web interface
Jupyter-lab
- doesn't see a postprocessor and data in the walk-through,
- looked at vector diagram software
Proquest
- does let us PDF version of content, but not full text
- something that is limiting us from downloading full text
Action Items
- Alejandro: send Shawn the updated results from NYT archive crawl
- Fenil: update server notes and documentation about how to connect to Graham cloud
- Fenil: look into using web interface on Graham cloud to download an image.
- Fenil: set up instance with 40 VCPUs & 5-6 TB of storage.
- Shawn: set up vector diagram software and Jupyter Lab environment on Arbutus cloud small instance
- Fenil: run complex query with all possible text aliases in proquest, send Alejandro number with an example of top 100.
Backburner
- create test results to check installation