June 23, 2023 - UTMediaCAT/mediacat-docs GitHub Wiki

Agenda

  • set up NYT archive (middle east - Israel - Palestinians) crawl - gy
  • see if NYT archive (politics) crawl results somewhere - gy
  • write Shengsong to see if he downloaded full results for NYT Mid E archive crawl - Al
  • set up domain crawler on server - Fr
  • check jewishjournal data to see if corrupted - Gy
  • work visualizations with sample data and prepare demo - Fr
  • attempt postprocesser on sample from Foxnews twitter data, check Jerusalem Post or Times of Israel - Gy
    • if working: * run postprocessor on found NYT results, March 25, 2023

Crawls & Postprocessing

  • set up NYT archive (Mid E) and so far about 40,000 results
  • couldn't find NYT archive (politics), will ask Shawn
  • jewishjournal count: gives different numbers depending on when the count is run, and maybe partially corrupted
  • Postprocessor:
    • keeps getting 0 results even though input data has results, both twitter data and the NYT data

Visualizations

  • make sure columns are correct

On-going task:

  • check crawl every 2 days - Gy
  • update the MVP esp wrt format of data going into postprocessor and coming out, and then as input to the visualization environment - Gy/Fr

Action Items:

  • ask Shawn about NYT archive (politics) - Gy
  • set up domain crawler on server - Fr
  • make a new tab in the crawl index spreadsheet to track the number for the small domain crawl - Al
  • add numbers for each domain to new tab - Gy
  • postprocessor: continue to troubleshoot & especially look at formatting - and ask Shawn - Gy
  • postprocessor: document with instructions the order of utilities and steps to use the postprocessor - Gy
  • assess whether MatPlot has features that enable UI faster - Fr
  • use existing data to test on jupyterlab platform - FR
  • research whether there are existing libraries with friendlier UI than Jupyterlab - Fr
  • write to Kirsta and Nat about the use of Jupyterlab - Al
  • meet on MOnday June 26th 11am toronto to talk about postprocessor - Gy/Fr