June 30, 2023 - UTMediaCAT/mediacat-docs GitHub Wiki

Agenda

  • ask Shawn about NYT archive (politics) - Gy
  • set up domain crawler on server - Fr
  • make a new tab in the crawl index spreadsheet to track the number for the small domain crawl - Al
  • add numbers for each domain to new tab - Gy
  • postprocessor: continue to troubleshoot & especially look at formatting - and ask Shawn - Gy
  • postprocessor: document with instructions the order of utilities and steps to use the postprocessor - Gy
  • assess whether MatPlot has features that enable UI faster - Fr
  • use existing data to test on jupyterlab platform - FR
  • research whether there are existing libraries with friendlier UI than Jupyterlab - Fr
  • write to Kirsta and Nat about the use of Jupyterlab - Al
  • meet on MOnday June 26th 11am toronto to talk about postprocessor - Gy/Fr

Crawler/Server

  • no answers from Shawn
  • managed to solve problem of bug in the output python file, and compiler python bug; Shawn might have done it a different way
    • still getting zero results with test data
    • once all the bugs are solved and tested, push to master
  • NYT archive crawls: Mid E/Palestinian/Israel
    • Palestinian archive crawl has expected number of 60,000 , but Mid E (less than half) and Israel (about half) were not the expected number
  • Francisco managed to get domain crawler to work on national post

Visualization

  • researching UI and visualization platforms - plotly integration that works with Matplot and Jupyterlab for user interface
    • run through Jupyterlab environment, and once on that environment through plotly, can use user interface
  • using existing data

On-going task:

  • check crawl every 2 days - Gy
  • update the MVP esp wrt format of data going into postprocessor and coming out, and then as input to the visualization environment - Gy/Fr
  • push corrected postprocessor code to master - Gy/Fr
  • postprocessor: document with instructions the order of utilities and steps to use the postprocessor - Gy

Action Items:

  • try to restart the Mid E and then Israel NYT Archive crawls, and if that doesn't lead to expected result, re-do each one sequentially - Gy
  • check March 31 NYT archive results against NYT Archive postprocess results, as well as new NYT Archive results - Gy
  • keep trouble-shooting postprocessor including meeting together - Gy/Fr
  • find list of possible graphics through matplot and email to Alejandro - Fr
  • continue research into plotly integration in jupyterlab environment - Fr
  • research d3 graph for possible use for network graph - Fr
    • if time, consider KPP dataset from sharepoint - Fr