July 6, 2023 - UTMediaCAT/mediacat-docs GitHub Wiki

Agenda

try to restart the Mid E and then Israel NYT Archive crawls, and if that doesn't lead to expected result, re-do each one sequentially - Gy
check March 31 NYT archive results against NYT Archive postprocess results, as well as new NYT Archive results - Gy
keep trouble-shooting postprocessor including meeting together - Gy/Fr
find list of possible graphics through matplot and email to Alejandro - Fr
continue research into plotly integration in jupyterlab environment - Fr
research d3 graph for possible use for network graph - Fr
- if time, consider KPP dataset from sharepoint - Fr

issue has come up with Graham cloud that unable to upload folder quickly, and after trying other possible ways, permission denied to SSH into Graham

NYT archive (Mid E) for March 31, 2022 does have results for years where there seem to be zeros, like around 1980s and 2008-9
- also: 1989 has less total results even though results with hits doesn't decrease
- could be a combination of crawler and postprocessor errors.
need to troubleshoot new crawler problem
need to troubleshoot on-going crawls

Plotly is separate from Matpolot - different utilities
- Matplot: can't do buttons, Plotly: manage to do one

check crawl every 2 days - Gy
update the MVP esp wrt format of data going into postprocessor and coming out, and then as input to the visualization environment - Gy/Fr
push corrected postprocessor code to master - Gy/Fr
postprocessor: document with instructions the order of utilities and steps to use the postprocessor - Gy

write to cloud support at Digital Alliance - Gy
need to troubleshoot on-going crawls - Gy
troubleshoot crawler problems to set up NYT archive crawl - Gy
re-run NYT archive Mid E & Israel crawl using Arbutus and very slow - Gy
if time, download sets of crawled data in both JSON and CSV for FoxNews & WaPo twitter crawls and send link to Alejandro - Gy
meet Monday at 1pm to talk about postprocessor - Gy/Fr
continue sorting the different aspects of the button in plotly, and then try with different datasets - Fr
research d3 graph for possible use for network graph - Fr
- if time, consider KPP dataset from sharepoint - Fr