October 21, 2021 - UTMediaCAT/mediacat-docs GitHub Wiki
Agenda
John will continue looking at the various approaches to optimizing the postprocessor speed:
try solving the race condition when using the same dictionary by adding a locking mechanism
when using separate dictionaries, try a formal map produce from an existing python library
for one dictionary, try assigning separate keys to each process, with a shared dictionary.
Colin will look at the repo front-end, and esp look at visualizations ticket (#7); will author a new pull request and we'll look at this request next week
John will deprecate Voyage repository
Alejandro will follow up on SSH issue for Compute Canada
Notes
Optimizing Postprocessor speed
John has tried several approaches to optimization and discussed with KS and Nat - the first approach is the best one. 2 days to 4 hours.
John spent the week looking at the alternatives and verified his original approach was the correct one to take.
New problem: Run out of memory at the very end and didn't process the twitter crawler data and needs to re-run the crawler. Working on an approach to monitor size and write to disk if the process is at risk, and is now seeking to re-run.
John is having a problem with Graham cloud that he is trying to address (he keeps getting kicked out). He will write Alejandro who will write the Compute Canada stuff.
Problem with SSH into Compute Canada resources
Alejandro tried to sort out with Colin but still experiencing the same problem. Adding IP under security groups didn't add anything. Jacqueline added John, so he doesn't know. Documented steps did not work. Alejandro will email Jacqueline to see if she can clarify and if we can get in we need to update documentation.
Visualizations
Colin can't run the existing repositories because the data set doesn't have dates on the domain side.
Colin will author a PR (we reviewed the PR and squashed it)
Colin will reprocess JSON file to add dates from URL where available so that he can try to run the stacked area graph. Will also try to get network diagram (force vector) if time allows.
Action Items
John
Finishing work on size monitoring feature for post-processor and testing
Writing Alejandro about the problem with Graham and getting kicked out
If time permits, beginning work on the troubleshooting of Metascraper in ticket 35
Colin
Providing .csv to Alejandro
Reprocessing the output JSON to include the dates that are available in URL for the domain hits
Try building the stacked area or network diagrams if time permits
Responding with any info requests on the thread that Alejandro starts with Jacqueline
Alejandro
Starting thread with Jacqueline - we need to find out how to properly add Colin to the Compute Canada resources and update the documentation to be correct.