Meeting 2014 10 31 - NCEAS/commdyn GitHub Wiki

Commdyn Weekly Meeting

Date: 31 October, 2014: Halloween! Participants: Matt, Lauren W., Peter, Syd, Chris, Lauren H.

Agenda and notes:

Package codyn development issues
- TODO: matt to add build and run instructions to the README
- Andrew and Lauren discussing the taylor metrics
- Testing and documentation still needed
- Matt: remove dependency on reshape
Review the recordr design
https://github.com/DataONEorg/sem-prov-design/blob/master/docs/PROV-capture/Run-manager-API.rst
Peter is going through a demo
Questions/comments
- Matt: can we have a 'localId' for referring to numbered runs?
- Lauren W: what determines the order of listing the runs from listRuns()? Because they seem to be out of order chronologically Peter: the listing will be ordered chronologically, but that isn't working for this demo -> ok thanks
Also, I would think the Start/End time would be listed right after the Script name rather than Published Time (maybe just a personal preference) Peter: yes, the listing should be useful to you so we can change the order/content as necessary
so listRuns() could take a parameter "orderBy"? yes ->ok cool - Matt: Can we 'tag' the runs so users can differentiate them? How else to do they differentiate them? - Maybe name it when you call record() and after listRuns() as well
what would the tag be based on?
the tag could be entered with the record(rc, scriptName, "tag text")
Lauren W: Anything the user wants, right
Chris: Yeah, the Run should probably support arbitrary comments to jog the memory of the user of what the run entailed
Lauren H.: Can run recording span multiple script executions? Especially for runs that take a lot of time?
- Peter: yes
- Matt: could startRecord(); do a bunch of stuff; endRecord()
- Matt: alternatively: record("someLongExpensiveRun.R"); then record("analysis1.R"); record("analysis2.R")
Chris: run cache could get large. Maybe need to purge the cache.
- add API method for deleteRun(runid)
- add API for deleteRuns(runFilter) where runFilter might be a date older than a certain age
Syd: how many of the runs to record?
- Peter: up to the person using the package;
Syd: how can the person running the analysis indicate which objects are important to keep; especially for people that don't know the package well
- Chris: ability to edit a package to prune objects; add API method for pruning objects from resulting data package;
Syd: intimidating that every movement has to be deliberate; helps if you can run that run again, and delete runs
Matt: need to provide more context in the 'View' output, including better section headers, lists of all tracked products, parameters used in the run
Syd: can you 'pause' the recording? Even if it loses some provenance relationships?
Syd: ability to add a comment when starting a run or after a run was done.
- similar to the Sumatra tool's reason flag (smt --reason "Testing grid size of .05" run)
- similar to the 'tagging' approach
Matt: Is this useful? Would you use it in your work? Would you publish your analyses with this?
- Lauren H.: Would probably wait til the end and record all at once, and then share the process. Rather than do any recording while building the analysis.
  - coding during exploratory phase would probably not use the run management
  - when working on a paper, there is a flow; work on first hypothesis until its complete; then that is a segment of code; then iterate internally on that code segment; then push to github when that segment is done; then move on to hypothesis 2, possibly using data artifacts that came out of the first analysis segment
- Syd: similar to Lauren H; if she knew she wasn't working with the data for a while, would record it so that later there were details that weren't lost; hard to remember details over time; main reason to do it would be to share it with others eventually
Peter: will incorporate into design docs