Meeting 2014 10 31 - NCEAS/commdyn GitHub Wiki

Commdyn Weekly Meeting

Date: 31 October, 2014: Halloween! Participants: Matt, Lauren W., Peter, Syd, Chris, Lauren H.

Agenda and notes:

  • Package codyn development issues

    • TODO: matt to add build and run instructions to the README
    • Andrew and Lauren discussing the taylor metrics
    • Testing and documentation still needed
    • Matt: remove dependency on reshape
  • Review the recordr design

  • https://github.com/DataONEorg/sem-prov-design/blob/master/docs/PROV-capture/Run-manager-API.rst

  • Peter is going through a demo

  • Questions/comments

    • Matt: can we have a 'localId' for referring to numbered runs?
    • Lauren W: what determines the order of listing the runs from listRuns()? Because they seem to be out of order chronologically Peter: the listing will be ordered chronologically, but that isn't working for this demo -> ok thanks
  • Also, I would think the Start/End time would be listed right after the Script name rather than Published Time (maybe just a personal preference) Peter: yes, the listing should be useful to you so we can change the order/content as necessary

  • so listRuns() could take a parameter "orderBy"? yes ->ok cool - Matt: Can we 'tag' the runs so users can differentiate them? How else to do they differentiate them? - Maybe name it when you call record() and after listRuns() as well

  • what would the tag be based on?

  • the tag could be entered with the record(rc, scriptName, "tag text")

  • Lauren W: Anything the user wants, right

  • Chris: Yeah, the Run should probably support arbitrary comments to jog the memory of the user of what the run entailed

  • Lauren H.: Can run recording span multiple script executions? Especially for runs that take a lot of time?

    • Peter: yes
    • Matt: could startRecord(); do a bunch of stuff; endRecord()
    • Matt: alternatively: record("someLongExpensiveRun.R"); then record("analysis1.R"); record("analysis2.R")
  • Chris: run cache could get large. Maybe need to purge the cache.

    • add API method for deleteRun(runid)
    • add API for deleteRuns(runFilter) where runFilter might be a date older than a certain age
  • Syd: how many of the runs to record?

    • Peter: up to the person using the package;
  • Syd: how can the person running the analysis indicate which objects are important to keep; especially for people that don't know the package well

    • Chris: ability to edit a package to prune objects; add API method for pruning objects from resulting data package;
  • Syd: intimidating that every movement has to be deliberate; helps if you can run that run again, and delete runs

  • Matt: need to provide more context in the 'View' output, including better section headers, lists of all tracked products, parameters used in the run

  • Syd: can you 'pause' the recording? Even if it loses some provenance relationships?

  • Syd: ability to add a comment when starting a run or after a run was done.

    • similar to the Sumatra tool's reason flag (smt --reason "Testing grid size of .05" run)
    • similar to the 'tagging' approach
  • Matt: Is this useful? Would you use it in your work? Would you publish your analyses with this?

    • Lauren H.: Would probably wait til the end and record all at once, and then share the process. Rather than do any recording while building the analysis.
      • coding during exploratory phase would probably not use the run management
      • when working on a paper, there is a flow; work on first hypothesis until its complete; then that is a segment of code; then iterate internally on that code segment; then push to github when that segment is done; then move on to hypothesis 2, possibly using data artifacts that came out of the first analysis segment
    • Syd: similar to Lauren H; if she knew she wasn't working with the data for a while, would record it so that later there were details that weren't lost; hard to remember details over time; main reason to do it would be to share it with others eventually
  • Peter: will incorporate into design docs