Software provenance for SR - QIICR/ProjectIssuesAndWiki GitHub Wiki

Objective

Information currently recorded in TID 1004 Device observer identifying attributes is probably insufficient and not well-structured to facilitate reproducibility of computation. We should improve this based on Slicer experience and DBP requirements.

Specific tasks

  • come up with a list of missing items that we would like to capture
  • formalize by extending TID 1004 (?)

Status

  • TID 1004

TID 1004

  • what we should probably capture
  • version of software, dependency libraries (for Slicer we have trunk version + version of the extension - need to capture both?)
  • platform: OS, architecture, compiler, version of compiler (+compiler flags?)
  • ...

For 3D Slicer based data provenance we should have

  • Information about the slicer source code
    • git repository and hash uniquely identify the Slicer source code
    • For factory builds downloaded from slicer.org,
    • this uniquely identifies the versions of all dependencies that are built with the superbuild script (VTK, ITK, python, etc)
    • version of Qt should be extracted from the library since it is not built by the superbuild
    • For non-factory builds (custom at a site or per-user) then the _DIR can be specified to select a different build of a library and we should extract the version information accordingly.
  • For extensions
    • the repository and hash (or svn revision) of the extension source code
    • we should provide a format where extensions can declare the versions of other dependencies they use (e.g. superbuild and/or git submodules and the like)
  • For the build machine we should track
    • version of the OS along with any information available about the patch level
    • processor architecture of the host machine and of the target build
    • memory and other hardware features of the machine
    • version of the build tools used (cmake, Visual studio, Xcode, gcc, etc)
    • versions of any development libraries used (not really applicable for slicer releases except Qt as noted above, but also NSIS package on windows that creates the installer)
    • user name (and real identity?) and environment variables for the account that triggered the build
    • build directory
    • date and time of the build
    • machine name (and IP address? mac address? other identifiers?)
  • When a particular piece of data is generated
    • machine info (like above)
    • user info (like above)
    • OS environ info (like above)
    • versions of system libraries installed (msvcrt, libc, etc)
    • reference to experiment protocol under which the data was generated (if applicable)

References and related materials

⚠️ **GitHub.com Fallback** ⚠️