Software provenance for SR - QIICR/ProjectIssuesAndWiki GitHub Wiki
Information currently recorded in TID 1004 Device observer identifying attributes is probably insufficient and not well-structured to facilitate reproducibility of computation. We should improve this based on Slicer experience and DBP requirements.
- come up with a list of missing items that we would like to capture
- formalize by extending TID 1004 (?)
- TID 1004
- what we should probably capture
- version of software, dependency libraries (for Slicer we have trunk version + version of the extension - need to capture both?)
- platform: OS, architecture, compiler, version of compiler (+compiler flags?)
- ...
- Information about the slicer source code
- git repository and hash uniquely identify the Slicer source code
- For factory builds downloaded from slicer.org,
- this uniquely identifies the versions of all dependencies that are built with the superbuild script (VTK, ITK, python, etc)
- version of Qt should be extracted from the library since it is not built by the superbuild
- For non-factory builds (custom at a site or per-user) then the _DIR can be specified to select a different build of a library and we should extract the version information accordingly.
- For extensions
- the repository and hash (or svn revision) of the extension source code
- we should provide a format where extensions can declare the versions of other dependencies they use (e.g. superbuild and/or git submodules and the like)
- For the build machine we should track
- version of the OS along with any information available about the patch level
- processor architecture of the host machine and of the target build
- memory and other hardware features of the machine
- version of the build tools used (cmake, Visual studio, Xcode, gcc, etc)
- versions of any development libraries used (not really applicable for slicer releases except Qt as noted above, but also NSIS package on windows that creates the installer)
- user name (and real identity?) and environment variables for the account that triggered the build
- build directory
- date and time of the build
- machine name (and IP address? mac address? other identifiers?)
- When a particular piece of data is generated
- machine info (like above)
- user info (like above)
- OS environ info (like above)
- versions of system libraries installed (msvcrt, libc, etc)
- reference to experiment protocol under which the data was generated (if applicable)