Overview - UVA-CAMA/NICUHDF5Viewer GitHub Wiki
The Study
In the Pre-Vent (Prematurity-Related Ventilatory Control) Study (NIH U01 Information), we are investigating mechanisms of ventilatory control that contribute to instability of oxygenation and risk of morbidity and mortality in premature infants during and after the Neonatal Intensive Care Unit (NICU) using a prospective observational cohort. For a little more information about the study, take a look at our methods manuscript in Pediatric Research here.
What is the code in this repository for?
As part of this study, five hospitals are collecting continuous monitoring data from their bedside monitors, and we at the Leadership and Data Coordinating Center are in charge of running algorithms on all of that data. To eliminate the need for large data transfer between sites, we send software to the sites which run analytics on their data. The sites then send summarized results files back to the LDCC for analysis. The data pipeline described on this wiki is the software developed by the LDCC to achieve this.
What kind of data is this software compatible with?
The data at different sites is collected using different monitors and different data collection systems (ex. Philips monitors with Bedmaster data collection software). Some sites have used more than one system during the study. To learn a little bit more about all the data types that are compatible with this pipeline, check out Ryan Bobko's github readme here to understand which file types are compatible with the Universal File Converter (UFC) which converts from this slew of file types into an HDF5 file format. The rest of the data pipeline is compatible not only with the HDF5 files generated by the UFC but also with a special .dat format from SpaceLab monitors at Case Western and a vitals.mat file format developed by Doug Lake.
What is this pipeline? Why so many pieces?
As shown on the Home page, this data processing pipeline is made up of a few different pieces. In short, this pipeline helps us get from raw, large, continuously collected vital sign monitoring data at all the sites to nice, neat, small summary files of clinically-relevant events that can easily be shipped to the LDCC for analysis. We have broken up the pieces of the pipeline because some pieces will need to be changed out and updated over time (like the algorithms), but we don't want to have to re-run the file conversion (the slowest piece) every time we want to add a new algorithm!
Step 1: File Conversion with the Universal File Converter (UFC)
In short, this puts data from all the sites into a single format. This is shared between the sites as a java wrapper around the command-line-executable code written by Ryan Bobko. This was written in java to enable sites who did not have admin privileges on the servers containing the big raw data files to still do the file conversion (since Java isn't too contentious of a thing to download on most computers). The UFC provides a simple graphical user interface (GUI) to allow non-technical users to hit run on these conversion jobs without having to deal with command line. Large batches of files can be run at once. Multiple instances of the UFC can be run to take advantage of multiple cores (purposely not built-in so as not to overwhelm the converting computer if someone needs to do other tasks on it while this long process runs in the background). It is HIGHLY recommended that the HDF5 files be stored as one file per day so that data processing and viewing doesn't run into memory allocation problems. This is also advantageous because if there is any corrupt segment of the file that causes any failure of the algorithms, splitting by day will limit the impact of the data loss to a single 24 hour period.
Step 2: Run Algorithms on the Data with the Batch Algorithm Processor (BAP)
The BAP can take in the HDF5 files output from the UFC, .dat files from CWRU, and vitals.mat files from Doug Lake. It runs a set of Algorithms on the data and returns a Results file as well as a log file. Clinical events are stored as "tagged" events, telling you the time of the event while also allowing for storage of some information about the event (such as the nadir of a desat). Some continuous results are also stored in the results file, such as the continuous apnea probability. Finally, the QRS detection results are stored in the results file as well.
The BAP is a compiled Matlab script. The reason this is in Matlab is because a some of the core algorithms of key importance for the PreVent study (namely Apnea and Periodic Breathing) were originally developed in Matlab. Plus that was the language most of the PreVent team was most comfortable with and most of the legacy code was written in.
Step 3: Merge Results by Infant
All of the processing so far for PreVent has split the data by day. This is great for processing, but makes it hard to manage for doing summary statistics, and is generally just klunky and unnecessary once the file sizes are as small as they are. We want to merge the results files together so that every infant has a single file for all of their result data. This is what Merge Tags does.
MergeTags grabs the tags from all the selected results.mat files (you can select whichever files you want using the GUI - I recommend merging all the results files so there is one merged results file per infant) and appends them all into a single struct. The struct is organized so there is a single matrix for each algorithm. You now no longer have to loop through every single results file for an infant to get all of their events in one place! These matrices are all stored in a struct. You can think of this struct as the matlab equivalent of an excel "book" that contains a bunch of sheets. One sheet (in Matlab, a matrix) contains all the bradycardia events, another sheet (again, a matrix) contains all the desat events, etc.
This merged file is a .mat file and bears the name of the first file in the folder with resultsmerged.mat at the end. The continuous data and qrs detection results are not included in the resultsmerged file.
Merge Tags is the simplest GUI of all time - it contains two buttons. You can learn all about how to use those two buttons here.
Step 4: Look the Data and Algorithm Results in the HDF5 Viewer
The viewer is a compiled matlab script that allows you to look at the data in your hdf5 file alongside the results of the algorithms run on the hdf5 file. You can zoom, scroll, jump to tagged events, and create custom tags with an interactive interface. For a video tutorial on how all the features of this viewer work, check out the videos page here for a tour of the software! The viewer contains all of the algorithm code that the BAP does, so you can run algorithms directly from the viewer if you would like to.