Add an Algorithm - UVA-CAMA/NICUHDF5Viewer GitHub Wiki
So you want to add an algorithm - great! Let's see if I can help make that process as easy as possible. This tutorial will walk you through getting your own copy of the code, adding an algorithm, testing the algorithm, and getting it distributed to all the sites.
Getting your own copy of the code
You'll want to start by forking the github repositories for NICUHDF5Viewer and for BatchAlgorithmProcessor to your own github repository and then cloning them to your local machine to work on them. To get an idea of how this works, check out this page about Git Fork and Git Clone. Make sure you clone each of the repositories into their own folders within the same directory (since the BatchAlgorithm Processor code uses scripts within the NICUHDF5Viewer repository). The directory structure should look like this:
- Your folder of choice
- NICUHDF5Viewer
- Apnea
- ApneaNew
- ...
- BatchAlgorithmProcessor
- .gitignore
- BAP_Logo.png
- ...
- NICUHDF5Viewer
Before diving in - test to make sure everything works
Try running the viewer from MATLAB (note: I use release 2019a currently for my development) by running the script HDF5Viewer_v1_0_1.m. This should work exactly the same as the compiled version of the viewer. Also, try running the Batch Algorithm Processor by running BatchAlgorithmProcessor.m. Note that if you want to have code that matches a particular release of the Viewer or BAP, you can ask git to get the code at the spot of a particular release (see Viewer release descriptions and BAP release descriptions)
Make sure you understand which version of the algorithms you have
Imporant note about algorithm versions: since the BAP pulls algorithm code from the viewer, pulling code from the most updated version of the BAP will not necessarily pull the most recent algorithms. The releases of the BAP require very little (if any) changes to the BAP code itself, since it is mostly just a shell that calls code that lives in the viewer repository. When I put out a new release of a compiled Batch Algorithm Processor, most changes are in the code in the HDF5 Viewer Repository.
To see the current version of the algorithms you have downloaded, look at the script algmask.m within the NICUHDF5Viewer repository. This has a list of all possible algorithms, along with their version numbers. Not all algorithms in this list are necessarily called - you need to look at the matrix at the bottom of the algmask script to see which of the algorithms are included.
Add your new algorithm's name in algmask
algmask.m is called by both the Viewer and the BAP to identify which algorithms to run.
First, add a new row to the BOTTOM of the fullalgorithm list. In the first column, enter your algorithm display name (this is what is visible to the user in the Batch Algoirthm Processor algorithm selection menu). This algorithm name can contain spaces. Then, add your algorithm result name to the second column (this is what is stored in the results file). Next, add a version number to the third column (hint: start with 1! - these should be integers). Finally, put a comment with the algorithm number (just add one to the previous algorithm number in the list). Here is an example algorithm listing.
'Hourly HR Mean', '/Results/HourlyHRMean', 1;...% 52
Then, you will need to add the algorithm number (the commented number above) to the matrix at the bottom of algmask.m. This tells the Viewer/BAP to actually call the algorithm. Any algorithms whose numbers are not in the list will not be displayed in the Batch Algorithm Processor or run by the Viewer's Run All Tagging Algorithms button.
Add your new algorithm to run_all_tagging_algs.m
run_all_tagging_algs.m is also called by both the Viewer and the BAP.
Within the runalg subfunction of run_all_tagging_algs.m, add a new case with the algorithm number (this is the commented number we mentioned above). This is where we call your algorithm. Provide a short one line comment explaining what your algorithm does. Then call your algorithm below
Inputs: The algorithm should take as an input "info" at minimum. Outputs: The outputs of your algorithm should be [result,t_temp,tag,tagcol].
Actually building your algorithm
Now that you know how your algorithm will be called, we can work on building what you want to put in it!
Create an m file containing a function with your desired algorithm name, and give it the same outputs and inputs you did in the call to the algorithm above within run_all_tagging_algs.m. Save this function's m file in the NICUHDF5Viewer repository.
Here is an example first line of your m file:
[result,t_temp,tag,tagcol] = yourfunctionname(info)
Grab some standard data
Now we want to get some data. If you want to grab some standard data, like HR, SPO2%, pulse rate, the chest impedance waveform, or the ECG data, you can use the getfiledata function to grab the data of interest. To understand how this function searches for the signal of interest, check out Variable Names page on the wiki. The key strings to use are 'HR', 'SPO2_pct', 'Pulse', 'Resp', 'ECGI', 'ECGII', and 'ECGIII'. Then, call formatdata to put the data in the standard format we use for our processing. Here is an example line of code to grab the data using getfiledata and format it using formatdata:
[data,~,info] = getfiledata(info,'SPO2_pct');
[data,~,~] = formatdata(data,info,3,1);
Handle the situation where a file doesn't have the data you want:
Of course, all files running through the processor won't have every signal of interest (bummer!), so we need to account for those scenarios by trying to make the function not break and throw an error. This is how I handle it:
[result,t_temp,tag,tagcol] = yourfunctionname(info)
% Initialize output variables in case the necessary data isn't available
result = [];
t_temp = [];
tag = [];
tagcol = [];
% Load in the spo2% signal
[data,~,info] = getfiledata(info,'SPO2_pct');
[data,~,~] = formatdata(data,info,3,1);
% If the necessary data isn't available, return empty matrices & exit
if isempty(data)
return
end
Get the actual data out of the data variable
This part is easy - Finally! This is how you would get the actual data out of the data structure so you can do something with it. Now we are cooking!
spo2data = data.x;
t_temp = data.t;
fs = data.fs;
Add in your algorithm
You've made it so far! Finally we are at the fun part. Now that you have data, you can design your algorithm. In designing, you might want to consider a few things:
Think about how you want your results returned
When you are designing your algorithm for processing, keep in mind how you would like the output returned. Remember that the results file has two main structures which can be used to return data - Tags and Continuous Data. For more information how those work, check out the Results File Structure. If you can make one of those structures work for your data, that would be ideal. (Note: the results file also returns QRS Detection Information as well as Information About the Original File, but it is unlikely you will want to store your algorithm results in there.) If neither of these work for you, we could add a new structure within the results file, but it is preferable to use the continuous or tags structure because you will be able to view your results in the viewer.
Tagged Results
Tags are the primary way we store event data. This is the way bradycardia and desaturation events are stored. Their storage is lightweight (i.e. we don't have to store any continuous data). Tags store a start and stop time when an event occurs and any other relevant information about that information can be stored within the tag structure! Genius! (Thank Doug). For a desaturation event, this could include the nadir of an event. (Note, if you label a nadir as a tag with the tagcolumn name 'Extrema,' the viewer will display that info alongside the event. Similarly, if you store an event duration under the name 'Duration' (in milliseconds), and the viewer will show the event duration in seconds.) The viewer will list all tagged events in the Tagged Events tab. The viewer uses a "click-to-jump" feature where if you click on a tag in the tag listbox, the viewer will jump to that event in the viewer for whichever signal you have displayed. If possible, it is really nice to include some sort of tag with your algorithm so you can jump around the file and see important events your algorithm has detected; however, it isn't essential that you algorithm returns a tag.
IMPORTANT SIDEBAR: Right now, the way I have run_all_tagging_algs.m set up, the script REQUIRES that tagcol be returned with something in it in order to store any algorithm results. This isn't ideal, I know, but I haven't had a chance to change it yet (it is a bit more complicated than changing the if statement that looks for tagcol, but that is another story). If you really don't have any meaningful tag information to store with an algorithm, I would encourage you to store at least a single tag with the start and stop time of the data on which the analysis was run - that way you can quickly check how much data you were able to successfully run your algorithm on. At the bare minimum, just put something in the tagcol array but don't use it - for an example of this, see the pullHRdata.m function (which isn't used in our pipeline). If you really don't want to do that for some reason, let me (Amanda) know and I will alter the pipeline to handle a continuous-results-only algorithm.
The function I most commonly use to return tags is: threshtags2.m. This is a good function to look at to figure out how to create tags that are consistent with other formatting. Remember - you can store WHATEVER information about the event that you want in this tag! You are not limited to the data fields that are used in threshtags2.m.
The viewer will display the result of a tag-only result as a binary continuous variable in the display (ex. if you click on a desaturation result in the viewer, the viewer will display a 0 when there is no desaturation and a 1 when a desaturation event is occuring, even though there is no continuous result data associated with the tag.
Tag data is returned from the algorithm in the tag and tagcol variables, which correspond to the result_tags and result_tagcolumns variables in the results files. The result_tagtitle is taken from the resultname list in the algmask.m file.
Special tagcolumns
Certain tagcolumns have "special" powers. Keep this in mind when choosing your tag column names
- Start - all tags need this - it should be a time stamp in UTC ms
- Stop - all tags need this - it should be a time stamp in UTC ms
- Duration - This should be in ms. This will be converted to seconds and displayed in the tag listbox in the Tagged Events Tab below Dur(s).
- Extrema - This will be displayed in the tag listbox in the tagged events tab below Ext
- Minimum - This will be displayed in the tag listbox in the tagged events tab below Ext
- Value - If there is a Value field, the tags will not be plotted simply as a binary output, but rather as the value in this field (unless there is also a continuous results field). Note that the viewer defaults to a y axis limit of 0 to 1 for results tags, so if you have a value that is outside that window, create a case for that in the viewer in HDF5Viewer_v1_0_1.m under the function customplotcolors.
Continuous Results
Continuous results are another way to store data in the results file. This is used for things like the running apnea probability. The apnea algorithm contains both a continuous result (continuous apnea probability at any given time) as well as tagged events when the apnea probability is above 0.6 for a certain amount of time (and a few other circumstances are also met).
Continuous results are returned to run_all_tagging_algs within the result variable. Just make sure result and t_temp have the same number of points.
Test your algorithm, handle errors, and intelligently return tagcol depending on the success of your algorithm
Please test your algorithm as thoroughly as possible on your own data files. Please create try/catch statements to handle any errors and return ALL empty variables if the algorithm is unable to run for any reason - that way we know the algorithm was unable to be run on the data. If the algorithm is able to be run successfully, but no events are found, please still return tagcol (see "IMPORTANT SIDEBAR" above). If tagcol is returned, we will assume that the algorithm was able to run successfully. If tagcol is empty, no results will be stored.
Submit a pull request to get your algorithm incorporated into everyone's pipeline
Once you have your algorithm all perfect and shiny, submit a pull request so I can incorporate your algorithm in the pipeline for everyone. I can do some testing of your algorithm on sample files I have from the sites at the LDCC. If it all looks good, I can package up the viewer and BAP for a new release to all the LDCC sites.
Tell everyone about your algorithm
You can contribute to this wiki, too! Add information about your algorithm to the algorithm table, and, if your algorithm is complex, create a whole page about it on the wiki!