1. Use cases: in which situations should I use this method?

Used when the researcher wants to evaluate how a diagnostic assessment tool behaves in a given population. Diagnostic tools could blood tests, physical exam tests, AI tools, etc. - see Sensitivity of Dried Blood Spot Testing for Detection of Congenital Cytomegalovirus Infection

2. Input: what kind of data does the method require?

A gold standard and test results

3. Algorithm: how does the method work?

Model mechanics

Reporting guidelines

Traditionally, diagnostic properties are measured through a two-by-two table with categorical variables with test results and the gold standard (having or not the condition). These tables -- also known as confusion matrices -- contain the frequency of false and true positives and negatives. These values allow for the calculation of sensitivity, specificity, and several other diagnostic metrics.
Sensitivity and specificity are test properties that are unaffected by the disease's prevalence [1].
- The sensitivity of a diagnostic test refers to the percentage of patients with a positive result accurately identified by the test. For example, consider that we are creating a new sleep scale that combines self-reported and wearable data, and one of our goals is to evaluate its sensitivity to change. How is sensitivity to change different from validity, and do you think the inclusion of wearable data might improve or worsen sensitivity to change? Why? - The idea of sensitivity to change is to know whether the scores of a scale can detect changes in the construct over time. For example, if depression levels increase but the scores remain more or less the same, this scale is useless since it is impossible to detect whether a specific treatment effectively reduces depression. With that in mind, sensitivity to change and validity are similar, but with the time taken into account; sensitivity to change is validity over time. Also, the inclusion of wearable data should, at least in theory, improve sensitivity to change since wearable data -- for example, Fitbit -- should be more "objective" than self-reported data, making it closer to actual change over time.
- The specificity relates to the percentage of patients with a negative result that is appropriately recognized by the test.
- if a change in the construct of interest actually corresponds to a change in the score of the scale we're developing. the assessment is usually done through simple models comparing the two metrics in terms of magnitude, direction, and shape of the association over time, i.e., longitudinally. there is no way, however, to categorically say that a scale is sensitive to change since this property will vary depending on the population and the construct metric it's being compared to.
Receiver operating characteristic (ROC) curves and their area under the ROC curve (AUROC) can be used to compare the ability of two continuous variables to diagnose an outcome [1]. A ROC curve is a graph that plots sensitivity against 1 – specificity. The sensitivity and specificity of a perfect test would both be 1. If such a test had a cut-off value, the sensitivity would be 1 for any non-zero values of 1 – specificity. The ROC curve would begin at (0,0), travel vertically up the y-axis to (0,1), and then horizontally across to (1,1). A good test would be to get as near this ideal as possible. The AUROC can be used to quantify the performance of a diagnostic variable. The AUROC of the perfect test would be 1, and therefore the AUROC of a random guess would be 0.5. The AUROC can be determined using the sum of trapezium areas. A statistical package can also be utilized, with calculations based on cut-off values that consider the entire range of data values.
Diagnostics tests can also be evaluated for continuous variables with measurements such as intra-class correlation coefficients.
There are a number of possible ways to model diagnostic data, and the methods will depend on the type of diagnostic response (continuous, ordinal, dichotomous), the presence of either a gold standard or an imperfect standard:
Bayesian multilevel models are used when imperfect gold standards are present, being able to accommodate any type of diagnostic test response [2]. For reasons outlined under our description for multilevel models, multilevel models are also a good choice when gold standards are present.
Latent class models are used when there is no standard, gold or imperfect [3].

Reporting guidelines include:

STARD-BLCM: Standards for the Reporting of Diagnostic accuracy studies that use Bayesian Latent Class Models [4];
STARD 2015: An Updated List of Essential Items for Reporting Diagnostic Accuracy Studies [5];
PCORI Standards for Studies of Medical Tests;
Adherence to the Standards for Reporting of Diagnostic Accuracy (STARD) 2015 Guidelines in Acute Point-of-Care Ultrasound Research [6].

Data science packages

Suggested companion methods

Machine learning models are often used as diagnostic tests, as classification models generate confusion matrices and regression models generate continuous predictions that can be compared to the actual continuous values). Machine learning models can also be used to combine multiple tests to verify how the diagnostic properties of the combined tests might be affected.

Learning materials

Books
- Statistical Methods in Diagnostic Medicine [7].
- Advanced Bayesian Methods for Medical Test Accuracy [8].
Articles combining theory and scripts
- A review of solutions for diagnostic accuracy studies with an imperfect or missing reference standard [9].

4. Output: how do I interpret this method's results?

Mock conclusions or most frequent format for conclusions reached at the end of a typical analysis.

Diagnostic properties for this sample are [LIST OF DIAGNOSTIC PROPERTIES]

Tables, plots, and their interpretation

Wikipedia has a page with a list of diagnostic properties.

5. SporeData-specific

Templates

Data science functions

sdatools::comorbidity_score

A comorbidity score is derived from getting codes from ICD-9 or ICD-10 for Charlson, Elixhauser, and several other comorbidities like hypertension and diabetes and then generating a formula from a regression model. For example, when evaluating the risk of mortality, the beta coefficient becomes weights for each of the comorbidities, indicating how much they contribute towards the risk of that event (death or complication). Those weights then become part of the score (i.e., create a score where each comorbidity has the weight determined by the regression model and normalize the score from 1-10).

References

[1] Bewick, Viv, Liz Cheek, and Jonathan Ball. "Statistics review 13: receiver operating characteristic curves." Critical care 8.6 (2004): 1-5.

[2] Dendukuri N, Joseph L. Bayesian approaches to modeling the conditional dependence between multiple diagnostic tests. Biometrics. 2001 Mar;57(1):158-67.

[3] van Smeden M, Naaktgeboren CA, Reitsma JB, Moons KG, de Groot JA. Latent class models in diagnostic studies when there is no reference standard — A systematic review. American journal of epidemiology. 2014 Feb 15;179(4):423-31.

[4] Kostoulas P, Nielsen SS, Branscum AJ, Johnson WO, Dendukuri N, Dhand NK, Toft N, Gardner IA. STARD-BLCM: Standards for the Reporting of Diagnostic accuracy studies that use Bayesian Latent Class Models. Preventive veterinary medicine. 2017 Mar 1;138:37-47.

[5] Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, Lijmer JG, Moher D, Rennie D, De Vet HC, Kressel HY. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. Clinical chemistry. 2015 Dec 1;61(12):1446-52.

[6] Prager R, Bowdridge J, Kareemi H, Wright C, McGrath TA, McInnes MD. {Adherence to the Standards for Reporting of Diagnostic Accuracy (STARD) 2015 Guidelines in Acute Point-of-Care Ultrasound Research](STARD) 2015 Guidelines in Acute Point-of-Care Ultrasound Research](https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2765282). JAMA Network Open. 2020 May 1;3(5):e203871-.

[7] Zhou XH, McClish DK, Obuchowski NA. Statistical methods in diagnostic medicine. John Wiley & Sons; 2009 Sep 25.

[8] Broemeling LD. Advanced Bayesian methods for medical test accuracy. CRC Press; 2016 Apr 19.

[9] Reitsma JB, Rutjes AW, Khan KS, Coomarasamy A, Bossuyt PM. A review of solutions for diagnostic accuracy studies with an imperfect or missing reference standard. Journal of clinical epidemiology. 2009 Aug 1;62(8):797-806.

06.Diagnostics01.Diagnostic properties - sporedata/researchdesigneR GitHub Wiki

1. Use cases: in which situations should I use this method?

2. Input: what kind of data does the method require?

3. Algorithm: how does the method work?

Model mechanics

Reporting guidelines

Data science packages

Suggested companion methods

Learning materials

4. Output: how do I interpret this method's results?

Mock conclusions or most frequent format for conclusions reached at the end of a typical analysis.

Tables, plots, and their interpretation

5. SporeData-specific

Templates

Data science functions

References

⚠️ GitHub.com Fallback ⚠️

06.Diagnostics01.Diagnostic properties - sporedata/researchdesigneR GitHub Wiki

1. Use cases: in which situations should I use this method?

2. Input: what kind of data does the method require?

3. Algorithm: how does the method work?

Model mechanics

Reporting guidelines

Data science packages

Suggested companion methods

Learning materials

4. Output: how do I interpret this method's results?

Mock conclusions or most frequent format for conclusions reached at the end of a typical analysis.

Tables, plots, and their interpretation

5. SporeData-specific

Templates

Data science functions

References

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️