08.Latent variable modeling08.Computerized adaptive testing - sporedata/researchdesigneR GitHub Wiki

1. Use cases: in which situations should I use this method?

  1. PROM measurement with a smaller number of questions (items) than the one provided through traditional scales, thus reducing the patients' burden of response [1]. PROMs here includes QALY - see Multiattribute health utility scoring for the computerized adaptive measure CAT-5D-QOL was developed and validated
  2. PROM measurement with more precise assessments [1].

2. Input: what kind of data does the method require?

  1. A group of items that have been calibrated through item response theory.
  2. Either patients answering the CAT or the clinical research coordinator asking the CAT questions should have access to the internet when the test is applied.

3. Algorithm: how does the method work?

Model mechanics

  • CAT uses the following sequence:

    1. The first item provided to the patient is given at the middle level of the item pool. An item pool is a group of items calibrated in order to understand the main item response theory (IRT) parameters for each item, which is described under IRT page.
    2. If the patient provides a positive answer to the item, the patient should have a level of the latent variable that is either at the level of the item or greater, but If the answer is no, the patient should have a level of the latent variable that is at the level of the item or lower. For example, in determining levels of depression, a questionnaire is applied. In a questionnaire, each item should have an increasing level, and the response of each one will determine whether the patient will continue to answer it. If the patient replies that he feels sad, the questionnaire will continue to determine the level of sadness/depression. The next item seeks to understand whether this sadness causes problems for the patient to get out of bed in the morning. If the patient says no, it determines that he has a low level of the latent variable. However, if the patient answers yes, that means a higher level of the latent variable, then the other one. In this case, the questionnaire will continue asking further questions to determine a range within an area until a level can be determined for that patient. CAT can use this assumption only because the items are related and have a linear relationship between them.
    3. In the third step, the area that was not blocked by previous questions will be analyzed. In order to reduce the range of the possible levels of depression, CAT will seek an item in the middle of the remaining area and progressively ballparking the patients depression in a smaller area.
    4. Patients continue to answers new items in the fourth step until CAT can ballpark the level of depression in an even smaller area.
    5. Depending on a stopping criterium, CAT will stop asking questions usually around after seven items. Whenever patients' depression level was ballpark, we will stipulate that area as the level of depression plus the level of confidence intervals representing the measurement error around the latent variable.
  • For certain items it is possible to do automatic generation [2].

  • a CAT represents a simple set of stages surrounded by a while loop (e.g. Weiss and Kingsbury, 1984):

    • Item Selection: The next item is chosen based on a pre-specified criterion/criteria. For example, the classic item selection mechanism is picking an item such that it maximizes Fisher Information at the current estimate of θi​. Frequently, content balancing, item constraints, or item exposure will be taken into consideration at this point (aside from solely picking the "best item" for a given person). See itChoose for current item selection methods.
    • Estimation: θi​ is estimated based on updated information, usually relating to the just-selected item and the response associated with that item. In a post-hoc CAT, all of the responses already exist, but in a standard CAT, "item administration" would be between "item selection" and "estimation." The classic estimation mechanism is estimating θi​ based off of maximizing the likelihood given parameters and a set of responses. Other estimation mechanisms correct for bias in the maximum likelihood estimate or add a prior information (such as a prior distribution of θ). If an estimate is untenable (i.e. it returns a non-sensical value or ∞), the estimation procedure needs to have an alternative estimation mechanism. See mleEst for current estimation methods.
    • Termination: Either the test is terminated based on a pre-specified criterion/critera, or no termination criteria is satisfied, in which case the loop repeats. The standard termination criteria involve a fixed criterion (e.g. administering only 50 items), or a variable criterion (e.g. continuing until the observed SEM is below .3). Other termination criteria relate to cut-point tests (e.g. certification tests, classification tests), that depend not solely on ability but on whether that ability is estimated to exceed a threshold. catIrt terminates classification tests based on either the Sequential Probability Ratio Test (SPRT) (see Eggen, 1999), the Generalized Likelihood Ratio (GLR) (see Thompson, 2009), or the Confidence Interval Method (see Kingsbury & Weiss, 1983). Essentially, the SPRT compares the ratio of two likelihoods (e.g. the likelihood of the data given being in one category vs the likelihood of the data given being in the other category, as defined by B+δ and B−δ (where B separates the categories and δ is the half width of the indifference region) and compares that ratio with a ratio of error rates (α and β) (see Wald, 1945). The GLR uses the maximum likelihood estimate in place of either B+δ or B−δ, and the confidence interval method terminates a CAT if the confidence interval surrounding an estimate of θ is fully within one of the categories.

Reporting guidelines

Data science packages

  • catR - see Computerized adaptive testing with R [3] and Reduction in patient burdens with graphical computerized adaptive testing on the ADL scale [4].
  • mirtCAT

Suggested companion methods

Learning materials

  1. Books *
  2. Articles
    • Psychometrics behind computerized adaptive testing [5].

4. Output: how do I interpret this method's results?

Mock conclusions or most frequent format for conclusions reached at the end of a typical analysis.

Tables, plots, and their interpretation

5. SporeData-specific

Templates

Data science functions

References

[1] Chakravarty EF, Bjorner JB, Fries JF. Improving patient reported outcomes using item response theory and computerized adaptive testing. The Journal of rheumatology. 2007 Jun 1;34(6):1426-31.

[2] Magis D, Raîche G. Random generation of response patterns under computerized adaptive testing with the R package catR. Journal of Statistical Software. 2012 May 24;48(8):1-31.

[3] Magis D, Barrada JR. Computerized adaptive testing with R: Recent updates of the package catR. Journal of Statistical Software. 2017;76(1):1-9.

[4] Chien TW, Wu HM, Wang WC, Castillo RV, Chou W. Reduction in patient burdens with graphical computerized adaptive testing on the ADL scale. Health and Quality of Life Outcomes. 2009 Dec 1;7(1):39.

[5] Chang HH. Psychometrics behind computerized adaptive testing. Psychometrika. 2015 Mar 1;80(1):1-20.

⚠️ **GitHub.com Fallback** ⚠️