Two groups distributions recovery - tmatta/lsasim GitHub Wiki


We examined latent trait recovery from two different populations under the following conditions: 1 (IRT models) x 1 (IRT R packages) x 3 (sample sizes) x 3 (test lengths) x 3 (test booklets). We used TAM (Robitzsch, Kiefer, & Wu, 2017) to estimate the population-specific distributions and compare the results to the generating values.


  • One IRT model was included: Rasch model
    • Item parameters were randomly generated
    • The bounds of the item difficulty parameter, b, are constrained to b_bounds = (-2, 2) where -2 is the lowest generating value and 2 is the highest generating value
  • One IRT R packages was evaluated: TAM (version 2.4-9)
  • Three sample sizes in each group were used: 500, 1000, and 5000
    • Group 1 : θ1 ~ N(0, 1)
    • Group 2 : θ2 ~ N(1, .75)
  • Three test lengths were used: 40, 60, and 80 items
  • Three test booklets were used: 5, 10, and 15 booklets

  • One hundred replications were used for each condition for the calibration

  • Person trait recovery:
    • Across all conditions, the estimated mean scores for the two countries are 0 and 1.001 respectively. TAM fixes the mean for the first group to zero. The estimated standard deviations for the two countries are 0.997 and 0.750. Comparing these estimates with their generating parameters: (θ1 ~ N(0, 1); θ2 ~ N(1, .75)) suggested that the latent trait distributions were recovered well.

 

# Load libraries
if(!require(lsasim)){  
  install.packages("lsasim")
  library(lsasim) #version 1.0.1
}

if(!require(TAM)){
  install.packages("TAM")
  library(TAM) #2.4-9
}
# Define population distributions
cnt1_true_mean <- 0
cnt1_true_sd <- 1

cnt2_true_mean <- 1
cnt2_true_sd <- .75
# Set up conditions
N.cond <- c(500, 1000, 5000)  #number of sample sizes
I.cond <- c(40, 60, 80)       #number of items 
K.cond <- c(5, 10, 15)        #number of booklets

# Set up number of replications
reps <- 100

# Create space for outputs
results <- NULL
#==============================================================================#
# START SIMULATION
#==============================================================================#

for (N in N.cond) { #sample size
  
  for (I in I.cond) { #number of items
    
    # generate item parameters for a Rasch model
    set.seed(2607) # fix item parameters across replications
    item_pool <- lsasim::item_gen(n_1pl = I, 
                                  b_bounds = c(-2, 2))
    
    for (K in K.cond) { #number of booklets
      
      for (r in 1:reps) { #replication
        
        set.seed(8088*k+1)
        
        # generate thetas for two groups
        theta_1 <- data.frame(theta = rnorm(N, cnt1_true_mean, cnt1_true_sd))
        theta_2 <- data.frame(theta = rnorm(N, cnt2_true_mean, cnt2_true_sd))
        theta <- rbind(theta_1, theta_2)
        
        ### -- Cognitive assessment generation
        # assign items to blocks
        blocks <- lsasim::block_design(n_blocks = K, 
                                       item_parameters = item_pool, 
                                       item_block_matrix = NULL)
        
        #assign blocks to booklets
        books <- lsasim::booklet_design(item_block_assignment = blocks$block_assignment, 
                                        book_design = NULL)
        
        #assign booklets to subjects
        book_samp <- lsasim::booklet_sample(n_subj = N*2, 
                                            book_item_design = books, 
                                            book_prob = NULL)
        
        # generate item responses
        cog <- lsasim::response_gen(subject = book_samp$subject, 
                                    item = book_samp$item, 
                                    theta = theta$theta, 
                                    b_par = item_pool$b)
        
        # assign country
        cog$country <- rep(1:2, each = N)
        
        #------------------------------------------------------------------------------#
        # Model estimation
        #------------------------------------------------------------------------------#
        
        # fit Rasch model using TAM package 
        # where the population model assumes separate normal distributions for 
        # the two countries
        mod <- NULL
        mod <- TAM::tam.mml(cog[,1:I], group = cog$country)
        
        # summarize results
        mod_results <- cbind(N, I, r, 
                             cnt1_true_mean, cnt1_true_sd, cnt2_true_mean, cnt2_true_sd,
                             as.numeric(mod$beta)[1], 
                             as.numeric(mod$beta)[2],
                             sqrt(mean(mod$variance[cog$country == 1])), 
                             sqrt(mean(mod$variance[cog$country == 2])),
                             r)
        
        colnames(mod_results) <- c("N", "I", "K",
                                   "cnt1_true_mean", "cnt1_true_sd", 
                                   "cnt2_true_mean", "cnt2_true_sd", 
                                   "cnt1_est_mean", "cnt1_est_sd", 
                                   "cnt2_est_mean", "cnt2_est_sd", 
                                   "rep")
        
        # combine results
        results <- rbind(results, mod_results)
        
      }
    }
  }
}

 

Summary:

 

  • Theta distributions for two groups across all conditions: the estimated mean scores for the two countries are 0 and 1.001 respectively. Recall that the mean of country 1 was fixed to 0. The estimated standard deviations for the two countries are 0.997 and 0.75. Comparing these estimates with their generating parameters: (θ1 ~ N(0, 1); θ2 ~ N(1, .75)) suggests that they were recovered well.

 

round(c(mean(data.frame(results)$cnt1_est_mean), mean(data.frame(results)$cnt1_est_sd)), 3)
## [1] 0.000 0.997

 

round(c(mean(data.frame(results)$cnt2_est_mean), mean(data.frame(results)$cnt2_est_sd)), 3)
## [1] 1.001 0.750

 

  • Theta distributions for two groups under full conditions

 

thetas <- aggregate(cbind(cnt1_est_mean, cnt1_est_sd,
                cnt2_est_mean, cnt2_est_sd) ~ N + I + K, 
          data=results, mean, na.rm=TRUE)
round(thetas, 3)
##       N  I  K cnt1_est_mean cnt1_est_sd cnt2_est_mean cnt2_est_sd
## 1   500 40  5             0       0.993         1.000       0.750
## 2  1000 40  5             0       0.992         1.000       0.749
## 3  5000 40  5             0       0.996         1.001       0.752
## 4   500 60  5             0       0.991         1.002       0.748
## 5  1000 60  5             0       0.994         1.002       0.751
## 6  5000 60  5             0       0.995         0.999       0.751
## 7   500 80  5             0       0.993         0.999       0.748
## 8  1000 80  5             0       0.993         1.002       0.752
## 9  5000 80  5             0       0.997         1.000       0.751
## 10  500 40 10             0       0.999         1.003       0.747
## 11 1000 40 10             0       0.994         1.004       0.752
## 12 5000 40 10             0       0.998         1.001       0.749
## 13  500 60 10             0       0.994         1.007       0.751
## 14 1000 60 10             0       0.994         0.998       0.751
## 15 5000 60 10             0       0.999         0.999       0.752
## 16  500 80 10             0       0.995         0.994       0.749
## 17 1000 80 10             0       0.997         1.002       0.754
## 18 5000 80 10             0       0.998         1.001       0.752
## 19  500 40 15             0       1.012         0.994       0.741
## 20 1000 40 15             0       0.993         0.997       0.750
## 21 5000 40 15             0       0.996         1.000       0.752
## 22  500 60 15             0       1.002         1.009       0.749
## 23 1000 60 15             0       1.002         0.995       0.753
## 24 5000 60 15             0       0.999         0.998       0.751
## 25  500 80 15             0       0.995         1.011       0.743
## 26 1000 80 15             0       0.999         1.008       0.748
## 27 5000 80 15             0       1.000         1.001       0.751
⚠️ **GitHub.com Fallback** ⚠️