Item parameter generation - tmatta/lsasim GitHub Wiki
The function item_gen
facilitates the generation of item parameters from a range of items response models.
lsasim::item_gen(n_1pl = NULL, n_2pl = NULL, n_3pl = NULL, thresholds = 1,
b_bounds, a_bounds = NULL, c_bounds = NULL)
The arguments n_1pl
, n_2pl
and n_3pl
specify the number of one-, two-, and three-parameter items to be generated. The argument thresholds
specifies the number of thresholds for the one- and two parameter items. Finally, the arguments b_bounds
, a_bounds
and c_bounds
specify the bounds of the uniform distributions used to generate the b, a, and c parameters, respectively.
The item_gen
function constrains the set of generated items to have an average b parameter of 0. When few items are generated, this constraint may cause the range of the generated b parameters to exceed the b_bounds
statement. However, for large numbers of items, as long as the bounds of the b parameter are symmetric, the items should not exceed the specified bounds.
For the following examples, we will be generating I = 30 items.
I <- 30
Dichotomous IRT models (i.e., 1PL, 2PL, or 3PL) can be expressed by the equation
where yi**j is the response to item i by respondent j, θj is the so-called for respondent j, and bi, ai, and ci, are the difficulty, discrimination, and pseudo-guessing parameters, respectively, for item i.
For a set of dichotomous items, lsasim::item_gen
returns an I × 6 data frame. The first column in the data frame is labeled item
and denotes the item ID. Columns two through four, labeled b
, a
, and c
, are the item parameters for the item difficulty, discrimination, and pseudo-guessing, respectively. The fifth column, labeled k
, indicates the number of thresholds for each item. The sixth column, labeled p
, indicates weather the item is from a 1PL, 2PL, or 3PL model.
To generate item parameters from a 1PL (Rasch) model, we only need to specify n_1pl
, the number of one-parameter items, and the bounds of the b parameters for the n_1pl
items. Below we generate item parameters for n_1pl = I
1PL items. The bounds of the b parameter are constrained to b_bounds = (-2, 2)
where -2 is the lowest generating value and 2 is the highest generating value.
gen1PL <- lsasim::item_gen(n_1pl = I, b_bounds = c(-2, 2))
Printing the first 6 rows of gen1PL
, we can see that the b
parameters have been generated while the a
parameters were fixed to 1 and the c
parameters were fixed to 0. Column k
indicates the items are dichotomous (1 threshold) and column p
indicates the items are from a 1PL generating model.
head(gen1PL)
## item b a c k p
## 1 1 -0.60 1 0 1 1
## 2 2 0.78 1 0 1 1
## 3 3 -0.75 1 0 1 1
## 4 4 0.18 1 0 1 1
## 5 5 1.19 1 0 1 1
## 6 6 0.16 1 0 1 1
The 30 generated b
parameters of gen1PL
range between -1.92 and 1.89, within the bounds set by b_bounds
.
To generate item parameters from a 2PL model, we specify n_2pl
, the number of two-parameter items, and the bounds for the b and a parameters. Below we generate item parameters for n_2pl = I
2PL items. The bounds of the b parameter are again constrained to b_bounds = (-2, 2)
and the bounds of the a parameter are constrained to a_bounds = c(0.75, 1.25)
.
gen2PL <- lsasim::item_gen(n_2pl = I, b_bounds = c(-2, 2),
a_bounds = c(0.75, 1.25))
Printing the first 6 rows of gen2PL
, we can see that the b
and a
parameters have been generated while the the c
parameters were fixed to 0. Column k
indicates the items are dichotomous (1 threshold) and column p
indicates the items are from a 2PL generating model.
head(gen2PL)
## item b a c k p
## 1 1 0.25 1.00 0 1 2
## 2 2 1.45 0.77 0 1 2
## 3 3 1.57 1.17 0 1 2
## 4 4 -1.62 1.05 0 1 2
## 5 5 0.41 0.79 0 1 2
## 6 6 -1.05 0.87 0 1 2
The 30 generated b
parameters of gen2PL
range between -1.92 and 1.90, within the bounds set by b_bounds
.
To generate item parameters from a 3PL model, we specify n_3pl
, the number of three-parameter items, and the bounds for the b, a, and c parameters. Below we generate item parameters for n_3pl = I
3PL items. The bounds of the b a parameters are again constrained to b_bounds = (-2, 2)
and a_bounds = c(0.75, 1.25)
. The bounds of the c parameter are set to c_bounds = c(0, .25)
.
gen3PL <- lsasim::item_gen(n_3pl = I, b_bounds = c(-2, 2),
a_bounds = c(.75, 1.25), c_bounds = c(0, .25))
Printing the first 6 rows of gen2PL
, we can see that the b
, a
, and c
parameters have been generated. Column k
indicates the items are dichotomous (1 threshold) and column p
indicates the items are from a 3PL generating model.
head(gen3PL)
## item b a c k p
## 1 1 -0.49 1.10 0.20 1 3
## 2 2 0.27 1.08 0.07 1 3
## 3 3 0.46 0.91 0.06 1 3
## 4 4 -0.27 0.98 0.15 1 3
## 5 5 0.40 1.01 0.18 1 3
## 6 6 -2.02 1.24 0.11 1 3
The 30 generated b
parameters of gen3PL
range between -2.1 and 1.67. Of the 30 items, 2 items exceeded the generating bounds set.
print(gen3PL[which(gen3PL$b > 2 | gen3PL$b < -2), ])
## item b a c k p
## 6 6 -2.02 1.24 0.11 1 3
## 21 21 -2.10 0.78 0.12 1 3
The lsasim::item_gen
function can also generate items for partial credit models
where k is the response on item i by respondent j and Ki is the maximum score on item i. The parameter bi is the average difficulty for item i and di**u is the threshold parameter between scores u and u − 1 for item i.
For a set of partial credit items, lsasim::item_gen
returns an I × Q data frame where Q = 6 + Ki − 1. That is, six of the P columns are the same as the columns from the dichotomous item data frame. The Ki − 1 new columns are the values of the item thresholds such that an item with Ki = 3 has two additional columns, d1
and d2
. To arrive at the difficulty at the first response thresholds, compute b + d1
, and b + d2
to compute the difficulty at the second threshold.
To generate item parameters from a partial credit model, we specify n_1pl
, the number of one-parameter items, thresholds
, and b_bounds
. Below we generate n_1pl = I
1PL item parameters, each with thresholds = 2
thresholds. The bounds of the b parameters are again constrained to b_bounds = (-2, 2)
.
genPC <- lsasim::item_gen(n_1pl = I, thresholds = 2, b_bounds = c(-2, 2))
Printing the first 6 rows of genPC
, we can see that the b
, d1
, and d2
parameters have been generated while a
and c
are fixed to 1 and 0, respectively. Column k
indicates the items have two thresholds and column p
indicates the items are from a 1PL generating model.
head(genPC)
## item b d1 d2 a c k p
## 1 1 0.24 -0.76 0.75 1 0 2 1
## 2 2 -1.34 -0.17 0.18 1 0 2 1
## 3 3 0.97 -0.57 0.56 1 0 2 1
## 4 4 -0.41 -0.65 0.65 1 0 2 1
## 5 5 -0.84 -0.18 0.17 1 0 2 1
## 6 6 0.04 -0.49 0.49 1 0 2 1
The 30 generated b
parameters of genPC
range between -1.34 and 1.23, within the bounds set by b_bounds
. Furthermore, the 60 generated thresholds of genPC
range between -1.78 and 1.97.
To generate item parameters from a generalized partial credit model, we specify n_2pl
, the number of two-parameter items, thresholds
, b_bounds
, and a_bounds
. Below we generate n_2pl = I
2PL item parameters, each with thresholds = 2
thresholds. The bounds of the b and a parameters are constrained to b_bounds = (-2, 2)
, and a_bounds = (.75, 1.25)
.
genGPC <- lsasim::item_gen(n_2pl = I, thresholds = 2, b_bounds = c(-2, 2),
a_bounds = c(.75, 1.25))
Printing the first 6 rows of genGPC
, we can see that the b
, d1
, d2
, and a
parameters have been generated while c
is fixed to 0. Column k
indicates the items have two thresholds and column p
indicates the items are from a 2PL generating model.
head(genGPC)
## item b d1 d2 a c k p
## 1 1 0.26 -0.24 0.25 0.87 0 2 2
## 2 2 0.70 -0.73 0.73 1.12 0 2 2
## 3 3 0.07 -0.11 0.11 1.23 0 2 2
## 4 4 -1.17 -0.30 0.30 1.19 0 2 2
## 5 5 -0.29 -0.42 0.42 0.98 0 2 2
## 6 6 -0.03 -0.40 0.40 1.25 0 2 2
The 30 generated b
parameters of genGPC
range between -1.41 and 1.32, within the bounds set by b_bounds
. Furthermore, the 60 generated thresholds of genGPC
range between -1.57 and 1.88.
It is often the case where a test is to be constructed from multiple item types. For example, PISA 2012 was a mix of 1PL items and partial credit items. Drawing on the same arguments, lsasim::item_gen
enables the generation of mixed type items. To do this, we extend n_1pl
, n_2pl
, n_3pl
and thresholds
to vectors. For the following examples, we will generate I
= 20 items.
I <- 20
To generate item parameters from a 1PL and partial credit model, we specify a length-2 vector for both n_1pl
and thresholds
. The b_bounds
remains as it has, and will be used to draw item parameters for both item types. Below we specify n_1pl = c(I/2, I/2)
and thresholds = c(1, 2)
. Under this specification, n_1pl[1]
corresponds to thresholds[1]
and n_1pl[2]
corresponds to thresholds[2]
. Thus, we will generate 10 1PL items and 10 partial credit items. The single b_bounds = (-2, 2)
indicates that all b parameters will be drawn from the same uniform distribution.
gen1PL_PC <- lsasim::item_gen(n_1pl = c(I/2, I/2), thresholds = c(1, 2),
b_bounds = c(-2, 2))
Printing all 20 rows of gen1PL_PC
, we can see that the b
parameter has been generated for all items and d1
and d2
parameters have been generated for the last 10 items. Furthermore, k
= 1 for the first 10 items and k
= 2 for the last 10 items.
print(gen1PL_PC)
## item b d1 d2 a c k p
## 1 1 0.21 0.00 0.00 1 0 1 1
## 2 2 -0.61 0.00 0.00 1 0 1 1
## 3 3 -1.76 0.00 0.00 1 0 1 1
## 4 4 0.45 0.00 0.00 1 0 1 1
## 5 5 0.68 0.00 0.00 1 0 1 1
## 6 6 0.94 0.00 0.00 1 0 1 1
## 7 7 1.62 0.00 0.00 1 0 1 1
## 8 8 -0.68 0.00 0.00 1 0 1 1
## 9 9 0.42 0.00 0.00 1 0 1 1
## 10 10 2.00 0.00 0.00 1 0 1 1
## 11 11 -0.99 -0.42 0.41 1 0 2 1
## 12 12 0.99 -0.75 0.75 1 0 2 1
## 13 13 -1.51 -0.06 0.06 1 0 2 1
## 14 14 0.48 -0.42 0.42 1 0 2 1
## 15 15 0.98 -0.55 0.56 1 0 2 1
## 16 16 -0.95 -0.69 0.69 1 0 2 1
## 17 17 0.11 -0.41 0.40 1 0 2 1
## 18 18 -0.70 -0.68 0.68 1 0 2 1
## 19 19 -0.32 -0.58 0.57 1 0 2 1
## 20 20 0.29 -0.51 0.50 1 0 2 1
The 20 generated b
parameters of gen1PL_PC
range between -1.76 and 2.00, within the bounds set by b_bounds
. Furthermore, the 30 generated thresholds of gen1PL_PC
range between -1.76 and 2.00.
To generate item parameters from a 3PL and generalized partial credit model, we specify a n_2pl
, n_3pl
and thresholds
as well as b_bounds
, a_bounds
, and c_bounds
. Below we specify n_2pl = I/2
, n_3pl = I/2
and thresholds = 2
. Under this specification, thresholds
does not require a vector as any n_3pl
is contained to have thresholds = 1
. Thus, n_2pl = 2
corresponds to thresholds = 2
. Furthermore, b_bounds = (-2, 2)
and a_bounds = c(.75, 1.25)
will be used to generate item parameters for both models. However, c_bounds = c(0, 0.25)
will only generate c parameters for those n_3pl = I/2
items.
gen3PL_GPC <- lsasim::item_gen(n_2pl = I/2, n_3pl = I/2, thresholds = 2,
b_bounds = c(-2, 2), a_bounds = c(.75, 1.25),
c_bounds = c(0, 0.25))
Printing all 20 rows of gen3PL_GPC
, we can see that the b
and a
parameters have been generated for all items, the d1
and d2
parameters have been generated for the first 10 items, and the c
parameter has been generated for the last 10 items. Furthermore, k
= 2 and p
= 2 for the first 10 items and k
= 1 and p
= 3 for the last 10 items.
print(gen3PL_GPC)
## item b d1 d2 a c k p
## 1 1 -0.15 -0.34 0.34 0.92 0.00 2 2
## 2 2 1.21 -0.74 0.73 1.04 0.00 2 2
## 3 3 -0.18 -0.09 0.08 0.86 0.00 2 2
## 4 4 -0.80 -0.48 0.47 1.14 0.00 2 2
## 5 5 -0.06 -0.33 0.33 1.16 0.00 2 2
## 6 6 -1.16 -0.19 0.19 1.10 0.00 2 2
## 7 7 -0.06 -0.53 0.53 0.79 0.00 2 2
## 8 8 -0.99 -0.66 0.66 1.02 0.00 2 2
## 9 9 -0.28 -0.07 0.06 0.99 0.00 2 2
## 10 10 0.16 -0.39 0.39 0.88 0.00 2 2
## 11 11 -0.18 0.00 0.00 0.78 0.12 1 3
## 12 12 0.29 0.00 0.00 1.21 0.05 1 3
## 13 13 0.51 0.00 0.00 0.78 0.16 1 3
## 14 14 0.40 0.00 0.00 1.13 0.11 1 3
## 15 15 1.90 0.00 0.00 1.05 0.09 1 3
## 16 16 0.99 0.00 0.00 1.13 0.14 1 3
## 17 17 1.76 0.00 0.00 1.12 0.20 1 3
## 18 18 0.62 0.00 0.00 1.21 0.13 1 3
## 19 19 -0.58 0.00 0.00 1.06 0.11 1 3
## 20 20 -1.04 0.00 0.00 1.01 0.10 1 3
The 20 generated b
parameters of gen3PL_GPC
range between -1.16 and 1.90, within the bounds set by b_bounds
. Furthermore, the 30 generated thresholds of gen3PL_GPC
range between -1.65 and 1.94.
Finally, we generate 20 items from four different models.
- 5 1PL items
- 5 partial credit items with three thresholds
- 5 generalized partial credit items with two thresholds
- 5 3PL items
The first aspect of the function below to note is the vector of thresholds, thresholds = c(1, 2, 3)
. This dictates that both n_1pl
and n_2pl
must also be length-3 vectors. The argument n_1pl = c(I/4, 0, I/4)
specified I/4
1PL items, zero two-threshold partial credit items, and I/4
three-threshold partial credit items. Similarly, the argument n_2pl = c(0, I/4, 0)
specifies zero 2PL items, I/4
two-threshold generalized partial credit items, and zero three-threshold generalized partial credit items. As mentioned above, n_3pl
does not require a vector as 3PL items are constrained to one threshold. Thus, n_3pl = I/4
indicates i/4
3PL items. The b parameters for all items will be generated from b_bounds = c(-2, 2)
, the a parameters for the generalized partial credit items and 3PL items will be generated from a_bounds = c(0.75, 1.25)
and the c parameters for the 3PL items will be generated from c_bounds = c(0, 0.25)
.
gen1PL_3PL_PC_GPC <- lsasim::item_gen(n_1pl = c(I/4, 0, I/4),
n_2pl = c(0, I/4, 0), n_3pl = I/4,
thresholds = c(1, 2, 3),
b_bounds = c(-2, 2),
a_bounds = c(0.75, 1.25),
c_bounds = c(0, 0.25))
Printing all 20 rows of gen1PL_3PL_PC_GPC
, we can use the that the k
and p
values to identify which items correspond to which models.
print(gen1PL_3PL_PC_GPC)
## item b d1 d2 d3 a c k p
## 1 1 0.95 0.00 0.00 0.00 1.00 0.00 1 1
## 2 2 1.00 0.00 0.00 0.00 1.00 0.00 1 1
## 3 3 -1.51 0.00 0.00 0.00 1.00 0.00 1 1
## 4 4 1.30 0.00 0.00 0.00 1.00 0.00 1 1
## 5 5 -0.72 0.00 0.00 0.00 1.00 0.00 1 1
## 6 6 -0.89 -0.17 0.02 0.14 1.00 0.00 3 1
## 7 7 -0.63 -0.60 0.02 0.59 1.00 0.00 3 1
## 8 8 1.02 -0.93 0.35 0.59 1.00 0.00 3 1
## 9 9 0.64 -1.03 -0.17 1.21 1.00 0.00 3 1
## 10 10 -0.69 -0.52 0.04 0.47 1.00 0.00 3 1
## 11 11 0.54 -0.69 0.68 0.00 1.19 0.00 2 2
## 12 12 -0.90 -0.71 0.70 0.00 0.97 0.00 2 2
## 13 13 -0.90 -0.61 0.60 0.00 1.13 0.00 2 2
## 14 14 0.83 -0.50 0.50 0.00 0.85 0.00 2 2
## 15 15 -0.82 -0.75 0.75 0.00 1.03 0.00 2 2
## 16 16 1.40 0.00 0.00 0.00 1.15 0.18 1 3
## 17 17 1.89 0.00 0.00 0.00 0.96 0.01 1 3
## 18 18 1.33 0.00 0.00 0.00 1.13 0.04 1 3
## 19 19 -0.28 0.00 0.00 0.00 1.11 0.12 1 3
## 20 20 -1.16 0.00 0.00 0.00 0.99 0.04 1 3
The 20 generated b
parameters of gen1PL_3PL_PC_GPC
range between -1.51 and 1.89, within the bounds set by b_bounds
. Furthermore, the 30 generated thresholds of gen1PL_3PL_PC_GPC
range between -1.61 and 1.89.