Generating item responses from the PISA 2012 test design - tmatta/lsasim GitHub Wiki
The lsasim
package contains three datasets to aid in the generation of item responses from the mathematics portion of the PISA 2012 "standard" test booklets. Table 1 indicates the cluster rotation design used to form the 13 standard test booklets (and is the same as Figure 2.1 of the PISA 2012 Technical Report on page 31). (Note that PISA uses the term "cluster" whereas lsasim
uses the term "block" to define a subset of items. Throughout this example, we use cluster when describing the PISA test design and block when describing lsasim
functionality.) Content clusters are denoted by a letter and a number, where M, S, and R indicate math, science, and reading, respectively. The term "standard" is meant to distinguish between standard blocks and so-called "easy" booklet options in PISA. PM6A and PM7A represent standard versions of clusters 6 and 7 (relative to their easy versions, PM6B and PM7B).
Booklet ID | Cluster | |||
---|---|---|---|---|
B1 | PM5 | PS3 | PM6A | PS2 |
B2 | PS3 | PR3 | PM7A | PR2 |
B3 | PR3 | PM6A | PS1 | PM3 |
B4 | PM6A | PM7A | PR1 | PM4 |
B5 | PM7A | PS1 | PM1 | PM5 |
B6 | PM1 | PM2 | PR2 | PM6A |
B7 | PM2 | PS2 | PM3 | PM7A |
B8 | PS2 | PR2 | PM4 | PS1 |
B9 | PR2 | PM3 | PM5 | PR1 |
B10 | PM3 | PM4 | PS3 | PM1 |
B11 | PM4 | PM5 | PR3 | PM2 |
B12 | PS1 | PR1 | PM2 | PS3 |
B13 | PR1 | PM1 | PS2 | PR3 |
PISA 2012 item parameters
The first dataset in the lsasim
package is pisa2012_math_item
, which contains the international item parameters for the 109 mathematics items used in PISA 2012 (see PISA 2012 Technical Report, pp 406-409). Notice that each row of the data frame contains an item name, an item number, a b parameter, and, for partial credit items, two d parameters. For example, PM00FQ01
to PM155Q01
are calibrated using the Rasch model and only involve item difficulty, b
. We can see that the d1
and d2
parameters equal to 0. Items PM155Q02D
and PM155Q03D
are calibrated using a partial credit model, which involve b
, d1
, and d2
.
lsasim::pisa2012_math_item[1:8, ]
## item_name item b d1 d2
## 1 PM00FQ01 1 0.39027 0.00000 0.00000
## 2 PM00GQ01 2 2.75209 0.00000 0.00000
## 3 PM00KQ02 3 1.97967 0.00000 0.00000
## 4 PM033Q01 4 -1.44130 0.00000 0.00000
## 5 PM034Q01T 5 0.42603 0.00000 0.00000
## 6 PM155Q01 6 -0.84340 0.00000 0.00000
## 7 PM155Q02D 7 -0.44941 0.74491 -0.74491
## 8 PM155Q03D 8 1.56865 -1.56865 1.56865
PISA 2012 item blocks
The second dataset in the lsasim
package is pisa2012_math_block
, which indicates how the 109 items correspond to the 10 item blocks. Each row of the data frame contains an item name and an item number, followed by a series of 0s and 1s to indicate which of the 10 blocks the item belongs. For example, item PM00FQ01
is assigned to block 6
while item PM00GQ01
corresponds to block 5
.
lsasim::pisa2012_math_block[1:10,]
## item_name item_no block1 block2 block3 block4 block5 block6 block7 block8
## 1 PM00FQ01 1 0 0 0 0 0 1 0 0
## 2 PM00GQ01 2 0 0 0 0 1 0 0 0
## 3 PM00KQ02 3 0 0 0 1 0 0 0 0
## 4 PM033Q01 4 1 0 0 0 0 0 0 0
## 5 PM034Q01T 5 1 0 0 0 0 0 0 0
## 6 PM155Q01 6 1 0 0 0 0 0 0 0
## 7 PM155Q02D 7 1 0 0 0 0 0 0 0
## 8 PM155Q03D 8 1 0 0 0 0 0 0 0
## 9 PM155Q04T 9 1 0 0 0 0 0 0 0
## 10 PM192Q01T 10 0 1 0 0 0 0 0 0
## block9 block10
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
## 7 0 0
## 8 0 0
## 9 0 0
## 10 0 0
Because only the indicator component of pisa2012_math_block
is needed, we subset it. The PISA 2012 item parameters and corresponding block design matrix are used to create the block assignment matrix.
pisa2012_math_block_mat <- lsasim::pisa2012_math_block[, -c(1:2)]
pisa_blocks <- lsasim::block_design(item_parameters = pisa2012_math_item,
item_block_matrix = pisa2012_math_block_mat)
Printing the block_descriptives provides a quick check of the block lengths and average difficulties. This table indicates the number of items in each block and the average difficulty for each block. Notice each block has 11, 12, or 13 items. Block b6
reflects cluster PM6A and block b8
reflects cluster PM7A, which are the optional standard blocks. Block b7
reflects cluster PM6B and block b9
reflects cluster PM7B, which are the optional easier clusters. Block b10
reflects cluster PMUH, which was a block of items selected from the Main Survey items taking into account their suitability for students with special educational needs. Blocks b7
, b9
, and b10
were not part of the standard test booklets.
print(pisa_blocks$block_descriptives)
## block length average difficulty
## b1 12 0.096
## b2 11 -0.007
## b3 12 0.012
## b4 12 0.025
## b5 12 0.067
## b6 13 0.092
## b7 13 -0.199
## b8 12 0.158
## b9 12 -0.238
## b10 12 -0.306
PISA 2012 test booklets
The third dataset in the lsasim
package, pisa2012_math_booklet
, indicates which item blocks correspond to which of the 13 standard test booklets in PISA 2012. Each row indicates a booklet while each column indicates an item block. Note that the columns b7
, b9
, and b10
contain all 0s. This is because pisa2012_math_booklet
was designed to construct the 13 standard booklets which does not include the easy booklets (b7
, b9
) or the UH booklet (b10
).
print(lsasim::pisa2012_math_booklet)
## booklet b1 b2 b3 b4 b5 b6 b7 b8 b9
## 1 B1 0 0 0 0 1 1 0 0 0
## 2 B2 0 0 0 0 0 0 0 1 0
## 3 B3 0 0 1 0 0 1 0 0 0
## 4 B4 0 0 0 1 0 1 0 1 0
## 5 B5 1 0 0 0 1 0 0 1 0
## 6 B6 1 1 0 0 0 1 0 0 0
## 7 B7 0 1 1 0 0 0 0 1 0
## 8 B8 0 0 0 1 0 0 0 0 0
## 9 B9 0 0 1 0 1 0 0 0 0
## 10 B10 1 0 1 1 0 0 0 0 0
## 11 B11 0 1 0 1 1 0 0 0 0
## 12 B12 0 1 0 0 0 0 0 0 0
## 13 B13 1 0 0 0 0 0 0 0 0
Again, we subset the data frame so that it meets the requirements of the book_design
argument of the booklet_design
function. From here, we can use it with block_assignment
from pisa_blocks
to create the 13 booklets.
pisa2012_math_book_mat <- lsasim::pisa2012_math_booklet[, -1]
pisa_books <- lsasim::booklet_design(item_block_assignment = pisa_blocks$block_assignment,
book_design = pisa2012_math_book_mat)
In the below output, booklet B1
contains the items from block 5 (PM5) and block 6 (PM6A), while booklet B2
contains the items from block 8 (PM7A).
print(pisa_books)
## B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13
## i1 2 41 11 3 4 4 10 3 11 4 10 10 4
## i2 45 42 15 43 5 5 12 43 15 5 12 12 5
## i3 46 53 18 44 6 6 13 44 18 6 13 13 6
## i4 47 54 21 48 7 7 14 48 21 7 14 14 7
## i5 73 68 22 49 8 8 19 49 22 8 19 19 8
## i6 74 69 23 93 9 9 27 93 23 9 27 27 9
## i7 75 76 25 94 16 16 28 94 25 16 28 28 16
## i8 82 77 29 95 17 17 30 95 29 17 30 30 17
## i9 83 78 34 96 20 20 31 96 34 20 31 31 20
## i10 84 79 36 102 24 24 32 102 36 24 32 32 24
## i11 108 80 37 103 26 26 33 103 37 26 33 33 26
## i12 109 81 38 104 35 35 11 104 38 35 3 0 35
## i13 1 0 1 1 2 10 15 0 2 11 43 0 0
## i14 39 0 39 39 45 12 18 0 45 15 44 0 0
## i15 40 0 40 40 46 13 21 0 46 18 48 0 0
## i16 50 0 50 50 47 14 22 0 47 21 49 0 0
## i17 51 0 51 51 73 19 23 0 73 22 93 0 0
## i18 52 0 52 52 74 27 25 0 74 23 94 0 0
## i19 55 0 55 55 75 28 29 0 75 25 95 0 0
## i20 56 0 56 56 82 30 34 0 82 29 96 0 0
## i21 57 0 57 57 83 31 36 0 83 34 102 0 0
## i22 58 0 58 58 84 32 37 0 84 36 103 0 0
## i23 105 0 105 105 108 33 38 0 108 37 104 0 0
## i24 106 0 106 106 109 1 41 0 109 38 2 0 0
## i25 107 0 107 107 41 39 42 0 0 3 45 0 0
## i26 0 0 0 41 42 40 53 0 0 43 46 0 0
## i27 0 0 0 42 53 50 54 0 0 44 47 0 0
## i28 0 0 0 53 54 51 68 0 0 48 73 0 0
## i29 0 0 0 54 68 52 69 0 0 49 74 0 0
## i30 0 0 0 68 69 55 76 0 0 93 75 0 0
## i31 0 0 0 69 76 56 77 0 0 94 82 0 0
## i32 0 0 0 76 77 57 78 0 0 95 83 0 0
## i33 0 0 0 77 78 58 79 0 0 96 84 0 0
## i34 0 0 0 78 79 105 80 0 0 102 108 0 0
## i35 0 0 0 79 80 106 81 0 0 103 109 0 0
## i36 0 0 0 80 81 107 0 0 0 104 0 0 0
## i37 0 0 0 81 0 0 0 0 0 0 0 0 0
For the following example, we will replicate the PISA 2012 booklet design to generate responses for mathematics items for 1000 examinees. We generate examinees' latent abilities from a standard normal distribution with mean 0 and standard deviation 1.
n_examinees <- 1000
examinees_theta <- rnorm(n_examinees, 0, 1)
subj_booklets <- lsasim::booklet_sample(n_subj = n_examinees,
book_item_design = pisa_books)
Because we have excluded the easy booklets, not all 109 items will be used to generate responses. Because the response_gen
function matches item information based on a unique item number, we must subset the item bank to exclude those items in pisa2012_math_item
that were not administered in the standard test booklets. To accomplish this, first obtain a sorted vector of unique items administered to the test takers, std_items
. This vector can be used to subset the item bank, pisa2012_math_item
.
std_items <- sort(unique(subj_booklets$item))
pisa_std_items <- lsasim::pisa2012_math_item[std_items, ]
nrow(pisa_std_items)
## [1] 84
The resulting object, pisa_std_items
, contains item information for the 84 items administered in the 13 standard test booklets. Because we are using a subset of items whose item numbers are not sequential, we use the optional argument item_no
when using the response_gen
function. Finally, we rename the variables of pisa_ir
to the PISA 2012 items names.
pisa_ir <- response_gen(subject = subj_booklets$subject,
item = subj_booklets$item,
theta = examinees_theta,
item_no = pisa_std_items$item,
b_par = pisa_std_items$b,
d_par = list(pisa_std_items$d1,
pisa_std_items$d2))
colnames(pisa_ir)[1:(ncol(pisa_ir)-1)] <- pisa2012_math_item$item_name[std_items]
The result is a data frame with performance on each item and subject's ID. We print the first five subjects' response data below. Items that are not administered are labeled as NA
.
print(pisa_ir[1:5,])
## PM00FQ01 PM00GQ01 PM00KQ02 PM033Q01 PM034Q01T PM155Q01 PM155Q02D PM155Q03D
## 1 1 NA 0 NA NA NA NA NA
## 2 0 0 NA NA NA NA NA NA
## 3 1 NA NA NA NA NA NA NA
## 4 NA 0 NA NA NA NA NA NA
## 5 NA NA 0 NA NA NA NA NA
## PM155Q04T PM192Q01T PM273Q01T PM305Q01 PM406Q01 PM406Q02 PM408Q01T PM411Q01
## 1 NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA
## 3 NA NA 0 NA NA NA 1 NA
## 4 NA NA 1 NA NA NA 0 NA
## 5 NA NA NA NA NA NA NA NA
## PM411Q02 PM420Q01T PM423Q01 PM442Q02 PM446Q01 PM446Q02 PM447Q01 PM462Q01D
## 1 NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA
## 3 NA 1 NA NA 1 0 1 NA
## 4 NA 0 NA NA 1 0 1 NA
## 5 NA NA NA NA NA NA NA NA
## PM464Q01T PM474Q01 PM496Q01T PM496Q02 PM559Q01 PM564Q01 PM564Q02 PM571Q01
## 1 NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA
## 3 0 NA NA NA 1 NA NA NA
## 4 0 NA NA NA 1 NA NA NA
## 5 NA NA NA NA NA NA NA NA
## PM603Q01T PM800Q01 PM803Q01T PM828Q01 PM828Q02 PM828Q03 PM903Q01 PM903Q03
## 1 NA NA NA NA NA NA 2 0
## 2 NA NA NA NA NA NA 0 0
## 3 NA 1 NA 1 0 0 0 1
## 4 NA 1 NA 0 1 1 NA NA
## 5 NA NA NA NA NA NA NA NA
## PM905Q01T PM905Q02 PM906Q01 PM906Q02 PM909Q01 PM909Q02 PM909Q03 PM915Q01
## 1 1 1 1 2 NA NA NA 1
## 2 NA NA NA NA 1 1 1 NA
## 3 NA NA NA NA NA NA NA NA
## 4 NA NA NA NA 0 1 0 NA
## 5 NA NA 0 0 NA NA NA 0
## PM915Q02 PM918Q01 PM918Q02 PM918Q05 PM919Q01 PM919Q02 PM923Q01 PM923Q03
## 1 1 1 1 1 1 1 1 1
## 2 NA 0 0 0 NA NA 0 0
## 3 NA 1 1 1 NA NA 1 1
## 4 NA NA NA NA NA NA NA NA
## 5 0 NA NA NA NA NA NA NA
## PM923Q04 PM924Q02 PM943Q01 PM943Q02 PM949Q01T PM949Q02T PM949Q03 PM953Q02
## 1 0 0 1 0 NA NA NA 0
## 2 0 1 NA NA 0 1 0 NA
## 3 0 1 NA NA NA NA NA NA
## 4 NA NA NA NA 1 1 2 NA
## 5 NA NA NA NA NA NA NA NA
## PM953Q03 PM953Q04D PM954Q01 PM954Q02 PM954Q04 PM955Q01 PM955Q02 PM955Q03
## 1 1 0 1 1 0 NA NA NA
## 2 NA NA NA NA NA 1 0 0
## 3 NA NA NA NA NA NA NA NA
## 4 NA NA NA NA NA 1 0 0
## 5 NA NA NA NA NA NA NA NA
## PM982Q01 PM982Q02 PM982Q03T PM982Q04 PM992Q01 PM992Q02 PM992Q03 PM995Q01
## 1 1 0 0 1 1 0 0 1
## 2 NA NA NA NA NA NA NA 1
## 3 NA NA NA NA NA NA NA 1
## 4 NA NA NA NA NA NA NA NA
## 5 0 0 0 1 1 0 0 NA
## PM995Q02 PM995Q03 PM998Q02 PM998Q04T subject
## 1 0 0 NA NA 1
## 2 0 0 1 0 2
## 3 0 0 NA NA 3
## 4 NA NA 1 0 4
## 5 NA NA NA NA 5