Generating item responses from the PISA 2012 test design - tmatta/lsasim GitHub Wiki

The lsasim package contains three datasets to aid in the generation of item responses from the mathematics portion of the PISA 2012 "standard" test booklets. Table 1 indicates the cluster rotation design used to form the 13 standard test booklets (and is the same as Figure 2.1 of the PISA 2012 Technical Report on page 31). (Note that PISA uses the term "cluster" whereas lsasim uses the term "block" to define a subset of items. Throughout this example, we use cluster when describing the PISA test design and block when describing lsasim functionality.) Content clusters are denoted by a letter and a number, where M, S, and R indicate math, science, and reading, respectively. The term "standard" is meant to distinguish between standard blocks and so-called "easy" booklet options in PISA. PM6A and PM7A represent standard versions of clusters 6 and 7 (relative to their easy versions, PM6B and PM7B).

Booklet ID	Cluster
B1	PM5	PS3	PM6A	PS2
B2	PS3	PR3	PM7A	PR2
B3	PR3	PM6A	PS1	PM3
B4	PM6A	PM7A	PR1	PM4
B5	PM7A	PS1	PM1	PM5
B6	PM1	PM2	PR2	PM6A
B7	PM2	PS2	PM3	PM7A
B8	PS2	PR2	PM4	PS1
B9	PR2	PM3	PM5	PR1
B10	PM3	PM4	PS3	PM1
B11	PM4	PM5	PR3	PM2
B12	PS1	PR1	PM2	PS3
B13	PR1	PM1	PS2	PR3

PISA 2012 item parameters

The first dataset in the lsasim package is pisa2012_math_item, which contains the international item parameters for the 109 mathematics items used in PISA 2012 (see PISA 2012 Technical Report, pp 406-409). Notice that each row of the data frame contains an item name, an item number, a b parameter, and, for partial credit items, two d parameters. For example, PM00FQ01 to PM155Q01 are calibrated using the Rasch model and only involve item difficulty, b. We can see that the d1 and d2 parameters equal to 0. Items PM155Q02D and PM155Q03D are calibrated using a partial credit model, which involve b, d1, and d2.

lsasim::pisa2012_math_item[1:8, ]

##   item_name item        b       d1       d2
## 1  PM00FQ01    1  0.39027  0.00000  0.00000
## 2  PM00GQ01    2  2.75209  0.00000  0.00000
## 3  PM00KQ02    3  1.97967  0.00000  0.00000
## 4  PM033Q01    4 -1.44130  0.00000  0.00000
## 5 PM034Q01T    5  0.42603  0.00000  0.00000
## 6  PM155Q01    6 -0.84340  0.00000  0.00000
## 7 PM155Q02D    7 -0.44941  0.74491 -0.74491
## 8 PM155Q03D    8  1.56865 -1.56865  1.56865

PISA 2012 item blocks

The second dataset in the lsasim package is pisa2012_math_block, which indicates how the 109 items correspond to the 10 item blocks. Each row of the data frame contains an item name and an item number, followed by a series of 0s and 1s to indicate which of the 10 blocks the item belongs. For example, item PM00FQ01 is assigned to block 6 while item PM00GQ01 corresponds to block 5.

lsasim::pisa2012_math_block[1:10,]

##    item_name item_no block1 block2 block3 block4 block5 block6 block7 block8
## 1   PM00FQ01       1      0      0      0      0      0      1      0      0
## 2   PM00GQ01       2      0      0      0      0      1      0      0      0
## 3   PM00KQ02       3      0      0      0      1      0      0      0      0
## 4   PM033Q01       4      1      0      0      0      0      0      0      0
## 5  PM034Q01T       5      1      0      0      0      0      0      0      0
## 6   PM155Q01       6      1      0      0      0      0      0      0      0
## 7  PM155Q02D       7      1      0      0      0      0      0      0      0
## 8  PM155Q03D       8      1      0      0      0      0      0      0      0
## 9  PM155Q04T       9      1      0      0      0      0      0      0      0
## 10 PM192Q01T      10      0      1      0      0      0      0      0      0
##    block9 block10
## 1       0       0
## 2       0       0
## 3       0       0
## 4       0       0
## 5       0       0
## 6       0       0
## 7       0       0
## 8       0       0
## 9       0       0
## 10      0       0

Because only the indicator component of pisa2012_math_block is needed, we subset it. The PISA 2012 item parameters and corresponding block design matrix are used to create the block assignment matrix.

pisa2012_math_block_mat <- lsasim::pisa2012_math_block[, -c(1:2)]

pisa_blocks <- lsasim::block_design(item_parameters = pisa2012_math_item, 
                                    item_block_matrix = pisa2012_math_block_mat)

Printing the block_descriptives provides a quick check of the block lengths and average difficulties. This table indicates the number of items in each block and the average difficulty for each block. Notice each block has 11, 12, or 13 items. Block b6 reflects cluster PM6A and block b8 reflects cluster PM7A, which are the optional standard blocks. Block b7 reflects cluster PM6B and block b9 reflects cluster PM7B, which are the optional easier clusters. Block b10 reflects cluster PMUH, which was a block of items selected from the Main Survey items taking into account their suitability for students with special educational needs. Blocks b7, b9, and b10 were not part of the standard test booklets.

print(pisa_blocks$block_descriptives)

##     block length average difficulty
## b1            12              0.096
## b2            11             -0.007
## b3            12              0.012
## b4            12              0.025
## b5            12              0.067
## b6            13              0.092
## b7            13             -0.199
## b8            12              0.158
## b9            12             -0.238
## b10           12             -0.306

PISA 2012 test booklets

The third dataset in the lsasim package, pisa2012_math_booklet, indicates which item blocks correspond to which of the 13 standard test booklets in PISA 2012. Each row indicates a booklet while each column indicates an item block. Note that the columns b7, b9, and b10 contain all 0s. This is because pisa2012_math_booklet was designed to construct the 13 standard booklets which does not include the easy booklets (b7, b9) or the UH booklet (b10).

print(lsasim::pisa2012_math_booklet)

##    booklet b1 b2 b3 b4 b5 b6 b7 b8 b9
## 1       B1  0  0  0  0  1  1  0  0  0
## 2       B2  0  0  0  0  0  0  0  1  0
## 3       B3  0  0  1  0  0  1  0  0  0
## 4       B4  0  0  0  1  0  1  0  1  0
## 5       B5  1  0  0  0  1  0  0  1  0
## 6       B6  1  1  0  0  0  1  0  0  0
## 7       B7  0  1  1  0  0  0  0  1  0
## 8       B8  0  0  0  1  0  0  0  0  0
## 9       B9  0  0  1  0  1  0  0  0  0
## 10     B10  1  0  1  1  0  0  0  0  0
## 11     B11  0  1  0  1  1  0  0  0  0
## 12     B12  0  1  0  0  0  0  0  0  0
## 13     B13  1  0  0  0  0  0  0  0  0

Again, we subset the data frame so that it meets the requirements of the book_design argument of the booklet_design function. From here, we can use it with block_assignment from pisa_blocks to create the 13 booklets.

pisa2012_math_book_mat <- lsasim::pisa2012_math_booklet[, -1]

pisa_books <- lsasim::booklet_design(item_block_assignment = pisa_blocks$block_assignment,
                                     book_design = pisa2012_math_book_mat)

In the below output, booklet B1 contains the items from block 5 (PM5) and block 6 (PM6A), while booklet B2 contains the items from block 8 (PM7A).

print(pisa_books)

##      B1 B2  B3  B4  B5  B6 B7  B8  B9 B10 B11 B12 B13
## i1    2 41  11   3   4   4 10   3  11   4  10  10   4
## i2   45 42  15  43   5   5 12  43  15   5  12  12   5
## i3   46 53  18  44   6   6 13  44  18   6  13  13   6
## i4   47 54  21  48   7   7 14  48  21   7  14  14   7
## i5   73 68  22  49   8   8 19  49  22   8  19  19   8
## i6   74 69  23  93   9   9 27  93  23   9  27  27   9
## i7   75 76  25  94  16  16 28  94  25  16  28  28  16
## i8   82 77  29  95  17  17 30  95  29  17  30  30  17
## i9   83 78  34  96  20  20 31  96  34  20  31  31  20
## i10  84 79  36 102  24  24 32 102  36  24  32  32  24
## i11 108 80  37 103  26  26 33 103  37  26  33  33  26
## i12 109 81  38 104  35  35 11 104  38  35   3   0  35
## i13   1  0   1   1   2  10 15   0   2  11  43   0   0
## i14  39  0  39  39  45  12 18   0  45  15  44   0   0
## i15  40  0  40  40  46  13 21   0  46  18  48   0   0
## i16  50  0  50  50  47  14 22   0  47  21  49   0   0
## i17  51  0  51  51  73  19 23   0  73  22  93   0   0
## i18  52  0  52  52  74  27 25   0  74  23  94   0   0
## i19  55  0  55  55  75  28 29   0  75  25  95   0   0
## i20  56  0  56  56  82  30 34   0  82  29  96   0   0
## i21  57  0  57  57  83  31 36   0  83  34 102   0   0
## i22  58  0  58  58  84  32 37   0  84  36 103   0   0
## i23 105  0 105 105 108  33 38   0 108  37 104   0   0
## i24 106  0 106 106 109   1 41   0 109  38   2   0   0
## i25 107  0 107 107  41  39 42   0   0   3  45   0   0
## i26   0  0   0  41  42  40 53   0   0  43  46   0   0
## i27   0  0   0  42  53  50 54   0   0  44  47   0   0
## i28   0  0   0  53  54  51 68   0   0  48  73   0   0
## i29   0  0   0  54  68  52 69   0   0  49  74   0   0
## i30   0  0   0  68  69  55 76   0   0  93  75   0   0
## i31   0  0   0  69  76  56 77   0   0  94  82   0   0
## i32   0  0   0  76  77  57 78   0   0  95  83   0   0
## i33   0  0   0  77  78  58 79   0   0  96  84   0   0
## i34   0  0   0  78  79 105 80   0   0 102 108   0   0
## i35   0  0   0  79  80 106 81   0   0 103 109   0   0
## i36   0  0   0  80  81 107  0   0   0 104   0   0   0
## i37   0  0   0  81   0   0  0   0   0   0   0   0   0

For the following example, we will replicate the PISA 2012 booklet design to generate responses for mathematics items for 1000 examinees. We generate examinees' latent abilities from a standard normal distribution with mean 0 and standard deviation 1.

n_examinees <- 1000

examinees_theta <- rnorm(n_examinees, 0, 1)

subj_booklets <- lsasim::booklet_sample(n_subj = n_examinees, 
                                        book_item_design = pisa_books)

Because we have excluded the easy booklets, not all 109 items will be used to generate responses. Because the response_gen function matches item information based on a unique item number, we must subset the item bank to exclude those items in pisa2012_math_item that were not administered in the standard test booklets. To accomplish this, first obtain a sorted vector of unique items administered to the test takers, std_items. This vector can be used to subset the item bank, pisa2012_math_item.

std_items <- sort(unique(subj_booklets$item))

pisa_std_items <- lsasim::pisa2012_math_item[std_items, ]

nrow(pisa_std_items)

## [1] 84

The resulting object, pisa_std_items, contains item information for the 84 items administered in the 13 standard test booklets. Because we are using a subset of items whose item numbers are not sequential, we use the optional argument item_no when using the response_gen function. Finally, we rename the variables of pisa_ir to the PISA 2012 items names.

pisa_ir <- response_gen(subject = subj_booklets$subject,
                        item = subj_booklets$item,
                        theta = examinees_theta,
                        item_no = pisa_std_items$item,
                        b_par = pisa_std_items$b,
                        d_par = list(pisa_std_items$d1,
                                     pisa_std_items$d2))

colnames(pisa_ir)[1:(ncol(pisa_ir)-1)] <- pisa2012_math_item$item_name[std_items]

The result is a data frame with performance on each item and subject's ID. We print the first five subjects' response data below. Items that are not administered are labeled as NA.

print(pisa_ir[1:5,])

##   PM00FQ01 PM00GQ01 PM00KQ02 PM033Q01 PM034Q01T PM155Q01 PM155Q02D PM155Q03D
## 1        1       NA        0       NA        NA       NA        NA        NA
## 2        0        0       NA       NA        NA       NA        NA        NA
## 3        1       NA       NA       NA        NA       NA        NA        NA
## 4       NA        0       NA       NA        NA       NA        NA        NA
## 5       NA       NA        0       NA        NA       NA        NA        NA
##   PM155Q04T PM192Q01T PM273Q01T PM305Q01 PM406Q01 PM406Q02 PM408Q01T PM411Q01
## 1        NA        NA        NA       NA       NA       NA        NA       NA
## 2        NA        NA        NA       NA       NA       NA        NA       NA
## 3        NA        NA         0       NA       NA       NA         1       NA
## 4        NA        NA         1       NA       NA       NA         0       NA
## 5        NA        NA        NA       NA       NA       NA        NA       NA
##   PM411Q02 PM420Q01T PM423Q01 PM442Q02 PM446Q01 PM446Q02 PM447Q01 PM462Q01D
## 1       NA        NA       NA       NA       NA       NA       NA        NA
## 2       NA        NA       NA       NA       NA       NA       NA        NA
## 3       NA         1       NA       NA        1        0        1        NA
## 4       NA         0       NA       NA        1        0        1        NA
## 5       NA        NA       NA       NA       NA       NA       NA        NA
##   PM464Q01T PM474Q01 PM496Q01T PM496Q02 PM559Q01 PM564Q01 PM564Q02 PM571Q01
## 1        NA       NA        NA       NA       NA       NA       NA       NA
## 2        NA       NA        NA       NA       NA       NA       NA       NA
## 3         0       NA        NA       NA        1       NA       NA       NA
## 4         0       NA        NA       NA        1       NA       NA       NA
## 5        NA       NA        NA       NA       NA       NA       NA       NA
##   PM603Q01T PM800Q01 PM803Q01T PM828Q01 PM828Q02 PM828Q03 PM903Q01 PM903Q03
## 1        NA       NA        NA       NA       NA       NA        2        0
## 2        NA       NA        NA       NA       NA       NA        0        0
## 3        NA        1        NA        1        0        0        0        1
## 4        NA        1        NA        0        1        1       NA       NA
## 5        NA       NA        NA       NA       NA       NA       NA       NA
##   PM905Q01T PM905Q02 PM906Q01 PM906Q02 PM909Q01 PM909Q02 PM909Q03 PM915Q01
## 1         1        1        1        2       NA       NA       NA        1
## 2        NA       NA       NA       NA        1        1        1       NA
## 3        NA       NA       NA       NA       NA       NA       NA       NA
## 4        NA       NA       NA       NA        0        1        0       NA
## 5        NA       NA        0        0       NA       NA       NA        0
##   PM915Q02 PM918Q01 PM918Q02 PM918Q05 PM919Q01 PM919Q02 PM923Q01 PM923Q03
## 1        1        1        1        1        1        1        1        1
## 2       NA        0        0        0       NA       NA        0        0
## 3       NA        1        1        1       NA       NA        1        1
## 4       NA       NA       NA       NA       NA       NA       NA       NA
## 5        0       NA       NA       NA       NA       NA       NA       NA
##   PM923Q04 PM924Q02 PM943Q01 PM943Q02 PM949Q01T PM949Q02T PM949Q03 PM953Q02
## 1        0        0        1        0        NA        NA       NA        0
## 2        0        1       NA       NA         0         1        0       NA
## 3        0        1       NA       NA        NA        NA       NA       NA
## 4       NA       NA       NA       NA         1         1        2       NA
## 5       NA       NA       NA       NA        NA        NA       NA       NA
##   PM953Q03 PM953Q04D PM954Q01 PM954Q02 PM954Q04 PM955Q01 PM955Q02 PM955Q03
## 1        1         0        1        1        0       NA       NA       NA
## 2       NA        NA       NA       NA       NA        1        0        0
## 3       NA        NA       NA       NA       NA       NA       NA       NA
## 4       NA        NA       NA       NA       NA        1        0        0
## 5       NA        NA       NA       NA       NA       NA       NA       NA
##   PM982Q01 PM982Q02 PM982Q03T PM982Q04 PM992Q01 PM992Q02 PM992Q03 PM995Q01
## 1        1        0         0        1        1        0        0        1
## 2       NA       NA        NA       NA       NA       NA       NA        1
## 3       NA       NA        NA       NA       NA       NA       NA        1
## 4       NA       NA        NA       NA       NA       NA       NA       NA
## 5        0        0         0        1        1        0        0       NA
##   PM995Q02 PM995Q03 PM998Q02 PM998Q04T subject
## 1        0        0       NA        NA       1
## 2        0        0        1         0       2
## 3        0        0       NA        NA       3
## 4       NA       NA        1         0       4
## 5       NA       NA       NA        NA       5