TestDesign

An optimal test design approach to constructing fixed and adaptive tests in R

Seung W. Choi

University of Texas at Austin

Sangdon Lim

University of Texas at Austin

Wim J. van der Linden

University of Twente

1/29/2021

Abstract

Fixed tests and computerized adaptive testing (CAT) coexist in many testing programs and are often used interchangeably on the premise that both testing formats meet the same test specifications. In conventional CAT, however, items are selected through computer algorithms to meet primarily statistical criteria, whereas fixed forms are often created focusing heavily on content, non-statistical, and practical requirements. Founded on the optimal test design framework, the shadow-test approach to CAT and its generalization can allow for constructing fixed and adaptive test forms to the same test specifications with complex sets of constraints. This approach can render a variety of testing formats with different levels of adaptation and relative efficiency. TestDesign is a package implemented in R to allow for constructing both static and adaptive test forms to the same test specifications based on the framework.

1 Optimal Test Design

The shadow-test approach to CAT provides a flexible framework for adaptive testing solutions requiring a complex set of constraints (van der Linden and Reese 1998).

Utilizing the universal shadow-test assembler framework (van der Linden and Diao 2014), many testing formats can be assembled as a special case of the approach, including fixed, multi-stage, fully-adaptive tests, and their mixtures.

The key advantage is that it enforces the same set of constraints for different testing formats.

1.1 The `TestDesign` Package

Based on the shadow-test approach to CAT, TestDesign implements the optimal test design framework (van der Linden and Reese 1998) to perform both the assembly of fixed-form tests and run simulations of their adaptive counterparts with great customizability.

TestDesign is unique in that it can assemble both static and adaptive test forms subject to the same test specification based on the optimal test design framework.

TestDesign supports item pools that include a mixture of dichotomous and polytomous items calibrated according to common IRT models.

TestDesign implements the universal shadow-test assembler framework (van der Linden and Diao 2014) allowing for various levels of adaptivity:

Fully Adaptive
LOFT or Fixed Form
2-Stage MST
3-Stage MIST
Hybrid (Adaptive + Fixed)
Hybrid (Fixed + Adaptive)

Universal Shadow-Test

1.2 MIP Solver Support

The current version can work with the following open-source and commercial MIP solvers:

\(\textbf{lpSolve}\) (Berkelaar and others 2020)
\(\textbf{Rglpk}\) (Theussl and Hornik 2019)
\(\textbf{Rsymphony}\) (Harter, Hornik, and Theussl 2017)
\(\textbf{lpsymphony}\) (Kim 2019)
\(\textbf{gurobi}\) (Gurobi Optimization and LLC 2019)

1.3 IRT Model Support

Utilizing the \(\textbf{S4}\) object-oriented programming (OOP) system in R, TestDesign provides a collection of classes and methods for various IRT models:

1-parameter logistic model
2-parameter logistic model
3-parameter logistic model
partial credit model
generalized partial credit model
graded response model

1.4 Modular Design

TestDesign is built on two primary modules:

Static() for fixed-form assembly
Shadow() for adaptive assembly

The primary modules share the same input data components prepared sequentially by the following loading functions:

loadItemPool() to load an item pool
loadItemAttrib() to load item attributes
loadStAttrib() to load optional stimulus attributes
loadConstraints() to load test constraints

2 Preparing Input Data

The input data components required for test assembly are an item pool containing item parameter estimates, item attributes, and test constraints. Stimulus-based test assembly also requires stimulus attributes. The data components are:

Item pool: Item-level IRT model information (e.g., 3PL) and the parameter estimates.
Item attributes: Item-level attributes, which may include content hierarchies, objectives, item types, response formats, etc.
Stimulus attributes (optional): Stimulus-level attributes that describe qualitative or quantitative characteristics of each stimulus, such as the length of passage (e.g., the word count), the mode or genre of writing, the topic area, the inclusion of graphics, etc.
Constraints: Examples of constraints to be imposed on test assembly include the test length, the number of items/stimuli for any sub-category of items/stimuli, enemy items, etc.

2.1 Item pool

An item pool is prepared as a plain csv file. The following example file contains a mix of 1000 dichotomous and polytomous items.

itempool_science_data <- read.csv(file.path(find.package("TestDesign"), "/extdata/itempool_science_1000.csv"), header = TRUE)
datatable(itempool_science_data, rownames = FALSE)

2.2 Item attributes

An item attribute file is prepared as a plain csv file, containing various item-level characteristics.

itemattrib_science_data <- read.csv(file.path(find.package("TestDesign"), "/extdata/itemattrib_science_1000.csv"),  header = TRUE, as.is = TRUE)
datatable(itemattrib_science_data, rownames = FALSE)

2.3 Constraints

A constraints file is prepared as a csv file. The following example contains 36 constraints.

constraints_science_data <- read.csv(file.path(find.package("TestDesign"), "/extdata/constraints_science_1000.csv"), header = TRUE, as.is = TRUE)
datatable(constraints_science_data, rownames = FALSE)

3 Loading Data

The loading functions can load csv files directly or data.frame objects created from them. Using the latter option, the following sections illustrate how to load and create input data components.

3.1 `loadItemPool()`

This creates an item_pool object.

itempool_science <- loadItemPool(itempool_science_data)
summary(itempool_science)

## Item pool
##   # of items :  1000
##     item_3PL :   918
##     item_GPC :    82
##       has SE : FALSE

head(itempool_science@parms)

## [[1]]
## Three-parameter logistic model (item_3PL)
##   Slope          : 0.2961202 
##   Difficulty     : -0.6070774 
##   Guessing       : 0.1987126 
## 
## [[2]]
## Three-parameter logistic model (item_3PL)
##   Slope          : 0.9188548 
##   Difficulty     : -2.94024 
##   Guessing       : 0.2708818 
## 
## [[3]]
## Three-parameter logistic model (item_3PL)
##   Slope          : 1.136193 
##   Difficulty     : -0.4767972 
##   Guessing       : 0.2061712 
## 
## [[4]]
## Three-parameter logistic model (item_3PL)
##   Slope          : 1.1097 
##   Difficulty     : -1.733667 
##   Guessing       : 0.1672703 
## 
## [[5]]
## Three-parameter logistic model (item_3PL)
##   Slope          : 0.9155325 
##   Difficulty     : -0.2742345 
##   Guessing       : 0.1377026 
## 
## [[6]]
## Three-parameter logistic model (item_3PL)
##   Slope          : 0.8146092 
##   Difficulty     : -1.181495 
##   Guessing       : 0.2755008

3.2 `loadItemAttrib()`

This creates an item_attrib object. The second argument references the item.pool object created above.

itemattrib_science <- loadItemAttrib(itemattrib_science_data, itempool_science)
summary(itemattrib_science)

## Item attributes
##   # of attributes : 9
##             INDEX : (1000 levels)
##                ID : (1000 levels)
##             LEVEL : 3 4 5
##          STANDARD : 1 2 3 4
##         OBJECTIVE : (28 levels)
##               DOK : 1 2 3
##              TYPE : DRAG EQTN FILL GRAPH HOTS MATCH SRMU SRSI
##            PVALUE : (1000 levels)
##             PTBIS : (1000 levels)

3.3 `loadConstraints()`

This creates a constraints object. The third argument references the item_attrib object created above.

constraints_science <- loadConstraints(constraints_science_data, itempool_science, itemattrib_science)
datatable(constraints_science@constraints, rownames = FALSE)

4 Test Assembly

We will first illustrate fixed-form assembly problems with Static() followed by examples of adaptive assembly using Shadow().

4.1 Fixed-Form Assembly with `Static()`

Static() provides three item selection methods:

Maximum Information (item_selection = "MAXINFO")
Target Information (item_selection = "TIF")
Target Test Characteristic Curve (item_selection = "TCC")

The helper function createStaticTestConfig() allows the user to create a configuration object, config_Static, and to specify the item_selection method of choice and related options.

4.1.1 Target Information Function

Here, we illustrate a fixed-form assembly with the Target Information Function option.

cfg_fixed <- createStaticTestConfig(
  item_selection = list(
    method          = "TIF",
    target_location = c(-1, 1),
    target_value    = c(8, 10)
  )
)

fixed_science <- Static(cfg_fixed, constraints_science)
datatable(fixed_science@selected, rownames = FALSE)

datatable(fixed_science@achieved, rownames = FALSE)

plot(fixed_science)

4.1.2 Target Test Characteristic Curve

Illustrating a fixed-form assembly with the Target Characteristic Curve option:

cfg_fixed <- createStaticTestConfig(
  item_selection = list(
    method          = "TCC",
    target_location = c(-1, 0, 1),
    target_value    = c(10, 15, 20)
  )
)

fixed_science <- Static(cfg_fixed, constraints_science)
datatable(fixed_science@selected, rownames = FALSE)

datatable(fixed_science@achieved, rownames = FALSE)

plot(fixed_science)

4.1.3 Maximum Information

Here, we illustrate a fixed-form assembly with the Target Information Function option:

cfg_fixed <- createStaticTestConfig(
  item_selection = list(
    method          = "MAXINFO",
    target_location = c(-1, 1)
  )
)

fixed_science <- Static(cfg_fixed, constraints_science)
datatable(fixed_science@selected)

datatable(fixed_science@achieved)

plot(fixed_science)

4.2 Adaptive Assembly with `Shadow()`

Shadow() provides a flexible mechanism to control the level of adaptivity in CAT to render different test formats with the same test specifications. Although maximum adaptivity is realized in fully adaptive testing whereby the shadow test is reassembled upon administering each item, the freezing/refreshing mechanism (van der Linden and Diao 2014) allows for assembling any conceivable testing format as a special case of the shadow-test approach. A few common test formats with reduced levels of adaptivity include:

Fixed - a single shadow test constructed targeting a specific trait level(s) to be administered in whole to all examinees
LOFT - an individualized shadow test constructed for each examinee targeting a specific location on the ability continuum or the examinee’s score from a previous administration and presented in its entirety
On-the-fly MST - a common shadow test constructed for a group of examinees to be reassembled at some predetermined points into testing (or when the change in trait estimate is greater than a certain threshold or both) to be optimized for each examinee’s updated trait estimate
Any hybrids of the above

4.2.1 Fully Adaptive

cfg_adaptive <- createShadowTestConfig()


adaptive_science <- Shadow(cfg_adaptive, constraints_science, true_theta = c(0, 1))

plot(adaptive_science, type = "audit" , examinee_id = 1)

plot(adaptive_science, type = "audit" , examinee_id = 2)

plot(adaptive_science, type = "shadow", examinee_id = 1, simple = TRUE)

plot(adaptive_science, type = "shadow", examinee_id = 2, simple = TRUE)

4.2.2 Three-Stage MST On-the-fly

cfg_adaptive <- createShadowTestConfig()
cfg_adaptive@refresh_policy$method <- "POSITION"
cfg_adaptive@refresh_policy$position <- c(1, 11, 21)
adaptive_science <- Shadow(cfg_adaptive, constraints_science, true_theta = c(0, 1))

plot(adaptive_science, type = "audit" , examinee_id = 1)

plot(adaptive_science, type = "audit" , examinee_id = 2)

plot(adaptive_science, type = "shadow", examinee_id = 1, simple = TRUE)

plot(adaptive_science, type = "shadow", examinee_id = 2, simple = TRUE)

5 Conclusion

Delivering test forms online and on demand has become a standard practice in educational, psychological, and health outcomes testing arenas in recent years. The optimal test assembly framework using MIP provides a viable solution for online and on demand test-assembly problems with complex test specifications and constraints. TestDesign is available from the Comprehensive R Archive Network: (https://CRAN.R-project.org/package=TestDesign) and GitHub (https://github.com/choi-phd/TestDesign).

References

Berkelaar, Michel, and others. 2020. lpSolve: Interface to Lp_solve V. 5.5 to Solve Linear/Integer Programs. https://CRAN.R-project.org/package=lpSolve.

Gurobi Optimization and LLC. 2019. Gurobi: Gurobi Optimizer 9.0 Interface. http://www.gurobi.com.

Harter, Reinhard, Kurt Hornik, and Stefan Theussl. 2017. Rsymphony: SYMPHONY in R. https://CRAN.R-project.org/package=Rsymphony.

Kim, Vladislav. 2019. Lpsymphony: Symphony Integer Linear Programming Solver in R. https://doi.org/10.18129/B9.bioc.lpsymphony.

Theussl, Stefan, and Kurt Hornik. 2019. Rglpk: R/GNU Linear Programming Kit Interface. https://CRAN.R-project.org/package=Rglpk.

van der Linden, Wim J., and Qi Diao. 2014. “Using a Universal Shadow-Test Assembler with Multistage Testing.” In Computerized Multistage Testing: Theory and Applications, edited by Duanli Yan, Alina A. von Davier, and Charles Lewis. Chapman; Hall/CRC Press. https://doi.org/10.1201/b16858.

van der Linden, Wim J., and Lynda M. Reese. 1998. “A Model for Optimal Constrained Adaptive Testing.” Applied Psychological Measurement 22 (3): 259–70. https://doi.org/10.1177/01466216980223006.