Standard-setting study

Standard-setting study is an official research study conducted by an organization that sponsors tests to determine a cutscore for the test. To be legally defensible in the US, in particular for high-stakes assessments, and meet the Standards for Educational and Psychological Testing, a cutscore cannot be arbitrarily determined; it must be empirically justified. For example, the organization cannot merely decide that the cutscore will be 70% correct. Instead, a study is conducted to determine what score best differentiates the classifications of examinees, such as competent vs. incompetent. Such studies require quite an amount of resources, involving a number of professionals, in particular with psychometric background. Standard-setting studies are for that reason impractical for regular class room situations, yet in every layer of education, standard setting is performed and multiple methods exist.

Standard-setting studies are typically performed using focus groups of 5-15 subject matter experts that represent key stakeholders for the test. For example, in setting cut scores for educational testing, experts might be instructors familiar with the capabilities of the student population for the test.

Types of standard-setting studies

Standard-setting studies fall into two categories, item-centered and person-centered. Examples of item-centered methods include the Angoff, Ebel, Nedelsky,[1] and Bookmark methods, while examples of person-centered methods include the Borderline Survey and Contrasting Groups approaches. These are so categorized by the focus of the analysis; in item-centered studies, the organization evaluates items with respect to a given population of persons, and vice versa for person-centered studies.

Item-centered studies are related to criterion-referenced tests and to norm-referenced tests.

Item-centered studies

Person-centered studies

Rather than the items that distinguish competent candidates, person-centered studies evaluate the examinees themselves. While this might seem more appropriate, it is often more difficult because examinees are not a captive population, as is a list of items.

For example, if a new test comes out regarding new content (as often happens in information technology tests), the test could be given to an initial sample called a beta sample, along with a survey of professional characteristics. The testing organization could then analyze and evaluate the relationship between the test scores and important statistics, such as skills, education, and experience. The cutscore could be set as the score that best differentiates between those examinees characterized as "passing" and those as "failing."

See for some discussion:


  1. Nedelsky, L. (1954). Absolute grading standards for objective tests. Educational and Psychological Measurement, 14, 3–19.
  2. Zieky, M.J. (2001). So much has changed: how the setting of cutscores has evolved since the 1980s. In Cizek, G.J. (Ed.), Setting Performance Standards, p. 19-52. Mahwah, NJ: Lawrence Erlbaum Associates.
  3. Lewis, D. M., Mitzel, H. C., Green, D. R. (June, 1996). Standard Setting: A Bookmark Approach. In D. R. Green (Chair), IRT-Based Standard-Setting Procedures Utilizing Behavioral Anchoring. Paper presented at the 1996 Council of Chief State School Officers National Conference on Large Scale Assessment, Phoenix, AZ.
  4. Mitzel, H. C., Lewis, D. M., Patz, R. J., & Green, D. R. (2000). The Bookmark Procedure: Cognitive Perspectives on Standard Setting. Chapter in Setting Performance Standards: Concepts, Methods, and Perspectives (G. J. Cizek, ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
This article is issued from Wikipedia - version of the Thursday, April 14, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.