Do Interim Assessments Need Test Security?

Written by David Foster, CEO, Caveon

The answer is clearly IT DEPENDS!

Assessment is a critical part of the learning process that takes place between teachers and students. Formative assessment occurs during the process of learning as students interact with others in the classroom and provide teachers with information that allows them to adapt instruction as they teach. Interim assessments are used to measure the progress of the learning process and can provide additional information to guide classroom planning. Is a student progressing in learning school subjects? Has a foundational skill been learned that is important so that other skills can be acquired? What specific skills does a student still lack? What additional resources should be brought to bear? These questions can be addressed by a properly conducted interim assessment, administered one or more times during the course of acquiring important knowledge and other skills.

Interim assessments are designed to provide information on a student’s progress and make it possible to evaluate the degree of progress. For many students, such assessments serve as a confirmation that they are learning at a typical pace. In theory, such tests do not have high-stakes consequences, and can be reasonably considered to be low stakes. It can be argued, therefore, that exams used in this way do not need significant test security measures, as the motivation for cheating and the theft of content is low. Following that line of reasoning, proctoring, data forensics, web monitoring, and many other test security measures, are not needed for interim assessments.

But, if the results, the test scores, of interim assessments are used for other purposes in addition to tracking learning progress, then the assessment may have moved to the high-stakes category. For example, a test described as interim acquires high stakes if the results are used to evaluate teacher or school performance. Similarly, if the test is used for student promotion or grade advancement purposes, it has high stakes. If the test is used to place the student into special programs for gifted, ELL, or special needs students, particularly if funding is attached to these decisions, then the test also has high stakes. When the stakes are raised, some students, parents, teachers and/or school administrators may attempt to manipulate the results in order to influence decisions based upon the test score.

It is very tempting for local, district and state administrators to use the results of an assessment in ways that are inappropriate and that were not intended by the test designers or publishers. The big mistake that can be made is failing to recognize that the assessment has now become a high-stakes assessment. Attempts at cheating can be expected, which often surprises everyone, although it shouldn’t. When these attempts occur, expensive investigations are conducted, the media draws public scrutiny to the situation, and the assessment is likely compromised and may need to be replaced, sometimes at great cost if the assessment is used widely. When such compromises occur, this provides substantial evidence that the assessment cannot be used properly for either the additional purposes or the original use, that of evaluating student growth or progress.

The Standards for Educational and Psychological Testing, published in 2014 by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education), provide strong, consensual guidance to administrators in this area. Standard 1.2 states:

A rationale should be presented for each intended interpretation of test scores for a given use, together with a summary of the evidence and theory bearing on the intended interpretation (p. 23).

Standard 1.25 goes on to state:

When unintended consequences result from test use, an attempt should be made to investigate whether such consequences arise from the test’s sensitivity to characteristics other than those it is intended to assess or from the test’s failure to fully represent the intended construct (p. 30).

The numerous educational cheating scandals in the past few years demonstrate that tests are extremely sensitive to the effects of test fraud. In every case, such fraud has resulted in greatly inflated average scores for classrooms and schools. Based on these standards, and common sense, we have two recommendations.

Recommendation 1. The test publisher should provide specific warnings against inappropriate use, along with the reasons. Here is an example:

This assessment was created for the sole purpose of measuring student growth and has value only in determining student progress and to provide teachers with information to help them identify appropriate instructional resources to assist the student. The test scores from this assessment, if administered according to assessment manuals, MUST NOT BE USED for other purposes, such as being part of the evaluation of teachers, schools, districts and states. If the test is used to evaluate teacher or school performance, the test response data test will be susceptible to potential manipulation by those with a stake in the outcome, thereby decimating the validity and reliability of the test and rendering it useless in measuring student progress.

Recommendation 2. If it is decided that the interim assessments must be used for these high-stakes educational purposes, then it is required that the school, district, or state put in place an appropriate high level of security, following standards and best practices to the letter.