In last month’s Blog #1, I discussed how technology can solve some of the testing industry’s problems and suggested that as an industry, we haven’t yet introduced technology in a creative, effective way. The problems of the past still exist, and a personal review of 99% of the exams built today reveals that they are simply paper-and-pencil tests that are administered on computers. We are living in the past, and it’s not doing us any favors.
Why don’t more people embrace creative approaches to testing? Why don’t we have more adaptive tests, an innovative method that has been around for over 30 years? Why are the vast majority of items used in tests still traditional multiple choice with four options and one correct answer? Why aren’t scores provided immediately for most tests? Why aren’t essays scored automatically and quickly instead of using raters? Why aren’t more tests given on-demand instead of on a few days a year? Why do we travel to a location to take a test rather than take it in our own home or office? Why does it take weeks or months to fix a compromised or broken exam? Why are we still suffering a growing number of security breaches?
I have given these questions careful consideration over the years, searching for answers. Here are some of the possibilities I have come up with (an incomplete list, I’m sure).
- “We need to stick with the standards.” Some people confuse traditional long-standing methods with standards. They would argue that as a testing standard, traditional multiple-choice question cannot and should not ever change. Of course, multiple-choice testing is a method, not a standard.
- “I’m good at what I do.” People are comfortable doing what they have learned to do and what they have always been doing. Change is difficult.
- “How do I know it will work?” Some people are afraid of what will happen if they try new things. They are unsure whether a new approach will bring the promised benefits, and opt to play it safe and avoid new risks.
- “It needs more research.” Some people rely on the solid, but fairly slow process of scientific research.
- “Who else is doing it?” Some people are late adopters, willing to let others blaze the trail. They do not want to try new methods without being confident that it has already worked for others in the industry.
- “I don’t have the time.” A person may not have the time to learn about and understand the value of new tools and procedures.
- “That’s hard to do.” Using and implementing new technology can take substantial time and resources, particularly at the beginning. Some individuals may not be willing to undertake the initial work it requires to implement new ideas.
- “It scares me a bit.” It’s true that developing and using a new process or tool may not immediately result in a positive experience. That’s the risk one takes with innovation. For some, it is better to avoid that risk altogether.
I have used the title of Psychometrician often during my career, but it never seemed to fit well. I think a better title for me would be Psychometric Technologist. The title describes a person devoted to finding innovative ways to support the Standards we all follow in testing.
Regarding the value of innovating, all I can really offer is my testimonial—based on my personal experiences—that using technology, even technology that has never been tried before, can have amazing positive effects. During my career I’ve had the opportunity to encounter some significant testing challenges, roadblocks and problems. There must be 20-30 clear examples of this over my 34-year career. Let me describe a few of them.
In about 1993, when I was creating certification tests for Novell, a prominent software company at the time, we were getting hammered by our candidates and partners who complained that our tests were trivial, measuring textbook learning instead of real-world experience. Of course, the criticisms were mostly valid. It was clear that instead of asking about “how to do something” using Novell software, it would be better to ask the candidate to actually complete tasks in the software. I received the go-ahead to create a simulation of NetWare, Novell’s flagship networking software. The simulation interacted with the test administration system and our test development system so that it was easy to create items that asked the candidate to complete tasks. Scoring was straightforward, and the exam took no longer than the previous version. We still used multiple-choice questions where appropriate to measure other skills. To make a long story short, after implementing the new exam, the complaints about Novell’s exams virtually disappeared overnight. The simulation cost about $250,000 to build, and was worth every penny. It was used in many certification exams, practice exams and quizzes following training courses. Later psychometric analyses revealed that the items performed better than we hoped. Follow-up validity studies confirmed that certified candidates were well prepared to work for Novell clients and partners.
The use of innovative technology was motivated by a problem, and at the time there was no research or experience available to help implement it. My only choice to solve this dilemma was to take a risk and move forward.
Here is a solution for a simpler problem. At around the same time, Novell candidates experienced some difficulty with the translation of specific words in exams. It seems some technical words that are usually well known in English (e.g., Spooler) were translated. While the translated word was in the candidates’ native language, it was not recognized or understood by many. Since some candidates did understand the translated term, we could not simply revert to the English word. Our general solution was to provide the current translated version of the item, and included a button on the screen labeled “English.” When pressed, this button would show the item in the original English version. Including the “original language” of the item erased any confusion and allowed candidates to complete the item without the negative influence of that particular problem. The dual-language item was new and made possible by the computer administration of the certification exams. Later psychometric analyses indicated an improvement in item statistics coincidental with the change. That one technology innovation caused a major complaint about our translated exams to go away forever.
Recently, I have introduced and encouraged the use of an improved multiple choice item type, called the Discrete Option Multiple Choice or DOMC. Computerized testing technology allows a typical multiple-choice item to be presented in such a way that removes a host of problems inherent with the traditional version. Benefits in test security, fairness, and validity, as well as accessibility, are immediately realized. The limited research to date has confirmed many of these benefits. Recently, Neal Kingston (University of Kansas) and others confirmed in a research report that there is no psychometric reason not to use the DOMC item type in testing programs.
There are certainly many reasons not to innovate or use others’ innovations. I’ve listed a few of them above. But there is little to be gained from the status quo, and much to be gained from abandoning it. We still have major problems to deal with and current and yet-to-be-discovered technology solutions provide the welcome light at the end of the tunnel.