Rather Testy, Aren’t We?
Key words: multiple-choice examinations, Stanford-Binet
2 pages (845 words)
I was
surprised to get a call once again from the association powers-that-be to help
teach a study course for the infamous Certified Association Executive (CAE)
exam. I say this because I’m a complete
iconoclast when it comes to standardized multiple-guess tests. Perhaps they are a just a necessary evil, or
perhaps they are indeed some great normalizing assessment criterion. My belief, however, is that they are
constructed more for the ease of educational administrators than for predicting
the ability of students to honestly apply or articulate anything actually
learned.Key words: multiple-choice examinations, Stanford-Binet
2 pages (845 words)
Now
before those vested in the design of these Stanford-Binet masterpieces (a) get
their distractors in a wad, or (b), their
cognitive taxonomy intertwined, or (c) their item stems incomplete, or (d) or all
of the above, let me say that the tests
can be good at assessing retention of certain types of basic information. This is why they’ve been used to assess
elementary school performance—very elementary—and we know where that’s
led.
My
major gripe is rather simple. If you
have a question, why not just ask folks for the answer? If you want someone to tell you what is the
capital of Minnesota, why not just ask them, “hey, you, dude, what’s the
capital of Minnesota?” Either he tells
you St. Paul (or even writes it, in which case he’ll need to know how to
spell), or he doesn’t. But no, here’s what we ask: The capital of Minnesota is which of the
following: (a) Minnesota City, (b) Minneapolis, (c) Annapolis,
or (d) none of the above. Or how about
a numbers one: What is 23? (a) 5, (b), 6, (c), 8, (d) 9. As you can tell, the whole fun of
constructing these questions is in presenting great wrong answers, called
distractors.
Good
distractors are a real art form for the test writer. The answer to a multiple choice question is
basically right there in front of you, so it has to be cleverly disguised. Some of the distractors used as part of the
disguise are obvious, but then some are rather sophisticated. Here’s one of the questions I wrote for the
CAE exam long ago when it first went from a 4-hour written essay/short answer
exam (and thus expensive to grade) to an easy-to-grade multiple choice exam: A budget is best described as which of the
following: (a) a plan, (b) one of the
three required financial statements, (c) reported revenue vs. expenses , or (d)
none of the above. I worked hard on my
distractors (b) and (c), and they fooled a lot of folks, so did (d) for that
matter. Of course the answer was the
simple (a), a plan. This is, in fact,
what someone would answer if the question were asked directly: hey, what’s a budget? Well, it’s a plan about money. Now that I reflect on this, I’m not sure I’m
all that proud of this testing method.
The
defenders of these Stanford-Binet type exams will point out how well the tests
predict some other performance metric such as grades. Here’s what happens. On each exam (say the SAT, or the GRE), there
are a number of “experimental” questions.
These questions are not scored for the current exam, but are for future
use. The exams, as a clever aside, also ask the
test taker for some demographic or other descriptive information—such
information as grade point average.
Then, when the scores are compiled the test designers look to see which
experimental questions correlate to good grades. They keep the questions for future exams
which the students with good grades get right, and they throw out any questions
which students with good grades get wrong.
At the end of all this, they proudly announce that their tests predict
grades. Of course, they predict grades;
they’ve been correlated to do so. It’s
like saying grades predict grades.
I would
be interested to know what professional attributes are correlated to
certification exam scores. There could
some; it would be something good to find out.
What I worry about are the unintended consequences of selecting and
rewarding our best and brightest by their ability to discern clever distractors
from the right answer. It bodes well for
CSI types and other detectives (maybe that’s why these shows are so popular),
but what does it say about our ability to come up with something new out of
nothing? To create from tabula rasa when we’ve never seen a
blank space before? We’ve developed
overarching examinations for important gateways in our society, which in
essence select for the passive skill of recognizing camouflaged solutions, not
the active skill of creating new answers.
Surely, our method of testing will have long term consequences, it will
(a) save money for test graders, (b) make
multiple-choice testing more frequently seen—even on game shows, (c) select for those with analytical skills
bordering on the litigious and devious, (d) increase our reliance on bureaucratically-inspired
statistics, or (e) all of the above. I’m
afraid to guess.
©Copyright 2013 John Harrison. All rights reserved.