JPH Notes: May 2013

Rather Testy, Aren’t We?

Key words: multiple-choice examinations, Stanford-Binet

2 pages (845 words)

I was surprised to get a call once again from the association powers-that-be to help teach a study course for the infamous Certified Association Executive (CAE) exam. I say this because I’m a complete iconoclast when it comes to standardized multiple-guess tests. Perhaps they are a just a necessary evil, or perhaps they are indeed some great normalizing assessment criterion. My belief, however, is that they are constructed more for the ease of educational administrators than for predicting the ability of students to honestly apply or articulate anything actually learned.

Now before those vested in the design of these Stanford-Binet masterpieces (a) get their distractors in a wad, or (b), their cognitive taxonomy intertwined, or (c) their item stems incomplete, or (d) or all of the above, let me say that the tests can be good at assessing retention of certain types of basic information. This is why they’ve been used to assess elementary school performance—very elementary—and we know where that’s led.

My major gripe is rather simple. If you have a question, why not just ask folks for the answer? If you want someone to tell you what is the capital of Minnesota, why not just ask them, “hey, you, dude, what’s the capital of Minnesota?” Either he tells you St. Paul (or even writes it, in which case he’ll need to know how to spell), or he doesn’t. But no, here’s what we ask: The capital of Minnesota is which of the following: (a) Minnesota City, (b) Minneapolis, (c) Annapolis, or (d) none of the above. Or how about a numbers one: What is 2³? (a) 5, (b), 6, (c), 8, (d) 9. As you can tell, the whole fun of constructing these questions is in presenting great wrong answers, called distractors.

Good distractors are a real art form for the test writer. The answer to a multiple choice question is basically right there in front of you, so it has to be cleverly disguised. Some of the distractors used as part of the disguise are obvious, but then some are rather sophisticated. Here’s one of the questions I wrote for the CAE exam long ago when it first went from a 4-hour written essay/short answer exam (and thus expensive to grade) to an easy-to-grade multiple choice exam: A budget is best described as which of the following: (a) a plan, (b) one of the three required financial statements, (c) reported revenue vs. expenses , or (d) none of the above. I worked hard on my distractors (b) and (c), and they fooled a lot of folks, so did (d) for that matter. Of course the answer was the simple (a), a plan. This is, in fact, what someone would answer if the question were asked directly: hey, what’s a budget? Well, it’s a plan about money. Now that I reflect on this, I’m not sure I’m all that proud of this testing method.

The defenders of these Stanford-Binet type exams will point out how well the tests predict some other performance metric such as grades. Here’s what happens. On each exam (say the SAT, or the GRE), there are a number of “experimental” questions. These questions are not scored for the current exam, but are for future use. The exams, as a clever aside, also ask the test taker for some demographic or other descriptive information—such information as grade point average. Then, when the scores are compiled the test designers look to see which experimental questions correlate to good grades. They keep the questions for future exams which the students with good grades get right, and they throw out any questions which students with good grades get wrong. At the end of all this, they proudly announce that their tests predict grades. Of course, they predict grades; they’ve been correlated to do so. It’s like saying grades predict grades.

I would be interested to know what professional attributes are correlated to certification exam scores. There could some; it would be something good to find out. What I worry about are the unintended consequences of selecting and rewarding our best and brightest by their ability to discern clever distractors from the right answer. It bodes well for CSI types and other detectives (maybe that’s why these shows are so popular), but what does it say about our ability to come up with something new out of nothing? To create from tabula rasa when we’ve never seen a blank space before? We’ve developed overarching examinations for important gateways in our society, which in essence select for the passive skill of recognizing camouflaged solutions, not the active skill of creating new answers. Surely, our method of testing will have long term consequences, it will (a) save money for test graders, (b) make multiple-choice testing more frequently seen—even on game shows, (c) select for those with analytical skills bordering on the litigious and devious, (d) increase our reliance on bureaucratically-inspired statistics, or (e) all of the above. I’m afraid to guess.

Wednesday, May 29, 2013

Rather Testy, Aren’t We?