About 900 words or 2 pages
The Data Are In?
You remember the old
joke: a kid from the sticks goes off to
college and comes back home to visit.
His proud father puts him in front of a crowd and says, “son, tell us
something you learned at school.” The
kid ponders for a moment and says, “Pi R Square.” The poor father winces, “oh, son, everybody knows pie are round. Cornbread are square.” Thus, we begin our contemplation of
data. Are data plural or is data singular,
and—except for the handful of philologists left on the planet (one of whom I am
not) who fight over such things—can that rather silly argument actually help us
understand what data is all about?
First, a quick attempt to
get to the “right” answer on singular vs plural on data. The answer is yes. Sometimes data should be a singular noun
referring to collected points of information in the abstract, a so-called
non-count noun (like “hair”), or it can be used in its plural (like the count
noun “hairs”) to refer to a certain bunch of information points, “those particular
data do not mean a thing.” An individual
data point could be called a datum according to the great authority, the OED
(Oxford English Dictionary); however, you rarely hear such effete speech. Data
is a word in Latin (a past participle, I’m told) actually meaning “given.”
Back
to why this matters: it shows us
something. Data can be viewed as
something we collect in the field or something we generate in an
experiment—both for the purpose of gaining knowledge. Sometimes, we examine the data, not to find
out something new, but to prove our hunches or points of view. Thus, our motive is important: are we trying to learn something looking at
the data, or are we trying to prove something?
This brings us to statistics,
which is (or is it which are?) a system of analyzing data with the hope of
making some determination. Statistics
depends in large part on convention; like accounting, it makes lots of
assumptions about norms and makes up its own rules. For example, let’s imagine an elderly
couple. The wife is in superb
health. The husband is dead. From those data and our good use of
statistics, we can say that on average as a couple, they’re in mediocre
health. If you didn’t know the full story on the couple,
the data can slip right by unquestioned.
Always
question the data. Here are but a few
quick checks, and a couple of examples.
First, check the “face validity,” otherwise known as the “smell
test.” Secondly, look for the
overarching trend and if this is applicable to other situations (external
validity). Thirdly, check up on a few of
the raw data points, just to see if the math really works (internal validity). And, here’s a qualitative approach: ask the presenter to poke a hole in his or
her own data, “from where you stand, what is the weakest aspect of your
data?” There’s always a weakness, and
the presenter, if honest, knows he’s almost
hiding something. Find out what that
is.
Once,
I was in a meeting where the association’s director of marketing was bragging
on and on about how her self-serving session at a conference scored so well in
the conference’s follow up survey. I
noticed that too, and I also noticed how another session--the poster session-- at
the conference had done equally as well according to the survey. I pointed that out the success of the poster
session in the meeting.
“Well, yes, what’s your
point?” she said.
“There actually weren’t
any poster sessions,” I commented. The
face validity of the much-touted survey then went down the tubes. Most casual follow-up surveys (i.e., those
where the survey is distributed to everyone, and some respond back, but most do
not) have no real statistical validity. This
is because only the folks who really have a strong opinion are the ones who
respond. Those of us who don’t have time
or have no strong opinion—presumably a substantial portion—don’t respond to
such surveys. All of our data are
missed. To be statistically valid, a
survey has to be taken from a sufficient random sample drawn from the
population. It’s not a huge
undertaking, just an extra step or two—but well worth it to provide good data.
I’ve
learned that presenting survey results—even statistically valid ones—to a
committee can be a tricky business. I
used to present the whole survey results, the numerical data and the comments,
to groups at all once, but noticed something odd going on. The members of the committee would go right
to the comments section like rubberneckers at a car wreck, overlooking the valid
numbers, and honing in on some quirky comment. I’ve seen this happen even with groups of
scientists and numbers people. I once
managed an association that had an annual meeting of over 8,000. Our follow-up survey was thorough, and the
numbers indicated that no one was interested in whether we offered a particular
special amenity at the conference; however, one responder asked for it in the
comments section. The board honed in on
that one comment, and we ended up spending thousands of dollars for a couple of
years providing the unused service until we finally discontinued it. Nowadays, I always provide the numerical
data from a survey first without the comments attached. Later I provide the comments well after we’ve
digested the numbers, and then only in compiled form (how many said this, and
how many said that). The one-off
comments then lose their undue attraction.
Good
data is a good thing. Understanding what
the data say, if anything, is even better.
©Copyright. John P. Harrison, 2015. All rights
reserved
No comments:
Post a Comment