JPH Notes: The Data Are In

The Data Are In

Keywords: data, statistics

About 900 words or 2 pages

The Data Are In?

You remember the old joke: a kid from the sticks goes off to college and comes back home to visit. His proud father puts him in front of a crowd and says, “son, tell us something you learned at school.” The kid ponders for a moment and says, “Pi R Square.” The poor father winces, “oh, son, everybody knows pie are round. Cornbread are square.” Thus, we begin our contemplation of data. Are data plural or is data singular, and—except for the handful of philologists left on the planet (one of whom I am not) who fight over such things—can that rather silly argument actually help us understand what data is all about?

First, a quick attempt to get to the “right” answer on singular vs plural on data. The answer is yes. Sometimes data should be a singular noun referring to collected points of information in the abstract, a so-called non-count noun (like “hair”), or it can be used in its plural (like the count noun “hairs”) to refer to a certain bunch of information points, “those particular data do not mean a thing.” An individual data point could be called a datum according to the great authority, the OED (Oxford English Dictionary); however, you rarely hear such effete speech. Data is a word in Latin (a past participle, I’m told) actually meaning “given.”

Back to why this matters: it shows us something. Data can be viewed as something we collect in the field or something we generate in an experiment—both for the purpose of gaining knowledge. Sometimes, we examine the data, not to find out something new, but to prove our hunches or points of view. Thus, our motive is important: are we trying to learn something looking at the data, or are we trying to prove something?

This brings us to statistics, which is (or is it which are?) a system of analyzing data with the hope of making some determination. Statistics depends in large part on convention; like accounting, it makes lots of assumptions about norms and makes up its own rules. For example, let’s imagine an elderly couple. The wife is in superb health. The husband is dead. From those data and our good use of statistics, we can say that on average as a couple, they’re in mediocre health. If you didn’t know the full story on the couple, the data can slip right by unquestioned.

Always question the data. Here are but a few quick checks, and a couple of examples. First, check the “face validity,” otherwise known as the “smell test.” Secondly, look for the overarching trend and if this is applicable to other situations (external validity). Thirdly, check up on a few of the raw data points, just to see if the math really works (internal validity). And, here’s a qualitative approach: ask the presenter to poke a hole in his or her own data, “from where you stand, what is the weakest aspect of your data?” There’s always a weakness, and the presenter, if honest, knows he’s almost hiding something. Find out what that is.

Once, I was in a meeting where the association’s director of marketing was bragging on and on about how her self-serving session at a conference scored so well in the conference’s follow up survey. I noticed that too, and I also noticed how another session--the poster session-- at the conference had done equally as well according to the survey. I pointed that out the success of the poster session in the meeting.

“Well, yes, what’s your point?” she said.

“There actually weren’t any poster sessions,” I commented. The face validity of the much-touted survey then went down the tubes. Most casual follow-up surveys (i.e., those where the survey is distributed to everyone, and some respond back, but most do not) have no real statistical validity. This is because only the folks who really have a strong opinion are the ones who respond. Those of us who don’t have time or have no strong opinion—presumably a substantial portion—don’t respond to such surveys. All of our data are missed. To be statistically valid, a survey has to be taken from a sufficient random sample drawn from the population. It’s not a huge undertaking, just an extra step or two—but well worth it to provide good data.

I’ve learned that presenting survey results—even statistically valid ones—to a committee can be a tricky business. I used to present the whole survey results, the numerical data and the comments, to groups at all once, but noticed something odd going on. The members of the committee would go right to the comments section like rubberneckers at a car wreck, overlooking the valid numbers, and honing in on some quirky comment. I’ve seen this happen even with groups of scientists and numbers people. I once managed an association that had an annual meeting of over 8,000. Our follow-up survey was thorough, and the numbers indicated that no one was interested in whether we offered a particular special amenity at the conference; however, one responder asked for it in the comments section. The board honed in on that one comment, and we ended up spending thousands of dollars for a couple of years providing the unused service until we finally discontinued it. Nowadays, I always provide the numerical data from a survey first without the comments attached. Later I provide the comments well after we’ve digested the numbers, and then only in compiled form (how many said this, and how many said that). The one-off comments then lose their undue attraction.

Good data is a good thing. Understanding what the data say, if anything, is even better.

JPH Notes

Monday, March 30, 2015

The Data Are In

No comments:

Post a Comment