Don Stacks is a Professor of Public Relations at the University of Miami and over the course of his career has received numerous awards for his work in PR and education. At this year’s Conclave Summit, Stacks gave his presentation on the use and misuse of Big Data in measurement. I sat down with Prof. Stacks after his speech and he went more in-depth on his views.
Q: During your speech you talked about Big Data being like a sledgehammer when people often need a regular hammer instead. How does someone make that transition from the sledgehammer to a normal hammer?
Don Stacks: A lot of what Big Data is right now is not collecting data that’s really of interest to us.
If you look at the what traditional Big Data is; you’re bringing in data from supermarkets and telling you who buys what, or credit card purchases [along with] huge samples of social media. And the social media we’re able to work with, but with Big Data we’re able to take these separate different datasets and combine them together to make all of that part of our analysis.
In some cases someone will give you a big dataset and what you have to ask is, “what do you want from it?” If you don’t know what you’re looking for when you go into the data you’ll never find it. The idea is that I’m going to find the golden egg or the pea the princess had in all these mattresses, just from knowledge of the mattresses. So you see the pea, but you have no idea what it is or what it’s for, so you miss it.
On the other hand if you say, “I’m looking for a pea, or a rock or an irritant,” now I’m guided in where I’m going and I can take a mass of data and parse it. I think that’s where the importance starts to come in; you have to be able to state what my objectives and goals are. Those are going to drive you in terms of using Big Data vs. Little Data, or whatever you want to call it.
Q: During your speech you talked about “not measuring normal,” can you explain what you mean by that?
Don Stacks: All data is normal, what we’re looking for is to see if the data does not fit a normal [bell] curve, and if it doesn’t you use a transformation to adjust it back into a normal curve.
Q: How do you do that?
Don Stacks: You modify the data so it’s in the normal. If you have 10 observations, your curve is very, very flat. If you have 100 observations you curve starts to approximate a normal curve. If you have a 1,000 you should have a normal curve.
That normal curve is based on a regular bell shaped curve, but in actuality that curve might be skewed to one side or another, in which case it’s normal for itself but not normal for the hypothetical that you’re looking for, so you adjust it so it reflects the metrics of other metrics.
Q: I think I understand what you are saying. With any given set of data most of it’s going to be average and sit right in the middle of the graph.
Don Stacks: 68%. One standard deviation above or below is the definition of average.
Q: And what you’re saying is that in real life normal might be skewed off to the side. So you might have a bell curve that’s almost completely flat with a big spike at the end, or have a big spike in the beginning…
Don Stacks: Or it could be you have some sort of confounding relationships in your error terms which is going to stretch things out. It doesn’t matter how many people you’re observing, a normal curve says 34% are above and 34% are below the mean.
What you want is a curve that approaches the hypothetical normality (bell) and is simple enough to adjust, there are tests for it. It becomes extremely important when you’re looking at large data sets, where you have multiple predictor variables and multiple outcome variables, which is what we have in the real world.
We’re more into looking at “how do outcomes relate to each other?” based upon on how these predicator variables relate to them. Which is much more sophisticated that saying “how many hospital beds do we need?” Given that it’s a holiday in the middle the summer and a lot of people are driving. That’s what hospitals do every holiday weekend; which predicts the number of beds which in turn predict the number of nurses and the number of doctors they’ll need. Simple regression.∞