The future of communications measurement will require better research methods, and perhaps stricter statistical tests. This article discusses the changes afoot. It is part of our special Future of Measurement issue.
The following sentence is a big, big deal for social science, and perhaps for communications measurement as well:
“We propose to change the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005.”
It’s from the abstract for Redefine Statistical Significance, a soon-to-be published paper by a large group of scientists and statisticians. They propose to dramatically raise the bar of statistical proof for social science research. If this new standard becomes accepted, it may mean that communications measurement and public relations research will have to adopt more stringent statistical tests.
Keep in mind that the paper above is just one aspect of social science attempting to get its act together after the recent and ongoing embarrassment of the replicability crisis. This began when researchers tried to replicate recently published psychological studies, and were able to do so only 40 percent of the time. (For a quick refresher, read “Communications Measurement, P-Hacking, and the Replicability Crisis.”)
While the replicability crisis has many and complex causes, a simple step toward an improvement would seem to be to make false positive results more difficult to come by. Which is what changing P from .05 to .005. would do. There are plenty of other and additional fixes proposed, see this brilliant article in Nature on P values, their origin, and alternatives.
So, what does this have to do with practical communications measurement?
The replicability crisis casts a shadow over most of social science, and perhaps comms measurement, as well. After all, if peer-reviewed, published research is doubtful 40 percent of the time, what does that imply about results from comms measurement?
But the big disconnect here is that most Measurement Advisor readers struggle just to get their data and analyses accepted by management. And so the above P proposal from the social science ivory tower is probably far removed from their own struggles.
As Katie Paine, CEO of Paine Publishing, (and publisher of this newsletter) says,
“To most of my clients ‘confidence level’ means that you hire someone who has been doing this for three decades and presumably knows what he or she is doing, or at least knows someone who does. I’ve been doing this for more than three decades, and I fight every day to help clients just to get any form of good research in place, never mind raising issues of confidence level. My concern is stopping people from drawing conclusions from ten data points, never mind having enough data to raise the confidence level by a factor of ten. I think this effort is nice for scientists but irrelevant for PR and comms professionals.”
Reached for comment Tina McCorkindale, CEO of the Institute for PR, says she supports the proposed new standard in certain cases:
“It should be noted that the new recommendations are specifically for claims of discovery of new effects as well as studies that conduct null hypotheses significance tests. They are not addressing replications of existing claims.
“For public relations and communication research, statistical significance should be based on how confident you want to be in your results. Currently, the social science standard, which our research falls into, is a minimum of 95% confidence. Medical research or high-stakes research should (and does) require a higher P-value.
“Because the p-value is directly related to the sample size, moving to a P = .005 level means sample size would have to be increased by 70%, which directly has an impact on time and budget. If I had a choice between a study with a strong design but using a standard P = .05 level and a study with a weak design with a more stringent p-value of P = .005, I would choose the standard P-value every day of the week.
“I don’t think the new study is saying that we should throw the current research standard out the window, but we should consider more stringent requirements in certain cases. In fact, one of the key points of the article is:
‘We emphasize that this proposal is about standards of evidence, not standards for policy action nor standards for publication. Results that do not reach the threshold for statistical significance (whatever it is) can still be important and merit publication in leading journals if they address important research questions with rigorous methods.’
“Therefore, I do support their recommendation in those aforementioned circumstances, but for our industry, I would say using the current P = .05 standard compared to the P = .005 recommendation depends on the research question and goal.”
Will a rising tide float all research boats?
And so this particular P proposal may not soon or strongly affect practical comms measurement. Nonetheless we here at The Measurement Advisor hope this sign of general housecleaning on the of part of social science indicates a renewed emphasis on proper research design and methods. An emphasis that can only be good for comms measurement.
That image up top is thanks to Trust Me, I’m a “Psychologist.”