Growing up as scientists in a “publish or perish” culture, most of us think that publication (“the proof” of productivity) is the only thing that matters. This general pressure weighs heavily on graduate students’ daily lives as we prepare for a future as independent researcher. Number of publications and the accumulating sum of impact factors that we “own” decides the success of applications from postdoctoral training opportunity to independent grant. However, this stress is a double-edged sword. On one hand, it pushes us forward to be a good researcher. But on the other hand, it also opens the door of research misconduct, in particular involving p-hacking.
The Most Famous Case in 2018
Dr. Brian Wansink, the chair of the Applied Economics and Management Department at Cornell University, is one of the most famous researchers studying the association between consumer behavior and food marketing. But recently, he became another researcher who fell in the p-hacking trap. Just earlier this month, the famous Joy of Cooking1 gave a kick back on the 2009 study that Wansink published in Annals of Internal Medicine due to the notorious “cherry-picked data”.2 A series of new investigation has since been launched by journals targeting on the approach of statistical method in Wansink’s publications (with a cumulative number of 22 retraction upon till today).3 Many of these reports of decision of retraction from journals in the past year repeatedly emphasized the inappropriate approach of data selection in order to get that significant p-value.
What is p-hacking?
Picture a dataset with 100,000 subjects, aged 20 to 90 years. Now the task is to task find out whether age contributes to the change of blood pressure. After hours of hard work, we finally finished cleaning up the data and are ready to run analysis. However, the result was not significant in linear regression analysis. In order to figure out the “issue”, we then spent hours re-checking the data set and searching for literature. So what next?
Approach 1. Who are “The Outliers”?
It is not uncommon that sometimes a few of our subjects are far away from the rest of the group. Rationally, we would consider omitting these subjects from consideration. But what happen when the results are still not “significant”? Maybe there is more potential outliers in the results? So, let us keep going… But, where should we stop? Based on the p-values?
Approach 2. It is all about the “Sample Size”, right?
The second or perhaps the most popular strategy is to increase the sample size. Indeed, according to most statistical textbooks: the greater the sample size, the stronger the statistical power you will have. Therefore, a rational decision is to increase the sample size by double, triple… But where does it end? Experimental materials are not cheap. And, how many lives do we need to sacrifice for the greater good? And, most important of all, even if we have a giant sample size, are we sure that the results can be generalized to the target population?
Approach 3. Swing Dance in the Pool of Statistical Tests.
Since some of the literature grouped their subjects based on their age, we decided to do the same thing and cluster subjects into two groups: young adults and seniors. This time, we decided to do t-test because our independent variable is now categorical. And, as we hope, the new p-value is below 0.01. Finally, we can use this result and publish an article! Months later, we received approval for publication. And, the story ends here?
The Consequences of Ingesting the Deadly p-Virus
John Oliver, the host of Last Week Tonight, once made a simple but direct definition of p-hacking:
“playing with your variables until you find something significant. But, it is probably meaningless.”
Even though one may say: “It is just one study.”, the consequence of exposing a wrong message may cause a chain reaction through Facebook, Twitter and the Media. Moreover, what if many scientific reports fall into the p-hacking trap? As pointed out in one of the most cited article about p-hacking written by Head et. al.,
“Meta-analyses are now the ‘gold standard for synthesizing the evidence for an effect of a treatment… However, meta-analyses are compromised if the studies being synthesized do not reflect the true distribution of effect sizes.”4
It is also worth noting that p-hacking and other scientific misconduct have a wide-spread influence on academia, the pharmaceutical industry, health profession and the society.5, 6
So, the next question is what are the dos and don’ts?
If you want to know the answer, make sure that you save the afternoon on the 2xth of April to join The p-hacking Workshop, specially designed just for IPN. The workshop is hosted by the Graduate Students Association for Neuroscience (GSAN). For more details, stay tuned and read the upcoming GSAN email. Do not forget to spread the words to your colleagues and PIs.
References
- Rombauer IvS, Becker MR. Joy of cooking. New York: Bobbs-Merrill, 1943.
- Wansink B, Payne CR. The joy of cooking too much: 70 years of calorie increases in classic recipes. Ann Intern Med 2009;150:291-292.
- Watch R. Available at: http://retractiondatabase.org/RetractionSearch.aspx#?auth%3dWansink%252c%2bBrian.
- Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD. The extent and consequences of p-hacking in science. PLoS Biol 2015;13:e1002106.
- George SL. Research misconduct and data fraud in clinical trials: prevalence and causal factors. Int J Clin Oncol 2016;21:15-21.
- George SL, Buyse M. Data fraud in clinical trials. Clin Investig (Lond) 2015;5:161-173.
Article by Chun Yao, proofed by Dr. Paramita Saha Chaudhuri and edited by Clément Bourguignon