P- Hacking and Data Collection.


P-Hacking has a lot to do with data collection. It is whereby influential individuals try to influence the data collection process to come up with results which match their expectations or the expectations of other individuals depending on the data. Such influence can be for some benefits such as financial benefits or employment opportunities. The problem of P-hacking is mostly common among data which is closely related.

Looking at the comic in question, the problem of P-hacking has been effectively illustrated. In trying to find the relationship between jelly beans and acne, many outcomes are expected. Most of the outcomes are in line with the color of the jelly beans and their association with acne. The outcome provided seems to be completely biased due to a lot of influence from external factors. To start with, it is not clear or understandable how the color of natural food products can cause an infection. The conclusion that a certain color among jelly beans causes acne is therefore completely biased and not true. On the other hand, it is not clearly explained how the outcome (p>0.05) among green jelly beans was arrived at.  Bohannon (2015) suggested that any conclusions regarding statistical data should always be supported with facts. From the illustration, many colors of jelly beans have been investigated regarding their association with acne. Arriving at the fact that green jelly beans cause acne is a completely biased and externally influenced outcome; hence showing how P-hacking has played a role in arriving at the outcome. It is biased because there is no statistical data to support the fact.

P-value scientists would specifically have to look for statistical data to fix the P-hacking problem. The information provided is not backed up by enough data. The availability of data would help in coming up with reasonable P values. With 20 tests (n=20), such a test could have provided good results only if they had incorporated more colors and statistical data in the test. The reasonable threshold P value would be 0.0005 due to a large number of expected outcomes. Therefore, the best formula to incorporate would be 1-(1-[p]) ^n<0.005. It would provide the best results according to Bohannon.


Bohannon, J. (2015). Many psychology papers fail replication test.

Place a new order
Pages (550 words)
Approximate price: -