30. May, 2016

Street drugs significantly linked to psychosis and schizophrenia

Street drugs (including LSD, methamphetamine, marijuana/hash/cannabis) and alcohol have been linked with significantly increased probability of developing psychosis and schizophrenia http://www.schizophrenia.com/prevention/streetdrugs.html

What does this mean? Claims over what is “significant” can be misleading, even manipulative. It is important to know what claims of significance mean. In the row about dope, can significant findings that cannabis triggers psychosis (in heavy users), lead to the conclusion that it really does? Listen to this podcast http://mpegmedia.abc.net.au/rn/podcast/2011/10/orr_20111009.mp3 For further reflection, keep reading.

Significance Testing and Alternative Methods of Statistics

Hypothesis testing is fundamental to the construction and evaluation of knowledge in psychology, with that knowledge laying the foundations for clinical practice. A hypothesis is tested against a falsifiable and contradictory view (i.e., a hypothesis that nullifies and disproves the alternate, original hypothesis) and experimental data is collected and analysed to draw conclusions from those results (Perezgonzalez, 2015). However, the best practice concerning the statistical analysis and interpretation of the results aseither significant or not, is contended. The fundamental issue is the question of whether null hypothesis significance testing (NHST) is valid (Trafimow & Marks, 2015)? This paper will review NHST exposing two perspectives – those use probabilistic inferences as a “test of significance” (Fischer, 1934, p. 19), and those that do not. This approach makes arguments over the validity of NHST redundant, and demonstrates that alternate methods of statistical testing are complimentary. The initial task is to outline the logic of hypothesis testing.

Hypothesis Testing

Popper proposed that falsifiability be a criterion of scientific theory (Stanford Encyclopedia of Philosophy, 2014), arguing that scientific statements “must be capable of conflicting with possible, or conceivable observations” (Popper, 1962, p. 39). Testing falsifiability is “possible by means of purely deductive inferences (with the help of the modus tollens of classical logic) to argue from the truth of singular statements to the falsity of universal statements” (Popper, 2005, p. 19). And thus, the modus tollens inference specifies falsifiability in hypothesis testing (Cohen, 1994).

A modus tollens statement is conditional. It is an if-then inference (that can be used in a syllogism). In denying the if, the consequent then is denied. An example of a modus tollens syllogism is: If your king is in checkmate, then you have lost the game. You have not lost the game. Therefore, your king is not in checkmate (deLaplante, 2009). By denying the antecedent (your king is in checkmate) the inferred consequent is denied (you have lost the game) and it follows that the king is not in checkmate. In NHST, the null hypothesis takes this same if-then form. Cohen (1994) gives this example: If the null hypothesis is correct, then this datum cannot occur. It has, however, occurred. Therefore, the null hypothesis is false. By denying the antecedent (the null hypothesis is correct), the inferred consequent is denied (the datum cannot occur), and the conclusion follows that the null hypothesis is false.

Although modus tollens reasoning is valid, it is invalid in NHST because the modus tollens inference is made probabilistic (Cohen, 1994). Trafimow and Marks (2015) frame this inferential problem as one of “traversing the distance from the probability of the finding . . . to the probability of the null hypothesis, given the finding” (p.1). The problem is: by the definition of negation, a modus tollens premise contemplates two outcomes, i.e., either the antecedent is, or is not true. The full argument is “by the law of the excluded middle and the definition of negation, either [the antecedent] or [not the antecedent] must be true” (Shenefelt & White, 2013, p. 92). And as the law of the excluded middle states there is no middle ground between being and non-being (Alvira, Clavell, & Melendo, 1991), introducing probability into a modus tollens premise is logically impossible. The counterargument is that NHST does not violate the rule of syllogistic reasoning to any great degree (Nickerson, 2000).

 Examples of outcomes arising from a modus tollens premise are: one’s king is in checkmate or is not. It cannot probably be in checkmate. A person is either dead or not dead. Reporting that a person is probability dead, sheds no light on whether they are. Accordingly, in NHST, basing a conclusion on a probabilistic premise would be erroneous if interpreted as conclusive proof that the null hypothesis is false. However, all arguments over validity obscure the question of whether NHST sheds light on the phenomena of interest (Task Force on Statistical Inference, 1996)? To answer that question, the aims of NHST must be considered.

Aims of NHST

In NHST testing a null hypothesis is generated on the assumption that no significant relationships exist between the variables of interest (McKenzie, 2013). In this view the null hypothesis remains true until there is sufficient evidence of statistically significant finding such as differences between experimental conditions, or associations between variables (McKenzie, 2013). Thus the goal of NHST is to provide evidence (statistically significant probabilities) of relationships and associations in the population of interest.

The problem of construal is perhaps what hinders people thinking that NHST does not achieve this goal. Häggström (in press) equates greater statistical significance to a greater coincidence needed to be explained in order to hold on to the null hypothesis, but as evidence accumulates, the null hypothesis becomes untenable. In this non-definitive sense NHST sheds light on the phenomena of interest (Task Force on Statistical Inference, 1996). NHST is a starting point of analysis, although the reporting of elements such as effect sizes, confidence intervals, and descriptive statistics are also needed to convey a more complete meaning to the results, i.e., the magnitude of the observed effect (American Psychological Association, 2010).

The Next Step

In accepting/rejecting NHST researchers have produced a taxonomy of those positions (Häggström, in press). Those embracing probabilistic inferences (including Bayesian inferences) aim to reduce statistical errors whilst traversing the distance between the probability of the findings, to the actual parameter in the population. A second method aims to estimate those parameters and build upon those estimations (Cummings, 2012). The first method relies upon p-values to infer parameters in a population and thereafter assesses the magnitude of effect. This approach—which is a litmus test of statistical significance—has inherent, but manageable problems. First, achieving sufficient statistical power to avoid either achieving a significant result when it does not exist, or missing one that does, i.e., type I or type II errors (Schmidt & Hunter, 1997) and additionally, the arbitrariness of the pre-determined p-value (Cummings, 2012). The second statistical method accumulates evidence of experimental effect, supporting its findings with the aid of significance testing and confidence intervals. Both approaches are complimentary. The way ahead is to embrace both approaches.

Describing the Existing Taxonomy

Embracing both approaches requires an understanding of how they are complimentary. Cummings (2014) claiming that NHST is defeated, cited four defences that argue for significance testing, and that these have been overcome. Rather, in examining these defences, the mutual inclusivity of these approaches becomes apparent. 

Arguments for significance testing. Schmidt and Hunter (1997) consider eight objections to abandoning significance testing. This paper will report two cited by Cummings (2014) and the additional claim that confidence intervals are themselves significance tests.

To the objections (a) significance tests are essential to identify findings that are real, and not due to chance, i.e., whether two means or correlations are really different, and (b) significance testing is needed to determine whether an experimental effect exists, Schmidt and Hunter (1997) reply as follows: Although it is assumed that significance testing can distinguish between real and chance findings in research studies, it cannot. The example given is that as the average power of significance tests is in the .40 to .60 range, half of the tests in research literature will be non-significant, leading to an erroneous conclusion that in half of all the studies no new relationship among the variables—and hence effect size—will be found. Schmidt and Hunter (1997) claim a meta-analysis can reveal that, in many cases, these relationships can be found.

These arguments over the indispensability of significance testing, in determining what is real and what has effect, divulge two alternative ways of crunching numbers. One way (meta-analysis) accumulates evidence to falsify the null hypothesis, the other seeks probable evidence to nullify the likelihood of what is found in an experiment actually existing in the population. Both methods shed light on mean differences and magnitude, however a meta-analysis is more effective where participant numbers—and hence statistical power—are low. This is why detractors of NHST claim that significance testing offers an illusion of certainty, is unreliable and cannot tell what the result may be on replication (Cummings, 2012).  

Confidence intervals construct “plausible regions for population parameters” (Wilkinson, & Task Force on Statistical Inference, 1999, p. 602) and give an indication of effect size (Cohen, 1995). It is important to address the claim that confidence intervals are themselves significance tests as proponents of statistics of estimation (Cummings, 2012, 2014) argue against significance testing. Schmidt and Hunter (1997) counter this claim pointing out that confidence interval testing preceded significance testing and that confidence intervals were interpreted as error bands around point estimates. Whether one identifies confidence intervals as significance tests, depends on that person’s interpretation of confidence intervals (Schmidt & Hunter, 1997). Although unsatisfactory and circular, this argument adds weight to claim that doing statistics from either perspective, is complimentary.


NHST can be understood from two perspectives – by those who espouse probabilistic inferences (Fisher, 1934, Nickerson, 2000) and those that do not (Cohen, 1994, Trafimow & Marks, 2015, Cummings, 2012). From those perspectives NHST either illogically, or logically asserts that sampling probabilities are in binary opposition to fixed parameters within a population. However, the statistical approach of NHST is complimentary to the way of estimation that rejects significance testing. Significance testing yields circumstantial, probability based evidence while estimation accumulates evidence of experimental effect, supported by confidence intervals. The distinct advantage of estimation is the non-reliance upon statistical power to reveal experimental effect (thus avoiding false negative/positive results), however neither approach can falsify a hypothesis. Jointly that evidence may weigh against the null hypothesis, hence arguments over NHST are obsolete. Any statistical method capable of shedding light on the effects of experimental research should be embraced, particularly those focused on findings of clinical importance/significance. This inclusive approach is the best way forward for statistical practice whose purpose is achieving well tested knowledge.



Alvira, T., Clavell, L., & Melendo, T. 1991. Metaphysics. Manila: Sinag-Tala.

American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: Author

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997-1003. doi.org/10.1037/0003-066X.49.12.997

Cohen, J. (1995). The earth is round (p < .05): Rejoinder.  American Psychologist, 50(12), 1103.

Cumming, G.  (2012).  The new statistics: What we need for evidence-based practice. InPsych: The Bulletin of the Australian Psychological Society, 34(3), 20-21. Retrieved from https://www.psychology.org.au/inpsych/2012/june/cumming/

Cumming, G.  (2014).  The new statistics: Why and how. Psychological Science, 25(1), 7-29. doi.org/10.1177/0956797613504966

deLaplante, K. (2009). Modus tollens [Video file]. Retrieved from https://www.youtube.com/watch?v=fLlkSDb0UFk

Fisher, R. A. (1934). Statistical methods for research workers (5th ed). Edinburgh: Oliver and Boyd. Retrieved from http://www.haghish.com/resources/materials/Statistical_Methods_for_Research_Workers.pdf

Häggström, O. (in press). The need for nuance in the null hypothesis significance testing debate, Educational and Psychological Measurement. Retrieved from http://www.math.chalmers.se/~olleh/NHST-nuance-revision.pdf

McKenzie, S. (2013). Vital statistics: An introduction to health science statistics. Chatswood, NSW: Elsevier Australia.

Nickerson, R. S. (2000).  Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241-301. doi.org/10.1037/1082-989X.5.2.241

Perezgonzalez, J. D. (2015). Fisher, Neyman-Pearson or NHST? A tutorial for teaching data testing. Frontiers in Psychology, 6(223), 1-11. doi.org/10.3389/fpsyg.2015.00223      

Popper, K. (1962). Conjectures and refutations. The growth of scientific knowledge. New York: Basic Books.

Popper, K. (2005). The logic of scientific discovery (1st e-library ed). Retrieved from http://strangebeautiful.com/other-texts/popper-logic-scientific-discovery.pdf

Schmidt F.L., & Hunter, J. E. (1997). Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger. (Eds.), What if there were no significance tests? (pp. 37–64). Mahwah, NJ: Erlbaum.

Shenefelt, M., White, H. (2013).  If A, then B: How the world discovered logic. Columbia University Press.

Stanford Encyclopedia of Philosophy. (2014). Science and pseudo-science. Retrieved from http://plato.stanford.edu/entries/pseudo-science/

Task Force on Statistical Inference. (1996). Initial Report. Retrieved from http://www.apa.org/science/leadership/bsa/statistical/tfsi-initial-report.pdf

Trafimow, D., & Marks, M. (2015).  Editorial. Basic and applied social psychology, 37 (1), 1-2. doi.org/10.1080/01973533.2015.1012991

Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54(8), 594-604. doi.org/10.1037/0003-066X.54.8.594