Low statistical power can significantly distort research results
Low statistical power in fMRI research can lead to highly distorted results on the relationship between brain and behaviour. In addition, a consequence of low statistical power in fMRI research is that it strongly reduces the chance of replication. This fact was demonstrated by UvA clinical psychologist Henk Cremers and fellow American researchers through a simulation and comparison with empirical data. Their findings have recently been published in the open-access journal PLOS ONE.
Low statistical power is a well-known problem, among other things in neuroimaging research. In research on the neural basis of behaviour, for example, large numbers of variables and relatively small numbers of observations often lead to low statistical power. ‘Statistical power is the probability that you will find an effect if that effect actually exists. The weaker the effect, the larger the sample you need to achieve sufficient statistical power,’ says Cremers. There is still a great deal of uncertainty surrounding the consequences for conclusions to be drawn from research with low statistical power. Cremers and his colleagues studied and explored to what extent low statistical power is relevant for the existing fMRI literature.
The researchers developed a simulation to illustrate the different correlations between brain and behaviour in a very large sample (illustrative of the entire population) as compared with random small samples (illustrative of a single study). Cremers: ‘In a relatively small study, you normally look at the results of brain activity in relation to specific behaviour, for example. You then make generalisations on this basis, conclusions which are more generally applicable. In our research, we used a different starting point: the hypothetical situation that you have fMRI data for an entire population. We looked at what happens if you conduct a hypothetical study using only a small sample of the total population, compared with what happens in reality.’
More specifically, the researchers looked at two possible effects (scenarios) which can be seen in the entire population as regards the correlation between brain and behaviour. The first scenario involves strong localised effects: a limited number of brain regions display a very strong correlation with a personality trait. By contrast, the second scenario involves weak diffuse effects: a significant number of brain regions display a very weak correlation with a personality trait. ‘The first scenario often appears to result from research. However, we believe that the second scenario is more likely in theory. We observed that when - in reality- the effects were weak and diffuse in the entire population, random samples indicate a strong localised effect and therefore gave rise to a very distorted picture as compared with the entire population,’ says Cremers. This type of result, which is based on a small sample, moreover appeared to be non-replicable.
The researchers further evaluated their findings using a similar analysis with data from the Human Connectome Project, one of the largest available databases of information on the brain and behaviour. This method also indicated a huge discrepancy: the same distortion occurred in analyses of a small sub-sample from the large sample. Whereas weak diffuse effects were found in reality, sub-samples indicated strong localised effects. The replication problem also occurred with the empirical data. In short, this research has consequences for the notions that we have about the neural basis of behaviour. It is not just one region of the brain that is involved in personality or psychopathology but rather many different interconnected regions, all of which make a small ‘contribution’.
Cremers makes a number of recommendations: ‘The most obvious and simplest thing is that it is important for a study to have a large number of participants. However, particularly in such as field as clinical psychology, this situation is not always possible.’ He also suggests using a number of other statistical analyses which are less susceptible to low statistical power. ‘For example, there are multivariate fMRI analyses (e.g. machine learning and, in some applications, also network analyses), which have been developed by Steven Scholte and Lourens Waldorp from the UvA among others.’ In follow-up research, Cremers plans to focus on applications of these multivariate methods. Moreover, there are important developments such as automated meta-analyses of fMRI data, which can help researchers to interpret aspects such as the specificity of findings compared to all published fMRI literature.
Henk R. Cremers, Tor D. Wager and Tal Yarkoni: ‘The relation between statistical power and inference in fMRI’, in: PLOS ONE (20 November 2017).