This PNAS paper could be the go-to example of how not to interpret statistics

A few days ago, a science reporter asked me to evaluate a PNAS paper titled Gender differences in the structural connectome of the human brain. As it turns out, this paper is awful. On the positive side, it’s an excellent opportunity to highlight common mistakes in psychology/neuroscience and how to do it (more) properly.

1. Minute effect sizes do not allow for bold generalizations

From abstract:

In all supratentorial regions, males had greater within-hemispheric connectivity, as well as enhanced modularity and transitivity, whereas between-hemispheric connectivity and cross-module participation predominates in females.

The paper generalizes it’s significant findings of structural brain differences between males and females to ALL males and females. Let’s evaluate how wrong you’d be to take the author’s conclusion for granted (using Common Language effect size): If you guess that a randomly picked female has got more between-hemispheric connections than a randomly picked male, you’d be wrong more than 41 % of the time. And for their second major conclusion: If you do the same tests, guessing male superiority with respect to within-hemispheric connections, you’d be wrong at least 37 % of the time. It is outright frightening to think of the possibility that policy makers could be influenced by this paper!

2. Don’t do reverse inference when you can do forward inference

From abstract:

“Overall, the results suggest that male brains are structured to facilitate connectivity between perception and coordinated action, whereas female brains are designed to facilitate communication between analytical and intuitive processing modes”.

The authors say that the (minute) structural differences explain behavioral differences between genders. This is reverse inference which can be very uninformative when applied to macrostructures such as lobes or hemispheres (read: small effect size). Therefore the same critique applies as above.

It is particularly strange that the authors did collect gender-relevant behavioral data on all subjects (see page 4, right column) but they do not statistically test them against the structural connectivity of said subjects. This is more than obvious to do if they wanted to claim that the structural differences underlie behavioral gender differences, and it could shut down critics like me if it showed a convincing relationship. The fact that they didn’t explicitly test to what extent the relationship between individual’s connectivity and behavior is modulated by gender makes me worry that it does not.

3. Interaction is required to claim developmental differences

From abstract:

“Analysis of these changes developmentally demonstrated differences in trajectory between males and females, mainly in adolescence and adulthood”.

… but the age x sex interaction is non-significant (page 3, right column). Thus it is invalid to conclude that sexes develop differently as a major part of this difference might be present at the outset. This is an all too common error in neuroscience.

4. Keep conclusions close to what’s actually tested

From abstract:

between-hemispheric connectivity and cross-module participation predominates in females.

and from significance-section:

”female brains are optimized for interhemispheric connections.

… yet they only statistical test it on connections between frontal lobes. It’s a quite violent generalization to equate a frontal lobe with a hemisphere. Figure 2 clearly indicates that almost only connections between frontal lobes are significant in females. E.g. there are no parietal-parietal or temporal-frontal connections. It seems very post-hoc (only report significant findings), and it’s certainly not a valid generalization.


If we remove the invalid parts of the abstract (and the study motivation), here’s what’s left:

“In this work, we modeled the structural connectome using diffusion tensor imaging in a sample of 949 youths (aged 8–22 y, 428 males and 521 females). Connection-wise statistical analysis, as well as analysis of regional and global network measures, presented a comprehensive description of network characteristics.”

… which is just a description of what they did without any interpretations. This is exactly the part that I like about the paper. The design is good, the data set is very impressive, and the modeling of connections between regions seems valid. I’d love to see these data in the hands of other researchers.

The fact that this paper got published in its current form is frankly discouraging for PNAS, for peer-review and the reputation of neuroscience. Let’s hope that cases like this only generalizes to the same extent as the conclusions of this paper do.



    1. admin

      Thanks! Yes, I’ve seen two other PNAS papers doing something like the paper you commented on. Dividing people into groups post-hoc is often problematic. Circular reasoning, regression towards the mean etc. kicks in.

      There’s lots of great PNAS papers too, luckily.

    2. Spot on analysis. I storified some of the commentary on the paper and I’ll add a link to your post: see Sad thing is I don’t think the paper is awful, in the sense that the data seem valuable. Although I’m not skilled in DTI analysis, my impression is that the method they used was novel, gave some intriguing results, and suggested a different way of thinking about connectivity in terms of relative strength of intra vs interhemispheric connections. As you point out, though, the interpretation went way beyond what was found, and fed into naively stereotyped accounts in the media, which seem to have been encouraged by comments from the authors.
      Main point I want to make to both you and Rhodri is that I think those of us into post-publication peer review need to start making more use of public channels for this: I’m aware of PubPeer (which I used to comment on this paper) and PubMed Commons. I think your and Rhodri’s commentaries would be well worth putting out in one of these places.

      1. Thanks a lot! PubPeer looks excellent. I’ll look into it. I would certainly love if someone would do post-publication reviews of my stuff (when it gets published). It should make us all wiser.

        I completely agree that the data seems very valuable. That’s also what I’m trying to point out in the end of the post: “The design is good, the data set is very impressive and the modeling of connections between regions seems valid. I’d love to see these data in the hands of other researchers.”

        Definitely seems too good to throw away just because the interpretations are sampled from stereotypics and not from the data.

Leave a Reply

Your email address will not be published. Required fields are marked *