Hypnosis and brain injury: where to find stuff

This post is a continually updated list of important communications on our paper in Brain entitled "Improving working memory performance in brain-injured patients using hypnotic suggestion." and my comments to each of them.

Reports

  • The scientific article in Brain: Argues that targeted hypnotic suggestion improves working memory for people with acquired brain injury.
  • Oxford University Blog: A summary for the general public. It includes a back-of-the-envelope estimation of how much you should update your belief based on this study alone.
  • An anti-hype blog post: Anticipates some potential overinterpretations by the general public (particularly journalists, therapists, and patients) and counter them.

Data and materials

  • Supplementary materials: Supports the main article with a more detailed presentation of data, methods, and stats. It took a long time to write the supplementary (15 pages), so I really hope that you read it!
  • Data and analysis at OSF: Contains all the data and analyses used in the Brain article and it's supplementary materials. We encourage you to scrutinize this, suggest improvements, as well as carry out alternative analyses.
  • We plan to make the hypnosis scripts available to everybody eventually. On the other hand, we want to avoid putting it "out in the wild" if it somehow turns out to be a fluke. So for now, we pursue a strategy where we only make non-published details about the intervention and the hypnosis scripts available to institutions/companies who can and will evaluate the effectiveness of the scripts systematically. If you're such an institution, we'd be eager to set up a collaboration. For example, Tryg Foundation just funded an RCT at Jobkompagniet Silkeborg which will include 120 patients during the next 2-4 years. The design and outcome measures simultaneously satisfy research purposes and Jobkompagniet's core mission. We expect to stick to this institution-and-evaluation-only strategy for 2-4 years until enough evidence has accumulated that we feel confident about the clinical effectiveness of hypnosis following acquired brain injury. If you're a private therapist, if you have suffered an acquired brain injury, or if you're a relative, we will ask you to wait until we have this evidence.

Press

  • Spot on Danish national television, including a case with a 19-year old man who had sustained a traumatic brain injury in a high-speed car crash two years before enrollment in the experiment. The spot is accompanied by two articles and a radio spot (also Danish). Our impression is that this case is in the top 30% on real-life improvement, i.e. not too unrepresentative.

.

What our paper on hypnosis following brain injury does not show

We just published a paper in Brain entitled “Improving working memory performance in brain-injured patients using hypnotic suggestion.” We argue that for patients with acquired brain injury, hypnotic suggestion which asks the patient to recover an ability to concentrate and remember improves working memory performance. I also reported the study and the results at the Oxford University Press blog, including an estimate of how skeptical you should be. We expect quite some publicity - and perhaps even hype - around the results of this paper.

Hype and overinterpretation work for advertising but not for science. In this post, I will try to counter some potential misunderstandings. We are incredibly excited about our findings. We just want to make sure that the public is excited for the right reasons and at the right level. As I argued on the OUP blog post, the appropriate state of belief is probably that of a “pessimist turned hopeful” who awaits further data to decide whether to leave the pessimistic stance or keep it. Given that brain injury constitutes the second- and third-largest health-related cost in the world, even a small hope is quite substantial.

Before I continue, I want to highlight that this research was a joint effort by Professor Morten Overgaard, hypnotist Rikke Overgaard, and me. The project was funded by the European Research Council and the Karen Elise Jensen Foundation and organized under Cognitive Neuroscience Research Unit at Center for Functionally Integrative Neuroscience and Aalborg University. The following reflect the views of all authors.

This study does not show that subjects were “cured”

The participants in our study improved to the population average in measures of working memory, which is an important but not all-encompassing cognitive function. We did not assess other domains such as psychological trauma, physical impairment, etc. because we did not target the hypnosis at these domains and therefore it was not part of the research question. Therefore, we cannot make any data-based claims about the effect of hypnosis on those domains.

Hypnosis does not involve anything “mystic” or otherwise “out of this world”

On the gray market of hypnotists, it is all too common to see the same persons offering regression to previous lives, contact with the dead, healing auras, etc. As a scientist, I reject this. In our experiment, the hypnotist sat in a chair and read aloud from a piece of paper. The patient sat in another chair and listened. There’s only a vibrating vocal chord, sound moving through the air, hitting the eardrums of the patient, and then sensory transduction to action potentials in the brain. From there, we strongly expect that the mechanism underlying the effect will eventually be identified among the on-shelf phenomena in psychology and neuroscience, e.g. neural plasticity working bottom-up, unlearning learned non-use, using a different strategy/realization of the same surface behavior, etc. The only novelness of our findings, if they replicate, would then be that these mechanisms apply to brain injury with a large effect size. There’s a long and serious research history on hypnosis in the top neuroscience journals by top researchers. See e.g. this 2013-paper in Nature Reviews Neuroscience by Oakley & Halligan. This is the line of research we are continuing, and it is very different from the current public perception of what hypnosis is.

The cognitive consequences following acquired brain injury are not unreal or hypocrisy

Some may say “if it’s that easy to recover, then it wasn’t a real problem” and perhaps even put the patient at fault. I hope you would agree with me that such a claim would be straight up ridiculous. First, such a claim would entail a psychology-only understanding of brain injury, and since we do now know the mechanism of why hypnotic suggestion works, this remains an open question. Second, even if it turns out that the cognitive impairment following acquired brain injury somehow can be described in purely psychological terms, this does not change the seriousness of the condition. PTSD is not less severe or hypocrisy because it has a psychogenic origin.

No direct evidence of clinical effectiveness (yet)

Neuropsychological test have surprisingly low predictive value for the ability to participate in society, e.g. returning to work, living independently, etc. which is often the primary criterion of clinical success in rehabilitation (sorry, psychologists!) We did not assess the latter directly, but we plan to this in an in-progress follow-up experiment. However, we do have anecdotal evidence that targeted hypnotic suggestion could be effective for participation. Many of the patients returned to us with reports of getting jobs, beginning or finishing education, not getting exhausted from grocery shopping and family parties, etc. This effect was immediate for some and slower for others. I recall one case where the patient only realized that there had been an improvement when the spouse pointed to very tangible evidence such as no more bruises from inattentively walking into stuff. Similarly, an independent psychologist was inspired by our findings and used hypnosis on two brain injured clients with excellent clinical results. But these reports are prone to selection bias (only the positive cases report back) and neglecting base-rate improvements (some would have improved anyway). We need direct evidence on clinical effectiveness, and we don’t have that now.

No direct evidence that hypnosis drives the effect (yet)

In casual talk, it’ll be easy to say that “hypnosis improved working memory.” We even do this ourselves. However, it does not follow directly from our design since both the targeted suggestion procedure and our non-targeted (intended as placebo) suggestion procedure was administered under hypnosis. We did not deliver the same treatment in a non-hypnotic setting (the so-called London-Fuhrer design). So it remains possible that hypnosis is not necessary. From the empirical literature, however, it is generally found that hypnosis boosts the effect of suggestion (though it’s disputed why; some of it is just that the Gandhi-Oakley finding that the word “hypnosis” increases suggestibility in and of itself). My best guess is that hypnosis adds considerably to the effect given that, if not, the effect of suggestion on working memory performance would have been discovered a long time ago in studies of psychotherapeutic treatment following acquired brain injury.

.

\sqrt{2} is superior to Bessel's correction

... for the estimation of the population standard deviation. That is, if you substitute n - 1 with n - \sqrt{2}, you get a much less biased estimate of the population SD. I just stumbled upon this when I desperately (!) looked to frequentist methods because of convergence problems with my bayesian model. The details can be seen in my post on Cross Validated about this curious finding as well as some excellent answers/elaborations.

I'm re-posting the simulation results here, just because they're pretty and I want some content on this blog.

TL: no correction. TR: Bessel's correction. BL: \sqrt{2} correction (superior). BR: 1.5. Red: real population sd. Blue: estimated sd (corrected). Gray lines indicate different sample size with 3000 simulations per sample size. Click on the image to view the script.

Naturally, \sqrt{2} doesn't outperform the analytically correct solution but it is a surprisingly good approximation. For larger sample sizes (n > 10), n - 1.5 becomes superior. In any case, it's a nice reminder that Bessel's correction does NOT magically make the sample SD into a population SD as is currently taught in many statistics classes.

I'm considering adding this fun-fact to the Wikipedia page on unbiased estimation of the standard deviation, where you can read more about the issue of bias in frequentist estimation.

.

This PNAS paper could be the go-to example of how not to interpret statistics

A few days ago, a science reporter asked me to evaluate a PNAS paper titled Gender differences in the structural connectome of the human brain. As it turns out, this paper is awful. On the positive side, it's an excellent opportunity to highlight common mistakes in psychology/neuroscience and how to do it (more) properly.

1. Minute effect sizes do not allow for bold generalizations

From abstract:

In all supratentorial regions, males had greater within-hemispheric connectivity, as well as enhanced modularity and transitivity, whereas between-hemispheric connectivity and cross-module participation predominates in females.

The paper generalizes it's significant findings of structural brain differences between males and females to ALL males and females. Let's evaluate how wrong you'd be to take the author's conclusion for granted (using Common Language effect size): If you guess that a randomly picked female has got more between-hemispheric connections than a randomly picked male, you'd be wrong more than 41 % of the time. And for their second major conclusion: If you do the same tests, guessing male superiority with respect to within-hemispheric connections, you'd be wrong at least 37 % of the time. It is outright frightening to think of the possibility that policy makers could be influenced by this paper!

2. Don't do reverse inference when you can do forward inference

From abstract:

“Overall, the results suggest that male brains are structured to facilitate connectivity between perception and coordinated action, whereas female brains are designed to facilitate communication between analytical and intuitive processing modes”.

The authors say that the (minute) structural differences explain behavioral differences between genders. This is reverse inference which can be very uninformative when applied to macrostructures such as lobes or hemispheres (read: small effect size). Therefore the same critique applies as above.

It is particularly strange that the authors did collect gender-relevant behavioral data on all subjects (see page 4, right column) but they do not statistically test them against the structural connectivity of said subjects. This is more than obvious to do if they wanted to claim that the structural differences underlie behavioral gender differences, and it could shut down critics like me if it showed a convincing relationship. The fact that they didn't explicitly test to what extent the relationship between individual's connectivity and behavior is modulated by gender makes me worry that it does not.

3. Interaction is required to claim developmental differences

From abstract:

“Analysis of these changes developmentally demonstrated differences in trajectory between males and females, mainly in adolescence and adulthood”.

… but the age x sex interaction is non-significant (page 3, right column). Thus it is invalid to conclude that sexes develop differently as a major part of this difference might be present at the outset. This is an all too common error in neuroscience.

4. Keep conclusions close to what's actually tested

From abstract:

between-hemispheric connectivity and cross-module participation predominates in females.

and from significance-section:

”female brains are optimized for interhemispheric connections.

... yet they only statistical test it on connections between frontal lobes. It's a quite violent generalization to equate a frontal lobe with a hemisphere. Figure 2 clearly indicates that almost only connections between frontal lobes are significant in females. E.g. there are no parietal-parietal or temporal-frontal connections. It seems very post-hoc (only report significant findings), and it's certainly not a valid generalization.

Conclusion

If we remove the invalid parts of the abstract (and the study motivation), here's what's left:

“In this work, we modeled the structural connectome using diffusion tensor imaging in a sample of 949 youths (aged 8–22 y, 428 males and 521 females). Connection-wise statistical analysis, as well as analysis of regional and global network measures, presented a comprehensive description of network characteristics.”

… which is just a description of what they did without any interpretations. This is exactly the part that I like about the paper. The design is good, the data set is very impressive, and the modeling of connections between regions seems valid. I'd love to see these data in the hands of other researchers.

The fact that this paper got published in its current form is frankly discouraging for PNAS, for peer-review and the reputation of neuroscience. Let's hope that cases like this only generalizes to the same extent as the conclusions of this paper do.

.

Renaming Brain Vision Recorder EEG recordings

Renaiming Brain Vision Recorder files is a pain. Here's a python script that makes it (more of) a breeze.

The problem

Brain Vision Recorder is a widely used software for EEG-recordings. Every EEG-recording is saved as three files:

  1. One .eeg file, containing all the data
  2. One .vmrk/.amrk/.bmrk file, containing markers and a reference to the .eeg file
  3. One .vhdr/.ahdr/.bhdr file, containing header information and references to the marker file and data file. This is the file that represents the recording and opened by analysis software

Because of these internal references between the files, it is quite laborious (and risky) to rename a recording. I often rename my recordings before analysis. I use the Python script below for this purpose.

The script

How to modify the script

The important part is just below the "#Generate new name". You can use this in two ways. First, you can rename your files in the browser and type in

This will simply update the header information in the header- and marker-files. Secondly, you can use all sorts of generative filename generation. E.g. this takes a subject ID from the middle of the filename and adds it to the front after a paradigm-name:

I hope this helps someone out there.

Calculating d', beta, c and Ad' in Python and PHP

About d'

A central component of Signal Detection Theory is d' - a measure of the ability to discriminate a signal from noise. The d' is flanked by the parameters "beta" and c, which are measures of the criterion that the observer uses to discriminate between the two.

These measures can be calculated in every experiment where there is a signal (e.g. target trials) and noise (e.g. nontarget trials), and the observer (e.g. subject) is to indicate whether the signal was present or not. d', beta and c are statistical measures in a model where noise and noise+signal are regarded as two "probability-of-detection" distributions on a "threshold-of-detection" continuum. d' is basically a Z-score on how well the observer discriminates the two distributions, i.e. the number of standard deviations between the probability-of-response distributions for signal and noise for this given subject. c (the criterion) is the number of standard deviations from the midpoint between these two distributions, i.e. a measure on a continuum from "conservative" to "liberal".

Every trial is scored as one of four possibilities:

  • noise + signal + response: HIT
  • noise + signal + no response: MISS
  • noise + response: FALSE ALARM (FA)
  • noise + no response: CORRECT REJECTION (CR)

It is often analyzed in terms of rates:

  • HIT RATE = HIT/SIGNALS (ratio of signals detected as signals)
  • FA RATE = FA/NON-SIGNALS (ratio of noise detected as signals)

You want to have a high hit rate and a low fa rate (hitrate > farate). Say you have conducted an experiment on a subject and calculated hitrate and farate. A signal-detection account of the subjects signal-detection-abilities can be one of the following:

  • If hitrate=1 and farate=0, d' is infinite, meaning that you signals and noise are totally separated, i.e. the distributions are non-overlapping. In that case, you have effectively discriminated signal from noise.
  • If hitrate > farate, d' is positive, meaning that you are more likely to detect the information as signal when signal is present than when it is absent.
  • If hitrate=farate, d'=0, meaning that you are just as likely to detect signals and noise as signals, i.e. you were unable to discriminate the signal and noise and hence the probability-of-detection distributions overlap perfectly. This occurs when responses are perfectly random.
  • If hitrate < farate, d' is negative, meaning that you have a wierd bias of primarily detecting noise as signals while ignoring the actual signals.
  • If hitrate = 0 and farate = 1, d' is negatively infinite, meaning that you have completely confused noise and signal.

For a a more thorough introduction and guide to calculating d'/beta, "Calculation of signal detection theory measures" by Stanislaw & Todorov.

Calculating d', beta, c and Ad' in Python

Using the Percentile Point Function (ppf) from scipy:

Note the adjustment of rate=0 and rate=1, to prevent infinite values.

Calculating d', beta, c and Ad' in PHP

Note that a rate-threshold of 0.025 is used because the PHP-ppf function above is only an estimation which fails violently if this threshold is exceeded.

The case for PsychoPy

Most researchers in cognitive neuroscience have to program an experiment in a stimulus-delivery environment at some time in their career. Everyone have heard about Presentation and E-prime but much fewer have heard about PsychoPy - a free cross-platform stimulus delivery environment.

PsychoPy logo

Advantages over Presentation and E-Prime:

  1. Free and open source, meaning no license problems!
  2. Cross-platform. You can develop on Windows, Mac or Linux and run your experiment on Windows, Mac and Linux!
  3. Limitless functionality. Psychopy is a Python module, which means that it can be heavily extended with other Python modules for e.g. Matlab-level matrix operations (numpy), image and audio manipulations, advanced statistics etc.
  4. Flexible stimulus-handling: "Online" stimulus-generation, e.g. random-noise masks or moving Gaussian gratings. Also, you can define stimuli-sizes in degrees relative to the subject's eyes, rather than having to do the trigonometry yourself.

Timing:

It compares to E-Prime and Presentation on timing. I've tested it on our stimulus-facilities using a photosensitive diode and comparing the timing of stimuli-appearance on the monitor to the triggers sent on a parallel port. It is reliably below 0,1 ms.

In PsychoPy you can (optionally) define durations of stimuli in monitor-frames rather than milliseconds, which ensures that timing is very well-controlled.

Disadvantages:

  1. PsychoPy's timing of audio stimuli is still not perfect. I've found a ~10ms jitter in audio onset (I've seen much worse in some C-compiled experiments...). If you need very precise audio timing, go for something else.
  2. The serious experimenter still needs to script all of his experiment. Although a very nice graphical interface is in development, it is still not thoroughly tested and tried. In my oppinion programming in Python is much cleaner and easier than E-Prime and Presentation.

Getting started:

Good introduction to Python syntax: Learn Python in 10 minutes
Download PsychoPy at PsychoPy.org and read documentation here.

Computing half-area latency in matlab/EEGLAB

In EEG research the latency of a component (hidden in the ERP) is conventionally measured as the time when the ERP is most extreme. However, this method is not well warranted, as should be clear to anyone who have read Steven Lucks excellent chapters on ERP analysis in "An Introduction to the Event Related Potential Technique". Instead he proposes a 50% area latency method: the latency is the point in time where the area under the curve is equal on both sides. It yields the same latencies as peak-measures on symmetrical ERPs and more correct latencies on non-symmetrical ERPs, where peak-measures fails.

A first solution

If you have your ERP as an array in MATLAB, you can compute the 50% area in this way:

To use it with EEGLAB (assuming you have your epoched EEG structure) you might implement it in this way:

How it works

Let's have a look at how this works. Remember that we want to find the sample where the area is equal on both sides. It consists of the following steps:

  1. Transform the erp array into an array of cumulative sums, i.e. every element is the sum of all previous elements. E.g. the array erp=[1,3,4,0] becomes erp_cumsum = [1,4,8,8]. (try to imagine this ERP to ease your understanding). Each element then corresponds to the area that precede it in the ERP. The last element corresponds the total area of the ERP.
  2. Now the job is simply to find the midmost value in erp_cumsum, i.e. the value that separates the area into two equal parts. To do this, we subtract the half area from every element in erp_cumsum. Continuing the example from above, sum(erp)/2 = 4. The subtraction giveserp_reduced = erp_cumsum-sum(erp)/2 = [-3,0,5,5].
  3. The midmost value in erp_cumsum is now the element closest to zero (we can already see that this is sample number 2). Finding it in matlab is done by taking the absolute value, which gives erp_abs = abs(erp_reduced) = [3,0,5,5], and finding the index of the minimum value. This is simply [~,index] = min(erp_abs).

Correction: defining the direction of the component

This method is not fool-proof but things can be done to prevent some errors. Consider a positive ERP (e.g. P300) that starts and ends at negative values, .e.g erp = [-3,0,5,2,-1,-4]. This is a very nice peak and visually you would consider it all to be part of the P300. However, using cumulative sums, the start and end would be "negative area", which doesn't make sense when we're trying to find a latency.

The solution starts off by defining whether you're looking for the latency of a positive or a negative peak. Then you adjust your ERP to only contain positive or negative values by raising or lowering the whole ERP so that all elements have the same sign. In this example, we're looking for a positive component and then we raise the ERP by 4. If we were looking for a negative component, we'd lower the ERP by 5. So:

Extending it to multiple ERPs

Say you're dealing with multiple ERPs and want to calculate 50% latencies on all of them. You want to put in e.g. 50 ERPs and get 50 latencies as output. This is practical if you want to calculate latencies on individual trials to investigate variability or on multiple subjects at the same time. This is where Matlab's excellent matrix operations get to work. I don't want to go into details on this one. I'm just going to give you the code for positive-going components in single-trial ERPs: