Tuesday, March 06, 2012

How Much of the Neuroimaging Literature Should We Discard?

Guilty article in PNAS (2003).

Just a quick pointer to a pair of posts on how sub-optimally designed and analyzed fMRI studies can continue to influence the field. Professor Dorothy Bishop of Oxford University posted a scathing analysis of one such paper, in Time for neuroimaging (and PNAS) to clean up its act:
Temple et al (2003) published an fMRI study of 20 children with dyslexia who were scanned both before and after a computerised intervention (FastForword) designed to improve their language. The article in question was published in the Proceedings of the National Academy of Sciences, and at the time of writing has had 270 citations. I did a spot check of fifty of those citing articles to see if any had noted problems with the paper: only one of them did so.
Bishop noted at least four major problems in the paper that invalidated the conclusions, including:
  • The authors presented uncorrected whole brain activation data. This is not explicitly stated but can be deduced from the z-scores and p-values. Russell Poldrack, who happens to be one of the authors of this paper, has written eloquently on this subject: “…it is critical to employ accurate corrections for multiple tests, since a large number of voxels will generally be significant by chance if uncorrected statistics are used. .. The problem of multiple comparisons is well known but unfortunately many journals still allow publication of results based on uncorrected whole-brain statistics.” Conclusion 2 is based on uncorrected p-values and is not valid.

Indeed, Dr. Russ Poldrack of the University of Texas at Austin is a leader in neuroimaging methodology and a vocal critic of shoddy design and overblown interpretations. And yes, he was an author on the 2003 paper in question. Poldrack replied to Bishop in his own blog:
Skeletons in the closet

As someone who has thrown lots of stones in recent years, it's easy to forget that anyone who publishes enough will end up with some skeletons in their closet. I was reminded of that fact today, when Dorothy Bishop posted a detailed analysis of a paper that was published in 2003 on which I am a coauthor.
I'm not convinced that every prolific scientist has skeletons in his/her closet, but it was nice to see that Poldrack acknowledged this particular bag of bones:
Dorothy notes four major problems with the study:
  • There was no dyslexic control group; thus, we don't know whether any improvements over time were specific to the treatment, or would have occurred with a control treatment or even without any treatment.
  • The brain imaging data were thresholded using an uncorrected threshold.
  • One of the main conclusions (the "normalization" of activation following training") is not supported by the necessary interaction statistic, but rather by a visual comparison of maps.
  • The correlation between changes in language scores and activation was reported for only one of the many measures, and it appeared to have been driven by outliers.
Looking back at the paper, I see that Dorothy is absolutely right on each of these points. In defense of my coauthors, I would note that points 2-4 were basically standard practice in fMRI analysis 10 years ago (and still crop up fairly often today). Ironically, I raised two of of these issues in my recent paper for the special issue of Neuroimage celebrating the 20th anniversary of fMRI, in talking about the need for increased methodological rigor...

But are old school methods entirely to blame? I don't think so. It seems that some of these errors are errors in basic statistics. At any rate, I highly recommend that you read these two posts.

Many questions remain. How self-correcting is the field? What should we do with old (and not-so-old) articles that are fatally flawed? How many of these results have replicated, or failed to replicate? Should we put warning labels on the failures?

Professor Bishop also noted specific problems at PNAS, like the "contributed by" track allowing academy members to publish with little or no peer review. The "pre-arranged editor" track is another potential issue. I suggest a series of warning labels for such articles, such as the one shown below.

Subscribe to Post Comments [Atom]


At March 07, 2012 4:22 AM, Anonymous Michael said...

Very interesting!

The title shows the main problem in my view!

"Neural deficits in children with dyslexia ameliorated by behavioral remediation: Evidence from functional MRI"

Why would anyone need evidence from fMRI? If the intervention actually works, and dyslexia is cured, then the brain activity should be an after thought, not of primary interest. Changes in neural activity are interesting, of course, but the main goal of the intervention is to help these children with their reading and writing.

At March 07, 2012 9:26 AM, Anonymous fMRI Guy said...

Interesting post! I believe those loopholes at PNAS have been closed (except for direct contributions by academy members). And hats off to Russ Poldrack both for his crusading against voodoo neuroimaging and for for acknowledging his own complicity.

I do think the field is self-correcting. In many manuscripts that I've recently reviewed the authors are explicitly careful not to commit the error of non-independent analysis that got so much publicity with the "voodoo correlations" paper. Perhaps naming names like Vul et al. did is the only way to make a dent in faulty research practices.

And BTW, there's now a fair literature, including a systematic review, showing that Fast ForWord doesn't work.

At March 11, 2012 11:23 PM, Anonymous Anonymous said...

I don't think you should discard them just because they are flawed.

A historiographical view would be to show the state of thought at the time. People making mistakes is just as interesting to the historian as people getting it right.

The people in the industry will know which articles are flawed and avoid referencing them.

People outside the industry are already confused to the max anyway by crazy media stories so culling scholarly articles is not likely to help that situation.

At March 12, 2012 5:01 PM, Blogger The Neurocritic said...

Thanks for the comments!

Michael - You're right, the important thing is that the intervention actually works.

fMRI Guy - I do think the papers by Vul, Poldrack, Kriegeskorte, Nieuwenhuis and all their colleagues have drawn attention to a number of problems with fMRI data analysis. Some of these flaws aren't restricted to neuroimaging. The question is how to systematically flag or note those that aren't up to par or fail to replicate. Ivan Oranksy mentioned CrossMark in a recent post at Retraction Watch, as a possible system for tagging published articles as "corrected" or "retracted" (for instance), along with links to things like blog posts, replication attempts, additional data, etc.

Anonymous - In a later post, I clarified what I meant by discarding:

Does Discarding Mean Retraction?

By "discarding" I meant disregarding the results from flawed articles, not retracting them from the literature entirely.

One of the unfortunate problems with your statement, "The people in the industry will know which articles are flawed and avoid referencing them" is that sometimes researchers in the field do not know which ones to avoid. Prof. Bishop noted there were 270 citations of the PNAS (2003) article. Her spot check of 50 refs found that only one was critical.

At March 14, 2012 4:30 AM, Blogger Vanya said...

"Why would anyone need evidence from fMRI?"

Sometimes an intervention works for only a subset of the population; neuroimaging could tell us why there are such individual differences in treatment outcomes.

See for example: http://journals.lww.com/neuroreport/Abstract/1997/03030/Cingulate_function_in_depression__a_potential.48.aspx

At March 18, 2012 1:09 AM, Blogger The Neurocritic said...

Vanya - That old Mayberg et al. (1997) article is an interesting example because the ROIs weren't well justified and the whole brain analysis did not correct for multiple comparisons. However, a 2011 meta-analysis showed that "the relationship between resting rACC activity and treatment response is robust." So I guess this illustrates the point that some results using the old, less-than-ideal methods have held up.


Post a Comment

Links to this post:

Create a Link

<< Home

eXTReMe Tracker