Can a Computer Algorithm Identify Suicidal People from Brain Scans? The Answer Won't Surprise You
Death by suicide is a preventable tragedy if the suicidal individual is identified and receives appropriate treatment. Unfortunately, some suicidal individuals do not signal their intent, and others do not receive essential assistance. Youths with severe suicidal ideation are not taken seriously in many cases, and thus are not admitted to emergency rooms. A common scenario is that resources are scarce, the ER is backed up, and a cursory clinical assessment will determine who is admitted and who will be triaged. From a practical standpoint, using fMRI to determine suicide risk is a non-starter.
Yet here we are, with media coverage blaring that an Algorithm can identify suicidal people using brain scans and Brain Patterns May Predict People At Risk Of Suicide. These media pieces herald a new study claiming that fMRI can predict suicidal ideation with 91% accuracy (Just et al. 2017). The authors applied a complex algorithm (machine learning) to analyze brain scans obtained using a highly specialized protocol to examine semantic and emotional responses to life and death concepts.
Let me unpack that a bit. The scans of 17 young adults with suicidal ideation (thoughts about suicide) were compared to those from another 17 participants without suicidal ideation. A computer algorithm (Gaussian Naive Bayes) was trained on the neural responses to death-related and suicide-related words, and correctly classified 15 out of 17 suicidal ideators (88% sensitivity) and 16 out of 17 controls (94% specificity). Are these results too good to be true? Yes, probably. And yet they're not good enough, because two at-risk individuals were not picked up.
The computational methods used to classify the suicidal vs. control groups are suspect, according to many machine learning experts on social media. One problem is known as “overfitting” — using too many parameters taken from small populations that may not generalize to unique samples. The key metric is whether the algorithm will be able to classify individuals from independent, out-of-sample populations. And we don't know that for sure. Another problem is that the leave-one-out cross validation is problematic. I'm not an expert here, so the Twitter threads that start below (and here) are your best bet.
ML re suicide, 90% correct, 2 groups of 17. Shiny journal. Anyone see any problems ? https://t.co/mgQ8tW6s5w @tyrell_turing— KordingLab (@KordingLab) October 31, 2017
For the rest of this post, I'll raise other issues about this study that concerned me.
Why use an expensive technology in the first place?
The rationale for this included some questionable statements.
- ...predictions by both clinicians and patients of future suicide risk have been shown to be relatively poor predictors of future suicide attempt2,3.
...the implicit association of death/suicide with self was associated with an approximately 6-fold increase in the odds of making a suicide attempt in the next 6 months, exceeding the predictive validity of known risk factors (e.g., depression, suicide-attempt history) and both patients’ and clinicians’ predictions.But let's go ahead with an fMRI study that will be far more accurate than a short and easy-to-administer computerized test!
- Nearly 80% of patients who die by suicide deny suicidal ideation in their last contact with a mental healthcare professional4.
How do you measure the neural correlates of suicidal thoughts?
This is a tough one, but the authors propose to uncover the neural signatures of specific concepts, as well as the emotions they evoke:
...the neural signature of the test concepts was treated as a decomposable biomarker of thought processes that can be used to pinpoint particular components of the alteration [in participants with suicidal ideation]. This decomposition attempts to specify a particular component of the neural signature that is altered, namely, the emotional component...
How do you choose which concepts and emotions to measure?
The “concepts” were words from three different categories (although the designation of Suicide vs. Negative seems arbitrary for some of the stimuli). The set of 30 words was presented six times, with each word shown for three seconds followed by a four second blank screen. Subjects were “asked to actively think about the concepts ... while they were displayed, thinking about their main properties (and filling in details that come to mind) and attempting consistency across presentations.”
The “emotion signatures” were derived from a prior study (Kassam et al., 2013) that asked method actors to self-induce nine emotional states (anger, disgust, envy, fear, happiness, lust, pride, sadness, and shame). The emotional states selected for the present study were anger, pride, sadness, and shame (all chosen post hoc). Should we expect emotion signatures that are self-induced by actors to be the same as emotion signatures that are evoked by words? Should we expect a universal emotional response to Comfort or Evil or Apathy?
Six words (death, carefree, good, cruelty, praise, and trouble — in descending order) and five brain regions (left superior medial frontal, medial frontal/anterior cingulate, right middle temporal, left inferior parietal, and left inferior frontal) from a whole-brain analysis (that excluded bilateral occipital lobes for some reason) provided the most accurate discrimination between the two groups. Why these specific words and voxels? Twenty-five voxels, specifically. It doesn't matter.
The neural representation of each concept, as used by the classifier, consisted of the mean activation level of the five most stable voxels in each of the five most discriminating locations....and...
All of these regions, especially the left superior medial frontal area and medial frontal/anterior cingulate, have repeatedly been strongly associated with self-referential thought......and...
...the concept of ‘death’ evoked more shame, whereas the concept of ‘trouble’ evoked more sadness in the suicidal ideator group. ‘Trouble’ also evoked less anger in the suicidal ideator group than in the control group. The positive concept ‘carefree’ evoked less pride in the suicidal ideator group. This pattern of differences in emotional response suggests that the altered perspective in suicidal ideation may reflect a resigned acceptance of a current or future negative state of affairs, manifested by listlessness, defeat and a degree of anhedonia (less pride evoked in the concept of ‘carefree’) [why not less pride to 'praise' or 'superior'? who knows...]
Not that this involves circularity or reverse inference or HARKing or anything...
How can a method that excludes data from 55% of the target participants be useful??
This one seems like a showstopper. A total of 38 suicidal participants were scanned, but those who did not show the desired semantic effects were excluded due to “poor data quality”:
The neurosemantic analyses ... are based on 34 participants, 17 participants per group whose fMRI data quality was sufficient for accurate (normalized rank accuracy > 0.6) identification of the 30 individual concepts from their fMRI signatures. The selection of participants included in the primary analyses was based only on the technical quality of the fMRI data. The data quality was assessed in terms of the ability of a classifier to identify which of the 30 individual concepts they were thinking about with a rank accuracy of at least 0.6, based on the neural signatures evoked by the concepts. The participants who met this criterion also showed less head motion (t(77) = 2.73, P < 0.01). The criterion was not based on group discriminability.
This logic seems circular to me, despite the claim that inclusion wasn't based on group classification accuracy. Seriously, if you throw out over half of your subjects, how can your method ever be useful? Nonetheless, the 21 “poor data quality” ideators with excessive head motion and bad semantic signatures were used in an out-of-sample analysis that also revealed relatively high classification accuracy (87%) compared to the data from the same 17 “good” controls (the data from 24 “bad” controls were excluded, apparently).
We attribute the suboptimal fMRI data quality (inaccurate concept identification from its neural signature) of the excluded participants to some combination of excessive head motion and an inability to sustain attention to the task of repeatedly thinking about each stimulus concept for 3 s over a 30-min testing period.
Furthermore, another classifier was even more accurate (94%) in discriminating between suicidal ideators who had made a suicide attempt (n=9) from those who had not (n=8), although the out-of-sample accuracy for the excluded 21 was only 61%. Perhaps I'm misunderstanding something here, but I'm puzzled...
I commend the authors for studying a neglected clinical group, but wish they were more rigorous, didn't overinterpret their results, and didn't overhype the miracle of machine learning.
Crisis Text Line [741741 in the US] uses machine learning to prioritize their call load based on word usage and emojis. There is a great variety of intersectional risk factors that may lead someone to death by suicide. At present, no method can capture the full scope of diversity of who will cross the line.
If you are feeling suicidal or know someone who might be, here is a link to a directory of online and mobile suicide help services.
Footnote
1 I won't discuss the problematic nature of the IAT here.
References
Just MA, Pan L, Cherkassky VL, McMakin DL, Cha c, Nock MK, & Brent D (2017). Machine learning of neural representations of suicide and emotion concepts identifies suicidal youth. Nature Human Behaviour. Published online: 30 October 2017
Kassam KS, Markey AR, Cherkassky VL, Loewenstein G, Just MA. (2013). Identifying Emotions on the Basis of Neural Activation. PLoS One. 8(6):e66032.
Nock MK, Park JM, Finn CT, Deliberto TL, Dour HJ, Banaji MR. (2010). Measuring the suicidal mind: implicit cognition predicts suicidal behavior. Psychol Sci. 21(4):511-7.
Subscribe to Post Comments [Atom]