A new paper in PLoS ONE (Lin et al., 2008) applied the methods of computational linguistics1 to analyze the database of abstracts presented at the 2001 to 2006 meetings of the Society for Neuroscience. The results provide an overview of the current state of the field.
Figure 8. (A) Visualization of topic map for all SFN meeting abstracts from 2001 to 2006. Abstracts assigned to different clusters appear in different colors (see legend). (B) Zooming in at the center of the topic map reveals more detailed clusters [click on the figure for a larger view].
Among the interesting findings:
- The majority of the authors (~60%) had only one abstract over the span of six years. This number may reflect a large group of “transients” comprising mostly undergraduates, graduate students, and perhaps post-docs who entered and exited the neuroscience field in a short period of time.
- The phenomena of a high transient rate, reflecting a sort of “infant mortality rate” for first time authors was first analyzed by Price , who estimated a 22% transient rate for paper authorship from a database consisting of a statistical sample of papers published between 1964 and 1970.
- A fundamental measure used in graph theory is the shortest path between a pair of connected vertices. In the context of the network under study, this measures the number of steps it takes to go from one author to another through intermediate collaborators. From the multi-year SFN database, the lengths of shortest paths between all pairs of authors for whom a connection exists were calculated exhaustively using a breadth-first search algorithm. These numbers were then averaged to yield the mean distance between authors in the entire network.
- Table 4 shows that the authors in the SFN community are separated from one another by an average distance of 6.09. A similar observation of “six degree of separation” has been reported previously for abstracts in the MEDLINE database ...
The authors used Latent Semantic Analysis to reduce the dimensionality of the topics covered in the abstract database.
- The reduced dimensionality vector space captures most of the important underlying structure in the association of terms and documents, while at the same time removing the noise or variability in word usage . In the reduced vector space, terms that occur in similar documents are located near one another even if they never co-occur in the same document, and topically related documents are grouped near one another based on their semantic relatedness.
Finally, changes in topic areas across time revealed the following trends:
- Among the 10 topic clusters, Cluster 9, which corresponds to visual and motor systems, is shown to have consistently increased in representation over the six year span. On the other hand, Cluster 2, which corresponds to cellular neuroscience, exhibits the most significant decrease in representation over the same period.
- These results suggest that there is a shift in general scientific interest from cellular-level work such as ion channel, synapse, and membrane physiology, towards more system level research incorporating such topics as vision, kinematics, motor processing, and imaging.
1 Oh, OK, here's the Wikipedia page for computational linguistics.
Lin JM, Bohland JW, Andrews P, Burns GA, Allen CB, Mitra PP, Bajic VB. (2008). An Analysis of the Abstracts Presented at the Annual Meetings of the Society for Neuroscience from 2001 to 2006. PLoS ONE, 3(4), e2052. DOI: 10.1371/journal.pone.0002052
Annual meeting abstracts published by scientific societies often contain rich arrays of information that can be computationally mined and distilled to elucidate the state and dynamics of the subject field. We extracted and processed abstract data from the Society for Neuroscience (SFN) annual meeting abstracts during the period 2001–2006 in order to gain an objective view of contemporary neuroscience. An important first step in the process was the application of data cleaning and disambiguation methods to construct a unified database, since the data were too noisy to be of full utility in the raw form initially available. Using natural language processing, text mining, and other data analysis techniques, we then examined the demographics and structure of the scientific collaboration network, the dynamics of the field over time, major research trends, and the structure of the sources of research funding. Some interesting findings include a high geographical concentration of neuroscience research in the north eastern United States, a surprisingly large transient population (66% of the authors appear in only one out of the six studied years), the central role played by the study of neurodegenerative disorders in the neuroscience community, and an apparent growth of behavioral/systems neuroscience with a corresponding shrinkage of cellular/molecular neuroscience over the six year period. The results from this work will prove useful for scientists, policy makers, and funding agencies seeking to gain a complete and unbiased picture of the community structure and body of knowledge encapsulated by a specific scientific domain.
Subscribe to Post Comments [Atom]