Biomedical Literature Analysis Current State and Challenges. - mauriceling/mauriceling.github.io GitHub Wiki

Citation: Ling, MHT, Lefevre, Christophe, Nicholas, KR. 2009. Biomedical Literature Analysis: Current State and Challenges. In Internet Policies and Issues, Volume 7. Nova Science Publishers, Inc.

Link to [Abstract and References]

This manuscript reviews the central (information retrieval, information extraction and text mining) and allied (corpus collection, databases and system evaluation methods) domains of computational to present the current state of biomedical literature analysis for protein-protein and protein-gene interactions and challenges ahead - Firstly, biomedical text mining is highly dependent in PubMed (MedLine) as text repository but neither the implementation details nor performance is terms of precision and recall is known. Secondly, extraction of interactions depends on the recognition of entity (protein and gene) names in text and whether different names refers to the same protein remains an open problem. Thirdly, extraction of interactions by co-occurrence and NLP has been shown to be complementary suggesting the improvement of future systems in this direction. Fourthly, evidence suggests that generic NLP engines may be able to process text for interaction extractions due to complementary POS tag use in shallow parsing process but more extensive evaluations are needed. Fifthly, there is a shortage of suitable corpora for system evaluation resulting in difficulty in comparison (due to different corpus or databases used in evaluation) prompting the collection of a common set of corpora for communal use. Lastly, biomedical literature analysis tools must demonstrate real world applications without a steep learning curve before the slow adoption of these tools by biologists (the intended users) can be reversed.