Crossref journal-article
Oxford University Press (OUP)
Bioinformatics (286)
Abstract

Abstract Motivation: We have previously developed a rule-based approach for extracting information on the regulation of gene expression in yeast. The biomedical literature, however, contains information on several other equally important regulatory mechanisms, in particular phosphorylation, which we now expanded for our rule-based system also to extract. Results: This paper presents new results for extraction of relational information from biomedical text. We have improved our system, STRING-IE, to capture both new types of linguistic constructs as well as new types of biological information [i.e. (de-)phosphorylation]. The precision remains stable with a slight increase in recall. From almost one million PubMed abstracts related to four model organisms, we manage to extract regulatory networks and binary phosphorylations comprising 3319 relation chunks. The accuracy is 83–90% and 86–95% for gene expression and (de-)phosphorylation relations, respectively. To achieve this, we made use of an organism-specific resource of gene/protein names considerably larger than those used in most other biology related information extraction approaches. These names were included in the lexicon when retraining the part-of-speech (POS) tagger on the GENIA corpus. For the domain in question, an accuracy of 96.4% was attained on POS tags. It should be noted that the rules were developed for yeast and successfully applied to both abstracts and full-text articles related to other organisms with comparable accuracy. Availability: The revised GENIA corpus, the POS tagger, the extraction rules and the full sets of extracted relations are available from Contact:  saric@eml-r.org

Bibliography

Šarić, J., Jensen, L. J., Ouzounova, R., Rojas, I., & Bork, P. (2005). Extraction of regulatory gene/protein networks from Medline. Bioinformatics, 22(6), 645–650.

Authors 5
  1. Jasmin Šarić (first)
  2. Lars Juhl Jensen (additional)
  3. Rossitza Ouzounova (additional)
  4. Isabel Rojas (additional)
  5. Peer Bork (additional)
References 15 Referenced 100
  1. {'key': '2023012408515175200_b1', 'first-page': '8', 'article-title': 'Partial parsing via finite-state cascades', 'author': 'Abney', 'year': '1996'} / Partial parsing via finite-state cascades by Abney (1996)
  2. {'key': '2023012408515175200_b2', 'first-page': '60', 'article-title': 'Automatic extraction of biological information from scientific text: protein–protein interactions', 'author': 'Blaschke', 'year': '1999'} / Automatic extraction of biological information from scientific text: protein–protein interactions by Blaschke (1999)
  3. 10.1093/nar/gkg095 / Nucleic Acids Res. / The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. by Boeckmann (2003)
  4. 10.1093/nar/30.1.69 / Nucleic Acids Res. / Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) by Dwight (2002)
  5. 10.1093/bioinformatics/17.suppl_1.S74 / Bioinformatics / GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. by Friedman (2001)
  6. {'key': '2023012408515175200_b6', 'first-page': '852', 'article-title': 'Tagging medical documents with high accuracy', 'author': 'Hahn', 'year': '2004'} / Tagging medical documents with high accuracy by Hahn (2004)
  7. 10.1016/S1532-0464(03)00015-7 / J. Biomedical Informatics / Information extraction from biomedical text. by Hobbs (2003)
  8. {'key': '2023012408515175200_b8', 'article-title': 'TIGERSearch—ein Suchwerkzeug für Baumbanken', 'author': 'Lezius', 'year': '2002'} / TIGERSearch—ein Suchwerkzeug für Baumbanken by Lezius (2002)
  9. 10.1093/bioinformatics/17.4.359 / Bioinformatics / Mining literature for protein–protein interactions. by Marcotte (2001)
  10. 10.1038/sj.embor.embor833 / EMBO Rep. / The way we write. by Netzel (2003)
  11. {'key': '2023012408515175200_b11', 'first-page': '362', 'article-title': 'Robust relational parsing over biomedical literature: extracting inhibit relations', 'author': 'Pustejovsky', 'year': '2002'} / Robust relational parsing over biomedical literature: extracting inhibit relations by Pustejovsky (2002)
  12. {'key': '2023012408515175200_b12', 'first-page': '191', 'article-title': 'Extracting regulatory gene expression networks from pubmed', 'author': 'Saric', 'year': '2004'} / Extracting regulatory gene expression networks from pubmed by Saric (2004)
  13. 10.1186/1471-2105-4-20 / BMC Bioinformatics / Information extraction from full text scientific articles: where are the keywords? by Shah (2003)
  14. {'key': '2023012408515175200_b14', 'first-page': '707', 'article-title': 'Automatic extraction of protein interactions from scientific abstracts', 'author': 'Thomas', 'year': '2000'} / Automatic extraction of protein interactions from scientific abstracts by Thomas (2000)
  15. 10.1093/nar/gki005 / Nucleic Acids Res. / STRING: known and predicted protein–protein associations, integrated and transferred across organisms. by von Mering (2005)
Dates
Type When
Created 20 years ago (July 26, 2005, 10:34 p.m.)
Deposited 2 years, 7 months ago (Jan. 24, 2023, 4:26 a.m.)
Indexed 1 year, 3 months ago (May 2, 2024, 11:39 a.m.)
Issued 20 years, 1 month ago (July 26, 2005)
Published 20 years, 1 month ago (July 26, 2005)
Published Online 20 years, 1 month ago (July 26, 2005)
Published Print 19 years, 5 months ago (March 15, 2006)
Funders 0

None

@article{_ari__2005, title={Extraction of regulatory gene/protein networks from Medline}, volume={22}, ISSN={1367-4803}, url={http://dx.doi.org/10.1093/bioinformatics/bti597}, DOI={10.1093/bioinformatics/bti597}, number={6}, journal={Bioinformatics}, publisher={Oxford University Press (OUP)}, author={Šarić, Jasmin and Jensen, Lars Juhl and Ouzounova, Rossitza and Rojas, Isabel and Bork, Peer}, year={2005}, month=jul, pages={645–650} }