Crossref journal-article
Oxford University Press (OUP)
Bioinformatics (286)
Abstract

AbstractMotivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically.Results: We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed.Availability: An initial C++ implementation of our method that was used for the results presented in this article is available at http://deweylab.biostat.wisc.edu/rsem.Contact:  cdewey@biostat.wisc.eduSupplementary information:  Supplementary data are available at Bioinformatics on

Bibliography

Li, B., Ruotti, V., Stewart, R. M., Thomson, J. A., & Dewey, C. N. (2009). RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics, 26(4), 493–500.

Authors 5
  1. Bo Li (first)
  2. Victor Ruotti (additional)
  3. Ron M. Stewart (additional)
  4. James A. Thomson (additional)
  5. Colin N. Dewey (additional)
References 17 Referenced 959
  1. 10.1093/bioinformatics/bth924 / Bioinformatics / Statistical modeling of sequencing errors in SAGE libraries by Beissbarth (2004)
  2. 10.1038/nmeth.1223 / Nat. Methods / Stem cell transcriptome profiling via massive-scale mRNA sequencing by Cloonan (2008)
  3. 10.1111/j.2517-6161.1977.tb01600.x / J. R. Stat. Soc. Ser. B (Methodol.) / Maximum likelihood from incomplete data via the EM algorithm by Dempster (1977)
  4. 10.1093/nar/gkn425 / Nucleic Acids Res. / Substantial biases in ultra-short read data sets from high-throughput DNA sequencing by Dohm (2008)
  5. 10.1016/j.ygeno.2007.11.003 / Genomics / A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE by Faulkner (2008)
  6. 10.1093/bioinformatics/btl048 / Bioinformatics / The UCSC known genes by Hsu (2006)
  7. 10.1093/bioinformatics/btp113 / Bioinformatics / Statistical inferences for isoform expression in RNA-Seq by Jiang (2009)
  8. 10.1093/bioinformatics/btn571 / Bioinformatics / Cross-hybridization modeling on Affymetrix exon arrays by Kapur (2008)
  9. 10.1007/978-3-540-87361-7_5 / Proceedings of the 8th International Workshop on Algorithms in Bioinformatics. / Exact transcriptome reconstruction from short sequence reads by Lacroix (2008)
  10. 10.1186/gb-2009-10-3-r25 / Genome Biol. / Ultrafast and memory-efficient alignment of short DNA sequences to the human genome by Langmead (2009)
  11. 10.1016/j.cell.2008.03.029 / Cell / Highly integrated single-base resolution maps of the epigenome in Arabidopsis by Lister (2008)
  12. 10.1101/gr.079558.108 / Genome Res. / RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays by Marioni (2008)
  13. 10.2144/000112900 / BioTechniques / Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing by Morin (2008)
  14. 10.1038/nmeth.1226 / Nat. Methods / Mapping and quantifying mammalian transcriptomes by RNA-Seq by Mortazavi (2008)
  15. 10.1126/science.1158441 / Science / The transcriptional landscape of the yeast genome defined by RNA sequencing by Nagalakshmi (2008)
  16. 10.1093/nar/6.7.2601 / Nucleic Acids Res. / A strategy of DNA sequencing employing computer programs by Staden (1979)
  17. 10.1038/nrg2484 / Nat. Rev. Genet. / RNA-Seq: a revolutionary tool for transcriptomics by Wang (2009)
Dates
Type When
Created 15 years, 8 months ago (Dec. 18, 2009, 9:09 p.m.)
Deposited 6 months, 2 weeks ago (Feb. 13, 2025, 5:37 p.m.)
Indexed 1 day, 8 hours ago (Aug. 29, 2025, 6:19 a.m.)
Issued 15 years, 8 months ago (Dec. 18, 2009)
Published 15 years, 8 months ago (Dec. 18, 2009)
Published Online 15 years, 8 months ago (Dec. 18, 2009)
Published Print 15 years, 6 months ago (Feb. 15, 2010)
Funders 0

None

@article{Li_2009, title={RNA-Seq gene expression estimation with read mapping uncertainty}, volume={26}, ISSN={1367-4803}, url={http://dx.doi.org/10.1093/bioinformatics/btp692}, DOI={10.1093/bioinformatics/btp692}, number={4}, journal={Bioinformatics}, publisher={Oxford University Press (OUP)}, author={Li, Bo and Ruotti, Victor and Stewart, Ron M. and Thomson, James A. and Dewey, Colin N.}, year={2009}, month=dec, pages={493–500} }