Abstract
AbstractMotivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically.Results: We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed.Availability: An initial C++ implementation of our method that was used for the results presented in this article is available at http://deweylab.biostat.wisc.edu/rsem.Contact: cdewey@biostat.wisc.eduSupplementary information: Supplementary data are available at Bioinformatics on
References
17
Referenced
959
10.1093/bioinformatics/bth924
/ Bioinformatics / Statistical modeling of sequencing errors in SAGE libraries by Beissbarth (2004)10.1038/nmeth.1223
/ Nat. Methods / Stem cell transcriptome profiling via massive-scale mRNA sequencing by Cloonan (2008)10.1111/j.2517-6161.1977.tb01600.x
/ J. R. Stat. Soc. Ser. B (Methodol.) / Maximum likelihood from incomplete data via the EM algorithm by Dempster (1977)10.1093/nar/gkn425
/ Nucleic Acids Res. / Substantial biases in ultra-short read data sets from high-throughput DNA sequencing by Dohm (2008)10.1016/j.ygeno.2007.11.003
/ Genomics / A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE by Faulkner (2008)10.1093/bioinformatics/btl048
/ Bioinformatics / The UCSC known genes by Hsu (2006)10.1093/bioinformatics/btp113
/ Bioinformatics / Statistical inferences for isoform expression in RNA-Seq by Jiang (2009)10.1093/bioinformatics/btn571
/ Bioinformatics / Cross-hybridization modeling on Affymetrix exon arrays by Kapur (2008)10.1007/978-3-540-87361-7_5
/ Proceedings of the 8th International Workshop on Algorithms in Bioinformatics. / Exact transcriptome reconstruction from short sequence reads by Lacroix (2008)10.1186/gb-2009-10-3-r25
/ Genome Biol. / Ultrafast and memory-efficient alignment of short DNA sequences to the human genome by Langmead (2009)10.1016/j.cell.2008.03.029
/ Cell / Highly integrated single-base resolution maps of the epigenome in Arabidopsis by Lister (2008)10.1101/gr.079558.108
/ Genome Res. / RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays by Marioni (2008)10.2144/000112900
/ BioTechniques / Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing by Morin (2008)10.1038/nmeth.1226
/ Nat. Methods / Mapping and quantifying mammalian transcriptomes by RNA-Seq by Mortazavi (2008)10.1126/science.1158441
/ Science / The transcriptional landscape of the yeast genome defined by RNA sequencing by Nagalakshmi (2008)10.1093/nar/6.7.2601
/ Nucleic Acids Res. / A strategy of DNA sequencing employing computer programs by Staden (1979)10.1038/nrg2484
/ Nat. Rev. Genet. / RNA-Seq: a revolutionary tool for transcriptomics by Wang (2009)
Dates
Type | When |
---|---|
Created | 15 years, 8 months ago (Dec. 18, 2009, 9:09 p.m.) |
Deposited | 6 months, 2 weeks ago (Feb. 13, 2025, 5:37 p.m.) |
Indexed | 1 day, 8 hours ago (Aug. 29, 2025, 6:19 a.m.) |
Issued | 15 years, 8 months ago (Dec. 18, 2009) |
Published | 15 years, 8 months ago (Dec. 18, 2009) |
Published Online | 15 years, 8 months ago (Dec. 18, 2009) |
Published Print | 15 years, 6 months ago (Feb. 15, 2010) |
@article{Li_2009, title={RNA-Seq gene expression estimation with read mapping uncertainty}, volume={26}, ISSN={1367-4803}, url={http://dx.doi.org/10.1093/bioinformatics/btp692}, DOI={10.1093/bioinformatics/btp692}, number={4}, journal={Bioinformatics}, publisher={Oxford University Press (OUP)}, author={Li, Bo and Ruotti, Victor and Stewart, Ron M. and Thomson, James A. and Dewey, Colin N.}, year={2009}, month=dec, pages={493–500} }