Abstract
Length variation in short tandem repeats (STRs) is an important family of DNA polymorphisms with numerous applications in genetics, medicine, forensics, and evolutionary analysis. Several major diseases have been associated with length variation of trinucleotide (triplet) repeats including Huntington's disease, hereditary ataxias and spinobulbar muscular atrophy. Using the reference human genome, we have catalogued all triplet repeats in genic regions. This data revealed a bias in noncoding DNA repeat lengths. It also enabled a survey of repeat-length polymorphisms (RLPs) in human genomes and a comparison of the rate of polymorphism in humans versus divergence from chimpanzee. For short repeats, this analysis of three human genomes reveals a relatively low RLP rate in exons and, somewhat surprisingly, in introns. All short RLPs observed in multiple genomes are biallelic (at least in this small sample). In contrast, long repeats are highly polymorphic and some long RLPs are multiallelic. For long repeats, the chimpanzee sequence frequently differs from all observed human alleles. This suggests a high expansion/contraction rate in all long repeats. Expansions and contractions are not, however, affected by natural selection discernable from our comparison of human-chimpanzee divergence with human RLPs. Our catalog of human triplet repeats and their surrounding flanking regions can be used to produce a cost-effective whole-genome assay to test individuals. This repeat assay could someday complement SNP arrays for producing tests that assess the risk of an individual to develop a disease, or become part of personalized genomic strategy that provides therapeutic guidance with respect to drug response.
References
43
Referenced
30
10.1126/science.1156409
10.1111/j.1469-1809.2006.00335.x
10.1126/science.1136678
10.1093/hmg/ddn282
10.1371/journal.pone.0003906
10.1016/j.sbi.2006.05.004
10.1038/nature05977
10.1101/gr.070409.107
10.1016/j.tig.2007.09.005
10.1002/ana.410410521
10.1074/jbc.M409984200
10.1016/S0896-6273(02)00872-3
10.1086/378133
10.4161/cbt.6.9.4825
10.1073/pnas.0408118101
10.1371/journal.pbio.0050254
10.1038/nature06884
10.1093/nar/22.22.4828
10.1093/nar/27.2.573
10.1038/35057149
10.1101/gr.078303.108
10.1093/nar/gkg767
10.1038/nature04072
10.1038/351652a0
10.1007/s004390050476
10.1038/75556
10.1007/s100380050059
10.1186/gb-2003-4-5-p3
10.1172/JCI31752
10.1038/nmeth1111
10.1073/pnas.96.18.10016
10.1101/gr.2450504
10.1093/hmg/9.13.2009
10.1016/j.bbrc.2006.12.129
10.1038/35057062
10.1186/1471-2105-8-274
10.1093/nar/gkl1013
10.1186/gb-2004-5-2-r12
10.1093/bioinformatics/btn548
10.1186/1471-2105-10-48
10.1038/nprot.2008.211
- A Agresti, A survey of exact inference for contegency tables. Statitical Science 7, 131–153 (1992). / Statitical Science / A survey of exact inference for contegency tables by Agresti A (1992)
10.1038/ng0504-431
Dates
Type | When |
---|---|
Created | 15 years, 11 months ago (Sept. 18, 2009, 1:11 a.m.) |
Deposited | 3 years, 4 months ago (April 12, 2022, 6:34 p.m.) |
Indexed | 1 year ago (Aug. 5, 2024, 4:42 p.m.) |
Issued | 15 years, 10 months ago (Oct. 6, 2009) |
Published | 15 years, 10 months ago (Oct. 6, 2009) |
Published Online | 15 years, 10 months ago (Oct. 6, 2009) |
Published Print | 15 years, 10 months ago (Oct. 6, 2009) |
@article{Molla_2009, title={Triplet repeat length bias and variation in the human transcriptome}, volume={106}, ISSN={1091-6490}, url={http://dx.doi.org/10.1073/pnas.0907112106}, DOI={10.1073/pnas.0907112106}, number={40}, journal={Proceedings of the National Academy of Sciences}, publisher={Proceedings of the National Academy of Sciences}, author={Molla, Michael and Delcher, Arthur and Sunyaev, Shamil and Cantor, Charles and Kasif, Simon}, year={2009}, month=oct, pages={17095–17100} }