Abstract
Abstract Background In recent years it has been demonstrated that structural variations, such as indels (insertions and deletions), are common throughout the genome, but the implications of structural variations are still not clearly understood. Long tandem repeats (e.g. microsatellites or simple repeats) are known to be hypermutable (indel-rich), but are rare in exons and only occasionally associated with diseases. Here we focus on short (imperfect) tandem repeats (STRs) which fall below the radar of conventional tandem repeat detection, and investigate whether STRs are targets for disease-related mutations in human exons. In particular, we test whether they share the hypermutability of the longer tandem repeats and whether disease-related genes have a higher STR content than non-disease-related genes. Results We show that validated human indels are extremely common in STR regions compared to non-STR regions. In contrast to longer tandem repeats, our definition of STRs found them to be present in exons of most known human genes (92%), 99% of all STR sequences in exons are shorter than 33 base pairs and 62% of all STR sequences are imperfect repeats. We also demonstrate that STRs are significantly overrepresented in disease-related genes in both human and mouse. These results are preserved when we limit the analysis to STRs outside known longer tandem repeats. Conclusion Based on our findings we conclude that STRs represent hypermutable regions in the human genome that are linked to human disease. In addition, STRs constitute an obvious target when screening for rare mutations, because of the relatively low amount of STRs in exons (1,973,844 bp) and the limited length of STR regions.
References
33
Referenced
45
-
Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G: The Diploid Genome Sequence of an Individual Human. PLoS Biology. 2007, 5 (10): e254-10.1371/journal.pbio.0050254.
(
10.1371/journal.pbio.0050254
) / PLoS Biology by S Levy (2007) -
Freeman JL, Perry GH, Feuk L, Redon R, McCarroll SA, Altshuler DM, Aburatani H, Jones KW, Tyler-Smith C, Hurles ME: Copy number variation: New insights in genome diversity. Genome Res. 2006, 16 (8): 949-961. 10.1101/gr.3677206.
(
10.1101/gr.3677206
) / Genome Res by JL Freeman (2006) -
Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, Albertson D, Pinkel D: Fine-scale structural variation of the human genome. Nat Genet. 2005, 37 (7): 727-732. 10.1038/ng1562.
(
10.1038/ng1562
) / Nat Genet by E Tuzun (2005) -
Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK: A high-resolution survey of deletion polymorphism in the human genome. Nat Genet. 2006, 38 (1): 75-81. 10.1038/ng1697.
(
10.1038/ng1697
) / Nat Genet by DF Conrad (2006) -
Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W: Global variation in copy number in the human genome. Nature. 2006, 444 (7118): 444-454. 10.1038/nature05329.
(
10.1038/nature05329
) / Nature by R Redon (2006) -
Khaja R, Zhang J, MacDonald JR, He Y, Joseph-George AM, Wei J, Rafiq MA, Qian C, Shago M, Pantano L: Genome assembly comparison identifies structural variants in the human genome. Nat Genet. 2006, 38 (12): 1413-1418. 10.1038/ng1921.
(
10.1038/ng1921
) / Nat Genet by R Khaja (2006) -
Madsen BE, Villesen P, Wiuf C: A periodic pattern of SNPs in the human genome. Genome Res. 2007, 17 (10): 1414-1419. 10.1101/gr.6223207.
(
10.1101/gr.6223207
) / Genome Res by BE Madsen (2007) -
Boby T, Patch AM, Aves SJ: TRbase: a database relating tandem repeats to disease genes for the human genome. Bioinformatics. 2005, 21 (6): 811-816. 10.1093/bioinformatics/bti059.
(
10.1093/bioinformatics/bti059
) / Bioinformatics by T Boby (2005) -
Borstnik B, Pumpernik D: Tandem Repeats in Protein Coding Regions of Primate Genes. Genome Res. 2002, 12 (6): 909-915. 10.1101/gr.138802.
(
10.1101/gr.138802
) / Genome Res by B Borstnik (2002) -
O'Dushlaine C, Edwards R, Park S, Shields D: Tandem repeat copy-number variation in protein-coding regions of human genes. Genome Biology. 2005, 6 (8): R69-10.1186/gb-2005-6-8-r69.
(
10.1186/gb-2005-6-8-r69
) / Genome Biology by C O'Dushlaine (2005) -
Hancock JM, Simon M: Simple sequence repeats in proteins and their significance for network evolution. Gene. 2005, 345 (1): 113-118. 10.1016/j.gene.2004.11.023.
(
10.1016/j.gene.2004.11.023
) / Gene by JM Hancock (2005) -
Alba MM, Guigo R: Comparative analysis of amino acid repeats in rodents and humans. Genome Res. 2004, 14 (4): 549-554. 10.1101/gr.1925704.
(
10.1101/gr.1925704
) / Genome Res by MM Alba (2004) -
Kashi Y, King DG: Simple sequence repeats as advantageous mutators in evolution. Trends in Genetics. 2006, 22 (5): 253-259. 10.1016/j.tig.2006.03.005.
(
10.1016/j.tig.2006.03.005
) / Trends in Genetics by Y Kashi (2006) - Kelkar YD, Tyekucheva S, Chiaromonte F, Makova KD: The genome-wide determinants of human and chimpanzee microsatellite evolution. Genome Res. 2007, gr.7113408 / Genome Res by YD Kelkar (2007)
-
Mirkin SM: Expandable DNA repeats and human disease. Nature. 2007, 447 (7147): 932-940. 10.1038/nature05977.
(
10.1038/nature05977
) / Nature by SM Mirkin (2007) - Levinson G, Gutman GA: Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol Biol Evol. 1987, 4 (3): 203-221. / Mol Biol Evol by G Levinson (1987)
-
Eppig JT, Blake JA, Bult CJ, Kadin JA, Richardson JE: The mouse genome database (MGD): new features facilitating a model system. Nucleic Acids Res. 2007, D630-637. 10.1093/nar/gkl940. 35 Database
(
10.1093/nar/gkl940
) -
Cohen J: GENOMICS: DNA Duplications and Deletions Help Determine Health. Science. 2007, 317 (5843): 1315-1317. 10.1126/science.317.5843.1315.
(
10.1126/science.317.5843.1315
) / Science by J Cohen (2007) -
Lupski JR: Genome structural variation and sporadic disease traits. Nat Genet. 2006, 38 (9): 974-976. 10.1038/ng0906-974.
(
10.1038/ng0906-974
) / Nat Genet by JR Lupski (2006) -
Lai Y, Sun F: The Relationship Between Microsatellite Slippage Mutation Rate and the Number of Repeat Units. Mol Biol Evol. 2003, 20 (12): 2123-2131. 10.1093/molbev/msg228.
(
10.1093/molbev/msg228
) / Mol Biol Evol by Y Lai (2003) -
Pumpernik D, Oblak B, Borštnik B: Replication slippage versus point mutation rates in short tandem repeats of the human genome. Molecular Genetics and Genomics. 2008, 279 (1): 53-61. 10.1007/s00438-007-0294-1.
(
10.1007/s00438-007-0294-1
) / Molecular Genetics and Genomics by D Pumpernik (2008) -
Leclercq S, Rivals E, Jarne P: Detecting microsatellites within genomes: significant variation among algorithms. BMC Bioinformatics. 2007, 8 (1): 125-10.1186/1471-2105-8-125.
(
10.1186/1471-2105-8-125
) / BMC Bioinformatics by S Leclercq (2007) -
Hodges E, Xuan Z, Balija V, Kramer M, Molla MN, Smith SW, Middle CM, Rodesch MJ, Albert TJ, Hannon GJ: Genome-wide in situ exon capture for selective resequencing. Nat Genet. 2007, 39 (12): 1522-1527. 10.1038/ng.2007.42.
(
10.1038/ng.2007.42
) / Nat Genet by E Hodges (2007) -
International Human Genome Sequencing Consortium: Initial sequencing and analysis of the human genome. Nature. 2001, 409 (6822): 860-921. 10.1038/35057062.
(
10.1038/35057062
) / Nature by International Human Genome Sequencing Consortium (2001) -
Mouse Genome Sequencing Consortium: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420 (6915): 520-562. 10.1038/nature01262.
(
10.1038/nature01262
) / Nature by Mouse Genome Sequencing Consortium (2002) -
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucl Acids Res. 2001, 29 (1): 308-311. 10.1093/nar/29.1.308.
(
10.1093/nar/29.1.308
) / Nucl Acids Res by ST Sherry (2001) -
Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, Huber W: BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics. 2005, 21 (16): 3439-3440. 10.1093/bioinformatics/bti525.
(
10.1093/bioinformatics/bti525
) / Bioinformatics by S Durinck (2005) - Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucl Acids Res. 2005, 33 (suppl_1): D514-517. / Nucl Acids Res by A Hamosh (2005)
-
Bult CJ, Blake JA, Richardson JE, Kadin JA, Eppig JT, the Mouse Genome Database G: The Mouse Genome Database (MGD): integrating biology with the genome. Nucl Acids Res. 2004, 32 (suppl_1): D476-481. 10.1093/nar/gkh125.
(
10.1093/nar/gkh125
) / Nucl Acids Res by CJ Bult (2004) -
Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ: The UCSC Table Browser data retrieval tool. Nucl Acids Res. 2004, 32 (suppl_1): D493-496. 10.1093/nar/gkh103.
(
10.1093/nar/gkh103
) / Nucl Acids Res by D Karolchik (2004) -
Benson G: Tandem repeats finder: a program to analyze DNA sequences. Nucl Acids Res. 1999, 27 (2): 573-580. 10.1093/nar/27.2.573.
(
10.1093/nar/27.2.573
) / Nucl Acids Res by G Benson (1999) -
Wilcoxon F: Individual Comparisons by Ranking Methods. Biometrics Bulletin. 1945, 1 (6): 80-83. 10.2307/3001968.
(
10.2307/3001968
) / Biometrics Bulletin by F Wilcoxon (1945) - R Development Core Team: R: A Language and Environment for Statistical Computing. 2006, Vienna, Austria: R Foundation for Statistical Computing / R: A Language and Environment for Statistical Computing by R Development Core Team (2006)
Dates
Type | When |
---|---|
Created | 16 years, 11 months ago (Sept. 12, 2008, 3:31 p.m.) |
Deposited | 4 years ago (Aug. 31, 2021, 11:36 p.m.) |
Indexed | 4 months, 4 weeks ago (April 7, 2025, 5:26 p.m.) |
Issued | 16 years, 11 months ago (Sept. 12, 2008) |
Published | 16 years, 11 months ago (Sept. 12, 2008) |
Published Online | 16 years, 11 months ago (Sept. 12, 2008) |
Published Print | 16 years, 9 months ago (Dec. 1, 2008) |
@article{Madsen_2008, title={Short Tandem Repeats in Human Exons: A Target for Disease Mutations}, volume={9}, ISSN={1471-2164}, url={http://dx.doi.org/10.1186/1471-2164-9-410}, DOI={10.1186/1471-2164-9-410}, number={1}, journal={BMC Genomics}, publisher={Springer Science and Business Media LLC}, author={Madsen, Bo Eskerod and Villesen, Palle and Wiuf, Carsten}, year={2008}, month=sep }