Abstract
Large segmental duplications cover much of the Arabidopsis thaliana genome. Little is known about their origins. We show that they are primarily due to at least four different large-scale duplication events that occurred 100 to 200 million years ago, a formative period in the diversification of the angiosperms. A better understanding of the complex structural history of angiosperm genomes is necessary to make full use of Arabidopsis as a genetic model for other plant species.
References
55
Referenced
817
-
K. Arumuganthan E. D. Earle Plant Mol. Biol. Rep. 9 208 (1991).
(
10.1007/BF02672069
) 10.1126/science.282.5389.662
-
McGrath J. M., Jansco M. M., Pichersky E., Theor. Appl. Genet. 86, 880 (1993).
(
10.1007/BF00212616
) / Theor. Appl. Genet. by McGrath J. M. (1993) -
Bancroft I., Yeast 17, 1 (2000).
(
10.1002/(SICI)1097-0061(200004)17:1<1::AID-YEA3>3.0.CO;2-V
) / Yeast by Bancroft I. (2000) -
Blanc G., Barakat A., Guyot R., Cooke R., Delseny M., Plant Cell 12, 1093 (2000).
(
10.1105/tpc.12.7.1093
) / Plant Cell by Blanc G. (2000) 10.1073/pnas.070430597
-
Kowalski S. P., Lan T. H., Feldmann K. A., Paterson A. H., Genetics 138, 499 (1994).
(
10.1093/genetics/138.2.499
) / Genetics by Kowalski S. P. (1994) 10.1073/pnas.160271297
10.1038/45471
-
Mayer K., et al., Nature 402, 769 (1999).
(
10.1038/47134
) / Nature by Mayer K. (1999) 10.1038/ng1296-380
-
Lan T.-H., et al., Genome Res. 10, 776 (2000).
(
10.1101/gr.10.6.776
) / Genome Res. by Lan T.-H. (2000) -
Paterson A. H., et al., Plant Cell 12, 1523 (2000).
(
10.1105/tpc.12.9.1523
) / Plant Cell by Paterson A. H. (2000) -
Terryn N., et al., FEBS Lett. 445, 237 (1999).
(
10.1016/S0014-5793(99)00097-6
) / FEBS Lett. by Terryn N. (1999) - Supplementary information is available at: www.igd.cornell.edu/∼tvision/arab/science_supplement.html
- We used GENSCAN [
10.1006/jmbi.1997.0951
- ] to provide gene models in unannotated clones. Because we only require that most predicted exons overlap true exons in the same translation frame and that predicted gene densities are approximately correct ab initio predictions are sufficient for our purposes.
10.1093/nar/25.17.3389
- BLASTP scores were obtained for all pairs of genes. The ( i j ) element of matrix M was assigned the alignment score for proteins i and j if the score was 100 bits or greater. Row and column indices denote the position of each protein within each chromosome. Chromosome order and orientation are arbitrary.
- Matching genes within 15 positions of each other were collected into the row and column with the smallest index and were assigned the maximum of the component scores. This was iterated until convergence thereby combining both tandemly duplicated genes and single-copy genes shared by overlapping clones. A single gene may occur in two positions if the tiling path information is wrong but extensive clone overlap would be needed to generate a spurious duplicated block and the high sequence similarity would have flagged the error.
- Only the five highest scores in each row and column were retained a conservative approach that sacrifices sensitivity for specificity.
- To identify duplicated blocks we first calculated a weight for each pair of nonzero elements M i1 j1 and M i2 j2 for all j 2 > j 1 with corresponding transcriptional orientations T ( i 1 ) T ( i 2 ) T ( j 1 ) T ( j 2 ) where T ε {−1 1}. The weight was W = − k + ℓ ( r + c ) −1 [ T ( i 1 )· T ( j 1 )= T ( i 2 )· T ( j 2 )=sgn( i 2 - i 1 )] · m where k ℓ and m are constants; r = ‖ i 1 − i 2 ‖ and c = ‖ j 1 − j 2 ‖ are the row and column distances; 1 [ T ( i 1 )· T ( j 1 )= T ( i 2 )· T ( j 2 )=sgn( i 2 - i 1 )] is an indicator function equaling 1 if the transcriptional orientations of the two pairs of linked cORFs are equivalent but otherwise equaling 0; and sgn( x ) equals −1 0 or 1 for x greater than less than or equal to 0 respectively. When cORFs were composed of multiple genes transcriptional orientations were taken to be those of the highest scoring gene pair. Weights were assigned to edges of a directed acyclic graph in which nodes were nonzero elements of M and edges connected all nodes ( i 1 j 1 ) and ( i 2 j 2 ) for which j 2 > j 1 . We computed minimum-weight paths between every pair of nodes connected by an edge [T. H. Cormen C. E. Leiserson R. L. Rivest Introduction to Algorithms (MIT Press Cambridge MA 1990) pp. 536–538] identified paths with negative weight combined overlapping paths combined paths from either side of the diagonal and accepted the resulting sets of nodes as duplicated blocks. Errors in the order and orientation of genes may hinder our ability to detect duplicated blocks but are unlikely to generate false ones.
10.1038/42711
-
A. McLysaght C. Seoighe K. H. Wolfe in Comparative Genomics D. Sankoff J. H. Nadeau Eds. (Kluwer New York 2000) pp. 47–58.
(
10.1007/978-94-011-4309-7_6
) - Random matrices were derived from M by permutation of its rows. Using parameter values of k = 5 l = 1.14 and m = 25 only one block defined by six or more pairs was identified in 1000 permutations of the chromosome 2 versus 4 submatrix compared with 34 blocks of five pairs. Thus blocks of seven pairs are unlikely to arise by chance though real duplicated blocks may be overextended or erroneously merged.
- Amino acid alignments were obtained using CLUSTALW version 1.7 [
10.1093/nar/22.22.4673
- ]. Estimates of d A were obtained using PAML [Z. Yang Phylogenetic Analysis by Maximum Likelihood (PAML) Version 3.0. (University College London 2000)] with the JTT substitution matrix [
- Jones D. W., Taylor W. R., Thornton J. M., CABIOS 8, 275 (1992); / CABIOS by Jones D. W. (1992)
- ]. The smallest estimate of d A was used for matches between multiple genes. The median is more robust to outliers than the mean though it will still be affected by the absence of highly diverged homologous genes from the sample.
- Homogeneity in d A among blocks was rejected by a single-classification analysis of variance ( P < 0.0001).
- Mixture models of normal distributions with parameters for means variances and mixing proportions were fit using an expectation-maximization algorithm. Models were compared by likelihood ratio tests [M. Lynch B. Walsh Genetics and Analysis of Quantitative Traits. (Sinauer Sunderland MA 1997) pp. 359–364]. The medians of samples drawn from a population of any distribution approach a normal distribution as the sample size increases. [W. Feller An Introduction to Probability Theory and its Applications (John Wiley & Sons New York ed. 2 1957) pp. 238–241]. The approximation should be adequate for samples of this size (average = 29) though it is not strictly valid because there are differing numbers of matches in each block. The log likelihoods of the one two and three distribution models are −5.5 −128.5 and −183.6 respectively and each differs from the next by three degrees of freedom.
- Because numerous small blocks were not counted and up to 20% of the genome sequence has not been analyzed this is likely to be an underestimate.
- This estimate is the product of the average gene density of chromosomes 2 and 4 ∼210 genes/megabase and the estimated length of chromosome 1 which is 27.9 megabases [
10.1038/10334
- A rate of 9 × 10 −10 ± 9 × 10 −10 nonsynonymous base substitutions·site −1 ·lineage −1 ·year −1 has been estimated for nuclear genes in the grasses [
- Gaut B. S., Evol. Biol. 30, 93 (1998); / Evol. Biol. by Gaut B. S. (1998)
10.1023/A:1006319803002
- ] though this must be treated with caution due to many sources of uncertainty. Because 75% of all possible sense nucleotide substitutions in the genetic code are nonsynonymous and there are three positions in each codon we assume that there are 2.25 nonsynonymous sites in each codon in converting from amino acid substitutions to nonsynonymous base substitutions. Accounting for patterns of codon usage in Arabidopsis (GenBank Release 119.0) one obtains a nearly identical conversion factor (2.30 nonsynonymous sites per codon) [
-
Benson D. A., et al., Nucleic Acids Res. 28, 15 (2000)].
(
10.1093/nar/28.1.15
) / Nucleic Acids Res. by Benson D. A. (2000) - Arabidopsis is a member of the rosid lineage of dicot angiosperms [
-
Soltis P. S., Soltis D. E., Chase M. W., Nature 402, 402 (1999)].
(
10.1038/46528
) / Nature by Soltis P. S. (1999) 10.1073/pnas.86.16.6201
-
Yang Y.-W., Lai K.-N., Tai P.-Y., Li W.-H., J. Mol. Evol. 48, 597 (1999).
(
10.1007/PL00006502
) / J. Mol. Evol. by Yang Y.-W. (1999) - The two presumed redundancies involve clones T6J4/F13B4 and F21N10/K17E7.
-
Koch M., Bishop J., Mitchell-Olds T., Plant Biol. 1, 529 (1999).
(
10.1111/j.1438-8677.1999.tb00779.x
) / Plant Biol. by Koch M. (1999) - This may be inflated by undetected homologies overextended blocks and transposition of genes from their original positions.
10.1007/PL00006498
- C. Somerville personal communication.
-
S. J Liljegren et al. Nature 404 766 (2000).
(
10.1038/35008089
) 10.1101/gr.9.9.825
10.1139/g99-033
-
Copenhaver G. P., Browne W. E., Preuss D., Proc. Natl. Acad. Sci. U.S.A. 95, 247 (1998).
(
10.1073/pnas.95.1.247
) / Proc. Natl. Acad. Sci. U.S.A. by Copenhaver G. P. (1998) - We thank C. Aquadro J. Doyle R. Durrett T. Mitchell-Olds C. Somerville M. Yanofsky and L. Zhang for helpful comments. This research was funded by grants from the National Science Foundation and the Office of Naval Research. T.J.V. is supported in part by the Cornell Theory Center.
Dates
Type | When |
---|---|
Created | 23 years, 1 month ago (July 27, 2002, 5:52 a.m.) |
Deposited | 1 year, 7 months ago (Jan. 13, 2024, 4:26 a.m.) |
Indexed | 1 week, 3 days ago (Aug. 26, 2025, 3:10 a.m.) |
Issued | 24 years, 8 months ago (Dec. 15, 2000) |
Published | 24 years, 8 months ago (Dec. 15, 2000) |
Published Print | 24 years, 8 months ago (Dec. 15, 2000) |
@article{Vision_2000, title={The Origins of Genomic Duplications in Arabidopsis}, volume={290}, ISSN={1095-9203}, url={http://dx.doi.org/10.1126/science.290.5499.2114}, DOI={10.1126/science.290.5499.2114}, number={5499}, journal={Science}, publisher={American Association for the Advancement of Science (AAAS)}, author={Vision, Todd J. and Brown, Daniel G. and Tanksley, Steven D.}, year={2000}, month=dec, pages={2114–2117} }