Abstract
With the development of large data banks of protein and nucleic acid sequences, the need for efficient methods of searching such banks for sequences similar to a given sequence has become evident. We present an algorithm for the global comparison of sequences based on matching k-tuples of sequence elements for a fixed k. The method results in substantial reduction in the time required to search a data bank when compared with prior techniques of similarity analysis, with minimal loss in sensitivity. The algorithm has also been adapted, in a separate implementation, to produce rigorous sequence alignments. Currently, using the DEC KL-10 system, we can compare all sequences in the entire Protein Data Bank of the National Biomedical Research Foundation with a 350-residue query sequence in less than 3 min and carry out a similar analysis with a 500-base query sequence against all eukaryotic sequences in the Los Alamos Nucleic Acid Data Base in less than 2 min.
Dates
Type | When |
---|---|
Created | 19 years, 3 months ago (May 31, 2006, 3:24 a.m.) |
Deposited | 3 years, 4 months ago (April 13, 2022, 11:43 a.m.) |
Indexed | 2 days, 23 hours ago (Aug. 30, 2025, 12:49 p.m.) |
Issued | 42 years, 7 months ago (Feb. 1, 1983) |
Published | 42 years, 7 months ago (Feb. 1, 1983) |
Published Online | 42 years, 7 months ago (Feb. 1, 1983) |
Published Print | 42 years, 7 months ago (Feb. 1, 1983) |
@article{Wilbur_1983, title={Rapid similarity searches of nucleic acid and protein data banks.}, volume={80}, ISSN={1091-6490}, url={http://dx.doi.org/10.1073/pnas.80.3.726}, DOI={10.1073/pnas.80.3.726}, number={3}, journal={Proceedings of the National Academy of Sciences}, publisher={Proceedings of the National Academy of Sciences}, author={Wilbur, W J and Lipman, D J}, year={1983}, month=feb, pages={726–730} }