Abstract
AbstractAmong the many applications of mass spectrometry, biomarker pattern discovery from protein mass spectra has aroused considerable interest in the past few years. While research efforts have raised hopes of early and less invasive diagnosis, they have also brought to light the many issues to be tackled before mass‐spectra‐based proteomic patterns become routine clinical tools. Known issues cover the entire pipeline leading from sample collection through mass spectrometry analytics to biomarker pattern extraction, validation, and interpretation. This study focuses on the data‐analytical phase, which takes as input mass spectra of biological specimens and discovers patterns of peak masses and intensities that discriminate between different pathological states. We survey current work and investigate computational issues concerning the different stages of the knowledge discovery process: exploratory analysis, quality control, and diverse transforms of mass spectra, followed by further dimensionality reduction, classification, and model evaluation. We conclude after a brief discussion of the critical biomedical task of analyzing discovered discriminatory patterns to identify their component proteins as well as interpret and validate their biological implications. © 2006 Wiley Periodicals, Inc., Mass Spec Rev 25:409–449, 2006
References
192
Referenced
137
10.1093/bioinformatics/17.6.495
{'key': 'e_1_2_10_3_1', 'first-page': '3609', 'article-title': 'Serum protein fingerprinting coupled with a pattern‐matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men', 'volume': '62', 'author': 'Adam BL', 'year': '2002', 'journal-title': 'Cancer Res'}
/ Cancer Res / Serum protein fingerprinting coupled with a pattern‐matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men by Adam BL (2002)10.1074/jbc.M210184200
10.1038/nature01511
10.1002/pmic.200300574
10.1002/pmic.200300809
10.1073/pnas.102102699
10.1093/bioinformatics/bth446
10.1021/ac0301806
-
AndrewsR DiederichJ TickleA.1995.A survey and critique of techniques for extracting rules from neural networks (Technical Report). Neurocomputing Research Centre Queensland.
(
10.1016/0950-7051(96)81920-4
) 10.1093/bioinformatics/btg484
10.1002/pmic.200300522
- BaileyT ElkanC.1993.Estimating the accuracy of learned concepts. Proceedings of the 13th International Joint Conference on Artificial Intelligence (p 895–900). Morgan Kaufman.
10.1021/ac00113a006
10.1097/01.ju.0000069431.95404.56
10.1002/pmic.200300652
10.1093/bioinformatics/bth947
10.1002/(SICI)1522-2683(19991201)20:18<3521::AID-ELPS3521>3.0.CO;2-8
10.1021/ac990448m
10.1021/ac990449e
10.1093/oso/9780198538493.001.0001
/ Neural networks for pattern recognition by Bishop CM (1995)10.2174/1389201043489648
10.1109/69.842268
10.1093/bioinformatics/btg419
10.1002/1522-2683(20000601)21:11<2243::AID-ELPS2243>3.0.CO;2-K
10.1007/BF00058655
10.1023/A:1010933404324
{'key': 'e_1_2_10_29_1', 'volume-title': 'Classification and regression trees', 'author': 'Breiman L', 'year': '1984'}
/ Classification and regression trees by Breiman L (1984)10.1016/S0169-7439(97)00032-4
10.1073/pnas.97.1.262
10.1023/A:1009715923555
10.1016/S0021-9673(02)00588-5
{'key': 'e_1_2_10_34_1', 'first-page': '1659', 'article-title': 'Exploring the proteome with MALDI‐TOF (Editorial)', 'volume': '3', 'author': 'Campa M', 'year': '2003', 'journal-title': 'Proteomics'}
/ Proteomics / Exploring the proteome with MALDI‐TOF (Editorial) by Campa M (2003)10.1002/(SICI)1097-0231(199610)10:13<1683::AID-RCM716>3.0.CO;2-L
10.1007/s00216-003-1995-x
10.1038/429496a
10.1117/12.281504
10.1093/bioinformatics/18.9.1207
10.1021/ac991500h
10.1097/01.SLA.0000064293.57770.42
10.1021/ac9810516
/ Anal Chem / Role of accurate mass measurement (+/− 10ppm) in protein identification strategies employing ms or ms/ms and database searching by Clauser KR (1999)10.1080/01621459.1979.10481038
/ J Amer Statist Assoc / Robust locally weighted regression and smoothing scatterplots by Cleaveland WS (1979)-
CohenWW.1995.Fast effective rule induction. Proceedings of 11th International Conference on Machine Learning p115–123.
(
10.1016/B978-1-55860-377-6.50023-2
) 10.1002/pmic.200300489
10.1373/49.10.1615
- CoombesKR TsavachidisS MorrisJS BaggerlyKA HungMC KuererHM.2004.Improved peak detection and quantification of mass spectrometry data acquired from surface‐enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform (Technical Report UTMDABTR‐001‐04). The University of Texas M. D. Anderson Cancer Center.
10.1002/0471200611
{'key': 'e_1_2_10_49_1', 'volume-title': 'An introduction to support vector machines', 'author': 'Cristianini N', 'year': '2000'}
/ An introduction to support vector machines by Cristianini N (2000)10.1373/49.8.1272
10.1093/jnci/djh056
10.1162/089976698300017197
- DomingosP.2000.A unified bias‐variance decomposition for zero‐one and squared loss. Proceedings of the Seventeenth National Conference on Artificial Intelligence p564–569.
{'key': 'e_1_2_10_54_1', 'volume-title': 'Pattern classification', 'author': 'Duda R', 'year': '2000'}
/ Pattern classification by Duda R (2000)10.1093/bioinformatics/18.suppl_1.S105
- EfronB TibshiraniR.1995.Cross‐validation and the boostrap: Estimating the error rate of a prediction rule (Technical Report TR‐477). Departement of Statistics Standford University.
10.1021/ac990686h
10.1021/ac011204g
10.1021/ac034800e
- FawcettT.2003.ROC graphs: Notes and practical considerations for data mining researchers (Technical Report). HP Labs.
- FeeldersA VerkooijenW.1995.Which method learns most from the data. Proceedings of the 5th International Workshop on AI and Statistics p219–225.
10.1016/S1387-3806(02)00588-2
10.1126/science.2675315
10.1016/S0021-9673(99)00553-1
10.1016/S0003-2670(03)00570-1
10.1021/ac0010025
10.1006/jcss.1997.1504
{'key': 'e_1_2_10_68_1', 'first-page': 'S34', 'article-title': 'Proteinchip clinical proteomics: Computational challenges and solutions', 'volume': '32', 'author': 'Fung E', 'year': '2002', 'journal-title': 'Comput Proteomics Suppl'}
/ Comput Proteomics Suppl / Proteinchip clinical proteomics: Computational challenges and solutions by Fung E (2002)10.1002/(SICI)1522-2683(19991201)20:18<3527::AID-ELPS3527>3.0.CO;2-9
10.1002/pmic.200300486
10.1021/ac011203o
10.1002/pmic.200300566
10.1002/(SICI)1522-2683(19991201)20:18<3535::AID-ELPS3535>3.0.CO;2-J
10.1016/j.urolonc.2004.04.008
10.1021/ac00206a014
10.1023/A:1012487302797
- GuyonI BitterHM AhmedZ BrownM HellerJ.2003.Multivariate non‐linear feature selection with kernel multiplicative updates and Gram‐Schmidt Relief. Proceedings of BISC FLINT CIBI 2003 Workshop. Berkeley.
10.1109/TKDE.2003.1245283
10.1007/978-0-387-21606-5
10.1002/rcm.600
10.1056/NEJM200102223440801
10.1002/pmic.200300523
10.1007/s10142-002-0066-2
10.1093/bioinformatics/18.suppl_1.S96
10.1196/annals.1310.015
10.1093/bioinformatics/btg264
10.1016/S0169-7439(03)00113-8
10.1016/S0021-9673(03)00616-2
{'key': 'e_1_2_10_89_1', 'volume-title': 'Analysis of proteomic pattern data for feature selection. Applications of Evolutionary Computing. EvoBIO: Evolutionary Computation and Bioinformatics', 'author': 'Jong K', 'year': '2004'}
/ Analysis of proteomic pattern data for feature selection. Applications of Evolutionary Computing. EvoBIO: Evolutionary Computation and Bioinformatics by Jong K (2004)10.1109/CIBCB.2004.1393930
10.1021/ac00171a028
10.1002/(SICI)1099-128X(199905/08)13:3/4<275::AID-CEM543>3.0.CO;2-B
10.1016/S1044-0305(00)00163-X
{'key': 'e_1_2_10_94_1', 'first-page': '129', 'article-title': 'The feature selection problem: Traditional methods and a new algorithm', 'author': 'Kira K', 'year': '1992', 'journal-title': 'Proc Natl Conf Artif Intell (AAAI‐92)'}
/ Proc Natl Conf Artif Intell (AAAI‐92) / The feature selection problem: Traditional methods and a new algorithm by Kira K (1992)- KohaviR.1995.A study of cross‐validation and bootstrap for accuracy estimation and model selection. Proceedings of the 14th International Joint Conference on AI. Morgan Kaufman.
10.1016/S0004-3702(97)00043-X
10.1007/978-3-642-97610-0
- KononenkoI.2004.Estimating attributes: Analysis and extensions of RELIEF. Proceedings of European Conference on Machine Learning.
10.1158/1078-0432.CCR-1167-3
10.1073/pnas.2033602100
10.1002/elps.1150191109
10.1093/bioinformatics/bth193
10.1016/S1044-0305(01)00336-1
{'key': 'e_1_2_10_104_1', 'volume-title': 'Elements of machine learning', 'author': 'Langley P', 'year': '1996'}
/ Elements of machine learning by Langley P (1996)10.1021/ac00004a011
10.1002/pmic.200300515
10.1021/ac970481d
10.1093/clinchem/48.8.1296
10.1093/bioinformatics/btg1066
10.1093/bioinformatics/bth098
10.1089/106652703322756159
{'key': 'e_1_2_10_112_1', 'first-page': '51', 'article-title': 'A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns', 'volume': '13', 'author': 'Liu H', 'year': '2002', 'journal-title': 'Genome Inform'}
/ Genome Inform / A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns by Liu H (2002)10.1016/0021-9673(94)00727-6
10.1021/ac00190a023
10.1021/ac035312
10.1093/bioinformatics/18.suppl_1.S155
10.1016/S1387-3806(01)00562-0
10.1021/ac00119a027
{'key': 'e_1_2_10_119_1', 'volume-title': 'Molecular scanner data analysis', 'author': 'Muller M', 'year': '2003'}
/ Molecular scanner data analysis by Muller M (2003)10.1016/S1044-0305(01)00358-0
10.1002/1615-9861(200210)2:10<1413::AID-PROT1413>3.0.CO;2-P
10.1002/pmic.200300516
10.1016/S0021-9673(98)00021-1
10.1016/S1044-0305(01)00301-4
10.1016/S0140-6736(04)16046-7
10.1002/mas.20002
10.1080/01621459.2000.10473930
/ J Am Stat Assoc / Receiver operating characteristic methodology by Pepe MS (1995)10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2
10.1373/49.8.1276
10.1016/S0140-6736(02)07746-2
10.1373/49.5.752
10.1117/1.1501561
10.1002/pmic.200400857
{'key': 'e_1_2_10_134_1', 'volume-title': 'Numerical recipies in C', 'author': 'Press WH', 'year': '1995'}
/ Numerical recipies in C by Press WH (1995)10.1023/A:1007601015854
10.1002/pmic.200300518
10.1093/clinchem/48.10.1835
10.1111/1541-0420.00017
{'key': 'e_1_2_10_139_1', 'volume-title': 'C4.5: Programs for machine learning', 'author': 'Quinlan JR', 'year': '1993'}
/ C4.5: Programs for machine learning by Quinlan JR (1993){'key': 'e_1_2_10_140_1', 'first-page': '446', 'volume-title': 'Computational learning theory and natural learning systems', 'author': 'Quinlan JR', 'year': '1994'}
/ Computational learning theory and natural learning systems by Quinlan JR (1994)10.5858/2002-126-1518-PATTMD
10.1016/1044-0305(92)87004-I
10.1021/ac00111a031
10.1021/ac951158i
{'key': 'e_1_2_10_145_1', 'first-page': '6971', 'article-title': 'Proteomic profiling of urinary proteins in renal cancer by surface enhanced laser desorption ionization and neural‐network analysis: Identification of key issues affecting potential clinical utility', 'volume': '63', 'author': 'Rogers MA', 'year': '2003', 'journal-title': 'Cancer Res'}
/ Cancer Res / Proteomic profiling of urinary proteins in renal cancer by surface enhanced laser desorption ionization and neural‐network analysis: Identification of key issues affecting potential clinical utility by Rogers MA (2003)10.1021/pr015514r
10.1093/bioinformatics/bth460
10.1093/bioinformatics/bth372
10.1021/ac60214a047
10.1016/S1044-0305(03)00345-3
10.7551/mitpress/4057.001.0001
/ Kernel methods in computational biology by Schölfkopf B (2004){'key': 'e_1_2_10_152_1', 'first-page': '1', 'volume-title': 'Artificial intelligence and heuristic methods in bioinformatics', 'author': 'Schölkopf B', 'year': '2003'}
/ Artificial intelligence and heuristic methods in bioinformatics by Schölkopf B (2003)10.1016/1044-0305(94)00091-D
10.1016/j.chroma.2004.04.004
10.1021/ar990163w
10.1021/ac9608679
10.1073/pnas.95.20.11532
10.1093/jnci/95.1.14
{'key': 'e_1_2_10_159_1', 'volume-title': 'Morphological image analysis', 'author': 'Soille P', 'year': '2003'}
/ Morphological image analysis by Soille P (2003)10.1093/bioinformatics/btg182
10.1186/1471-2105-4-24
10.1373/clinchem.2003.028209
10.1016/j.jasms.2004.04.034
10.1073/pnas.082099299
10.1093/bioinformatics/bth357
10.1016/0197-2456(93)90225-3
10.1007/BF00993473
10.1073/pnas.091062498
{'key': 'e_1_2_10_169_1', 'volume-title': 'Statistical learning theory', 'author': 'Vapnik V', 'year': '1998'}
/ Statistical learning theory by Vapnik V (1998)10.1038/nmeth705
10.1016/S1044-0305(98)00069-5
10.1002/pmic.200300519
10.1186/1471-2105-5-26
10.1021/ac0354701
10.1021/ac00131a023
10.1002/pmic.200300513
10.1038/85686
{'key': 'e_1_2_10_178_1', 'first-page': '31', 'article-title': 'Detection of early‐stage cancer by serium protein analysis', 'volume': '32', 'author': 'Watkins B', 'year': '2001', 'journal-title': 'Am Lab'}
/ Am Lab / Detection of early‐stage cancer by serium protein analysis by Watkins B (2001)10.1021/ac960435y
- WolpertDH.1992.Stacked generalization (Technical Report LA‐UR‐90‐3460). Los Alamos NM.
10.1002/pmic.200300590
10.1002/1615-9861(200210)2:10<1365::AID-PROT1365>3.0.CO;2-9
10.1093/bioinformatics/btg210
10.1038/nrc1043
10.1016/S0140-6736(03)14068-8
10.1093/nar/30.4.e15
10.1093/biostatistics/4.3.449
10.1016/S1044-0305(97)82982-0
10.1016/S1044-0305(97)00284-5
10.1021/ac980724h
10.1002/pmic.200300520
10.1073/pnas.2532248100
10.1093/bioinformatics/17.suppl_1.S323
Dates
Type | When |
---|---|
Created | 19 years, 6 months ago (Feb. 4, 2006, 5:48 a.m.) |
Deposited | 1 year, 6 months ago (Feb. 2, 2024, 1:39 p.m.) |
Indexed | 3 months, 2 weeks ago (May 9, 2025, 1:50 a.m.) |
Issued | 19 years, 6 months ago (Feb. 3, 2006) |
Published | 19 years, 6 months ago (Feb. 3, 2006) |
Published Online | 19 years, 6 months ago (Feb. 3, 2006) |
Published Print | 19 years, 3 months ago (May 1, 2006) |
@article{Hilario_2006, title={Processing and classification of protein mass spectra}, volume={25}, ISSN={1098-2787}, url={http://dx.doi.org/10.1002/mas.20072}, DOI={10.1002/mas.20072}, number={3}, journal={Mass Spectrometry Reviews}, publisher={Wiley}, author={Hilario, Melanie and Kalousis, Alexandros and Pellegrini, Christian and Müller, Markus}, year={2006}, month=feb, pages={409–449} }