Abstract
AbstractMotivation: An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease classification. Thus there is a need for developing statistical methods that can efficiently use such high-throughput genomic data, select biomarkers with discriminant power and construct classification rules. The ROC (receiver operator characteristic) technique has been widely used in disease classification with low-dimensional biomarkers because (1) it does not assume a parametric form of the class probability as required for example in the logistic regression method; (2) it accommodates case–control designs and (3) it allows treating false positives and false negatives differently. However, due to computational difficulties, the ROC-based classification has not been used with microarray data. Moreover, the standard ROC technique does not incorporate built-in biomarker selection.Results: We propose a novel method for biomarker selection and classification using the ROC technique for microarray data. The proposed method uses a sigmoid approximation to the area under the ROC curve as the objective function for classification and the threshold gradient descent regularization method for estimation and biomarker selection. Tuning parameter selection based on the V-fold cross validation and predictive performance evaluation are also investigated. The proposed approach is demonstrated with a simulation study, the Colon data and the Estrogen data. The proposed approach yields parsimonious models with excellent classification performance.Availability: R code is available upon request.Contact: jian@stat.uiowa.edu
References
28
Referenced
134
10.1016/S0165-1765(98)00255-9
/ Econ. Lett. / Computation of the maximum rank correlation estimator by Abrevaya (1999)10.1073/pnas.96.12.6745
/ Proc. Natl Acad. Sci. USA / Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays by Alon (1999)10.1073/pnas.102102699
/ Proc. Natl Acad. Sci. USA / Selection bias in gene extraction on the basis of microarray gene-expression data by Ambroise (2002)10.1089/106652700750050943
/ J. Comput. Biol. / Tissue classification with gene expression profiles by Ben-Dor (2000){'key': '2023061007214985000_b5', 'first-page': '59', 'article-title': 'Improved statistical tests for differential gene expression by shrinking variance components estimates', 'volume': '6', 'author': 'Cui', 'year': '2005', 'journal-title': 'Bioinformatics'}
/ Bioinformatics / Improved statistical tests for differential gene expression by shrinking variance components estimates by Cui (2005)10.1093/bioinformatics/btf867
/ Bioinformatics / Boosting for tumor classification with gene expression data by Dettling (2003)10.1198/016214502753479248
/ J. Am. Stat. Assoc. / Comparison of discrimination methods for tumor classification based on microarray data by Dudoit (2002)10.1214/009053604000000067
/ Ann. Stat. / Least angle regression by Efron (2004){'key': '2023061007214985000_b9', 'article-title': 'Gradient directed regularization for linear regression and classification', 'volume-title': 'Technical report', 'author': 'Friedman', 'year': '2004'}
/ Technical report / Gradient directed regularization for linear regression and classification by Friedman (2004){'volume-title': 'Computational Learning and Probabilistic Reasoning', 'year': '1996', 'author': 'Gammerman', 'key': '2023061007214985000_b10'}
/ Computational Learning and Probabilistic Reasoning by Gammerman (1996)10.1155/JBB.2005.147
/ J. Biomed. Biotechnol. / Classification and selection of biomarkers in genomic data using LASSO by Ghosh (2005){'article-title': 'Threshold gradient descent method for censored data regression with applications in pharmacogenomics', 'year': '2005', 'author': 'Gui', 'key': '2023061007214985000_b12'}
/ Threshold gradient descent method for censored data regression with applications in pharmacogenomics by Gui (2005)10.1016/0304-4076(87)90030-3
/ J. Econometrics / Non-parametric analysis of a generalized regression model by Han (1987)10.1007/978-0-387-21606-5
/ The Elements of Statistical Learning by Hastie (2001)10.2307/2951582
/ Econometrica / A smoothed maximum score estimator for the binary response model by Horowitz (1992){'key': '2023061007214985000_b16', 'article-title': 'Additive risk models for survival data with high dimensional covariates', 'volume-title': 'Biometrics', 'author': 'Ma', 'year': '2005'}
/ Biometrics / Additive risk models for survival data with high dimensional covariates by Ma (2005)10.1177/0272989X9901900110
/ Med. Decis. Making / Three-way ROCs by Mossman (1999)10.1093/bioinformatics/18.1.39
/ Bioinformatics / Tumor classification by partial least squares using microarray gene expression data by Nguyen (2002)10.1093/oso/9780198509844.001.0001
/ The Statistical Evaluation of Medical Tests for Classification and Prediction by Pepe (2003)10.1093/aje/kwh101
/ Am. J. Epidemiol. / Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker by Pepe (2004){'key': '2023061007214985000_b21', 'article-title': 'Combining predictors for classification using the area under the ROC curve', 'author': 'Pepe', 'year': '2005', 'journal-title': 'Biometrics'}
/ Biometrics / Combining predictors for classification using the area under the ROC curve by Pepe (2005)10.1093/bioinformatics/bth383
/ Bioinformatics / Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction by Pochet (2004)10.1023/A:1024099825458
/ Mach. Learning / Tree induction for probability based rankings by Provost (2003){'article-title': 'R: a language and environment for statistical computing', 'year': '2005', 'author': 'R Development Core Team', 'key': '2023061007214985000_b24'}
/ R: a language and environment for statistical computing by R Development Core Team (2005){'article-title': 'Prediction and uncertainty in the analysis of gene expression profiles', 'year': '2001', 'author': 'Spang', 'key': '2023061007214985000_b25'}
/ Prediction and uncertainty in the analysis of gene expression profiles by Spang (2001)10.1111/j.2517-6161.1996.tb02080.x
/ J. R. Stat. Soc. B / Regression shrinkage and selection via the lasso by Tibshirani (1996)10.1137/1.9781611970128
/ Spline models for observational data by Wahba (1990)10.1073/pnas.201162998
/ Proc. Natl Acad. Sci. USA / Predicting the clinical status of human breast cancer by using gene expression profiles by West (2001)
Dates
Type | When |
---|---|
Created | 19 years, 10 months ago (Oct. 18, 2005, 11:38 p.m.) |
Deposited | 7 months, 2 weeks ago (Jan. 4, 2025, 8:35 p.m.) |
Indexed | 1 month, 4 weeks ago (June 27, 2025, 3:28 a.m.) |
Issued | 19 years, 10 months ago (Oct. 18, 2005) |
Published | 19 years, 10 months ago (Oct. 18, 2005) |
Published Online | 19 years, 10 months ago (Oct. 18, 2005) |
Published Print | 19 years, 8 months ago (Dec. 15, 2005) |
@article{Ma_2005, title={Regularized ROC method for disease classification and biomarker selection with microarray data}, volume={21}, ISSN={1367-4803}, url={http://dx.doi.org/10.1093/bioinformatics/bti724}, DOI={10.1093/bioinformatics/bti724}, number={24}, journal={Bioinformatics}, publisher={Oxford University Press (OUP)}, author={Ma, Shuangge and Huang, Jian}, year={2005}, month=oct, pages={4356–4362} }