Abstract
Recent advances in microscope automation provide new opportunities for high-throughput cell biology, such as image-based screening. High-complex image analysis tasks often make the implementation of static and predefined processing rules a cumbersome effort. Machine-learning methods, instead, seek to use intrinsic data structure, as well as the expert annotations of biologists to infer models that can be used to solve versatile data analysis tasks. Here, we explain how machine-learning methods work and what needs to be considered for their successful application in cell biology. We outline how microscopy images can be converted into a data representation suitable for machine learning, and then introduce various state-of-the-art machine-learning algorithms, highlighting recent applications in image-based screening. Our Commentary aims to provide the biologist with a guide to the application of machine learning to microscopy assays and we therefore include extensive discussion on how to optimize experimental workflow as well as the data analysis pipeline.
References
100
Referenced
207
10.1073/pnas.102102699
/ Proc. Natl. Acad. Sci. USA / Selection bias in gene extraction on the basis of microarray gene-expression data. by Ambroise (2002)10.1371/journal.pcbi.1000173
/ PLOS Comput. Biol. / Support vector machines and kernels for computational biology. by Ben-Hur (2008){'volume-title': 'Pattern Recognition and Machine Learning', 'year': '2006', 'author': 'Bishop', 'key': '2021042613151555200_b3'}
/ Pattern Recognition and Machine Learning by Bishop (2006)10.1093/bioinformatics/17.12.1213
/ Bioinformatics / A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells. by Boland (2001)10.1023/A:1010933404324
/ Mach. Learn. / Random forests. by Breiman (2001){'volume-title': 'CART: Classification and Regression Trees', 'year': '1983', 'author': 'Breiman', 'key': '2021042613151555200_b6'}
/ CART: Classification and Regression Trees by Breiman (1983){'key': '2021042613151555200_b7', 'article-title': 'Assay development guidelines for image-based high content screening, high content analysis and high content imaging.', 'volume-title': 'Assay Guidance Manual', 'author': 'Buchser', 'year': '2004'}
/ Assay Guidance Manual / Assay development guidelines for image-based high content screening, high content analysis and high content imaging. by Buchser (2004)10.1002/bies.201200032
/ Bioessays / Toward the virtual cell: automated approaches to building models of subcellular organization “learned” from microscopy images. by Buck (2012)10.1186/gb-2006-7-10-r100
/ Genome Biol. / CellProfiler: image analysis software for identifying and quantifying cell phenotypes. by Carpenter (2006)10.1093/bioinformatics/bth932
/ Bioinformatics / Splice site identification by idlBNs. by Castelo (2004)10.1038/nchembio.363
/ Nat. Chem. Biol. / Small molecules discovered in a pathway screen target the Rho pathway in cytokinesis. by Castoreno (2010)10.1186/1471-2105-8-210
/ BMC Bioinformatics / A multiresolution approach to automated classification of protein subcellular location images. by Chebira (2007)10.1016/0031-3203(94)00116-4
/ Pattern Recognit. / Statistical geometrical features for texture classification. by Chen (1995)10.1038/nature08779
/ Nature / Systems survey of endocytosis by multiparametric image analysis. by Collinet (2010)10.1083/jcb.200910105
/ J. Cell Biol. / Automated microscopy for high-content RNAi screening. by Conrad (2010)10.1101/gr.2383804
/ Genome Res. / Automatic identification of subcellular phenotypes on human cell arrays. by Conrad (2004)10.1038/nmeth.1558
/ Nat. Methods / Micropilot: automation of fluorescence microscopy-based imaging for systems biology. by Conrad (2011)10.1016/j.cell.2011.11.001
/ Cell / Computer vision in cell biology. by Danuser (2011)10.1007/978-1-60327-194-3_11
/ Methods Mol. Biol. / Feature selection and machine learning with mass spectrometry data. by Datta (2010)10.1093/bib/bbt020
/ Brief. Bioinform. / Pattern recognition in bioinformatics. by de Ridder (2013)10.1371/journal.pcbi.1000029
/ PLOS Comput. Biol. / Nonnegative matrix factorization: an analytical and interpretive tool in computational biology. by Devarajan (2008)10.1016/j.cell.2008.12.041
/ Cell / RNF168 binds and amplifies ubiquitin conjugates on damaged chromosomes to allow accumulation of repair proteins. by Doil (2009)10.1145/2347736.2347755
/ Commun. ACM / A few useful things to know about machine learning. by Domingos (2012)10.1038/nmeth.2084
/ Nat. Methods / Biological imaging software tools. by Eliceiri (2012)10.1007/3-540-59119-2_166
/ Computational Learning Theory / A decision-theoretic generalization of on-line learning and an application to boosting. by Freund (1995)10.1214/aos/1016218223
/ Ann. Stat. / Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). by Friedman (2000)10.1038/msb.2010.25
/ Mol. Syst. Biol. / Clustering phenotype populations by genome-wide RNAi and multiparametric imaging. by Fuchs (2010)10.1186/gb-2004-5-10-r80
/ Genome Biol. / Bioconductor: open software development for computational biology and bioinformatics. by Gentleman (2004)10.1186/1471-2105-10-94
/ BMC Bioinformatics / Statistical and visual differentiation of subcellular imaging. by Hamilton (2009)10.1109/PROC.1979.11328
/ Proc. IEEE / Statistical and structural approaches to texture. by Haralick (1979){'volume-title': 'The Elements of Statistical Learning: Data Mining, Inference and Prediction', 'year': '2005', 'author': 'Hastie', 'key': '2021042613151555200_b31'}
/ The Elements of Statistical Learning: Data Mining, Inference and Prediction by Hastie (2005)10.1038/nmeth.1486
/ Nat. Methods / CellCognition: time-resolved phenotype annotation in high-throughput live cell imaging. by Held (2010)10.1038/nmeth.1581
/ Nat. Methods / Mapping of signaling networks through synthetic genetic interaction analysis by RNAi. by Horn (2011)10.1177/1087057111414878
/ J. Biomol. Screen. / Machine learning improves the precision and robustness of high-content screens: using nonlinear multiparametric methods to analyze screening results. by Horvath (2011)10.1016/j.jim.2004.04.011
/ J. Immunol. Methods / Automated interpretation of subcellular patterns from immunofluorescence microscopy. by Hu (2004){'key': '2021042613151555200_b36', 'first-page': '1139', 'article-title': 'Automated classification of subcellular patterns in multicell images without segmentation into single cells.', 'volume-title': 'Proceedings of the IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2004', 'author': 'Huang', 'year': '2004'}
/ Proceedings of the IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2004 / Automated classification of subcellular patterns in multicell images without segmentation into single cells. by Huang (2004)10.1002/cyto.a.20793
/ Cytometry / Efficient framework for automated classification of subcellular patterns in budding yeast. by Huh (2009)10.1186/1471-2105-9-482
/ BMC Bioinformatics / CellProfiler Analyst: data exploration and analysis software for complex image-based screens. by Jones (2008)10.1073/pnas.0808843106
/ Proc. Natl. Acad. Sci. USA / Scoring diverse cellular morphologies in image-based screens with iterative feedback and machine learning. by Jones (2009)10.1093/bioinformatics/btr095
/ Bioinformatics / Improved structure, function and compatibility for CellProfiler: modular high-throughput image analysis software. by Kamentsky (2011)10.1162/neco.2007.19.8.2183
/ Neural Comput. / Robust loss functions for boosting. by Kanamori (2007)10.1109/CVPR.2010.5540029
/ Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) / Neuron geometry extraction by perceptual grouping in sstem images. by Kaynig (2010)10.1016/S0925-2312(03)00372-2
/ Neurocomputing / Financial time series forecasting using support vector machines. by Kim (2003)10.1038/nature03159
/ Nature / An endoribonuclease-prepared siRNA screen in human cells identifies genes essential for cell division. by Kittler (2004)10.1038/ncb1659
/ Nat. Cell Biol. / Genome-scale RNAi profiling of cell division in human tissue culture cells. by Kittler (2007){'key': '2021042613151555200_b46', 'first-page': '1137', 'article-title': 'A study of cross-validation and bootstrap for accuracy estimation and model selection.', 'volume-title': 'Proceedings of The International Joint Conference on Artificial Intelligence, Vol. 14', 'author': 'Kohavi', 'year': '1995'}
/ Proceedings of The International Joint Conference on Artificial Intelligence, Vol. 14 / A study of cross-validation and bootstrap for accuracy estimation and model selection. by Kohavi (1995){'key': '2021042613151555200_b47', 'first-page': '25', 'article-title': 'Handling imbalanced datasets: A review.', 'volume': '30', 'author': 'Kotsiantis', 'year': '2006', 'journal-title': 'GESTS International Transactions on Computer Science and Engineering'}
/ GESTS International Transactions on Computer Science and Engineering / Handling imbalanced datasets: A review. by Kotsiantis (2006){'volume-title': 'High Content Screening', 'year': '2007', 'author': 'Lansing Taylor', 'key': '2021042613151555200_b48'}
/ High Content Screening by Lansing Taylor (2007)10.1093/bib/bbk007
/ Brief. Bioinform. / Machine learning in bioinformatics. by Larrañaga (2006){'key': '2021042613151555200_b50', 'first-page': '184', 'article-title': 'Novel morphological phenotypes discovery in high-content screens using underused features.', 'volume-title': 'Proceedings of the ISCA 2nd International Conference on Bioinformatics and Computational Biology', 'author': 'Lin', 'year': '2010'}
/ Proceedings of the ISCA 2nd International Conference on Bioinformatics and Computational Biology / Novel morphological phenotypes discovery in high-content screens using underused features. by Lin (2010)10.1002/cyto.a.10107
/ Cytometry / Image analysis for automatic segmentation of cytoplasms and classification of Rac1 activation. by Lindblad (2004){'key': '2021042613151555200_b52', 'first-page': '6601', 'article-title': 'Features for cells and nuclei classification.', 'volume-title': 'Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society,', 'author': 'Liu', 'year': '2011'}
/ Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, / Features for cells and nuclei classification. by Liu (2011)10.1038/nmeth.2083
/ Nat. Methods / Annotated high-throughput microscopy image sets for validation. by Ljosa (2012)10.1016/j.yexcr.2010.04.001
/ Exp. Cell Res. / Systems microscopy: an emerging strategy for the life sciences. by Lock (2010)10.1038/nmeth.1366
/ Nat. Methods / A 3D digital atlas of C. elegans and its application to single-cell analyses. by Long (2009)10.1038/nmeth1032
/ Nat. Methods / Image-based multivariate profiling of drug responses from single cells. by Loo (2007)10.1083/jcb.200904140
/ J. Cell Biol. / Heterogeneity in the physiological states and pharmacological responses of differentiating 3T3-L1 preadipocytes. by Loo (2009)10.1016/j.celrep.2012.09.003
/ Cell Reports / RNAi screening reveals proteasome- and Cullin3-dependent stages in vaccinia virus infection. by Mercer (2012)10.1016/S0925-2312(03)00431-4
/ Neurocomputing / The support vector machine under test. by Meyer (2003)10.1038/nmeth.1600
/ Nat. Methods / Adaptive informatics for multifactorial and high-content biological data. by Millard (2011)10.1101/gr.5755407
/ Genome Res. / Identification of novel peptide hormones in the human proteome by hidden Markov model screening. by Mirabeau (2007)10.1038/nchembio.576
/ Nat. Chem. Biol. / An active role for machine learning in drug development. by Murphy (2011)10.1038/nmeth.2024
/ Nat. Methods / Why bioimage informatics matters. by Myers (2012)10.1038/nature08869
/ Nature / Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes. by Neumann (2010){'key': '2021042613151555200_b65', 'first-page': '841', 'article-title': 'On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes.', 'volume': '14', 'author': 'Ng', 'year': '2002', 'journal-title': 'Adv. Neural Inf. Process. Syst.'}
/ Adv. Neural Inf. Process. Syst. / On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. by Ng (2002)10.1080/10255842.2012.670855
/ Comput. Methods Biomech. Biomed. Engin / Medical image registration: a review. by Oliveira (2012)10.1093/bioinformatics/btq046
/ Bioinformatics / EBImage—an R package for image processing with applications to cellular phenotypes. by Pau (2010){'article-title': 'imageHTS: Analysis of high-throughput microscopy-based screens.</emph>', 'year': '2013', 'author': 'Pau', 'key': '2021042613151555200_b68'}
/ imageHTS: Analysis of high-throughput microscopy-based screens.</emph> by Pau (2013)10.1126/science.1100709
/ Science / Multidimensional drug profiling by automated microscopy. by Perlman (2004)10.1109/5.18626
/ Proc. IEEE / A tutorial on hidden Markov models and selected applications in speech recognition. by Rabiner (1989)10.1038/nmeth.2097
/ Nat. Methods / PhenoRipper: software for rapidly profiling microscopy images. by Rajaram (2012)10.1038/nmeth.2096
/ Nat. Methods / SimuCell: a flexible framework for creating synthetic microscopy images. by Rajaram (2012)10.1038/nmeth.1584
/ Nat. Methods / mProphet: automated data processing and statistical validation for large-scale SRM experiments. by Reiter (2011)10.1093/bioinformatics/btm344
/ Bioinformatics / A review of feature selection techniques in bioinformatics. by Saeys (2007)10.1038/nmeth.2019
/ Nat. Methods / Fiji: an open-source platform for biological-image analysis. by Schindelin (2012)10.1038/ncb2092
/ Nat. Cell Biol. / Live-cell imaging RNAi screen identifies PP2A-B55alpha and importin-beta1 as key mitotic exit regulators in human cells. by Schmitz (2010)10.1111/j.1365-2818.2011.03502.x
/ J. Microsc. / Assessing the efficacy of low-level image content descriptors for computer-based fluorescence microscopy image analysis. by Shamir (2011)10.1186/1751-0473-3-13
/ Source Code Biol. Med. / Wndchrm - an open source utility for biological image analysis. by Shamir (2008)10.1371/journal.pcbi.1000974
/ PLOS Comput. Biol. / Pattern recognition software and techniques for biological image analysis. by Shamir (2010)10.1177/1087057110370894
/ J. Biomol. Screen. / Automated image analysis for high-content screening and analysis. by Shariff (2010)10.1038/msb.2010.22
/ Mol. Syst. Biol. / Patterns of basal signaling heterogeneity can distinguish cellular populations with different drug sensitivities. by Singh (2010)10.1073/pnas.0807038105
/ Proc. Natl. Acad. Sci. USA / Characterizing heterogeneous cellular responses to perturbations. by Slack (2008)10.1109/ISBI.2011.5872394
/ Proceedings of the 2011 8th IEEE International Symposium on Biomedical Imaging: From Nano to Macro / Ilastik: interactive learning and segmentation toolkit. by Sommer (2011)10.1093/bioinformatics/btt175
/ Bioinformatics / CellH5: a format for data exchange in high-content screening. by Sommer (2013)10.1016/j.tcb.2009.08.007
/ Trends Cell Biol. / Open source bioimage informatics for cell biology. by Swedlow (2009)10.1371/journal.pcbi.0030116
/ PLOS Comput. Biol. / Machine learning and its applications to biology. by Tarca (2007)10.1109/83.650848
/ IEEE Trans. Image Process. / A pyramid approach to subpixel registration based on intensity. by Thévenaz (1998)10.3115/1564131.1564140
/ Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing / A web survey on the use of active learning to support annotation of text data. by Tomanek (2009)10.1109/TPAMI.2009.186
/ IEEE Trans. Pattern Anal. Mach. Intell. / Auto-context and its application to high-level vision tasks and 3D brain image segmentation. by Tu (2010){'key': '2021042613151555200_b90', 'first-page': '1', 'article-title': 'Dimensionality reduction: A comparative review.', 'volume': '10', 'author': 'Van der Maaten', 'year': '2009', 'journal-title': 'J. Mach. Learn. Res.'}
/ J. Mach. Learn. Res. / Dimensionality reduction: A comparative review. by Van der Maaten (2009)10.1007/978-1-4757-3264-1
/ The Nature of Statistical Learning Theory by Vapnik (2000)10.1023/B:VISI.0000013087.49260.fb
/ Int. J. Comput. Vis. / Robust real-time face detection. by Viola (2004)10.1177/1087057107311223
/ J. Biomol. Screen. / Cellular phenotype recognition for high-content RNA interference genome-wide screening. by Wang (2008)10.1371/journal.pone.0056690
/ PLoS ONE / Label-free detection of neuronal differentiation in cell populations using high-throughput live-cell imaging of PC12 cells. by Weber (2013)10.1371/journal.pbio.1000522
/ PLoS Biol. / A protein inventory of human ribosome biogenesis reveals an essential function of exportin 5 in 60S subunit export. by Wild (2010)10.1016/j.cell.2013.01.033
/ Cell / Dual specificity kinase DYRK3 couples stress granule condensation/dissolution to mTORC1 signaling. by Wippich (2013)10.1083/jcb.201112112
/ J. Cell Biol. / Sds22 and Repo-Man stabilize chromosome segregation by counteracting Aurora B on anaphase kinetochores. by Wurzenberger (2012)10.1093/bioinformatics/btg477
/ Bioinformatics / Bio-support vector machines for computational proteomics. by Yang (2004)10.1002/cyto.a.20810
/ Cytometry A / Automated quality assessment of autonomously acquired microscopic images of fluorescently stained bacteria. by Zeder (2010)10.1038/nmeth.2046
/ Nat. Methods / Unsupervised modeling of cell morphology dynamics for time-lapse microscopy. by Zhong (2012)
Dates
Type | When |
---|---|
Created | 11 years, 9 months ago (Nov. 20, 2013, 8:58 p.m.) |
Deposited | 1 year, 3 months ago (May 20, 2024, 12:17 a.m.) |
Indexed | 1 month ago (July 30, 2025, 10:01 a.m.) |
Issued | 12 years, 8 months ago (Jan. 1, 2013) |
Published | 12 years, 8 months ago (Jan. 1, 2013) |
Published Online | 12 years, 8 months ago (Jan. 1, 2013) |
@article{Sommer_2013, title={Machine learning in cell biology – teaching computers to recognize phenotypes}, ISSN={0021-9533}, url={http://dx.doi.org/10.1242/jcs.123604}, DOI={10.1242/jcs.123604}, journal={Journal of Cell Science}, publisher={The Company of Biologists}, author={Sommer, Christoph and Gerlich, Daniel W.}, year={2013}, month=jan }