10.1145/235968.233324
Crossref journal-article
Association for Computing Machinery (ACM)
ACM SIGMOD Record (320)
Abstract

Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely studied problems in this area is the identification of clusters, or densely populated regions, in a multi-dimensional dataset. Prior work does not adequately address the problem of large datasets and minimization of I/O costs.This paper presents a data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. BIRCH incrementally and dynamically clusters incoming multi-dimensional metric data points to try to produce the best quality clustering with the available resources (i.e., available memory and time constraints). BIRCH can typically find a good clustering with a single scan of the data, and improve the quality further with a few additional scans. BIRCH is also the first clustering algorithm proposed in the database area to handle "noise" (data points that are not part of the underlying pattern) effectively.We evaluate BIRCH 's time/space efficiency, data input order sensitivity, and clustering quality through several experiments. We also present a performance comparisons of BIRCH versus CLARANS, a clustering method proposed recently for large datasets, and show that BIRCH is consistently superior.

Bibliography

Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH. ACM SIGMOD Record, 25(2), 103–114.

Authors 3
  1. Tian Zhang (first)
  2. Raghu Ramakrishnan (additional)
  3. Miron Livny (additional)
References 12 Referenced 1,864
  1. Peter C, heeselnan, James Kelly , Matthew Self , Auto C Iass : A Bayesian Classification System , Proc. of the 5th Int'l Conf. on Machine Learning, Morgan Kaufman, 3un. 1988 . Peter C, heeselnan, James Kelly, Matthew Self, et al., Auto CIass : A Bayesian Classification System, Proc. of the 5th Int'l Conf. on Machine Learning, Morgan Kaufman, 3un. 1988. / Proc. of the 5th Int'l Conf. on Machine Learning, Morgan Kaufman, 3un. by Peter (1988)
  2. Richard Duds , and Peter E . Hart, Pattern Classification and Scene Analysis , Wiley , 1973 . Richard Duds, and Peter E. Hart, Pattern Classification and Scene Analysis, Wiley, 1973. / Hart, Pattern Classification and Scene Analysis by Duds Richard (1973)
  3. R. Dubes , and A.K. Jain , Clustering Methodologies in Exploratory Data Analysis Advances in C, omputers , Edited by M.C. Yovits, Vol. 19 , Academic Press , New York , 1980 . R. Dubes, and A.K. Jain, Clustering Methodologies in Exploratory Data Analysis Advances in C, omputers, Edited by M.C. Yovits, Vol. 19, Academic Press, New York, 1980. / Clustering Methodologies in Exploratory Data Analysis Advances in C, omputers by Dubes R. (1980)
  4. Martin Ester , Hans-Peter Kriegel , and Xiaowei Xu , A Database Interface . for Clustering in Large Spatial Databases , Proc. of 1st {nt'l Conf. on Knowledge Discovery and Data Mining , 1995 . Martin Ester, Hans-Peter Kriegel, and Xiaowei Xu, A Database Interface .for Clustering in Large Spatial Databases, Proc. of 1st {nt'l Conf. on Knowledge Discovery and Data Mining, 1995. / Proc. of 1st {nt'l Conf. on Knowledge Discovery and Data Mining by Ester Martin (1995)
  5. Martin Ester , Hans-Peter Kriegel , and Xiaowei Xu , Knowledge Discovery in Large Spatial Databases : Focusing Techniques for Efficient Class Identification , Proc. of 4th Int'l Symposium on Large Spatial Databases , Portland, Maine, U.S.A. , 1995 . Martin Ester, Hans-Peter Kriegel, and Xiaowei Xu, Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification, Proc. of 4th Int'l Symposium on Large Spatial Databases, Portland, Maine, U.S.A., 1995. / Proc. of 4th Int'l Symposium on Large Spatial Databases by Ester Martin (1995)
  6. 10.1023/A:1022852608280
  7. A. Gersho and P~. Gray , Vector quantization and signal compression , Boston, Ms .: Kluwer Academic Publishers , 1992 . A. Gersho and P~. Gray, Vector quantization and signal compression, Boston, Ms.: Kluwer Academic Publishers, 1992. (10.1007/978-1-4615-3626-0) / Vector quantization and signal compression by Gersho A. (1992)
  8. Leonard Kaufman , and Peter J . Rousseeuw , Finding Groups in Data - An Introduction to Cluster Analysis , Wiley Series in Probability and Mathematical Statistics , 1990 . Leonard Kaufman, and Peter J. Rousseeuw, Finding Groups in Data - An Introduction to Cluster Analysis, Wiley Series in Probability and Mathematical Statistics, 1990. (10.1002/9780470316801) / Finding Groups in Data - An Introduction to Cluster Analysis / Wiley Series in Probability and Mathematical Statistics by Kaufman Leonard (1990)
  9. 10.1023/A:1022800624210
  10. R.C.T. Lee , Clustering analysis and its applications , Advances in Information Systems Science , Edited by J .T.Toum, Vol. 8 , pp. 169 - 292 , Plenum Press , New York , 1981 . R.C.T.Lee, Clustering analysis and its applications, Advances in Information Systems Science, Edited by J .T.Toum, Vol. 8, pp. 169-292, Plenum Press, New York, 1981. (10.1007/978-1-4613-9883-7_4) / Advances in Information Systems Science by Lee R.C.T. (1981)
  11. F. Murtagh , A Survey of _Recent Advances in Hier'archical Clustering Algorithms , The Computer Journal , 1933 . F. Murtagh, A Survey of _Recent Advances in Hier'archical Clustering Algorithms, The Computer Journal, 1933. / The Computer Journal by Murtagh F. (1933)
  12. Raymond T. Ng and Jiawei Hart, Efficient and Effective Clustering Methods for ,Spatial Data Mining , Proc. of VLDB , 1994 . Raymond T. Ng and Jiawei Hart, Efficient and Effective Clustering Methods for ,Spatial Data Mining, Proc. of VLDB, 1994. / Proc. of VLDB by Raymond (1994)
Dates
Type When
Created 19 years, 9 months ago (Nov. 9, 2005, 3:35 p.m.)
Deposited 2 months, 1 week ago (June 18, 2025, 4:01 p.m.)
Indexed 2 days, 9 hours ago (Aug. 23, 2025, 9:34 p.m.)
Issued 29 years, 2 months ago (June 1, 1996)
Published 29 years, 2 months ago (June 1, 1996)
Published Online 29 years, 2 months ago (June 1, 1996)
Published Print 29 years, 2 months ago (June 1, 1996)
Funders 0

None

@article{Zhang_1996, title={BIRCH: an efficient data clustering method for very large databases}, volume={25}, ISSN={0163-5808}, url={http://dx.doi.org/10.1145/235968.233324}, DOI={10.1145/235968.233324}, number={2}, journal={ACM SIGMOD Record}, publisher={Association for Computing Machinery (ACM)}, author={Zhang, Tian and Ramakrishnan, Raghu and Livny, Miron}, year={1996}, month=jun, pages={103–114} }