P. K. Agarwal and C. M. Procopiuc. Exact and approximation algorithms for clustering. Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 658–667, 1998.
 D. W. Aha. Lazy learning: Special issue editorial. Artiﬁcial Intelligence Review, 11(1-5):7–10, 1997.
 D. Cai, X. He, and J. Han. Document clustering using locality preserving indexing. IEEE Transactions on Knowledge and Data Engineering, 17(12):1624–1637, 2005.
 H. Chim and X. Deng. Efﬁcient phrase-based document similarity for clustering. IEEE Transactions on Knowledge and Data Engineering, 20(9):1217–1229, 2008.
 M. Craven, D. DiPasquo, D. Freitag, A. K. McCallum, T. M. Mitchell, K. Nigam, and S. Slattery. Learning to extract symbolic knowledge form the world wide web. Proceedings of 15th National Conference on Artiﬁcial Intelligence, 1998.
 I. S. Dhillon, J. Kogan, and C. Nicholas. Feature Selection and Document Clustering. In Berry MW Ed. A Comprehensive Survey of Text Mining, 2003.
 I. S. Dhillon and D. S. Modha. Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1):143–175, 2001.
 J. D’hondt, J. Vertommen, P.-A. Verhaegen, D. Cattrysse, and J. R. Duﬂou. Pairwise-adaptive dissimilarity measure for document clustering. Information Sciences, 180:2341–2358, 2010.
 C. G. Gonz′alez, W. B. Jr., and A. L. V. Rodrigues. Density of closed balls in real-valued and autometrized boolean spaces for clustering applications. 19th Brazilian Symposium on Artiﬁcial Intelligence, pages8–22, 2008.
 K. M. Hammouda and M. S. Kamel. Efﬁcient phrase-based document indexing for web document clustering. IEEE Transactions on Knowledge and Data Engineering, 16(10):1279–1296, 2004.
 K. M. Hammouda and M. S. Kamel. Hierarchically distributed peer-to-peer document clustering and cluster summarization. IEEE Transactionson Knowledge and Data Engineering, 21(5):681–698, 2009.
 J. Han and M. Kamber. Data Mining: Concepts and Techniques. Second Edition, Morgan Kaufmann, Elsevier, 2006.
 T. Joachims. A probabilistic analysis of the rocchio algorithm with tﬁdf for text categorization. International Conference on Machine Learning, pages143–151, 1997.
 T. Joachims and F. Sebastiani. Guest editors’ introduction to the special issue on automated text categorization. Journal of Intelligent Information Systems, 18(2/3):103–105, 2002.
 T. Kanungo, D. M. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Y. Wu. An efﬁcient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):881–892, 2002.
 H. Kim, P. Howland, and H. Park. Dimension reduction in text classiﬁcation with support vector machines. Journal of Machine Learning Research, 6:37–53, 2005.
 S.-B. Kim, K.-S. Han, H.-C. Rim, and S. H. Myaeng. Some effective techniques for naïve bayes text classiﬁcation. IEEE Transactions on Knowledge and Data Engineering, 18(11):1457–1466, 2006.
 K. Knight. Mining online text. Communications of the ACM, 42(11):58–61, 1999.
 J. Kogan, C. Nicholas, and V. Volkovich. Text mining with information-theoretic clustering. Computing in Science and Engineering, 5(6):52–59, 2003.
 J. Kogan, M. Teboulle, and C. K. Nicholas. Data driven similarity measures for k-means like clustering algorithms. Information Retrieval, 8(2):331–349, 2005.
 S. Kolliopoulos and S. Rao. A nearly linear-time approximation scheme for the euclidean k-median problem. Seventh Annual European Symposium on Algorithms, pages362–371, 1999.
 V. Lertnattee and T. Theeramunkong. Multidimensional text classiﬁcation for drug information. IEEE Transactions on Information Technology in Biomedicine, 8(3):306–312, 2004.
 D. D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361–397, 2004.
 M. G. Michie. Use of the bray-curtis similarity measure in cluster analysis of foraminiferal data. Mathematical Geology, 14(6):661–667, 1982.
 T. Mitchell. Machine Learning. McGraw-Hill, 1997.
 K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Text classiﬁcation from labeled and unlabeled documents using em. Machine Learning, 39(2/3):103–134, 2000.
 K. Nigam, A. K. McCallum, S. Thrun, and T. M. Mitchell. Learning to classify text from labeled and unlabeled documents. Proceedings of 15th National Conference on Artiﬁcial Intelligence, 1998.
 G. Salton and M. J. McGill. Introduction to Modern Retrieval. McGraw-Hill Book Company, 1983.
 T. W. Schoenharl and G. Madey. Evaluation of measurement techniques for the validation of agent-based simulations against streaming data. International Conference on Computational Science, 2008.
 F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1–47, 2002.
 C. Silva, U. Lotric, B. Ribeiro, and A. Dobnikar. Distributed text classiﬁcation with an ensemble kernel-based learning approach. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 40(3):287–297, 2010.
 A. Strehl and J. Ghosh. Value-based customer grouping from large retail data-sets. SPIE Conference on Data Mining and Knowledge Discovery, 4057:33–42, 2000.
 P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Addision-Wesley, 2006.
 M. L. Zhang and Z. H. Zhou. ML-KNN: Alazy learning approach to multi-label learning. Pattern Recognition, 40(7):2038–2048, 2007.
 T. Zhang, Y. Y. Tang, B. Fang, and Y. Xiang. Document clustering in correlation similarity measure space. IEEE Transactions on Knowledge and Data Engineering (to appear), 2011.
 Y. Zhao and G. Karypis. Comparison of agglomerative and partitional document clustering algorithms. The Workshop on Clustering High Dimensional Data and its Applications at the Second SIAM International Conference on Data Mining, pages83–93, 2002.