|| M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slat-|
tery, “Learning to extract symbolic knowledge from the World Wide Web,” Fifteenth
National Conference on Artiﬁcal Intelligence, 1998.
 D. D. Lewis and K. A. Knowles, “Threading electronic mail: A preliminary study,”
Information Processing and Management, vol. 33, no. 2, pp. 209–217, 1997.
 K. Lang, “NewsWeeder : Learning to ﬁlter netnews,” International Conference on Ma-
chine Learning, pp. 331–339, 1995.
 S. Chakrabarti, B. Dom, R. Agrawal, and P. Raghavan, “Keyword detection, navigation,
and annotation in hierarchical text,” 23rd International Conference on Very Large Data
Bases, pp. 446–455, 1997.
 S. Weiss, S. Kasif, and E. Brill, “Text classiﬁcation in USENET newsgroup: A progress
report,” AAAI Spring Symposium on Machine Learning in Information Access Technical
 D. Hull, J. Pedersen, and H. Schutze, “Document routing as statistical classiﬁcation,”
AAAI Spring Symposium on Machine Learning in Information Access Technical Papers,
 T. Yan and H. Molina, “SIFT - a tool for wide-area information dissemination,” 1995
USENIX Technical Conference, pp. 177–186, 1995.
 G. Salton and M. J. McGill, Introduction to Modern Retrieval. McGraw-Hill Book
 T. Joachims, “A probabilistic analysis of the Rocchio algorithm with TFIDF for text
categorization,” in 14th International Conference on Machine Learning, 1997, pp. 143–
 H. Kim, P. Howland, and H. Park, “Dimension reduction in text classiﬁcation with
support vector machines,” Journal of Machine Learning Research, vol. 6, pp. 37–53,
 F. Sebastiani, “Machine learning in automated text categorization,” ACM Computing
Surveys, vol. 34, no. 1, pp. 1–47, 2002.
 B. Y. Ricardo and R. N. Berthier, Modern Information Retrieval. Addison Wesley
 A. L. Blum and P. Langley, “Selection of relevant features and examples in machine
learning,” Aritﬁcial Intelligence, vol. 97, no. 1-2, pp. 245–271, 1997.
 E. F. Combarro, E. Monta˜nés, I. Díaz, J. Ranilla, and R. Mones, “Introducing a family
of linear measures for feature selection in text categorization,” IEEE Transactions on
Knowledge and Data Engineering, vol. 17, no. 9, pp. 1223–1232, 2005.
 K. Daphne and M. Sahami, “Toward optimal feature selection,” in 13th International
Conference on Machine Learning, 1996, pp. 284–292.
 R. Kohavi and G. John, “Wrappers for feature subset selection,” Aritﬁcial Intelligence,
vol. 97, no. 1-2, pp. 273–324, 1997.
 Y. Yang and J. O. Pedersen, “A comparative study on feature selection in text cat-
egorization,” in 14th International Conference on Machine Learning, 1997, pp. 412–
 H. Liu and L. Yu, “Toward integrating feature selection algorithms for classiﬁcation and
clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp.
 D. D. Lewis, “Feature selection and feature extraction for text categorization,” in Work-
shop Speech and Natural Language, 1992, pp. 212–217.
 H. Li, T. Jiang, and K. Zang, “Eﬃcient and robust feature extraction by maximum
margin criterion,” in Conference on Advances in Neural Information Processing System,
2004, pp. 97–104.
 E. Oja, Subspace Methods of Pattern Recognition. Research Studies Press, 1983.
 R. Caruana and D. Freitag, “Greedy attribute selection.” 11th International Conference
on Machine Learning, pp. 28–36, 1994.
 J. G. Dy and C. E. Brodley, “Feature subset selection and order identiﬁcation for un-
supervised learning,” 17th International Conference on Machine Learning, pp. 247–254,
 Y. Kim, W. Street, and F. Menczer, “Feature selection for unsupervised learning via
evolutionary search,” Sixth ACM SIGKDD International Conference on Knowledge Dis-
covery and Data Mining, pp. 365–369, 2000.
 M. Dash, K. Choi, P. Scheuermann, and H. Liu, “Feature selection for clustering - a
ﬁlter solution,” Second International Conference on Data Mining, pp. 115–122, 2002.
 M. A. Hall, “Correlation-based feature selection for discrete and numeric class machine
learning,” 17th International Conference on Machine Learning, pp. 359–366, 2000.
 H. Liu and R. Setiono, “A probabilistic approach to feature selection - a ﬁlter solution,”
13th International Conference on Machine Learning, pp. 319–327, 1996.
 L. Yu and H. Liu, “Feature selection for high-dimensional data: A fast correlation-based
ﬁlter selection,” 20h International Conference on Machine Learning, pp. 856–863, 2003.
 S. Das, “Filters, wrappers and a boosting-based hybrid for feature selection,” 18th In-
ternational Conference on Machine Learning, pp. 74–81, 2001.
 A. Y. Ng, “On feature selection: Learning with exponentially many irrelevant features
as training examples,” 15th International Conference on Machine Learning, pp. 404–
 E. Xing, M. Jordan, and R. Karp, “Feature selection for high-dimensional genomic
microarray data,” 15th International Conference on Machine Learning, pp. 601–608,
 P. Langley, “Selection of relevant feature in machine learning,” The AAAI Fall Sympo-
sium on Relevance, pp. 140–144, 1994.
 J.Yan, B. Zhang, N. Liu, S. Yan, Q. Cheng, W. Fan, Q. Yang, W. Xi, and Z. Chen,
“Eﬀective and eﬃcient dimensionality reduction for large-scale and streaming data pre-
processing,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 3, pp.
 I. T. Jolliﬀe, Principal Component Analysis. Springer-Verlag, 1986.
 A. M. Martinez and A. C. Kak, “PCA versus LDA,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 23, no. 2, 2001.
 H. Park, M. Jeon, and J. B. Rosen, “Lower dimensional representation of text data based
on centroids and least squares,” BIT Numberical Math, vol. 43, pp. 427–448, 2003.
 S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear
embedding,” Science, vol. 290, pp. 2323–2326, 2000.
 J. B. Tenenbaum, V. de Silva, and J. C. Langford, “A global geometric framework for
nonlinear dimensionality reduction,” Science, vol. 290, pp. 2319–2323, 2000.
 M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques for embedding
and clustering,” Advances in Neural Information Processing Systems 14, 2002.
 K. Hiraoka, K. Hidai, M. Hamahira, H. Mizoguchi, T. Mishima, and S. Yoshizawa,
“Successive learning of linear discriminant analysis: Sanger-type algorithm,” in 14th
International Conference on Pattern Recognition, 2000, pp. 2664–2667.
 J. Weng, Y. Zhang, and W. S. Hwang, “Candid covariance-free incremental principal
component analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 25, no. 8, pp. 1034–1040, 2003.
 J. Yan, B. Y. Zhang, S. C. Yan, Z. Chen, W. G. Fan, Q. Yang, W. Y. Ma, and Q. S.
Cheng, “Immc: Incremental maximum margin criterion,” in 10th ACM SIGKDD Inter-
national Conference on Knowledge Discovery and Data Mining, 2004, pp. 725–730.
 L. D. Baker and A. McCallum, “Distributional clustering of words for text classiﬁcation,”
in 21st Annual International ACM SIGIR, 1998, pp. 96–103.
 R. Bekkerman, R. El-Yaniv, N. Tishby, and Y. Winter, “Distributional word clusters vs.
words for text categorization,” Journal of Machine Learning Research, vol. 1, pp. 1–48,
 M. C. Dalmau and O. W. M. Flórez, “Experimental results of the signal processing
approach to distributional clustering of terms on reuters-21578 collection,” in 29th Eu-
ropean Conference on IR Research, 2007, pp. 678–681.
 I. S. Dhillon, S. Mallela, and R. Kumar, “A divisive infromation-theoretic feature clus-
tering algorithm for text classiﬁcation,” Journal of Machine Learning Research, vol. 3,
pp. 1265–1287, 2003.
 D. Ienco and R. Meo, “Exploration and reduction of the feature space by hierarchical
clustering,” in 2008 SIAM Conference on Data Mining, 2008, pp. 577–587.
 N. Slonim and N. Tishby, “The power of word clusters for text classiﬁcation,” in 23rd
European Colloquium on Information Retrieval Research (ECIR), 2001.
 F. Pereira, N. Tishby, and L. Lee, “Distributional clustering of englishwords,” in 31st
Annual Meeting of ACL, 1993, pp. 183–190.
 H. Al-Mubaid and S. A. Umair, “A new text categorization technique using distributional
clustering and learning logic,” IEEE Transactions on Knowledge and Data Engineering,
vol. 18, no. 9, pp. 1156–1165, 2006.
 R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Addison Wesley,
 M. R. Boutell, J. Luo, X. Shen, and C. M. Brown, “Learning multi-label scene classiﬁ-
cation,” Pattern Recognition, vol. 37, no. 9, pp. 1757–1771, 2004.
 A. Elisseeﬀ and J. Weston, “A kernel method for multi-labelled classiﬁcation,” Advances
in Neural Information Processing Systems 14, MIT Press, Cambridge, pp. 681–687,
 J. J. Rocchio, “Relevance feedback in information retrieval,” In G. Salton (Ed.), The
SMART retrieval system: Experiments in automatic document processing, pp. 313–323,
 T. Mitchell, Machine Learning. McGraw-Hill, 1997.
 S. Tan, “Neighbor-weighted k-nearest neighbor for unbalanced text corpus,” Expert Sys-
tems with Applications, vol. 28, no. 4, pp. 667–671, 2005.
 S. Tan, “An eﬀective reﬁnement strategy for KNN text classiﬁer,” Expert Systems with
Applications, vol. 30, no. 2, pp. 290–298, 2006.
 Y. Yang and C. G. Chute, “An example-based mapping method for text categorization
and retrieval,” ACM Transactions on Information Systems, vol. 12, no. 3, pp. 252–277,
 D. A. Hull, “Improving text retrieval for the routing problem using latent semantic
indexing,” ACM International Conference on Research and Development in Information
Retrieval, pp. 282–289, 1994.
 D. W. Aha, “Lazy learning: Special issue editorial,” Artiﬁcial Intelligence Review, vol. 11,
no. 1-5, pp. 7–10, 1997.
 D. Lewis and M. Ringuette, “A comparison of two learning algorithms for text catego-
rization,” Third Annual Symposium on Document Analysis and Information Retrieval,
pp. 81–93, 1994.
 I. J. Good, The Estimation of Probabilities: An Essay on Modern Bayesian Methods.
MIT Press, 1965.
 J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, pp. 81–106, 1986.
 J. R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
 N. Fuhr and C. Buckley, “A probabilistic learning approach for document indexing,”
ACM Transactions on Information Systems, vol. 9, no. 3, pp. 223–248, 1991.
 C. Apté, F. J. Damerau, and S. M. Weiss, “Automated learning of decision rules for text
categorization,” ACM Transactions on Information Systems, vol. 12, no. 3, pp. 233–251,
 W. W. Cohen and Y. Singer, “Context-sensitive learning methods for text categoriza-
tion,” ACM Transactions on Information Systems, vol. 17, no. 2, pp. 141–173, 1999.
 T. Joachims, “Text categorization with support vector machines: Learning with many
relevant features,” European Conference on Machine Learning, pp. 137–142, 1998.
 S. T. Dumais, J. Platt, D. Heckerman, and M. Sahami, “Inductive learning algorithms
and representation for text categorization,” 7th ACM International Conference on In-
formation and Knowledge Management, pp. 148–155, 1998.
 G. Tsoumakas and I. Katakis, “Multi-label classiﬁcation: An overview,” International
Journal of Data Warehousing and Mining, vol. 3, no. 3, pp. 1–13, 2007.
 G. Tsoumakas, I. Katakis, and I. Vlahavas, Mining Multi-label Data. O. Maimon, L.
Rokach (Ed.), Springer, 2nd edition, 2010.
 A. McCallum, “Multi-label text classiﬁcation with a mixture model trained by EM,”
Working Notes of the AAAI’99 Workshop on Text Learning, 1999.
 R. E. Schapire and Y. Singer, “BoosTexter: A boosting-based system for text catego-
rization,” Machine Learning, vol. 39, no. 2-3, pp. 135–168, 2000.
 A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete
data via the EM algorithm,” Journal of the Royal Statistics Society-B, vol. 39, no. 1,
pp. 1–38, 1977.
 Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning
and an application to boosting,” Journal of Computer and System Sciences, vol. 55,
no. 1, pp. 119–139, 1997.
 F. D. Comité, R. Gilleron, and M. Tommasi, “Learning multi-label altenating decision
tree from texts and data,” Lecture Notes in Computer Science, vol. 2734, pp. 35–49,
 Y. Freund and L. Mason, “The alternating decision tree learning algorithm,” 16th In-
ternational Conference on Machine Learning, pp. 124–133, 1999.
 M. L. Zhang and Z. H. Zhou, “Multilabel neural networks with applications to func-
tional genomics and text categorization,” IEEE Transactions on Knowledge and Data
Engineering, vol. 18, no. 10, pp. 1338–1351, 2006.
 M. L. Zhang, “ML-RBF: RBF neural networks for multi-label learning,” Neural Pro-
cessing Letters, vol. 29, no. 2, pp. 61–74, 2009.
 M. L. Zhang and Z. H. Zhou, “ML-kNN: A lazy learning approach to multi-label learn-
ing,” Pattern Recognition, vol. 40, no. 7, pp. 2038–2048, 2007.
 M. L. Zhang, J. M. Peña, and V. Robles, “Feature selection for multi-label naive bayes
classiﬁcation,” Information Sciences, vol. 179, no. 19, pp. 3218–3229, 2009.
 M. Jeon, H. Park, and J. B. Rosen, “Dimension reduction based on centroids and least
squares for eﬃcient processing of text data,” Technical Report MN TR 01-010, Univ. of
Minnesota, Minneapolis, 2003.
 P. Howland and H. Park, “Generalizing discriminant analysis using the generalized sin-
gular value decomposition,” IEEE Transactions on Pattern Analysis and Machine In-
telligence, vol. 26, pp. 995–1006, 2004.
 S. Diplaris, G. Tsoumakas, P. Mitkas, and I. Vlahavas, “Protein classiﬁcation with
multiple algorithms,” Panhellenic Conference on Informatics, vol. 3746, pp. 448–456,
 T. Goncalves and P. Quaresma, “A preliminary approach to the multilabel classiﬁcation
problem of portuguese juridical documents,” in 11th Portuguese Conference on Artiﬁcical
 B. Lauser and A. Hotho, “Automatic multi-label subject indexing in a multilingual
environment,” in 7th European Conference in Research and Advanced Technology for
Digital Libraries, 2003.
 T. Li and M. Ogihara, “Detecting emotion in music,” in Internation Symposium on
Music Information Retreval, 2003.
 A. Clare and R. D. King, “Knowledge discovery in multi-label phenotype data,” in 5th
European Conference on Principles of Data Mining and Knowledge Discovery, 2001.
 D. H. Widyantoro and J. Yen, “A fuzzy similarity approach in text classiﬁcation task,”
IEEE International Conference on Fuzzy Systems, pp. 653–658, 2000.
 R. Saraco˘glu, K. T‥ut‥unc‥u, and N. Allahverdi, “A new approach on search for similar
documents with multiple categories using fuzzy clustering,” Expert Systems with Appli-
cations, vol. 34, no. 4, pp. 2545–2554, 2008.
 J. Yen and R. Langari, Fuzzy Logic–Intelligence, Control, and Information. Upper
Saddle River, NJ, USA: Prentice-Hall, 1999.
 J. S. Wang and C. S. G. Lee, “Self-adaptive neurofuzzy inference systems for classiﬁcation
applications,” IEEE Transactions on Fuzzy Systems, vol. 10, no. 6, pp. 790–802, 2002.
 C.-S. Ouyang, W.-J. Lee, and S.-J. Lee, “A TSK-type neuro-fuzzy network approach
to system modeling problems,” IEEE Transactions on Systems, Man, and Cybernetics
Part B: Cybernetics, vol. 35, no. 4, pp. 751–767, 2005.
 C. Cortes and V. Vapnik, “Support-vector network,” Machine Learning, vol. 20, no. 3,
pp. 273–297, 1995.
 B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regu-
larization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001.
 J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge
University Press, Cambridge, UK, 2004.
 J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann,
 S. P. Lloyd, “Least squares quantization in PCM,” IEEE Transactions on Information
Theory, vol. 28, pp. 128–137, 1957.
 J. MacQueen, “Some methods for classiﬁcation and analysis of multivariate observa-
tions,” Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp.
 L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster
Analysis. John Wiley & Sons, 1990.
 Z. Huang, “Extensions to the k-means algorithm for clustering large data sets with
categorical values,” Data Mining and Knowledge Discovery, vol. 2, pp. 283–304, 1998.
 T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An eﬃcient data clustering method
for very large databases,” ACM-SIGMOD International Conference on Management of
Data, pp. 103–114, 1996.
 S. Guha, R. Rastogi, and K. Shim, “Cure: An eﬃcient clustering algorithm for large
databases,” ACM-SIGMOD International Conference on Management of Data, pp. 73–
 M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering
clusters in large spatial databases,” International Conference on Knowledge Discovery
and Data Mining, pp. 226–231, 1996.
 A. Hinneburg and D. A. Keim, “An eﬃcient approach to clustering in large multime-
dia databases with noise,” International Conference on Knowledge Discovery and Data
Mining, pp. 58–65, 1998.
 S.-J. Lee and C.-S. Ouyang, “A neuro-fuzzy system modeling with self-constructing
rule generation and hybrid SVD-based learning,” IEEE Transactions on Fuzzy Systems,
vol. 11, no. 3, pp. 341–353, 2003.
 W. Wang, J. Yang, and R. Muntz, “STING: A statistical information grid approach to
spatial data mining,” International Conference on Very Large Data Bases, pp. 186–195,
 G. Sheikholeslami, S. Chatterjee, and A. Zhang, “WaveCluster: A muti-resolution clus-
tering approach for very large spatial databases,” International Conference on Very
Large Data Bases, pp. 428–439, 1998.
 S. L. Lauritzen, “The EM algorithm for graphical association models with missing data,”
Computational Satistics and Data Analysis, vol. 19, pp. 191–201, 1995.
 G. H. Golub and C. F. V. Loan, Matrix Computations. Baltimore, MD, USA: The
Johns Hopkins University Press, 1996.
 D. D. Lewis, Y. Yang, T. Rose, and F. Li, “RCV1: A new benchmark collection for text
categorization research,” Journal of Machine Learning Research, vol. 5, pp. 361–397,
 “The cadê web directory, http://www.cade.com.br/.”
 C. C. Chang and C. J. Lin, “Libsvm: A library for support vector machines,” software
available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm, 2001.
 Y. Yang and X. Liu, “A re-examination of text categorization methods,” in ACM SIGIR
Conference, 1999, pp. 42–49.
 G. Tsoumakas, I. Katakis, and I. Vlahavas, “Mining multi-label data,” Data Mining and
Knowledge Discovery Handbook (draft of preliminary accepted chapter), 2009.
 K. Nigam, A. K. McCallum, S. Thrun, and T. M. Mitchell, “Learning to classify text
from labeled and unlabeled documents,” Proceedings of 15th National Conference on
Artiﬁcial Intelligence, 1998.
 J. Pestian, C. Brew, P. Matykiewicz, D. Hovermale, N. Johnson, K. B. Cohen, and
W. Duch, “A shared task involving multi-label classiﬁcation of clinical free text,”
BioNLP 2007: Biological, translational, and clinical language processing., pp. 97–104,
 N. Ueda and K. Saito, Parametric Mixture Models for Multi-label Text. MIT Press,
Cambridge, MA, 2003.