An empirical comparison of fast and efficient tools for mining textual data

Volkan Tunali*, A. Yılmaz Çamurcu, T. Tugay Bilgin

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In order to effectively manage and retrieve the information comprised in vast amount of text documents, powerful text mining tools and techniques are essential. In this paper we evaluate and compare two state-of-the-art data mining tools for clustering high-dimensional text data, Cluto and Gmeans. Several experiments were conducted on three benchmark datasets, and results are analysed in terms of clustering quality, memory and CPU time consumption. We empirically show that Gmeans offers high scalability by sacrificing clustering quality while Cluto presents better clustering quality at the expense of memory and CPU time.
Original languageEnglish
Title of host publication1st International Symposium on Computing in Science & Engineering Proceedings
Place of PublicationIzmir, Turkey
PublisherGediz University Publications
Pages141-147
Number of pages7
ISBN (Electronic)9786056139413
Publication statusPublished - 3 Jun 2010
Externally publishedYes
Event1st International Symposium on Computing in Science & Engineering - Kusadasi, Aydin, Turkey
Duration: 3 Jun 20105 Jun 2010

Conference

Conference1st International Symposium on Computing in Science & Engineering
Abbreviated titleISCSE
Country/TerritoryTurkey
CityAydin
Period3/06/105/06/10

Keywords

  • text mining
  • document clustering
  • spherical k-means
  • bisecting k-means

Fingerprint

Dive into the research topics of 'An empirical comparison of fast and efficient tools for mining textual data'. Together they form a unique fingerprint.

Cite this