Examining the impact of stemming on clustering Turkish texts

Volkan Tunali*, Turgay Tugay Bilgin

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)

Abstract

Preprocessing is an important step in information retrieval and text mining. In this study, we examined the impact of stemming on clustering Turkish texts. We used two datasets compiled from web sites of Turkish news agencies, and performed extensive experiments. We empirically show that there is no significant evidence that stemming always improves the quality of clustering for texts in Turkish. However, when stemming is used, dimensionality of the document-term matrix dramatically decreases without inversely affecting the clustering performance. As a result, it is highly recommended to apply stemming for clustering Turkish texts.
Original languageEnglish
Title of host publication2012 International Symposium on Innovations in Intelligent Systems and Applications
PublisherIEEE
Number of pages4
ISBN (Electronic)9781467314480
ISBN (Print)9781467314466
DOIs
Publication statusPublished - 23 Jul 2012
Externally publishedYes
Event2012 International Symposium on Innovations in Intelligent Systems and Applications - Trabzon, Turkey
Duration: 2 Jul 20124 Jul 2012

Conference

Conference2012 International Symposium on Innovations in Intelligent Systems and Applications
Abbreviated titleINISTA
Country/TerritoryTurkey
CityTrabzon
Period2/07/124/07/12

Keywords

  • data mining
  • text mining
  • document clustering
  • preprocessing
  • stemming

Fingerprint

Dive into the research topics of 'Examining the impact of stemming on clustering Turkish texts'. Together they form a unique fingerprint.

Cite this