Türkçe metinlerin kümelenmesinde farklı kök bulma yöntemlerinin etkisinin araştırılması

Translated title of the contribution: Examining the impact of different stemming methods on clustering Turkish texts

Volkan Tunali*, Turgay Tugay Bilgin

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In order to apply data mining techniques on texts that are written in natural language, a preprocessing step is used to transform this unstructured data into structured format. Stemming is an important text preprocessing technique. In this study, we empirically examined the impact of essentially three stemming methods on clustering Turkish texts: Zemberek, Affix Stripping, and Fixed Prefix. Although very similar results were obtained with all these methods in terms of clustering quality, Zemberek and Fixed Prefix 5 methods produced better results when compared to the others. Besides clustering quality, Zemberek and Fixed Prefix 5 methods are preferable stemming methods for Turkish text clustering applications due to high dimensionality reduction rate they provide.
Translated title of the contributionExamining the impact of different stemming methods on clustering Turkish texts
Original languageTurkish
Title of host publicationELECO'2012 Elektrik - Elektronik ve Bilgisayar Mühendisliği Sempozyumu
Pages598-602
Number of pages5
Publication statusPublished - 29 Nov 2012
Externally publishedYes
EventELECO'2012 Electric - Electronic and Computer Engineering Symposium - Bursa, Turkey
Duration: 29 Nov 20121 Dec 2012

Conference

ConferenceELECO'2012 Electric - Electronic and Computer Engineering Symposium
Abbreviated titleELECO'2012
Country/TerritoryTurkey
CityBursa
Period29/11/121/12/12

Keywords

  • text mining
  • stemming
  • document clustering
  • natural language processing

Fingerprint

Dive into the research topics of 'Examining the impact of different stemming methods on clustering Turkish texts'. Together they form a unique fingerprint.

Cite this