PRETO: a high-performance text mining tool for preprocessing Turkish texts

Volkan Tunali*, Turgay Tugay Bilgin

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

21 Citations (Scopus)

Abstract

Text documents are usually unstructured and written in natural language. To apply conventional data mining techniques on text documents, a preprocessing operation is indispensable. In this paper, we introduce PRETO, a cross-platform, powerful and scalable preprocessing tool developed specifically for preprocessing Turkish texts, with a wide range of preprocessing options like stemming, stopword filtering, statistical term filtering, and n-gram generation. We demonstrate the performance and scalability of PRETO with some experiments on large document collections.
Original languageEnglish
Title of host publicationCompSysTech 12
Subtitle of host publicationProceedings of the 13th International Conference on Computer Systems and Technologies
EditorsBoris Rachev, Angel Smrikarov
Place of PublicationNew York
PublisherAssociation for Computing Machinery
Pages134-140
Number of pages7
ISBN (Electronic)9781450311939
DOIs
Publication statusPublished - 22 Jun 2012
Externally publishedYes
Event13th International Conference on Computer Systems and Technologies - Ruse, Bulgaria
Duration: 22 Jun 201223 Jun 2012
Conference number: 13

Conference

Conference13th International Conference on Computer Systems and Technologies
Abbreviated titleCompSysTech 2012
Country/TerritoryBulgaria
CityRuse
Period22/06/1223/06/12

Fingerprint

Dive into the research topics of 'PRETO: a high-performance text mining tool for preprocessing Turkish texts'. Together they form a unique fingerprint.

Cite this