Abstract
Text documents are usually unstructured and written in natural language. To apply conventional data mining techniques on text documents, a preprocessing operation is indispensable. In this paper, we introduce PRETO, a cross-platform, powerful and scalable preprocessing tool developed specifically for preprocessing Turkish texts, with a wide range of preprocessing options like stemming, stopword filtering, statistical term filtering, and n-gram generation. We demonstrate the performance and scalability of PRETO with some experiments on large document collections.
| Original language | English |
|---|---|
| Title of host publication | CompSysTech 12 |
| Subtitle of host publication | Proceedings of the 13th International Conference on Computer Systems and Technologies |
| Editors | Boris Rachev, Angel Smrikarov |
| Place of Publication | New York |
| Publisher | Association for Computing Machinery |
| Pages | 134-140 |
| Number of pages | 7 |
| ISBN (Electronic) | 9781450311939 |
| DOIs | |
| Publication status | Published - 22 Jun 2012 |
| Externally published | Yes |
| Event | 13th International Conference on Computer Systems and Technologies - Ruse, Bulgaria Duration: 22 Jun 2012 → 23 Jun 2012 Conference number: 13 |
Conference
| Conference | 13th International Conference on Computer Systems and Technologies |
|---|---|
| Abbreviated title | CompSysTech 2012 |
| Country/Territory | Bulgaria |
| City | Ruse |
| Period | 22/06/12 → 23/06/12 |
Fingerprint
Dive into the research topics of 'PRETO: a high-performance text mining tool for preprocessing Turkish texts'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver