Automated Feature Construction for Classification of Time Ordered Data Sequences

Michael Schaidnagel, Thomas Connolly, Fritz Laux

Research output: Contribution to journalArticle

Abstract

The recent years and especially the Internet have changed the ways in which data is stored. It is now common to store data in the form of transactions, together with its creation time-stamp. These transactions can often be attributed to logical units, e.g., all transactions that belong to one customer. These groups, we refer to them as data sequences, have a more complex structure than tuple-based data. This makes it more difficult to find discriminatory patterns for classification purposes. However, the complex structure potentially enables us to track behaviour and its change over the course of time. This is quite interesting, especially in the e-commerce area, in which classification of a sequence of customer actions is still a challenging task for data miners. However, before standard algorithms such as Decision Trees, Neural Nets, Naive Bayes or Bayesian Belief Networks can be applied on sequential data, preparations are required in order to capture the information stored within the sequences. Therefore, this work presents a systematic approach on how to reveal sequence patterns among data and how to construct powerful features out of the primitive sequence attributes. This is achieved by sequence aggregation and the incorporation of time dimension into the feature construction step. The proposed algorithm is described in detail and applied on a real-life data set, which demonstrates the ability of the proposed algorithm to boost the classification performance of well-known data mining algorithms for binary classification tasks.
Original languageEnglish
Pages (from-to)632-641
Number of pages10
JournalInternational Journal On Advances in Software
Volume7
Issue number3 and 4
Publication statusPublished - 2014

Fingerprint

Miners
Bayesian networks
Decision trees
Data mining
Agglomeration
Internet
Neural networks

Keywords

  • Feature construction
  • sequential data
  • temporal data mining

Cite this

Schaidnagel, Michael ; Connolly, Thomas ; Laux, Fritz. / Automated Feature Construction for Classification of Time Ordered Data Sequences. In: International Journal On Advances in Software. 2014 ; Vol. 7, No. 3 and 4. pp. 632-641.
@article{8fb3ed16ec1a4ef7b830b6529552b361,
title = "Automated Feature Construction for Classification of Time Ordered Data Sequences",
abstract = "The recent years and especially the Internet have changed the ways in which data is stored. It is now common to store data in the form of transactions, together with its creation time-stamp. These transactions can often be attributed to logical units, e.g., all transactions that belong to one customer. These groups, we refer to them as data sequences, have a more complex structure than tuple-based data. This makes it more difficult to find discriminatory patterns for classification purposes. However, the complex structure potentially enables us to track behaviour and its change over the course of time. This is quite interesting, especially in the e-commerce area, in which classification of a sequence of customer actions is still a challenging task for data miners. However, before standard algorithms such as Decision Trees, Neural Nets, Naive Bayes or Bayesian Belief Networks can be applied on sequential data, preparations are required in order to capture the information stored within the sequences. Therefore, this work presents a systematic approach on how to reveal sequence patterns among data and how to construct powerful features out of the primitive sequence attributes. This is achieved by sequence aggregation and the incorporation of time dimension into the feature construction step. The proposed algorithm is described in detail and applied on a real-life data set, which demonstrates the ability of the proposed algorithm to boost the classification performance of well-known data mining algorithms for binary classification tasks.",
keywords = "Feature construction, sequential data, temporal data mining",
author = "Michael Schaidnagel and Thomas Connolly and Fritz Laux",
year = "2014",
language = "English",
volume = "7",
pages = "632--641",
journal = "International Journal On Advances in Software",
issn = "1942-2628",
publisher = "International Academy, Research, and Industry Association",
number = "3 and 4",

}

Automated Feature Construction for Classification of Time Ordered Data Sequences. / Schaidnagel, Michael; Connolly, Thomas; Laux, Fritz.

In: International Journal On Advances in Software, Vol. 7, No. 3 and 4, 2014, p. 632-641.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Automated Feature Construction for Classification of Time Ordered Data Sequences

AU - Schaidnagel, Michael

AU - Connolly, Thomas

AU - Laux, Fritz

PY - 2014

Y1 - 2014

N2 - The recent years and especially the Internet have changed the ways in which data is stored. It is now common to store data in the form of transactions, together with its creation time-stamp. These transactions can often be attributed to logical units, e.g., all transactions that belong to one customer. These groups, we refer to them as data sequences, have a more complex structure than tuple-based data. This makes it more difficult to find discriminatory patterns for classification purposes. However, the complex structure potentially enables us to track behaviour and its change over the course of time. This is quite interesting, especially in the e-commerce area, in which classification of a sequence of customer actions is still a challenging task for data miners. However, before standard algorithms such as Decision Trees, Neural Nets, Naive Bayes or Bayesian Belief Networks can be applied on sequential data, preparations are required in order to capture the information stored within the sequences. Therefore, this work presents a systematic approach on how to reveal sequence patterns among data and how to construct powerful features out of the primitive sequence attributes. This is achieved by sequence aggregation and the incorporation of time dimension into the feature construction step. The proposed algorithm is described in detail and applied on a real-life data set, which demonstrates the ability of the proposed algorithm to boost the classification performance of well-known data mining algorithms for binary classification tasks.

AB - The recent years and especially the Internet have changed the ways in which data is stored. It is now common to store data in the form of transactions, together with its creation time-stamp. These transactions can often be attributed to logical units, e.g., all transactions that belong to one customer. These groups, we refer to them as data sequences, have a more complex structure than tuple-based data. This makes it more difficult to find discriminatory patterns for classification purposes. However, the complex structure potentially enables us to track behaviour and its change over the course of time. This is quite interesting, especially in the e-commerce area, in which classification of a sequence of customer actions is still a challenging task for data miners. However, before standard algorithms such as Decision Trees, Neural Nets, Naive Bayes or Bayesian Belief Networks can be applied on sequential data, preparations are required in order to capture the information stored within the sequences. Therefore, this work presents a systematic approach on how to reveal sequence patterns among data and how to construct powerful features out of the primitive sequence attributes. This is achieved by sequence aggregation and the incorporation of time dimension into the feature construction step. The proposed algorithm is described in detail and applied on a real-life data set, which demonstrates the ability of the proposed algorithm to boost the classification performance of well-known data mining algorithms for binary classification tasks.

KW - Feature construction

KW - sequential data

KW - temporal data mining

M3 - Article

VL - 7

SP - 632

EP - 641

JO - International Journal On Advances in Software

JF - International Journal On Advances in Software

SN - 1942-2628

IS - 3 and 4

ER -