Abstract

Internet-of-Things produce and transmit enormous amount of data. Extracting valuable information from this enormous volume of data has become an important part of businesses and research. However, extracting information from this data without providing privacy protection puts individuals on risk. Data has to be sanitized before use, and anonymization provides solution to this problem. Since, IoT is a collection of numerous dierent devices, data streams of these devices tend to vary over time thus creating varied data stream. However, traditional data stream anonymization approaches only provide privacy protection for data streams having predened and xed attributes. Therefore, conventional methods cannot directly work on varied data stream. In this work, we propose K-VARP (K-anonymity for VARied data stream via Partitioning) to publish varied data stream. K-VARP reads tuple and assigns them to partitions using their description, and all tuples must be anonymized before expiring. It tries to anonymize expiring tuple within partition if its partition is eligible to produce k-anonymous cluster. Otherwise, partition -merging is applied. In K-VARP we proposed a new merging criterion called R-likeness to measure similarity distance between tuple and partitions. Moreover, exible re-using and imputation free-publication is implied in K-VARP to achieve better anonymization quality and performance. Our experiment on real dataset shows that K-VARP is ecient and eective compared to existing algorithms. K-VARP showed approximately twenty percent less information loss, while forming similar number of clusters within comparable computation time.
Original languageEnglish
Pages (from-to)238-255
Number of pages18
JournalInformation Sciences
Volume467
Early online date3 Aug 2018
DOIs
Publication statusPublished - 31 Oct 2018

Fingerprint

K-anonymity
Data Streams
Merging
Partitioning
Partition
Privacy Protection
Industry
Experiments
Internet of Things
Information Loss
Imputation
Number of Clusters
Internet of things
Anonymity
Data streams
Similarity Measure
Percent
Assign
Attribute
Vary

Keywords

  • Internet of Things (IoT)
  • Data privacy
  • Data streams
  • Anonymization
  • Missing values

Cite this

@article{1af8e6fd80ac453e9d63b116c4156433,
title = "K-VARP: K-anonymity for varied data streams via partitioning",
abstract = "Internet-of-Things produce and transmit enormous amount of data. Extracting valuable information from this enormous volume of data has become an important part of businesses and research. However, extracting information from this data without providing privacy protection puts individuals on risk. Data has to be sanitized before use, and anonymization provides solution to this problem. Since, IoT is a collection of numerous dierent devices, data streams of these devices tend to vary over time thus creating varied data stream. However, traditional data stream anonymization approaches only provide privacy protection for data streams having predened and xed attributes. Therefore, conventional methods cannot directly work on varied data stream. In this work, we propose K-VARP (K-anonymity for VARied data stream via Partitioning) to publish varied data stream. K-VARP reads tuple and assigns them to partitions using their description, and all tuples must be anonymized before expiring. It tries to anonymize expiring tuple within partition if its partition is eligible to produce k-anonymous cluster. Otherwise, partition -merging is applied. In K-VARP we proposed a new merging criterion called R-likeness to measure similarity distance between tuple and partitions. Moreover, exible re-using and imputation free-publication is implied in K-VARP to achieve better anonymization quality and performance. Our experiment on real dataset shows that K-VARP is ecient and eective compared to existing algorithms. K-VARP showed approximately twenty percent less information loss, while forming similar number of clusters within comparable computation time.",
keywords = "Internet of Things (IoT), Data privacy, Data streams, Anonymization, Missing values",
author = "Ankhbayar Otgonbayar and Zeeshan Pervez and Keshav Dahal and Steve Eager",
year = "2018",
month = "10",
day = "31",
doi = "10.1016/j.ins.2018.07.057",
language = "English",
volume = "467",
pages = "238--255",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - K-VARP

T2 - K-anonymity for varied data streams via partitioning

AU - Otgonbayar, Ankhbayar

AU - Pervez, Zeeshan

AU - Dahal, Keshav

AU - Eager, Steve

PY - 2018/10/31

Y1 - 2018/10/31

N2 - Internet-of-Things produce and transmit enormous amount of data. Extracting valuable information from this enormous volume of data has become an important part of businesses and research. However, extracting information from this data without providing privacy protection puts individuals on risk. Data has to be sanitized before use, and anonymization provides solution to this problem. Since, IoT is a collection of numerous dierent devices, data streams of these devices tend to vary over time thus creating varied data stream. However, traditional data stream anonymization approaches only provide privacy protection for data streams having predened and xed attributes. Therefore, conventional methods cannot directly work on varied data stream. In this work, we propose K-VARP (K-anonymity for VARied data stream via Partitioning) to publish varied data stream. K-VARP reads tuple and assigns them to partitions using their description, and all tuples must be anonymized before expiring. It tries to anonymize expiring tuple within partition if its partition is eligible to produce k-anonymous cluster. Otherwise, partition -merging is applied. In K-VARP we proposed a new merging criterion called R-likeness to measure similarity distance between tuple and partitions. Moreover, exible re-using and imputation free-publication is implied in K-VARP to achieve better anonymization quality and performance. Our experiment on real dataset shows that K-VARP is ecient and eective compared to existing algorithms. K-VARP showed approximately twenty percent less information loss, while forming similar number of clusters within comparable computation time.

AB - Internet-of-Things produce and transmit enormous amount of data. Extracting valuable information from this enormous volume of data has become an important part of businesses and research. However, extracting information from this data without providing privacy protection puts individuals on risk. Data has to be sanitized before use, and anonymization provides solution to this problem. Since, IoT is a collection of numerous dierent devices, data streams of these devices tend to vary over time thus creating varied data stream. However, traditional data stream anonymization approaches only provide privacy protection for data streams having predened and xed attributes. Therefore, conventional methods cannot directly work on varied data stream. In this work, we propose K-VARP (K-anonymity for VARied data stream via Partitioning) to publish varied data stream. K-VARP reads tuple and assigns them to partitions using their description, and all tuples must be anonymized before expiring. It tries to anonymize expiring tuple within partition if its partition is eligible to produce k-anonymous cluster. Otherwise, partition -merging is applied. In K-VARP we proposed a new merging criterion called R-likeness to measure similarity distance between tuple and partitions. Moreover, exible re-using and imputation free-publication is implied in K-VARP to achieve better anonymization quality and performance. Our experiment on real dataset shows that K-VARP is ecient and eective compared to existing algorithms. K-VARP showed approximately twenty percent less information loss, while forming similar number of clusters within comparable computation time.

KW - Internet of Things (IoT)

KW - Data privacy

KW - Data streams

KW - Anonymization

KW - Missing values

U2 - 10.1016/j.ins.2018.07.057

DO - 10.1016/j.ins.2018.07.057

M3 - Article

VL - 467

SP - 238

EP - 255

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

ER -