Distributed decentralized collaborative monitoring architecture for cloud infrastructures

Xiaolong Xu, Yun Chen, Jose M. Alcaraz Calero

Research output: Contribution to journalArticle

Abstract

Cloud computing infrastructures are demanding an efficient monitoring mechanism to warranty the operational state of large-scale virtualized data centers and to provide mechanisms to improve the efficiency and stability of such infrastructures. Traditionally, centralized monitoring models (CMM) provide high performance and availability for the group of nodes in charge of monitoring tasks. However, the centralized nature of this architecture, easily leads to a single point of failure, bottlenecks in terms of performance and an unbalanced distributions of the monitoring workloads. These facts are not being suitable for large-scale cloud infrastructures. To tackle this concern, the main contribution of this paper is a distributed collaborative monitoring model (DCMM) for cloud computing infrastructures. DCMM provides self-organized capabilities based on mutual perception and balanced monitoring of each node. DCMM also provides rapid notification and recovery mechanisms under degraded conditions. In addition, an adaptive threshold control algorithm (ATCA) is proposed to dynamically adapt the sets of thresholds used for notification purposes in order to identify unnecessary duplicate information sent back to the monitoring tool. ATCA is based on historical monitoring records. Both DCMM and ATCA are described in detail in this contribution. Several empirical experiments have been done using OpenStack cloud infrastructure in order to validate our claims. Experimental results show that DCMM with ATCA can efficiently balance monitoring workload, reduce the workload of monitoring nodes, avoid a single point of failure, and reduce bottleneck problems whereas it is contributing to the achievement of real-time monitoring and data consistency within the monitoring architecture.
Original languageEnglish
Pages (from-to)2451-2463
JournalCluster Computing
Volume20
Issue number3
Early online date10 Nov 2016
DOIs
Publication statusPublished - 1 Sep 2017

Fingerprint

Monitoring
Cloud computing
Availability
Recovery

Keywords

  • Cloud computing
  • Resource monitoring
  • Collaborative monitoring
  • Adaptive threshold control

Cite this

@article{5748d5355a8645c5a85494f269ae097c,
title = "Distributed decentralized collaborative monitoring architecture for cloud infrastructures",
abstract = "Cloud computing infrastructures are demanding an efficient monitoring mechanism to warranty the operational state of large-scale virtualized data centers and to provide mechanisms to improve the efficiency and stability of such infrastructures. Traditionally, centralized monitoring models (CMM) provide high performance and availability for the group of nodes in charge of monitoring tasks. However, the centralized nature of this architecture, easily leads to a single point of failure, bottlenecks in terms of performance and an unbalanced distributions of the monitoring workloads. These facts are not being suitable for large-scale cloud infrastructures. To tackle this concern, the main contribution of this paper is a distributed collaborative monitoring model (DCMM) for cloud computing infrastructures. DCMM provides self-organized capabilities based on mutual perception and balanced monitoring of each node. DCMM also provides rapid notification and recovery mechanisms under degraded conditions. In addition, an adaptive threshold control algorithm (ATCA) is proposed to dynamically adapt the sets of thresholds used for notification purposes in order to identify unnecessary duplicate information sent back to the monitoring tool. ATCA is based on historical monitoring records. Both DCMM and ATCA are described in detail in this contribution. Several empirical experiments have been done using OpenStack cloud infrastructure in order to validate our claims. Experimental results show that DCMM with ATCA can efficiently balance monitoring workload, reduce the workload of monitoring nodes, avoid a single point of failure, and reduce bottleneck problems whereas it is contributing to the achievement of real-time monitoring and data consistency within the monitoring architecture.",
keywords = "Cloud computing, Resource monitoring, Collaborative monitoring, Adaptive threshold control",
author = "Xiaolong Xu and Yun Chen and {Alcaraz Calero}, {Jose M.}",
year = "2017",
month = "9",
day = "1",
doi = "10.1007/s10586-016-0675-5",
language = "English",
volume = "20",
pages = "2451--2463",
journal = "Cluster Computing",
issn = "1386-7857",
publisher = "Springer",
number = "3",

}

Distributed decentralized collaborative monitoring architecture for cloud infrastructures. / Xu, Xiaolong; Chen, Yun; Alcaraz Calero, Jose M.

In: Cluster Computing, Vol. 20, No. 3, 01.09.2017, p. 2451-2463.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Distributed decentralized collaborative monitoring architecture for cloud infrastructures

AU - Xu, Xiaolong

AU - Chen, Yun

AU - Alcaraz Calero, Jose M.

PY - 2017/9/1

Y1 - 2017/9/1

N2 - Cloud computing infrastructures are demanding an efficient monitoring mechanism to warranty the operational state of large-scale virtualized data centers and to provide mechanisms to improve the efficiency and stability of such infrastructures. Traditionally, centralized monitoring models (CMM) provide high performance and availability for the group of nodes in charge of monitoring tasks. However, the centralized nature of this architecture, easily leads to a single point of failure, bottlenecks in terms of performance and an unbalanced distributions of the monitoring workloads. These facts are not being suitable for large-scale cloud infrastructures. To tackle this concern, the main contribution of this paper is a distributed collaborative monitoring model (DCMM) for cloud computing infrastructures. DCMM provides self-organized capabilities based on mutual perception and balanced monitoring of each node. DCMM also provides rapid notification and recovery mechanisms under degraded conditions. In addition, an adaptive threshold control algorithm (ATCA) is proposed to dynamically adapt the sets of thresholds used for notification purposes in order to identify unnecessary duplicate information sent back to the monitoring tool. ATCA is based on historical monitoring records. Both DCMM and ATCA are described in detail in this contribution. Several empirical experiments have been done using OpenStack cloud infrastructure in order to validate our claims. Experimental results show that DCMM with ATCA can efficiently balance monitoring workload, reduce the workload of monitoring nodes, avoid a single point of failure, and reduce bottleneck problems whereas it is contributing to the achievement of real-time monitoring and data consistency within the monitoring architecture.

AB - Cloud computing infrastructures are demanding an efficient monitoring mechanism to warranty the operational state of large-scale virtualized data centers and to provide mechanisms to improve the efficiency and stability of such infrastructures. Traditionally, centralized monitoring models (CMM) provide high performance and availability for the group of nodes in charge of monitoring tasks. However, the centralized nature of this architecture, easily leads to a single point of failure, bottlenecks in terms of performance and an unbalanced distributions of the monitoring workloads. These facts are not being suitable for large-scale cloud infrastructures. To tackle this concern, the main contribution of this paper is a distributed collaborative monitoring model (DCMM) for cloud computing infrastructures. DCMM provides self-organized capabilities based on mutual perception and balanced monitoring of each node. DCMM also provides rapid notification and recovery mechanisms under degraded conditions. In addition, an adaptive threshold control algorithm (ATCA) is proposed to dynamically adapt the sets of thresholds used for notification purposes in order to identify unnecessary duplicate information sent back to the monitoring tool. ATCA is based on historical monitoring records. Both DCMM and ATCA are described in detail in this contribution. Several empirical experiments have been done using OpenStack cloud infrastructure in order to validate our claims. Experimental results show that DCMM with ATCA can efficiently balance monitoring workload, reduce the workload of monitoring nodes, avoid a single point of failure, and reduce bottleneck problems whereas it is contributing to the achievement of real-time monitoring and data consistency within the monitoring architecture.

KW - Cloud computing

KW - Resource monitoring

KW - Collaborative monitoring

KW - Adaptive threshold control

U2 - 10.1007/s10586-016-0675-5

DO - 10.1007/s10586-016-0675-5

M3 - Article

VL - 20

SP - 2451

EP - 2463

JO - Cluster Computing

JF - Cluster Computing

SN - 1386-7857

IS - 3

ER -