Design of a Performance Measurement Framework for Cloud Computing

Abstract

Cloud Computing is an emerging technology for processing and storing very large amounts of data. Sometimes anomalies and defects affect part of the cloud infrastructure, resulting in a performance degradation of the cloud. This paper proposes a performance measurement framework for Cloud Computing systems, which integrates software quality concepts from ISO 25010.

Share and Cite:

Bautista, L. , Abran, A. and April, A. (2012) Design of a Performance Measurement Framework for Cloud Computing. Journal of Software Engineering and Applications, 5, 69-75. doi: 10.4236/jsea.2012.52011.

1. Introduction

Cloud Computing (CC) is an emerging technology aimed at processing and storing very large amounts of data. It is an internet-based technology in which several distributed computers work together to efficiently process large amounts of information, while ensuring the rapid processing of query results to users. Some CC users prefer not to own the physical infrastructure they are using: instead, they rent cloud infrastructure, or a cloud platform or software, from a third-party provider. These infrastructure application options delivered as a service are known as Cloud Services [1].

One of the most important challenges in delivering Cloud Services is to ensure that they are fault tolerant. Failures and anomalies can degrade these services, and impact their quality, and even the availability. According to Coulouris [2], a failure occurs in distributed systems (DS), like CC systems (CCS), when a process or a communication channel departs from what is considered to be its normal or desired behavior. CCS include all the technical resources clouds have in order to process information, like software, hardware, and network elements, for example. An anomaly is different, in that it slows down part of a CCS without making it fail completely, impacting the performance of tasks within nodes and, consequently, of the system itself.

A performance measurement framework (PMF) for CCS should propose a means to identify and quantify “normal cluster behavior”, which can serve as a baseline for detecting possible anomalies in the computers (i.e. nodes in a cluster) that may impact cloud performance. To achieve this goal, methods are needed to collect the necessary base measures specific to CCS performance, and analysis models must be designed to determine the relationships that exist among these measures.

The ISO International Vocabulary of Metrology (VIM) [3] defines a measurement method as a generic description of a logical organization of operations used in measurement, and an analysis model as an algorithm or calculation combining one or more measures obtained from a measurement method to produce evaluations or estimates relevant to the information needed for decision making.

The purpose of a measurement process, as described in ISO 15939 [4], is to collect, analyze, and report data relating to the products developed and processes implemented within the organizational unit, to support effective management of the process, and to objectively demonstrate the quality of the products.

ISO 15939 [4] defines four sequential activities: establish and sustain measurement commitment, plan the measurement process, perform the measurement process, and evaluate the measurement. These activities are performed in an iterative cycle that allows for continuous feedback and improvement of the measurement process, as shown in Figure 1.

This work presents a PMF in which the two activities recommended by the ISO 15939 measurement process are developed: 1) establish measurement commitment; and 2) plan the measurement process. This framework defines the requirements for the CC performance measurement, the type of data to be collected, and the criteria for evaluating the resulting information. In future work,

Figure 1. Sequence of activities in a measurement process (Adapted from the ISO 5939 measurement process model [4]).

the design of a measurement method and a performance measurement model for CCS will be developed.

This paper is structured as follows. Section 2 presents related work on performance measurement for computer based systems. Section 3 establishes the performance context for CC by defining the basic concepts of performance and developing an overview of the elements involved in the measurement process. Section 4 presents the design of the proposed PMF for CCS using COSMIC concepts. In addition, this section introduces a number of key international standards terms, related to performance, with which we further detail the PMF described in this section. Finally, Section 5 summarizes the contributions of this research and suggests future work.

2. Related Work

2.1. Performance Measurement Approaches for Computer Systems

Currently, the measurement of computer-based system (CBS) performance has been investigated in the computer science literature from the following viewpoints: load balancing, network intrusion detection, and host state maintenance. For example, Burges [5] defines system performance as “normal behavior”, and proposes that this behavior can only be determined by learning about past events and by modeling future behavior using statistics from the past and observing present behavior. According to Burges, modern computing systems are complex: they are composed of many interacting subsystems, which makes their collective behavior intricate and, at the same time, influences the performance of the whole system.

Other authors have tried to predict the performance of complex systems (computer clusters, for example) by simulating cluster behavior using a virtual environment. For instance, Rao [6] estimates the variation of cluster performance through changes in task size, as well as the time taken to solve a particular problem. He has also built a predictive model using regression analysis to investigate the behavior of the system and predict the performance of the cluster.

Other published approaches have focused on the reliability aspects of large, high-performance computer systems in order to measure system performance. Smith [7] observes that failure occurrence has an impact on both system performance and operational costs. He proposes an automatic mechanism for anomaly detection that aims to identify the root causes of anomalies and faults. Smith [7] has also developed an automatic anomaly detection framework that is aimed at processing massive volumes of data using a technique based on pattern recognition. In a case study, Smith identifies health-related variables, which are then used for anomaly detection. Each of these variables is related to a system characteristic (such as user utilization, CPU idle time, memory utilization, I/O volume operations). Once the measurement data have been collected, he proposes clustering categories, where an outlier detector identifies the nodes that potentially have anomalies. Finally, a list of those possible anomalies is sent to a system administrator who has the expertise to quickly confirm whether or not an anomaly exists.

Smith’s research presents interesting avenues for the measurement of system performance from various perspectives. Further work is needed to define an integrated model of performance measurement, which would include the perspectives of users, developers, and maintainers.

2.2. Jain’s System Performance Concepts and Sub Concepts

A well known perspective for system performance measurement is proposed by Jain [8], who maintains that a performance study must first establish a set of performance criteria (or characteristics) to help to carry out the system measurement process. He notes that if a system performs a service correctly, its performance is typically measured using three sub concepts: 1) responsiveness, 2) productivity, and 3) utilization, and proposes a measurement process for each. In addition, Jain notes that for each service request made to a system, there are several possible outcomes, which can be classified in three categories: the system may perform the service correctly or incorrectly, or it may refuse to perform the service altogether. Moreover, he defines three sub concepts associated with each of these possible outcomes which affect system performance: 1) speed, 2) reliability, and 3) availability. Figure 2 presents the possible outcomes of a service request to a system and the sub concepts associated with them.

2.3. ISO 25010 Performance Concepts and Sub Concepts

There are several software engineering standards on system and software quality models, such as ISO 25010 [9], which is a revision of the ISO 9126-1 [10] software quality model. The ISO 25010 standard defines software product and computer system quality from two distinct perspectives: 1) a quality in use model, and 2) a product quality model:

1) The quality in use model is composed of five characteristics that relate to the outcome of an interaction when a product is used in a particular context of use. This quality model is applicable to the entire range of use of the human-computer system, including both systems and software.

2) The product quality model is composed of eight characteristics that relate to the static properties of software and the dynamic properties of the computer system.

This product quality model is applicable to both systems and software. According to ISO 25010, the properties of both determine the quality of the product in a particular context, based on user requirements. For example, performance efficiency and reliability can be specific concerns of users who specialize in areas of content delivery, management, or maintenance. The performance efficiency concept proposed in ISO 25010 has three sub concepts: 1) time behavior, 2) resource utilization, and 3) capacity, while the reliability concept has four sub concepts: 1) maturity, 2) availability, 3) fault tolerance, and 4) recoverability. In this research, we have selected performance efficiency and reliability as concepts for determining the performance of CCS. Both Jain’s proposal and the ISO 25010 concepts and sub concepts form the basis of our definition of the performance concept in CC.

3. Definition and Decomposition of the Performance Concept for Cloud Computing

3.1. Definition of the Performance Concept for Cloud Computing

Based on the performance perspectives presented by Jain and the product quality characteristics defined by ISO 25010, we propose the following definition of CCS performance measurement:

Figure 2. Possible outcomes of a service request to a system, according to Jain [10].

“The performance of a Cloud Computing system is determined by analysis of the characteristics involved in performing an efficient and reliable service that meets requirements under stated conditions and within the maximum limits of the system parameters.”

Although at first sight this definition may seem complex, it only includes the sub concepts necessary to carry out CCS performance measurement from three perspectives: 1) users, 2) developers, and 3) maintainers.

Furthermore, from the literature review, a number of sub concepts have been identified that could be directly related to the concept of performance, such as:

• Performance efficiency: The amount of resources used under stated conditions. Resources can include software products, the software and hardware configuration of the system, and materials.

• Time behavior: The degree to which the response and processing times and throughput rates of a product or system, when performing its functions, meet requirements.

• Capacity: The degree to which the maximum limits of a product or system parameter meet requirements.

• Resource utilization: The degree to which the amounts and types of resources used by a product or system when performing its functions meet requirements.

• Reliability: The degree to which a system, product or component performs specified functions under specified conditions for a specified period of time.

• Maturity: The degree to which a system meets needs for reliability under normal operation.

• Availability: The degree to which a system, product or component is operational and accessible when required for use.

• Fault tolerance: The degree to which a system, product, or component operates as intended, in spite of the presence of hardware or software faults, and,

• Recoverability: The degree to which a product or system can recover data directly affected in the event of an interruption or a failure and be restored to the desired state.

3.2. Definition of a Performance Context Diagram for Cloud Computing

Now that the CCS performance measurement concepts and sub concepts have been identified, a context diagram will be helpful that shows the relationships between the performance sub concepts proposed by ISO 25010 and the performance measurement perspective presented by Jain, as well as the logical sequence in which the sub concepts appear when a performance issue arises in a CCS (see Figure 3).

In this figure, system performance is determined by two main sub concepts: 1) performance efficiency, and 2) reliability. As explained previously, when a CCS receives a service request, there are three possible outcomes (the service is performed correctly, the service is performed incorrectly, or the service cannot be performed). The outcome will determine the sub concepts that will be applied for performance measurement. For example, suppose that the CCS performs a service correctly, but, during its execution, the service failed and was later reinstated. Although the service was ultimately performed successfully, it is clear that the system availability (part of the reliability sub concept) was compromised, and this affected CCS performance.

As illustrated above, CCS performance can be based on two main concepts: 1) performance efficiency, and 2) reliability. Performance efficiency will determine the amount of resources used for a period of time, while reliability will determine the degree to which a system successfully performs specified functions during the same period. Resources include all CCS elements, such as: software applications, hardware system, and network system.

4. Design of Performance Measurement Framework for Cloud Computing

4.1. The COSMIC Measurement Method Model

The ISO 19761 COSMIC v 3.0 Functional Size Measurement Method (FSM) [11] defines an explicit model of software functionality derived from the functional user requirements (FUR). FUR describe the functionality that the software or system is to execute (sometimes also known as system capabilities). According to this method, each FUR is represented by one or more functional processes within the piece of software to which it has been allocated. In turn, each functional process is represented by sub processes, which can be of the data movement type or the data transform type.

Based on this explicit model of functionality, four data movement types are recognized (Entry, Exit, Read, and Write). Figure 4 shows the COSMIC model of generic software adapted from Figure 12.4, p. 256 of [12]).

According to the COSMIC model [12], software is delimited by hardware, as shown on the left-hand side of

Figure 3. Context diagram for Cloud Computing performance measurement.

Figure 4. COSMIC model of generic software-adapted from Figure 12.4 of [12].

Figure 4: software can be used by a user, an engineered device, or other software through I/O hardware, such as a keyboard, a printer, a mouse, etc. In addition, as depicted on the right-hand side of Figure 4, software is delimited by persistent storage hardware, like a hard disk. Thus, software functionality can be viewed as a flow of data groups characterized Entry, Exit, Read, and Write data movements. The Entry and Exit data movements allow the exchange of data with the user across the I/O hardware/software boundary, and the Read and Write data movements allow the exchange of data between the software and the storage hardware.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] H. Jin, S. Ibrahim, T. Bell, L. Qi, H. Cao, S. Wu and X. Shi, “Tools and Technologies for Building Clouds,” Cloud Computing: Principles, Systems and Applications, Computer Communications and Networks, Springer-Verlag, Berlin, 2010. doi:10.1007/978-1-84996-241-4_1
[2] G. Coulouris, J. Dollimore and T. Kindberg, “Distributed Systems Concepts and Design,” Addison-Wesley, 4th Edition, Pearson Education, Edinburgh, 2005.
[3] ISO/IEC Guide 99-12, “International Vocabulary of Metrology—Basic and General Concepts and Associated Terms, VIM,” International Organization for Standardization ISO/IEC, Geneva, 2007.
[4] ISO/IEC 15939, “Systems and Software Engineering—Measure,” International Organization for Standardization ement Process, Geneva, 2007.
[5] M. Burgess, H. Haugerud and S. Straumsnes, “Measuring System Normality,” ACM Transactions on Computer Systems, Vol. 20, No. 2, 2002, pp. 125-160. doi:10.1145/507052.507054
[6] A. Rao, R. Upadhyay, N. Shah, S. Arlekar, J. Raghothamma and S. Rao, “Cluster Performance Forecasting Using Predictive Modeling for Virtual Beowulf Clusters,” In: V. Garg, R. Wattenhofer and K. Kothapalli, Eds., ICDCN 2009, LNCS 5408, Springer-Verlag, Berlin, 2009, pp. 456-461.
[7] D. Smith, Q. Guan and S. Fu, “An Anomaly Detection Framework for Autonomic Management of Compute Cloud Systems,” IEEE 34th Annual IEEE Computer Software and Applications Conference Workshops, Seoul, 19-23 July 2010, pp. 376-381.
[8] J. Raj, “The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling,” Wiley-Interscience, New York, 1991.
[9] ISO/IEC 25010:2010(E), “Systems and Software Engineering—Systems and Software Product Quality Requirements and Evaluation (SQuaRE)—System and Software Quality Models,” International Organization for Standardization, Geneva, 2010.
[10] ISO/IEC 9126-1:2001(E), “Software Engineering—Pro- duct Quality—Part 1: Quality Model,” International Organization for Standardization, Geneva, 2001.
[11] ISO/IEC-19761, “Software Engineering—COSMIC v 3.0 —A Functional Size Measurement Method,” International Organization for Standardization, Geneva, 2003.
[12] A. Abran, “Software Metrics and Software Metrology,” John Wiley & Sons Interscience and IEEE-CS Press, New York, 2010. doi:10.1002/9780470606834
[13] K. Sarayreh, A. Abran and L. Santillo, “Measurement of Software Requirements Derived from System Reliability Requirements,” Workshop on Advances on Functional Size Measurement and Effort Estimation, 24th European Conference on Object Oriented Programming, Maribor, 20-22 June 2010.
[14] ECSS-E-ST-10C, “Space Engineering: System Engineering General Requirements,” European Cooperation for Space Standardization, Requirements & Standards Division, Noordwijk, 2009.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.