Escolar Documentos
Profissional Documentos
Cultura Documentos
g
u
r
a
t
o
r
s
1. Servce Request
+
Response
2. Setup VMs
3. Vadate Request
5. Depoy
Customer
Montor Framework
SLA Management Framework
Appcaton Depoyer
Vrtuazaton Layer
Vrtuazaton Layer
Front-End Node
4. Servce Request 6. Montor
Fig. 1. CASViD Architecture.
Customers place their service requests through a dened
interface to the front-end node (step 1, Figure 1), which
acts as the management node in the Cloud environment. The
VM congurator sets up the Cloud environment by deploying
precongured VM images (step 2) on physical machines and
making them accessible for service provisioning. The request
is received by the service interface and delivered to the SLA
management framework for validation (step 3), which is done
to ensure that the request comes from the right customer. In
the next step the service request is passed to the application
deployer (step 4), which allocates resources for the service
execution and deploys it in the Cloud environment (step 5).
After deploying the service application, CASViD monitors the
application execution and sends the monitored information to
the SLA management framework (step 6) for processing and
detection of SLA violations.
The VM congurator and application deployer are com-
ponents for allocating resources and deploying applications
on our Cloud testbed. They are included in the architecture
to show our complete solution. The Application Deployer is
responsible for managing the execution of user applications,
similar to brokers in the Grid literature [1], [16], [26], [30],
focusing on parameter sweeping executions [11]. It simplies
the processes of transferring application input data to each
VM, starting the execution, and collecting the results from the
VMs to the front-end node. The mapping of application tasks
to VMs is performed dynamically by a scheduler located in
the Application Deployereach slave process consumes tasks
whenever the VM is idle. Further details on this component
and VM congurator are found in our previous work [17],
[18]. The execution of the applications and the monitoring
process can be done automatically by the Cloud provider,
or can be incorporated into a Cloud Service that can be
instantiated by the users.
The proposed CASViD architecture is generic in its usage
as it is not designed for a particular set of applications. The
service interface supports the provisioning of transactional as
well as computational applications. The SLA management
framework can handle the provisioning of all application
types based on the pre-negotiated SLAs. Description of the
negotiation process and components is out of scope of this
paper and is discussed by Brandic et al. [8].
A. System and Application Monitor
CASViD architecture contains a exible monitoring frame-
work based on the SNMP (Simple Network Management
Protocol) standard [12]. It receives instructions to monitor
applications from the SLA management framework and de-
livers the monitored information. It is based on the traditional
manager/agent model used in network management. Figure
2 presents the monitor architecture. The manager, located in
the management node, polls periodically each agent in the
cluster to get the monitored information. In order to enhance
its scalability, the monitor uses asynchronous communication
with all cluster agents. It is composed of a library and an agent.
The monitor agent implements the methods to capture each
metric dened in the CASViD monitor MIB (Management
Information Base). At the manager side, the monitor library
provides methods to congure which metrics should be cap-
tured and which nodes should be included in the monitoring.
The SLA management framework in the system architecture
uses this library to congure the monitoring process and
retrieve the desired metrics. The retrievement process can be
done by collecting the metrics information from application
or operating system log les.
Management System
Montor Lbrary
Management Node
SNMP Protoco
SNMP Agent
Montor Agent
Montor
MIB
/proc
Appcaton
Foder
Appcaton
Foder
Processing Node
.
.
.
Fig. 2. CASViD Monitor Overview.
Similar to other monitoring systems [21], [29], CASViD
monitor is general purpose and supports the acquisition of
common application metrics, and even system metrics such as
CPU and memory utilization. The application metrics (SLA
parameters) to be monitored depends on the application type
and how to ensure its performance.
B. SLA Management Framework
The service provisioning management and detection of ap-
plication SLA objective violations are performed by the SLA
management framework component. This component is central
and interacts with the Service Interface, Application Deployer,
and CASViD monitor. In order to manage the SLA violations,
!"# !"# !"" !"#
resource allocation to services, service scheduling, application
monitoring, and SLA violation detection (Figure 1).
Computng Envronment Nodes
Servce Interface
Physca Resources
Physca Resources
V
M
C
o
n
f
g
u
r
a
t
o
r
s
1. Servce Request
+
Response
2. Setup VMs
3. Vadate Request
5. Depoy
Customer
Montor Framework
SLA Management Framework
Appcaton Depoyer
Vrtuazaton Layer
Vrtuazaton Layer
Front-End Node
4. Servce Request 6. Montor
Fig. 1. CASViD Architecture.
Customers place their service requests through a dened
interface to the front-end node (step 1, Figure 1), which
acts as the management node in the Cloud environment. The
VM congurator sets up the Cloud environment by deploying
precongured VM images (step 2) on physical machines and
making them accessible for service provisioning. The request
is received by the service interface and delivered to the SLA
management framework for validation (step 3), which is done
to ensure that the request comes from the right customer. In
the next step the service request is passed to the application
deployer (step 4), which allocates resources for the service
execution and deploys it in the Cloud environment (step 5).
After deploying the service application, CASViD monitors the
application execution and sends the monitored information to
the SLA management framework (step 6) for processing and
detection of SLA violations.
The VM congurator and application deployer are com-
ponents for allocating resources and deploying applications
on our Cloud testbed. They are included in the architecture
to show our complete solution. The Application Deployer is
responsible for managing the execution of user applications,
similar to brokers in the Grid literature [1], [16], [26], [30],
focusing on parameter sweeping executions [11]. It simplies
the processes of transferring application input data to each
VM, starting the execution, and collecting the results from the
VMs to the front-end node. The mapping of application tasks
to VMs is performed dynamically by a scheduler located in
the Application Deployereach slave process consumes tasks
whenever the VM is idle. Further details on this component
and VM congurator are found in our previous work [17],
[18]. The execution of the applications and the monitoring
process can be done automatically by the Cloud provider,
or can be incorporated into a Cloud Service that can be
instantiated by the users.
The proposed CASViD architecture is generic in its usage
as it is not designed for a particular set of applications. The
service interface supports the provisioning of transactional as
well as computational applications. The SLA management
framework can handle the provisioning of all application
types based on the pre-negotiated SLAs. Description of the
negotiation process and components is out of scope of this
paper and is discussed by Brandic et al. [8].
A. System and Application Monitor
CASViD architecture contains a exible monitoring frame-
work based on the SNMP (Simple Network Management
Protocol) standard [12]. It receives instructions to monitor
applications from the SLA management framework and de-
livers the monitored information. It is based on the traditional
manager/agent model used in network management. Figure
2 presents the monitor architecture. The manager, located in
the management node, polls periodically each agent in the
cluster to get the monitored information. In order to enhance
its scalability, the monitor uses asynchronous communication
with all cluster agents. It is composed of a library and an agent.
The monitor agent implements the methods to capture each
metric dened in the CASViD monitor MIB (Management
Information Base). At the manager side, the monitor library
provides methods to congure which metrics should be cap-
tured and which nodes should be included in the monitoring.
The SLA management framework in the system architecture
uses this library to congure the monitoring process and
retrieve the desired metrics. The retrievement process can be
done by collecting the metrics information from application
or operating system log les.
Management System
Montor Lbrary
Management Node
SNMP Protoco
SNMP Agent
Montor Agent
Montor
MIB
/proc
Appcaton
Foder
Appcaton
Foder
Processing Node
.
.
.
Fig. 2. CASViD Monitor Overview.
Similar to other monitoring systems [21], [29], CASViD
monitor is general purpose and supports the acquisition of
common application metrics, and even system metrics such as
CPU and memory utilization. The application metrics (SLA
parameters) to be monitored depends on the application type
and how to ensure its performance.
B. SLA Management Framework
The service provisioning management and detection of ap-
plication SLA objective violations are performed by the SLA
management framework component. This component is central
and interacts with the Service Interface, Application Deployer,
and CASViD monitor. In order to manage the SLA violations,
!"# !"# !"" !"#
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 51
restrictive than the violation thresholds. With this information the system can react quickly to avert the violation
threat and save the Cloud provider from costly SLA violation penalties.
3.4.3. A multi-layer approach for cloud application monitoring [Gon11]
Hierarchical monitoring and analysis is a methodology for refining the monitoring data and analysis results in
order to achieve higher precision and also reduce the amount of data to be analysed. In the context of Cloud
computing it could be taken into account for the purpose of load lightening (i.e., the amount of data to be
analysed) and reasoning on monitoring data. A three-dimensional approach for cloud application monitoring is
proposed in this work encompassing the Local Application Surveillance (LAS), the Intra Platform Surveillance
(IPS) and the Global Application Surveillance (GAS) dimensions with the interconnection and subcomponents
shown in Figure 3.4.c. LAS monitors the application instance to check for rules violations. For the purpose of
further analysis, the output of the LAS is sent to the assigned IPS, which is an extra monitoring mechanism at the
level of one particular VE and analyses data from different VMs running on the same machine looking for issues
arising as a result of interaction between VMs and between the applications running on the same VM. The
filtered results are then sent to the GAS components for further analysis.
Figure 3.4.c. LAS, IPS and GAS layers.
The aim of the optional GAS component, which is assigned one per application (not instance), is monitoring
the software pieces and detecting modelling and implementation problems through analysing data from different
machines (from several IPS components) referred to the same application. The global view of the GAS
components reveals the behaviour of the software in different virtualized environments inferring proper
conclusions for both the applications users and developers.
3.4.4. Cloud Application Monitoring: the mOSAIC Approach [Rak11]
The building of custom monitoring systems for Cloud applications is facilitated using the mOSAIC API. The
mOSAIC approach as a whole contains four modules as the API, the framework (i.e., platform), the provisioning
system, and the semantic engine. The API and the framework aim at the development of portable and provider
independent applications. The provisioning system works at IaaS level and resource management. The
functionality of the provisioning system is a part of the Cloud agency [Ven11]. The framework is a collection of
predefined Cloud components in order to build complex applications. The framework constitutes a PaaS
enabling the execution of complex services with predefined interfaces. The mOSAIC SLA management
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 52
components are also part of the Framework. The API offers the implementation of a programming model in a
given language (currently Java, and Python in the future) to build applications. The API provides new concepts
(e.g., the Cloudlet or the Connector) in order to focus on Cloud resources and communications instead of the
resource access or communication details. The mOSAIC architecture is depicted in Figure 3.4.d.
Figure 3.4.d. The mOSAIC Architecture
Resource Monitoring is implemented by the Cloud agency. Archiver, which is a monitoring agent offered by the
agency, collects monitoring information from the agents distributed on the monitored resources and stores the
messages in a storage system. The Monitoring agent has the ability to collect information from common
monitoring systems (e.g., Ganglia, Nagios, SNMP-based applications, etc.) and publish them on the same
storage. Applications interact with the Archiver through a connector. The Observer component generates events
on the (resource) event bus by accessing the storage filled by the Archiver. Integrated in the Cloudlet,
Application Monitoring connector is responsible for application component monitoring and generating events on
the connected buses. Then after application components need to share the monitoring information and manage
the related events.
The mOSAIC monitoring API offers a set of connectors representing an abstraction of resource monitoring and a
set of drivers implementing different ways of acquiring monitoring data (from different techniques); therefore, it
supports monitoring by (i) offering a way to collect data directly from any of the components of a mOSAIC
application, (ii) offering a way to collect data for any proposed monitoring techniques (accessing Cloud-
provider, resource-related, and mosaic monitoring tools (called M/W (monitoring/warning) system)) and (iii)
supporting the mOSAIC Cloud application in order to access data regardless to the technology of the acquired
resources and the way they are monitored. The aim of the set of mOSAIC monitoring tools, offered by the
mOSAIC framework, is offering the ability to building up a dedicated monitoring system.
3.4.5. M4Cloud, A Generic Application Level Monitoring [Mas11]
This model-driven approach classifies and monitors application-level metrics in shared environments such as
the Cloud. The basis for the implementation of the monitoring phase is the Cloud Metric Classification (CMC).
CMC identifies the following four models: application based (e.g., generic/specific), measurement based (e.g.,
direct/calculable), implementation based (e.g., shared/individual) and nature based (e.g., quantity/quality)
models. The application based model supports the distinction of the metrics on the basis of the application they
belong to. The measurement based model is applied to define the formulas from which metrics can be calculated;
the Implementation Based Model defines for each metric the corresponding measurement mechanisms,
Figure 1: mOSAIC Monitoring Components Architecture
by the user, publish resource-related information, such as
the CPU usage, at given time intervals. The mOSAIC de-
veloper has the role of developing the monitoring Cloudlet,
which, through the connectors to the Event bus is able to
retrieve all the monitored events. Moreover it is able to ac-
cess to the Cloud Agency in order to retrieve more general
purpose information.
6 Related Work
As discussed in previous sections, in monitoring of
Cloud-based applications, we can distinguish two sepa-
rate levels: the infrastructure level and the application-level
monitoring. Infrastructure-level resource monitoring [7]
aims at the measurement and reporting of system parame-
ters related to real- or virtual infrastructure services offered
to the user (e.g. CPU, RAM, or data storage parameters). At
this level, some of the better advertised Cloud-monitoring
solutions include the Nimsoft Monitor [4], Monitis [3], or
the Tap in Cloud Management Service [5]. These services
cover different subsets of Cloud-service providers, and they
support different measurements based on the actual Cloud
provider that is being monitored. On the application level,
the nature of the monitored parameters, and the way their
values should be retrieved depend on the actual software
being monitored, and not on the Cloud infrastructure it is
running on. VMware vFabric Hyperic [6], e.g., specializes
in web-application monitoring, and is capable of monitor-
ing applications that utilize any of the app. 75 supported
web technologies. A more general approach is to utilize the
JMX Java framework [1], that is employed by most Java
application containers, and is capable of providing infor-
mation on the status of the application running in the con-
tainer. This, however, requires that the application is writ-
ten in Java, and that it is prepared to publish information
through the JMX subsytem. Regardless of the level of pa-
rameters being monitored, a general monitoring infrastruc-
ture is required to collect and process the information pro-
vided by the monitored components. Such an infrastructure
is provided by the Lattice framework [2] (also utilized in
the RESERVOIR project [13]) that has a minimal run-time
footprint and is not intrusive, so as not to adversely affect
the performance of the system itself or any running applica-
tions. The framework denes a system of data sources, data
consumers, and control strategies that inuence the collec-
tion of monitoring data. The monitoring data can be trans-
ported over IP multicast solutions, Event Service Bus, or a
publish/subscribe mechanism. This is a very exible frame-
work, which is tailored towards distributed applications, but
not Cloud applications. In contrast, in the mOSAIC mon-
itoring subsystem, we can utilize such Cloud-oriented ser-
vices as reliable messaging and exible storage options for
the measurement data.
!"#
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 53
coherently with the formulas defined at the previous step; finally, the nature based model defines the nature of
the metrics and their definition within SLAs. More info on the models could be found in the original article in
[Mas11].
CMC is part of the M4Cloud framework. This is shown in Figure 3.4.e. In this framework, the FoSII
infrastructure [Bra10] is used as a Cloud Management System (CMS). Monitored data is analysed and stored
within a knowledge database and then is used for planning actions. Moreover, monitored data is also acquired
and analysed after the execution of such actions, for the purpose of efficiency evaluation.
Figure 3.4.e. Architecture of M4Cloud.
3.4.6. REMO, a Resource-Aware Application State Monitoring approach [Men08]
Cost effectiveness and scalability are among main criteria in developing monitoring infrastructure for large-
scale distributed applications. REMO addresses the challenge of constructing monitoring overlays from the cost
and scalability point of views jointly considering inter-task cost-sharing opportunities and node-level resource
constraints. Processing overhead is modelled in this approach in a per message basis. A forest of optimized
monitoring trees is deployed in the approach through iterations of two phases exploring cost sharing
opportunities between tasks and refining the tree with resource sensitive construction schemes. In each iteration
a partition augmentation procedure is run generating a list of most promising augmentations for improving the
current distribution of workload among trees, considering also the cost estimation for the purpose of limiting the
list. Then, these augmentations are further refined through a resource-aware evaluation procedure and
monitoring trees are built accordingly (through the resource-aware tree construction algorithm).
An adaptive algorithm is also considered for the purpose of balancing the cost and benefits of the overlay,
which is useful especially for large-scale systems with dynamic monitoring tasks.
Planning the monitoring topology and collection frequency are important factors in keeping a balance
between monitoring scalability and cost effectiveness. The drawback of proposed approaches up to date is that
they either build monitoring topologies for each individual monitoring task (e.g., TAG [Mad02], SDIMS
[Yal04], PIER [Hue05], join aggregations [Cor05], REED [Aba05], operator placement [Sri05]) or use a static
one for all monitoring tasks [Sri05], which none of them is optimal. For instance, it could happen that two
monitoring tasks collect data over the same nodes. Hence, in such a case it is more efficient to consider just one
monitoring tree for data transmission, as nodes can merge updates for both tasks and reduce per-message
processing overhead. Therefore, it is of significant importance to consider multi-monitoring-task level topology
optimization for the purpose of monitoring scalability. Load management is another important factor to be
considered in monitoring data collection, especially for data-intensive environments, meaning that the
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 54
monitoring topology should be able to somehow control the amount of resources spent in order to collect and
deliver the data. In the case of ignoring this fact, it may lead to overloading and consequently losing of data.
Remo approach addresses all these issues considering node-level resources in building a monitoring topology
and optimizing the topology for the purpose of scalability and ensuring that no node is assigned with monitoring
workloads more than the amount that their available resources could support.
Three main advantages of this approach are as follows. At first, it identifies three critical requirements of
large-scale application monitoring including sharing message processing cost among attributes, meeting node-
level resource constraints and efficient adaptation based on monitoring task changes. Then after, a monitoring
framework optimizing the monitoring topologies and addressing the above-mentioned requirements is proposed.
Finally, techniques for runtime efficiency and support are developed as well. The figure below demonstrates the
high level model of REMO encompassing the following four components as the task manager, the management
core, the data collector and the result processor. The functionalities of each of these components are summarized
in the architecture section of Figure 3.4.f.
Figure 3.4.f. The REMO Architecture
3.4.7. Cloud4SOA
2
The Cloud4SOA monitoring offers a unified platform-independent mechanism, to monitor the health and
performance of business-critical applications hosted on multiple Clouds environments in order to ensure that
their performance consistently meets expectations defined by the SLA. In order to consider the heterogeneity of
different PaaS offering Cloud4SOA provides a monitoring functionality based on unified platform independent
metrics.
The Cloud4SOA monitoring functionality allows to monitor leverages on a range of standardized and unified
metrics of different nature (resource / Infrastructure level, container level, application level, etc.) that, based on
the disparate underlying cloud providers, allow the runtime monitoring of distributed applications so as to
enforce the end-to-end QoS, regardless of where they are deployed across different PaaS. In the scope of
Cloud4SOA several metrics have been defined (Table 3.4.a) from the cloud resource as well as the business
application perspective, but not all of them have been enforced at runtime since they only provide useful
information about the status of the application.
2
http://www.cloud4soa.eu
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 55
Table 3.4.a Cloud4SOA Metrics
Metric Description App. Cloud
CPU load The amount of computational work that the
application performs
X
Memory Load The amount of memory consumed by the application X
HTTP Response
Code
Includes custom status messages to understand the
health of the application, but also the performance of
the cloud
X X
Application and
DB Response Time
Time that measures the efficiency and speed with
which servers deliver requested web content to end
users
X X
Application
Container
Response Time
The elapsed time between the end of an inquiry or
demand on a cloud system and the beginning of a
response
X
Cloud Response
Time
The time the Cloud needs to process and forward to
the application the incoming call
X
3.5. Cost Monitoring and Measurement Indexes
A common measure for measuring cost of IT is the Return on Investment or ROI calculation. Analysts such as
Daryl Plummer from Gartner have reviewed the use of ROI calculations in the monitoring and measurement of
cloud services. His discussions focused on industrial monitoring and measurement of cloud services. Other
analysts such as Trevor Pott look at the complexity of calculating metrics such as ROI in a cloud environment
with different infrastructures and legacy operations all in the mix.
The single issue that strikes any reviewer of the state of measurement and monitoring of cloud services is that
there is no agreed way of measuring or comparing cloud services. This lack of agreement or commonality has
led to initiatives from commercial organizations and standards bodies. World Wide Web Consortium (W3C)
initiated an incubator project on the Unified Service Description Language (USDL) as a way of generating
consensus. The lack of a public standard for monitoring cloud services is not a problem for private cloud
implementations. In-house staff or service providers in a private cloud environment can facilitate interoperable
monitoring without concerning themselves about external service measurement. Private clouds are the
predominant environments in corporations at the present time, however there is a growth in rogue IT and the use
of external services. These both demand a more standardized means of monitoring and measuring cloud services.
There are a few initiatives active in the description of cloud services in index form. Their goal is to create a
standard way of comparing services during the selection process. W3Cs USDL incubator is one such initiative
The Cloud Service Measurement Index Consortium (CSMIC) is another initiative, by a number of organizations
to describe measures for use in the comparison of service behaviour at service selection time.
3.5.1. Unified Service Description Language
USDL was the name given to an incubator project from W3C. USDL extends the state of the art in many fields
of service description and is seen as an extension of work done in the semantic web in general and linked data in
particular. USDL is seen as a language based method of aligning business services by using a common
description. The incubation group has completed its work and has delivered a report that contains their
recommendations.
It is clear from the report that USDL requires additional work to make it valid for use in cloud services.
Particularly there are requirements to create module specific processes as well as descriptions. A good example
of this is the legal module. In the legal module there will need to be different processes for each jurisdiction.
Another identified extension to USDL is a language specific query language. Little work appears to have been
completed since the report was published.
3.5.2. Service Measurement Index
The Service Measurement Index (SMI) is a standardization initiative managed by the Cloud Services
Measurement Index Consortium, led by Carnegie Mellon University. The consortium has a current membership
of 17 organisations from all areas of IT including, universities, benchmarking specialists, software houses and
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 56
systems integrators. All have an interest in defining a common method for describing services. The consortium
meets regularly to develop definitions of service attributes and measures. These are not measures for continuous
monitoring of cloud service performance, but for service selection. Some of these measures can be modified to
describe continuous monitoring measures.
SMI is defined as a set of measures that describe an attribute that is part of a service category.
Table 3.5.a contains a list of attributes that have been prioritized for definition of measures. This is the current
list and will be expanded later in the exercise. The financial category is of particular interest to MODAClouds as
a source of information and potential cost management information.
Category Selected Attributes
Accountability Compliance, Ease of Doing Business, Provider Certifications, Provider contract/SLA
verification
Agility Elasticity, Portability, Scalability
Assurance Availability, Reliability, Resiliency/fault tolerance
Financial Acquisition, On-going cost, Transition costs
Performance Functionality, Interoperability, Service response time
Security and Privacy Access control & privilege management, Data integrity, Data privacy and data loss
Usability Accessibility, Learnability, Suitability
Table 3.5.a. List of CSMIC Prioritised Attributes by category
Each attribute has one or more measures defined and for each measure there is a template for the description that
contains a number of fields. There is the normal identification and meta-data and the measure description. The
measure description contains information about how the measure is described; the frequency and units of
measure plus the formula for calculating this measure. Some formulae are simple yes/no questions, for example
Is the service supplier Sarbanes-Oxley certified? Other formulae are more complex and are used to describe a
measure. Measures are weighted based on their importance to the selector of services.
Data is gathered to allow calculations of a measure. The data is either gathered by benchmarking or by the
contribution of service suppliers. In a prototype exercise in 2011 several cloud service suppliers donated
performance, security and quality metrics. A leading university benchmarked several public services to gather.
This formed the basis of examples of service selection that validated the approach. From those early days the
basic QARCCS (Quality, Agility, Risks, Cost, Capability, Security) model was modified to the current list of
categories.
The SMI is now in the final stages of internal review of attribute measure definition and the next stage is for
those measures to be passed to the wider community for external review. The review should be completed in the
second quarter of 2013. During this review period the measures will undergo further refinement. A user scenario
and a demonstration tool for calculating service selection heat-maps are also under development. Defined
measures and data will allow service selection and the further development of a standard measure of cloud
services from SMI. The use of SMI to select services is illustrated in Figure 3.5.a.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 57
Figure 3.5.a. Service selection heat-map for dummy service.
If the user placed a high priority on the security and privacy category then despite a high score for usability this
service would not be acceptable. Even if a service does not provide an adequate score in a category this does not
mean the service is always unacceptable. If may become acceptable with suitable risk mitigation.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 58
Table 3.5.b. Monitoring component in EU Projects
Tools Features
4CaaST Reused Frameworks from the state of art:
Collected Framework(JMX, monitor applications that have MBeans)
Publish/Subscribe Middleware(framework used: SilboPS)
JASMINe(collector provided: MbeanCmd)
To access monitoring services inside 4CaaSt, a TCloud REST-based API is proposed.
Design a new environment which can use all the different available generic monitoring
systems(e.g. Ganglia, collectd, mBeanCmd, etc),
Cloud4SOA Front-End(visual information about the status (life cycle) of the deployed applications)
Back-End(Collect data from platforms)
Cloud-TM The skeleton of our prototype implementation of WPM(Workload and Performance Monitor) is
based on the Lattice framework;
The prototypal implementation of the data platform oriented probes has been extensively based
on the JMX framework
Contrail Monitoring is based on the monitoring solution developed in the SLA@SOI project
(architecture);
TheWeb hosting service: use Ganglia for monitoring several application-specific parameters
OPTIMIS
Java, RESTful
Web Services
Interfaces to downstream components querying data from the
Monitoring Infrastructure (e.g. TREC components and Monitoring
website)
Interfaces to upstream components inserting data into the Monitoring
Infrastructure (e.g. Probe and scripts).
MySQL Storage of monitoring data
Google Web
Toolkit
Monitoring website
Vision
Cloud
The Monitoring system is responsible for collecting cluster level usage records and aggregate
them to generate cloud level usage records. The generated records are pushed to the Accounting
system via a RESTful web interface.
mOSAIC Regarding monitoring of cloud-based applications, we distinguish two separate levels: the
infrastructure level and the application-level monitoring.
Connector API for Monitoring: Use JSON(JavaScript Object Notation) for data exchange
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 59
4. State-of-the-Art: QoS Management
4.1. Preamble
Quality of Service (QoS) plays a central role in the optimal delivery of web services, and as more applications
are deployed on public clouds, the task of handling QoS becomes harder. As more applications share the same
infrastructure, their demand for resources may create contention that reduces the QoS perceived by the user. In
this section we review methods that have been proposed during the last decade to provide tools to system
administrators to manage QoS of web applications.
To provide a better view on this problem we have divided this section in three parts:
Data analysis and forecasting. The first section deals with need to accurately describe the system
workload, as this drives the demand for resources. The high variability and auto-correlation present in
web application workloads calls for advanced modelling approaches for prediction and evaluation.
These include statistical regressions, autoregressive models, and machine learning techniques.
Runtime QoS models. This section presents recent advances on QoS runtime models, which are tools to
evaluate the performance of the application under a given workload mix, resource availability, and
resource management policy. These models attempt to predict the effect of a reconfiguration on the
system performance, allowing an to predict the benefits of a reconfiguration before applying it, and to
consider future changes needed in order to cope with a potential change in the workload. The models
used in this section are based on statistical inference, control theory, and queueing theory.
SLA management. This section describes methods to determine resource management policies to cope
with Service Level Agreements (SLA). SLAs exist between application provider and end users and also
between application provider and cloud provider. We here focus on the first kind, i.e., agreements with
the end users. We consider policies for application placement, admission control and capacity
allocation. To determine the most appropriate policy, optimization and game theoretic methods are
reviewed.
4.2. QoS Data Analysis and Forecasting
4.2.1. Problem
MODAClouds will offer a data analysis platform that will serve among its main purposes the goal of
parameterizing models of cloud applications in order to deliver predictions of their QoS metrics. Classical QoS
data analysis involves service demand estimation and traffic forecasting. Service demand estimation
means that approximate the service demand for different classes of requests by analysing the log file or
streaming data. Typical interesting data metric includes response time and server utilisation. Traffic
forecasting problem is to forecast the incoming workload by analysing the historical data and obtain the
future trend. This will bring the need for methods to keep predictive models consistent with observations to
maximize predictive accuracy. A similar concept is adopted for example in [Shi06], [Coh04], [Des12] where the
runtime engine features statistical learning methods, classification, regression, adaptive re-learning. Machine
learning methods can be more flexible than stochastic models in capturing dependencies in empirical data.
However, they can be less accurate in what-if analysis since they take a black-box system view. For example,
since they do not model scheduling mechanisms, it is difficult to predict the effects of changes in request
priorities, which can, instead, be simple to predict with a stochastic model. Unsupervised methods may also be
inapplicable for predicting metrics that are unobservable due to overhead concerns (e.g., threading levels).
While definition of machine learning methods is usually embedded with the modelling technique itself (e.g.,
training algorithms for neural networks), less standardized data analysis methods are required to parameterize
QoS stochastic models. Since WP5 aims at leveraging such models for design-time predictions, the runtime
environment will take advantage of the WP5 models and therefore will need mechanisms to parameterize them.
QoS model parameterization can be broadly divided into direct measurement techniques and statistical inference
methods. Direct measurement parameterization, as used for instance in [Urg05], is usually expensive in terms of
overhead because it instruments the code or tracks the requests to see what they do However, an estimation
problem can be formulated using statistical inference methods and this can be reapplied periodically. In the next
sections we focus on statistical inference methods for data analysis and forecasting of QoS model parameters. As
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 60
the methods share similarities in the techniques used, we provide a brief review of these techniques in the
following section.
4.2.2. Overview of data analysis, forecasting techniques, and queueing models
Before describing recent works on this area, we overview some of the main techniques used for data analysis and
forecasting.
Regression analysis
Regression is a statistical technique to estimate the relationship among observed variables. Regression
models can be formulated as: Y = f(X, !), where X is independent variable and Y is dependent variable.
Both X and Y are observed variables and ! is the variable to estimate. The most simple regression
method is linear regression which assumes a linear relationship Y = X! + ! between the variables.
Regression approaches are widely used for prediction and forecasting. Classical approaches include
ordinary least squares linear regression, which is trying to find the simple linear relation that minimises
the sum of squared residuals, and non-linear methods such as SVM regression, which uses Support
Vector Machine (SVM) to obtain the non-linear relation between variables.
Autoregressive models
Autoregressive models are mathematical models describing time-varying processes. Classical methods
include autoregressive moving average models (ARMA) and their generalization form of autoregressive
integrated moving average model (ARIMA). The ARMA models form a class of linear time series
models. By adjusting the order of the model, any linear time series models can be approximated with a
desired accuracy. Autoregressive models can also be used to forecast time series which has been
mainly used in economics and natural science.
Kalman filter
Kalman filter is a technique to estimate the states of a running system by analysing the system input and
noisy and incomplete observed data. Kalman filter works in two steps. First, the algorithm estimates the
current system state and the uncertainties. Once obtaining the observed measurement of the system,
Kalman filter tries to update the previous estimates using a weighted average, with higher weight to the
estimates with lower uncertainty. Kalman filter is a recursive estimator and it only requires the current
measurement and previous state, therefore it is suitable for online parameter estimation to achieve
adaptive management of the system.
Machine Learning methods
Machine learning algorithms have been studied extensively over the last century and their application in
queueing system raise much interest in this decade. Machine learning has the advantage that it requires
no internal structure of the system and treat it as a black-box. Therefore they are more flexible than
stochastic models in capturing dependencies among the data. Techniques like Support Vector Machine
(SVM), Artificial Neural Network (ANN), Clustering and Bayesian models has been studied and
applied to recognise patterns of workload and predict and forecast future events.
Along this section we will repetitively make reference to queueing systems and queueing networks, which
among the most important modelling tools for QoS management. To make this document self-contained, we
provide a brief overview of these techniques, pointing to deliverable D5.1 for additional details:
Queueing systems
A queueing system is a mathematical model that consists of one or several servers that deliver a time-
consuming service to a population of clients/requests. A queueing system can be described using the
Kendall notation: A/B/c/k, where A describes the request arrival process, B describes the service
process, c is the number of servers in the system, and k is the number of spaces in the system, including
service and waiting spaces. For instance, the most traditional queue is the M/M/1/! queue, where the
inter-arrival request times and the service times follow an exponential distribution, there is one server
and infinite room for holding waiting requests. Other usual values for the A and B components include
a general distribution G and the Erlang distribution Er, among others. Another relevant aspect of a
queueing system is its service discipline, which determines how the server/resource is allocated among
the incoming requests. Common service disciplines include First-Come-First-Serve (FCFS), Last-
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 61
Come-First-Serve (LCFS), Processor Sharing (PS) and Generalized Processor Sharing (GPS), among
others.
Queueing networks
A queueing network is a collection of queueing systems (each one a node in the network) that interact
through their arrival and departure processes. When a request finishes being served at a node, it may
move to another node in the network, or leave the network, according to a probabilistic routing matrix.
Further, the requests can be classified in different classes, depending on the probability laws that govern
their external arrivals, services and routing. Queueing networks are ideal to analyze systems where
several resources are accessed by external requests.
Layered queueing networks
Layered queueing networks are an extension to queueing networks that allows the representation of
computer systems composed of several layers of software servers that share hardware resources. The
layers play an important role in software applications as they capture the blocking and waiting that a
software server (in a layer) experiences when it requests a process from another server (in a lower layer)
in order to complete its service. When performing such a request, the calling server gets blocked and is
unable to provide any service, a feature that product-form queueing network models are not able to
capture.
4.2.3. Statistical Inference for QoS model parameterization
Statistical inference techniques differ from direct data measurement techniques because they aim at calibrating
QoS model parameters from aggregate statistics such as CPU utilization or response time measurements.
In [Men94], a standard model calibration technique is introduced. The technique is based on comparing the
performance metrics (e.g., response time, throughput and resource utilization) predicted by a performance model
against measurements collected in a controlled experimental environment varying the system workload and
configuration. Given the lack of control over the system workload and configuration during operation,
techniques of this type may not be applicable for online model calibration.
In [Rol95], linear regression is used for parameter estimation and is found to be accurate with less than 10%
error with respect to simulation data. However, regression fails when there is not enough variability in the
observed data. [Rol98] studies the precision of linear regression using simulation of different service time
distributions, which is shown to decrease as the service variance grows. In [Liu05], performance models are
calibrated by application-independent synthetic benchmarks. The approach uses middleware benchmarking to
extract performance profiles of the underlying component-based middleware. However, application-specific
behaviour is not modelled.
The study in [Zha07] presents a regression-based approximation of the CPU demand of customer transactions,
which is later used to parameterize a queueing network model where each queue represents a tier of the web
application. It is shown that such an approximation is effective for modeling different types of workloads whose
transaction mix changes over time. Moreover, [Cas08a] presents an optimization-based inference technique that
is formulated as a robust linear regression problem that can be used with both closed and open queueing network
performance models. It uses aggregate measurements (i.e., system throughput and utilization of the servers),
commonly retrieved from log files, in order to estimate service times. The work in [Pac08] considers the problem
of dynamically estimating CPU demands of diverse types of requests using CPU utilization and throughput
measurements. The problem is formulated as a multivariate linear regression problem and accounts for multiple
effects such as data aging.
In [Kal11], an on-line resource demand estimation approach is presented. An evaluation of regression techniques
Least Squares (LSQ), Least Absolute Deviations (LAD) and Support Vector Regression (SVR) is presented.
Experiments with different workloads show the importance of tuning the parameters, thus the authors proposes
an online method to tune the regression parameters.
In [Kal12], a novel approach of resource demand estimation is proposed for multi-tier systems. The Demand
Estimation with Confidence (DEC) approach it proposes can effectively overcome the problem of
multicollinearity in regression methods. DEC can be iteratively applied to improve the accuracy. A thorough
evaluation demonstrates the effectiveness of the algorithm.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 62
Other approaches to model calibration are presented in [Wu08] and [Zhe08]. Both of them use Extended Kalman
Filter for parameter tracking. While in [Wu08], a calibration framework based on fixed test configurations is
proposed, [Zhe08] applies tracking filters on time-varying systems. [Zhe08] extends [Zhe05], where the use of
an extended Kalman filter is investigated to adjust the estimated parameters based on utilization and response
time measurements. The above approaches to model calibration, however, have not been validated in scenarios
of a realistic size and complexity yet and it is currently not clear if they can be used as a basis for online model
calibration.
The study in [Cre10] proposes a method based on clustering to estimate the service time. The authors employ
density based clustering to obtain clusters and then use clusterwise regression algorithm to estimate the service
time. A refinement process is conducted between clustering and regression to get accurate clustering results by
removing outliers and merging the clusters that fit the same model. This approach proves to be computationally
efficient and robust to outliers.
[Cre12] proposes an algorithm to estimate the service demands for different system configurations. A time-based
linear clustering algorithm is used to identify different linear clusters for each service demands. This approach
proves to be robust to noisy data. Extensive validation on generated dataset and real data show the effectiveness
of the algorithm.
[Sha08] explores the problem of inferring workload classes automatically from high-level measurement of
resources (e.g., request rate, total CPU and network usage) using a machine learning technique known as
independent component analysis (ICA).
In [Sut08], the authors propose using an inference method to estimate the parameters in a queueing network.
This method can effectively overcome the problem of queueing models which require distributional
assumptions. From the perspective of graphical models, a Gibbs sampler and stochastic EM algorithm for
M/M/1 FIFO queues are proposed to estimate the parameters of the queueing network from incomplete data.
The work in [Liu06] proposes instead service demand estimation from utilization and end-to-end response times:
the problem is formulated as quadratic optimization programs based on M/G/1/PS formulas; results are in good
agreement with experimental data.
The work in [Spi11] presents a thorough investigation of the state-of-the-art in resource demand estimation
technologies. Those technologies are analysed and compared in the same environment. By adjusting the
parameters of the environment the accuracy of the algorithms can be compared and possible directions for future
research can be obtained. The following table provides a classification of the approaches reviewed according to
the main techniques employed by each of them.
Overall, we may find that regression analysis tend to be the simplest the method for model parameterization. It
requires the assumption of the hidden relation between variables, such as linear relation. Kalman filter is suitable
for online model parameterisation because it is able to recursively adapt the parameters. However, this may
introduce significant overhead to the system. Therefore it is suitable to close the feedback loop in WP5 when
combined with layered queueing network, when no short-time execution is required. Machine learning
techniques have the advantage of no need for the knowledge of internal structure of the system. However, when
it comes to what-if analysis, machine learning cannot provide much useful information. Queueing-based
inference is able to provide useful insight into the system; however it requires the assumption of the queueing
distribution. The reviewed approaches are classified in Table 4.2.a, according to the techniques used.
Table 4.2.a Summary of QoS model parameterization methods
Method type References
Regression analysis [Rol95] [Rol98] [Zha07] [Kal11] [Kal12]
Kalman filter [Zhe05] [Zhe08] [Wu08]
Machine learning [Cre10] [Cre12] [Sha08] [Sut08]
Queueing-based inference [Liu06] [Sut08]
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 63
Above model parameterization methods can be selected and compared and then employed in MODAClouds. If
none of them is efficient enough, a novel approach may be developed. For QoS model parameterization, the
most important is the accuracy. Above methods demonstrate their effectiveness under different circumstances.
For example, the Kalman filter can be accurate but it requires recursive execution which lead to a strong
overhead. Another issue for runtime model parameterization is that this requires the approach to be conducted in
a short time period. Regression can be easily the fastest; however, it requires the assumption of the relation and
may lose accuracy if the real relation is different.
4.2.4. Workload forecasting methods
There are many approaches to predict future workload. These approaches require extensive profiling and log
data about the running system and then use the data to extract interesting information from the system with the
help of techniques such as machine learning and data mining.
Autoregressive methods, also known as Box-Jenkins algorithms, have been proposed to forecast workload time
series. In [Lu09] the Box-Jenkins algorithms are combined with simulation technologies to incorporate risk and
uncertainty analysis. [Ver07] proposes a hierarchical framework to both short-term and long-term web server
workload. The authors use Dynamic Harmonic Regression (DHR) to model the long-term workload and use an
autoregressive model to predict the short-term workload. The parameters of both methods are estimated using
Sequential Monte Carlo (SMC) algorithms. Experiments result show that the framework is robust to outliers and
non-stationary in the data.
An interesting approach is to combine autoregressive methods with machine learning techniques. For instance,
the work in [Zha03] proposes a forecasting technology combining both ARIMA and Artificial Neural Network
(ANN) models. This approach takes the advantages of both ARIMA and ANN in linear and nonlinear modelling
for time series data. Experiments with real data show that the hybrid model has an improved forecasting
performance compared to the models used separately. Also, [Pow05] explores several machine learning and data
mining algorithms, such as auto-regressive models, multivariate regression models, Bayesian network classifiers,
to predict the short term performance of enterprise systems. They treat it as a classification question as to
whether the system will meet the target performance objective in a short time period. Besides the accuracy of
different methods, they also characterize whether they are qualified to be stand alone tools in the real system. For
example, the model should adapt to different systems and workloads and can predict with incomplete data.
Moreover, the gain of accuracy should be more than the cost of the complexity of the model. Another example of
this is [Wu10], who proposes to use Kalman filter and Savitzky-Golay filter to predict grid performance. They
use a confidence windows approach to restrict the workload prediction in a certain tolerable range to avoid large
workload fluctuations. They also present an adaptive hybrid model to extend the classic auto-regression model to
take the confidence windows into consideration and adaptively improve the prediction accuracy. The authors use
real data to prove the effectiveness compared to existing workload forecasting technologies.
Other works based on machine learning methods include [Di12], where a workload prediction algorithm based
on Bayes model is proposed. The objective is to predict the long-term workload and the pattern of it. The authors
designed nine key features of the workload and use Bayesian classifier to estimate the posterior probability of
each feature. The experiments are based on a large dataset collected from a Google data center with thousands of
machines.
Non-bayesian machine learning methods have also received significant attention, as in [Wan05], where a web
traffic trend prediction model is proposed. The neuro-fuzzy model analyses the web log data and extracts the
useful information from it. The authors build a pattern analysis and fuzzy inference system to predict the chaotic
trend of both the short-term and long-term web traffic by the help of cluster information obtained from Self
Organising Map (SOM). Empirical results demonstrate the efficiency for predicting the future trend of web
traffic. Also, in in [Kha12], the authors propose a method to characterise and predict workload in cloud
environments in order to efficiently provision cloud resources. The authors developed a co-clustering algorithm
to find servers that have similar workload pattern. The pattern is found by studying the performance correlations
for applications on different servers. Then they presented a Hidden Markov Modelling (HMM) method to
identify the temporal correlations between different clusters and use this information to predict the workload
variation in future.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 64
Methods based on trend and pattern recognition technologies are used in [Gma07] to propose a workload
demand prediction algorithm. The objective of this approach is to find a way to efficiently use the resource pool
to allocate the servers to different workloads. The pattern and trend of the workloads are first analysed and then
synthetic workloads are created to reflect the future behaviours of the workload. After obtaining the synthetic
workloads, how to place the workloads among different servers can be suggested so as to minimise the usage of
servers and balance workloads. A related approach is taken in [Hol10], where a periodicity detection approach is
proposed. The objective is to predict the workload changes in enterprise DBS which often exhibits periodic
patterns. Two methods for detecting periodicity pattern are proposed: the discrete Fourier transform method and
the interval analysis method. An algorithm is presented to relate the knowledge of periodic patterns with
workload changes.
Table 4.2.b presents a classification of the methods reviewed according to the underlying technique.
Table 4.2.b Summary of Workload forecasting methods
Method type References
Autoregressive model [Lu09] [Zha03] [Ver07] [Pow05] [Wu10]
Regression model [Pow05] [Ver07]
Kalman filter [Wu10]
Machine Learning (Bayesian) [Pow05] [Ver07] [Di12]
Machine Learning (Non-Bayesian) [Zha03] [Wan05] [Kha12]
Pattern Analysis (Recognition) [Gma07] [Hol10]
4.3. Run-Time QoS Models
Common approaches used by system administrators to characterize the runtime execution of complex software
systems include direct measurement techniques, such bytecode instrumentation via aspect-oriented programming
[Mar12]. These monitoring approaches focus on acquiring extensive profiling and log data about the offered
QoS and then provide the ability to execute statistical analysis methods and data mining to extract interesting
information about the system in execution. While this procedure is in general very important to understand the
properties of a system at runtime, it does not per-se provide mechanisms to help reasoning on how such a system
could be optimized. Such mechanisms include, for example, the ability to condense such information into
mathematical models that could be integrated within numerical optimization programs in order to find the best
choice for a decision parameter. Another example is to determine the correlation between a request resource
consumption on a server and the resource consumption it requires on a different server. While footprinting
methods exist to track the identity of a transaction across a distributed system, they are not always adopted and
furthermore they do not allow to clearly map the resource consumption of a request across all the software and
hardware layers that contribute to its processing. Hence, statistical reasoning is needed to understand such
correlation from monitoring.
Several works have attempted to use statistical learning methods, such as classification, regression and adaptive
re-learning, to characterize a system in execution at runtime and operate predictions on its QoS. Others have
focused on the estimation and tracking of the system state by means of control theoretic methods such as Kalman
filters. Yet another set of works have adopted models that describe the inner structure of the system modelled
and/or the architecture of deployed software application. These models are typically queueing networks and
queueing layered networks. In the following sections we describe recent developments in each of these
directions.
Regarding the techniques, these are closely related to the ones presented in Section 4.2, including statistical
inference, control-theoretic and queueing-based methods. For a brief review of the main techniques mentioned in
the following sections we refer the reader to Section 4.2.3. . Furthermore, product form queueing networks and
layered queueing networks are used also for design time analyses. Research works focussing at design time are
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 65
discussed in MODAClouds deliverable D5.1 (Sections 5.2 and 7). Here the works focussing on run-time
problems are considered.
4.3.1. Statistical learning models
[Aga07] presents E2EProf, a toolkit capable of tracking the end-to-end behaviour of requests in a distributed
enterprise application, such as those that are commonly migrated to the cloud. The approach looks at network
packet traces to reconstruct non-intrusively the path of a high-level request across a distributed system. A time
series approach is utilized in which cross-correlation between events in the traces is used as a driver for inference
and establishing which software components have been utilized by a transaction. Such system has been reported
by the authors as applied to production systems.
The works in [Coh04, Coh05] illustrate a methodology to predict correlation among system states based on Tree-
Augmented Networks, an efficient class of Bayesian networks. Given a monitoring trace, the approach involves
defining an ensemble of models that is continuously learned. Such models attempt to describe the probabilistic
law that puts in relation a QoS metric (e.g., CPU utilization, memory consumption, etc) with an SLO state of
compliance (service objective achieved, service objective violated). Scoring methods are used to select the best
submodel in the ensemble to use to estimate the SLO state over a moving window. The predefined sample size
can be obtained from learning surface.
Recently, [Gam12] proposes runtime QoS models in which a controller maintains a Kriging model for each
target SLO. A Kriging model is a method to describe the correlation among errors in a prediction model, thus it
differs from regression methods which instead focus on providing a prediction given modelling assumption, not
a description of the resulting error. The Kriging approach is based on radial basis functions, which are data
interpolators used in pattern recognition since many years. Essentially, they can be useful to attack situations in
which errors are correlated, a circumstance which can be more problematic to handle for regression models.
Initial results of this approach indicate that Kriging models can lead to controllers delivering very low, even
negligible, SLO violation errors.
vPerfGuard [Xio13] is a new controller capable of automatically identifying predictive metrics for application
performance and adapt dynamically to changes in such metrics. Compared to other controllers, this approach
aims at identifying the most important metrics from prediction using a machine learning approach. Correlations
across metrics are considered for feature selection. Subsequently, modelling is performed using methods such as
linear regression, k-nearest neighbor (k-NN), regression trees, and boosting which are compared for their
predictive capabilities. Similar methods are adopted for example in [Shi06], [Coh04], [Des12] where a runtime
engine is proposed that features statistical learning, classification, regression, and adaptive re-learning.
IRONModel [The08] is a performance management system that maintains a modelling description of a
distributed system by dynamically analyzing its traces and discovering automatically new correlations between
performance metrics and system attributes. The model is built by system designer incorporated in the system.
The underlying modelling approach is based on zero-training classification and regression trees (Z-CART). The
underlying models rely in part on operational analysis and bound analysis laws developed in the context of
queueing theory, however the approach combines these formulas in a machine learning framework. Compared to
other approaches in this section, IRONModel also features active probing to accelerate training.
Reinforcement learning has also been proposed as a method to build run-time QoS models [Tes05]. Although
these methods may provide good results without specifying an underlying traffic model, they also require
significant online training, which can be very expensive in production systems. To prevent this, hybrid methods
[Tes06] have been considered, where the initial policy is provided by an analytic model, that is afterwards
improved by solutions found by a reinforcement algorithm trained offline using previously collected
information.
[Tan12] introduces PREPARE, an online anomaly prediction and virtualization-based prevention system. Its
anomaly detection module consists of a 2-state Markov model to predict the future values of relevant attributes,
and a tree-augmented Bayesian network model to classify those future states between normal and abnormal. In
addition, it provides a module to determine the faulty VMs causing the anomaly, as well as an actuator module
that perform preventive actions to avoid SLO violation states.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 66
Many of the statistical learning methods mentioned so far, such as [Tan12,Coh05,Dua09] rely on labelled
training data, i.e. data from the production system that includes monitoring metrics and annotations of whether
the system is violating an SLO or not. As these data is not readily available in most systems, [Dea12] introduces
an unsupervised learning algorithm that is able to predict anomalies in a virtualized data centre, without the need
of training data. To this end, the authors rely on the Self Organizing Map method which can describe complex
system behaviours with a smaller computational cost than other unsupervised methods.
[Mal11] employs a multi-model for n-tier application control, where an empirical model learns the best decisions
for each possible configuration and workload. As initially the system under control has no logs to learn from, the
decisions are taken based on another model, in this case a Horizontal Scale Model. Once a decision is known for
a given configuration, the empirical model takes over and applies the decision already known as the best for that
configuration. Although proposed initially as relying of the Horizontal scale model, this meta-model can actually
operate with any of the models proposed in this or the following sections.
4.3.2. Control theory models
Some works have instead attempted to use modelling techniques based on control-theory, such as Kalman filters
[Kal09] and Linear parametrically varying (LPV) models [Tan10]. Control theory has also provided a framework
to analyse the behaviour of policies for autonomic control. [Dut10] uses this framework to analyse the challenges
of threshold-based and reinforcement learning approaches, considering aspects that affect the stability of an
autonomic system, such as the latency and power of the controller, and oscillations in the input variables.
Kalman filters have been applied to control resource consumption in runtime web applications in works such as
[Zhe05], [Kal09]. Here we discuss the underlying resource consumption models. [Zhe05] uses a modelling
methodology based on layered queueing models, which are reviewed in Section 4.3.4. Conversely, [Kal09]
illustrates the application of feedback-loop models used in control theory to distributed systems. It proposes
three Kalman filters to model the dynamics of a software application and applies them to the control problem
showing good accuracy. Such filters are respectively based on a Single Input Single Output (SISO) model
relating input workload and CPU utilization, Multiple Input Multiple Output (MIMO) model relating
covariances between VM utilizations, and an adaptive version of the latter that is self-configured.
LPV models are a class of control-theory methods that allows to describe the dynamics of a complex system in
terms of an input and a set of so-called scheduling variables that are variables describing the operational
condition of the system [Lee99], [Nem95], [Lov98], [Bam99], [Ver02]. An LPV model is linear in the
parameters and a vector of scheduling variables enters the system matrices in an affine or linear fractional way.
Single-inputsingle-output (SISO) and multiple-inputmultiple-output (MIMO) state space LPV models have
been both considered in the literature. For example LPV methods have been investigated in [Ver02], [Van09]
and their performance assessed on experimental data measured on a custom implementation of a workload
generator and micro-benchmarking Web service applications. The results show that the LPV framework is very
general, since it allows describing the performance of an IT system by exploiting all of the available technical
parameters to manage QoS.
[Tan10] introduces an LPV model to identify the dynamics of a web service, and defines an optimal control
problem based on this model. The solution of this optimal control problem is then used to define an optimal
policy to manage the trade-off that arises between the QoS guarantees and the energy consumption. In [Gia11]
the stability properties of an LPV-based proportional controller are analysed. The controller is designed for
admission control in web services and the LPV model is used to design the controller.
[Lim10] proposes a proportional threshold control for elastic storage in cloud platforms. The controller explicitly
considers the resources as discrete quantities, which is in line with per-instance pricing in platforms such as EC2.
The controller also considers the actuator lag generated by the delay of redistributing data to new storage servers.
Other approaches include the use of fuzzy logic [Xu07] to design a two-level controller for resource allocation in
a virtualized datacentre. Fuzzy logic difers from Boolean logic in that membership of element to a set is not
either 0 or 1, but can be any real in the interval [0,1]. With this generalization, [Xu07] proposes a model that
learns the relationship between workload and resource demand for a given QoS level. From this model inference
functions are derived that determine the appropriate resource allocation for a given workload.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 67
4.3.3. Product-form queueing networks
Queueing networks have been among the first methods used for runtime control of software systems. Their
distinguishing feature compared to the models described above is their ability to consider white-box information
about a system in the runtime prediction. Often, this does not imply major limitations from a computational point
of view, since efficient iterative algorithms and fluid methods exist to approximate the solution of such models
in short amount of time. Recently, [Cas08] shows how such models can be applied to integrate a more realistic
description of the application workloads, including burstiness and fluctuations in the surrounding operating
environment (e.g., network bandwidth fluctuations in the cloud).
Early work in [Men03, Ben04], focuses on ecommerce sites and shows how queueing network models can be
used with combinatorial search techniques to determine an optimal system configuration. Periodic execution
allows adaptation at runtime. Variants of such models have been subsequently studied in works such as [Ben05,
Men07, Men05] in various application areas, including data centers. Urgaonkar et al. were able to validate a
basic product-form queueing networks for the Rubis and Rubbos [Rub] open-source benchmark multi-tier
applications [Urg05]. They also considered various non-product-form extentions to the model to better account
for several important features of their applications under study, e.g., an imbalance of load across multiple
application servers. Chen et al. represent the TPC-W [Tpc] and Rubis benchmark multi-tier applications as
multi-station queues where the multiplicity refers to the number of server processes in each tier [Che08]. They
use an approximation [Sei87] that transforms a multi-station queueing network models to an equivalent single-
station product-form queueing network models which can be solved using MVA. Lu et al. used simple product-
form models in conjunction with a feedback controller to perform runtime optimization of a single-tiered Apache
Web server system [Lu03]. [Zha07] presents a queueing network model where each queue represents a tier of a
web application, which is parameterized by means of a regression-based approximation of the CPU demand of
customer transactions. It is shown that such an approximation is effective for modeling different types of
workloads whose transaction mix changes over time.
4.3.4. Layered queueing networks
The main limitation of ordinary queueing network models is that they describe the resource consumption
mechanisms of the software, but they do not explicitly take into account known information about the software
architecture. Layered queueing models (LQM) [Rol95, Woo95] are an extension to queueing networks that
allows the representation of computer systems composed of several layers of software servers that share
hardware resources, and have therefore been extensively applied to software system research. LQMs were
developed starting in the 1980s to consider the performance impact of contention for software resources, e.g.
server threads, and the interactions between software entities at various system layers, e.g., messaging between
an application server and a database server. The approach decomposes an LQM into a hierarchy of queueing
networks models. Each model in the hierarchy is solved using approximate mean value analysis and the solution
process is repeated until the individual estimates of the models are all consistent with each other.
Approximate mean value analysis [Cha82, Cre02] is a technique that allows queueing network models to be
solved iteratively in a very efficient manner thereby permitting the study of larger systems and the solution of
models at runtime. However, the technique relies on product-form assumptions which restrict its applicability. In
particular, behaviour commonly observed in complex enterprise systems such as contention for software
resources, synchronous and asynchronous request-reply relationships between software entities, and priority
based resource access all violate product-form assumptions [Alt06].
As mentioned in Section 4.3.2, [Zhe05] uses a modelling methodology based on layered queueing models,
together with an extended Kalman filter for parameter estimation. They considered a time-varying web
application, which is modelled as an LQM due to the interdependencies of its components (web server,
database). Parameters such as the clients think time, the CPU and disk demands, vary with time and its values
are estimated by the Kalman filter. With these estimated values, the LQM is parameterized and (SLA-driven)
performance results obtained. These results can then be used by an autonomic control to make decisions
regarding resource allocation to prevent SLA violations. Other works in this area include [Lit05, Woo05].
[Jun09] presents a runtime adaptation engine that allows the automatic reconfiguration of multi-tier web
applications. The engine first evaluates the potential benefits of a reconfiguration based on an LQM and its
associated costs. Based on these the engine chooses the optimal sequence of reconfigurations to be applied on
the web application. The engine is evaluated with the RUBiS benchmark [Rub] and shows a significant
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 68
reduction in SLA violations. This has been extended in [Jun10] to consider power costs, including those caused
by the transient behaviour generated by the reconfiguration.
4.3.5. Summary
From the previous discussion, it is clear that the area of QoS runtime modeling has received significant interest
in the last decade. The contributions reviewed in this section are summarized in the following table. When
considering the different options available for Qos runtime models, it is important to mention that the selection
of the modeling technique is closely related to the information available to the performance analyst.
Statistical learning and most of the control theory-based methods assume a black-box approach, where little or
nothing is known about the application inner workings and architecture. As a result, these methods may be able
to capture the result of a reconfiguration that has been considered in the past, but may not be able to adequately
predict the performance implications of a completely new configuration. On the other hand, methods relying on
queueing networks and layered queueing networks consider more information about the specifics of the
application, and can therefore better predict the results of a new configuration. However, this additional
information may not always be available, especially for the owners of the cloud infrastructure, for whom the
application may indeed be a black box. Some methods, as [Zhe05], actually combine both approaches, using the
control theory models to parameterize layered queueing networks that describe the underlying architecture of the
application.
Table 4.3.a Summary of rum-time modelling methods
Method type References
Statistical learning [Aga07] [Coh04] [Coh05] [Gam12] [Xio13] [SBC06] [CCGTS04] [DWSPV12] [The08]
[Tes05] [Tes06] [Tan12] [Dua09] [Dea12] [Mal11]
Control theory [Kal09] [Tan10] [Dut10] [Zhe05] [Son12] [Vaq08] [Gia11] [Lim10] [Xu07]
Queueing networks [Cas08] [Men03] [Ben04][Ben05] [Men07] [Men05] [Urg05] [Che08] [Sei87] [Lu03]
[Zha07]
Layered queueing
networks
[Cha82] [Cre02] [Alt06] [Zhe05] [Lit05] [Woo05] [Jun09] [Jun10]
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 69
4.4. SLA Management
Many solutions have been proposed for the management of Cloud services at run-time, each seeking to meet
application requirements while controlling the underlying infrastructure. Five main problem areas have been
considered in resource management policies design: 1) application/VM placement, 2) admission control, 3)
capacity allocation, and 4) load balancing.
The following discussion aims to figure out how these problems are addressed and to classify them according to
theoretical or applied criteria, conforming to the related research developed by the scientific community. Figure
4.4.a summarizes the classification criteria we adopt which will be examined in details in the next three
subsections. A similar approach is followed in the state-of-the-art review presented in Deliverable D5.1 that,
unlike this document focusing on run-time techniques, surveys the literature on design-time approaches to Cloud
related problems.
Figure 4.4.a. Classification criteria for SLA run-time management solutions.
4.4.1. Problem
The first category we consider is related to the problem the approaches aim to solve in the real world. Every
approach tries to achieve a certain goal in a specific context. As a first classification of the literature approaches
we consider the perspective, i.e., the actor optimizing the use of resources: Many proposals take the perspective
of the Cloud providers whose goal is to determine the optimal configuration of the underlying infrastructure in
order to satisfy incoming requests from the end-users while minimizing some cost metrics (e.g., energy). In the
opposite perspective the actor involved in resource management optimization is the Cloud end-user which
performs Cloud resource allocations according to application needs, minimizing the cost of use of Cloud
resources. This latter approach is the one that will be perceived within the MODAClouds project.
Most of the problems aim to minimize costs, others want to ensure high performance or high availability of the
system, some others to simultaneously guarantee these goals. Since the nature and the architecture of a system
are concepts difficult to be defined, it is useful to categorize some quantifiable quality attributes as performance,
cost, availability, reliability, safety, security or energy consumption.
Optimization
Approaches
Problem Solution Discipline
Quality
Attribute
Dimensionality
Constraint
Type
Decision
Variables
Optimization
Strategy
Constraint
Handling
Timescale
Type
Quality Model Perspective
Architecture
Representation
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 70
Furthermore, the set of optimized quality attributes can be aggregated into a single mathematical function or
decoupled into conflicting objective: the first one optimizes a single quality attribute only (single-objective
optimization, SOO), while the second optimizes multiple quality attributes at once (multi-objective optimization,
MOO). Often, for a nontrivial multi-objective optimization problem, there does not exist a single solution that
simultaneously optimizes each objective; in that case, the objective functions are said to be conflicting, and there
exists a (possibly infinite number of) Pareto optimal solutions. Some approaches encode priority criteria
following MOO into a single mathematical function (multi-objective weighted, MOW), others can even use
specifically designed functions.
Besides the dimensionality, each problem can be further characterized by the quality constraints that represent
additional attributes or other system properties. Constraints include structural constraints and performance
constraints as minimum throughput for the applications or available memory, limits on the overall resource
costs, fixed budget on the energy costs of the infrastructure, response time constraint. In some cases constraints
are not present.
4.4.2. Solution
The problems faced at run-time can be further analyzed on the basis of the solution category. We classify the
approaches according to how they achieve the optimization goal and thus describe the main steps of the
optimization process. First, solutions can be classified as centralized and distributed according to the framework
and to the interplay between the system factors; alternatively there are hierarchical solutions when the resources
are managed introducing multiple decision points (e.g., an high level controller assigns applications to clusters of
physical servers, while a second layer controller determines the optimal capacity allocation among applications
within the same cluster).
Within each problem, the solution is devised by the Decision Variables (DVs) available (e.g., provider selection,
application placement, capacity allocation, load-balancing, admission control). In other words, the DVs indicate
which changes of the system are considered as decision variables of the underlying optimization process.
Furthermore, approaches can be characterized according to the representation of the system under study. Firstly,
architecture representation classifies the solutions based on the information used to describe the problem
structure and configuration: according to the input required, there can be architectural model, UML (Unified
Modeling Language), ADL (Architecture Description Language) or optimization models (linear or non-linear).
Secondly, concerning with the solution technique, two main categories of optimization strategies can be pointed
out: those using exact methods or those guaranteeing approximate solutions. Among exact methods there can be
standard methods (as the branch-and-bound or dynamic programming) or problem-specific methods. Among
approximate ones, heuristic methods require problem or domain specific information to perform the search,
while meta-heuristic methods apply high-levels search strategies. The latter might exploit, for example, local
search, Evolutionary Algorithms such as Genetic Algorithms, Simulated Annealing or bio-inspired as for
example.
Another characteristic that differentiates the various searches and solution methods is constraint handling that
describes the used strategies to handle constraints. More precisely this category distinguishes if they are treated
as hard constraints or soft constraints with related penalties.
Finally, solutions are classified according to the time scale used which can range from a daily or hourly scale up
to the granularity of minutes, in some cases even seconds.
4.4.3. Discipline
Finally, the techniques used to solve these run-time service management problems advantage of various
disciplines which range from mathematics to computer science. Between the most used we find control theory
methods, machine learning and utility based methods consisting of combining performance models and
optimization models. For a detailed discussion and analysis of the disciplines see also [Ard12c, Ard08].
Furthermore, as orthogonal classification we can distinguish among pure optimization and game theory based
approaches. In pure optimization approaches a single actor optimizes, with various techniques and objectives,
his own goals without interacting with other actors. Vice versa, in game theory approaches the interaction across
different actors is non-negligible and, while perceiving his how goal, each actor (e.g., a cloud end-user) can be
affected by the actions of other actors (e.g., other end-users of the same cloud provider), not only by his own
actions.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 71
4.4.4. State of the art
The next two sections present some of the most significant works that have been carried on in the last few years
for Cloud services SLA management. First pure optimization approaches are discussed (see Section 4.4.c), and
later game theory based solutions will be considered.
Pure optimization approaches
The literature has been reviewed according to the complex taxonomy depicted in Figure 4.4.a. Many categories
and sub-categories have been considered. Tables 4.4.a to 4.4.l represent a useful and direct way to partition the
state-of-the-art literature from a specific point of view.
In what follows a brief description of the most important works published in the last few years is presented. The
papers are grouped according to the Decision Variables category. Notice that although many works should
appear several times because they present many decision variables we only report them once. However, the other
decision variables are mentioned.
Provider Selection
The works listed below have in common the fact that the methods they propose consider the selection of a
different provider at run-time.
In [Dut12] authors show SmartScale, an autoscaling framework that uses a combination of vertical (adding more
resources to existing VM instances) and horizontal (adding more VM instances) scaling mechanisms together
with the selection of the most suitable provider. This method ensures that each application is scaled in order to
optimize both resource usage and the reconfiguration costs to pay due to the scaling process itself.
In a similar way, in [Xia12], an implementation of a system that provides automatic scaling for Internet
application is described. Each application is encapsulated in a single VM and the system scales up and down,
minimizing costs and energy consumption, maximizing the throughput, deciding also the application placement
and load distribution thanks to a color set algorithm.
Finally, [Xio11] addresses two challenges: the minimization of the total amount of resources while meeting the
end-to-end performance requirements for N-tier web applications. Open and closed workloads are considered as
input for an adaptive PI controller. A SLA-based control method leads to exact solution minimizing of the
average response time.
Application placement
The application placement together with the dynamic resource allocation problem is addressed and optimally
solved in [Had12] where a minimum cost maximum flow algorithm is proposed. The solution is based on a Bin-
Packing algorithm combined with a prediction mechanism.
An opportunistic scheduling approach, instead, is proposed in [He12], where parallel tasks are considered and
low-priority tasks are allocated to underutilized computation resources left by high-priority tasks. A model
representing tasks as ON/OFF Markov chains is presented.
In [Cap10], the SOS Cloud project is presented. The project aims at providing robust and scalable solutions for
service deployment and resource provisioning in a cloud infrastructure. The project has a double objective:
meeting the service level agreement and minimizing the required cloud resources. The algorithms developed
have the additional benefit to take advantage of the cloud elasticity, allocating and deallocating resources to help
the services to respect contractual SLAs.
Lastly, a bio-inspired cost minimization mechanism for data-intensive service provision is proposed in [Wan12].
The mechanism uses some bio-inspired concepts and mechanisms to manage data application services, to create
a large services cluster and to produce optimal composition solutions. The authors propose a multi-objective
genetic algorithm capable of returning a set of Pareto-optimal solutions.
Capacity allocation
As far as the capacity allocation decision variable is concerned, the literature is marred by works considering it
as part of the proposed solution.
In [Bjo12], for instance, the authors discuss an opportunistic service replication policy that leverages the VM
workload and performance variability, as well as on-demand billing pricing models to ensure response time
constraints, while achieving a target system utilization for the underlying resources.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 72
Alternatively, one can mention [Gou11] where a Force-directed Resource Assignment (FRA) heuristic is used to
optimize the total expected profit obtained from processing, memory and communications resources. Moreover,
the results of the proposed approach are compared with those attained by relaxing the capacity constraint which
represent upper-bounds for the original problem.
Furthermore, in [Zam12] the authors show the todays limitations for Cloud Computing providers in allocating
their VMs with off-line mechanisms based on fixed-prices or auctions. Improvements have been demonstrated
by implementing an on-line mechanism that aims at maximizing the profit of each provider.
A model for applying revenue management to on-demand IT services has been presented in [Liu10]. The model
uses a nonlinear objective function to determine the optimal price over different system capacity and multiple
classes with different SLAs.
In [Lin12] a branch-and-bound approach together with an adjusting recursive procedure are proposed to evaluate
and maximize the reliability of a computer network in a Cloud Computing environment; the algorithm devised as
solution considers budget, time and stochastic capacity constraints.
Similarly, the problem of minimizing the use of resources and meeting, at the same time, performance
requirements under a certain financial budget and time constraints, has been investigated in [Tia11] for
MapReduce applications.
Load Balancing
In [Ard12b] the authors take the perspective of a Web service provider which offers multiple transactional Web
Services. They provide a non-linear model of the capacity allocation and load redirection of multiple classes of
request problems which are solved with decomposition techniques, exploiting predictive models of the incoming
workload at each physical site. A heuristic solution method for the same problem is, instead, presented in
[Ard11b].
The decentralized load balancing problem, as opposed to the traditional centralized version, has also been the
subject of recent works. [Ala09] proposes a decentralized load-balancing mechanism that considers
heterogeneous resources. The server state information is exchanged as to minimize the communication overhead
required by a decentralized approach. A bio-inspired algorithm for the load balancing problem is also discussed
in [Val11]. It is investigated an alternative for a decentralized service network, based on an unstructured overlay
network, in which the nodes that host instances of many different service types self-organize into virtual clusters.
The authors present a framework focusing on the load balancing problem, because nodes must be able to
efficiently balance the incoming requests among themselves. The proposed approach combines and exploits the
synergies between the clustering technique and superpeer topologies. Moreover, it inherits the typical benefits of
bio-inspired self-organization, such as the scalability with respect to the number of peers, and the dynamism and
robustness respect to unexpected behaviour.
Admission control
In [Wu12], cost-effective admission control and scheduling algorithms for SaaS providers are proposed in order
to maximize profits while improving customer satisfaction level.
In [Kon12], instead, a probabilistic approach aims to test admission control and to find the optimal allocation of
VMs on physical servers; the multi-objective weighted function incorporates business rules in terms of trust and
cost, and it is associated to constraints representing real factors that compromise the Cloud services, including
the provider selection, the variable number of users in time and different workload patterns.
Game Theory approaches
Game theory has found its applications in numerous fields such as Economics, Social Science, Political Science,
Evolutionary Biology. Over the last years this branch of applied mathematics has found its applications also in
problems arising in the ICT industry. For example, resource or QoS allocation problems, pricing and load
shedding, cannot always be handled with classical pure optimization approaches. Indeed, in a general complex
system interaction across different players is non-negligible: Each player can be affected by the actions of all
players, not only by his own actions. Non-cooperative Game Theory tools can reproduce perfectly this aspect. In
this setting, a natural modeling framework involves seeking an equilibrium, or stable operating point for the
system.
More precisely, non-cooperative Game Theory is the study of problems of conflict and cooperation among
multiple independent decision-makers, which means the study of the ways in which strategic interactions among
economic agents produce outcomes with respect to the preferences (or utilities) of those agents, where the
outcomes in question might have been intended by none of the agents. Each agent pursues his own interests
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 73
working independently and without assuming anything about what other players are doing. Moreover, he has to
follow certain rules while making his choices and each agent is supposed to behave rationally.
In the language of Game Theory rationality implies that every player is motivated by maximizing his own utility
(or payoff) irrespective to what other players are doing.
Given a game, which strategies will the rational players adopt? Intuitively, a player pursues the case in which his
payoff is maximized. Since the payoff function depends even on the strategies of the other players, which in turn
are maximizing their own payoff, a conflict situation is created and it is not easy to characterize the best choice
for every player. In other words, when rational players correctly forecast the strategies of their opponents they
are not merely playing best responses to their beliefs about their opponents play; they are playing best responses
to the actual play of their opponents. Indeed, the notion of a solution is more tenuous in game theory than in
other fields; it concerns with optimality, feasibility and equilibria.
In the fifties a solution concept - due to John Forbes Nash, see [74] - emerged as the most appropriate and
effective. When all players correctly forecast their opponents strategies, and play best responses to these
forecasts, the resulting strategy profile is a Nash equilibrium.
Formally, a non-cooperative game " in strategic form is a tuple {N, {X
i
}
i"N
, {#
i
}
i"N
}that consists of:
a finite set of players N $ {1,2,...,n}, where n " N;
a set of strategies X
i
for every player i " N, which is also called feasible set for player i;
payoff functions, #
i
:X
1
%X
2
%&&&%X
n
' R for each player i"N.
Moreover, we indicate with X: X $X
1
%X
2
%&&&%X
n
!R
M
the common strategy set, called feasible set or strategy
space of game "; every point x " X represents the feasible strategies of the game. Let us denote with x
!i
the set
of all the players variables, except the i-nth one: x
(i
$(x
1
,x
2
,...,x
i(1
,x
i+1
,...,x
n
) so we can write x = (x
i
, x
(i
). A
vector x " X is called a Nash equilibrium for the game if:
#
i
(x) ) #
i
(x
i
, x
(i
), #x
i
" X
i
Equivalently, x is a Nash equilibrium if and only if x
i
solves the maximization problem:
max #
i
(x
i
,x
(i
), s.t. x
i
" X
i
for all i " N, i.e., if and only if no player can improve his payoff function by unilaterally changing his
strategy.
Many approaches have been used to represent, model and manage Cloud services at run-time through Game
Theory tools. In [Fen11] authors present a methodical in-depth game theory study on price competition, moving
progressively from a monopoly market to a duopoly market, and finally to an oligopoly Cloud market. They
characterize the nature of non-cooperative competition in a Cloud market with multiple competing Cloud service
providers, derive algorithms that represent the influence of resource capacity and operating costs on the solution
and they prove the existence of a Nash equilibrium. On the dynamics of the market, a model of competitive
equilibrium in e-commerce to solve the problem of pricing and outsourcing can be found in [Dub07]; here the
analysis of pricing choices and decisions to outsource IT capability leads to a representation of the Internet
competition and extracts the maximum profit solution. Studies of the maximization of the social welfare as a
long-term social utility are discussed in [Men11]. Considering relevant queuing aspects in a centralized setting,
under appropriate convexity assumptions on the operating costs and individual utilities, the work established
existence and uniqueness of the social optimum. Furthermore, other studies based on a non-cooperative game
theory, are presented in [Wan12] where authors employ a bidding model to solve the resource allocation problem
in virtualized servers with multiple instances competing for resources. A unique equilibrium point is obtained. A
similar discussion can be found in [Wei10] where a QoS constrained parallel tasks resource allocation problem is
considered.
[Abh12] considers two simple pricing schemes for selling Cloud instances and studies the trade-off between
them. Exploiting Bayes Nash equilibrium the authors provide theoretical and simulation based evidence
suggesting that fixed prices generate a higher expected revenue than hybrid systems.
Using Bellman equations and a dynamic bidding policy, in [Zaf12], an optimal strategy under a Markov spot
price evaluation is found in order to complete jobs with deadline and availability constraints. The performance of
the model is evaluated by considering uniformly distributed spot prices and EC2 spot prices. Another work
regarding on-spot bidding is proposed in [Son12]. Authors propose a profit aware dynamic bidding algorithm,
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 74
which observes the current spot price and selects bids adaptively to maximize the average profit of a Cloud
service broker while minimize its costs in a spot instance market.
Finally, a Generalized Nash game for service provisioning problem have been formulated in [Ard12] and
[Ard11] where the perspective of SaaS providers hosting their applications at an IaaS provider is taken. Each
SaaS needs to comply with end user applications SLA and at the same time maximize its own revenue while
minimizing the cost of use of resource supplied by the IaaS. On the other end, the IaaS wants to maximize the
revenues obtained providing on spot resources.
4.4.5. Summary tables
A summary of classification proposed here is reported in Tables 4.4.a to 4.4.l.
Tables 4.4.a, 4.4.b, 4.4.d and 4.4.d relate to the Problem category while the Solution category is detailed through
tables 4.4.e to 4.4.j. Finally, Tables 4.4.k and 4.4.l represent the Discipline category following the classification
depicted in Figure 4.4.a.
Problem
The first table, Table 4.4.a, represents the partitioning of the reviewed literature according to the Perspective
sub-category. Each piece of literature can face a specific problem from two distinct points of view, focusing
either on the Cloud provider or on the Cloud end-user.
The Quality Attributes are summarized in Table 4.4.b. Four specific attributes are considered (Performance,
Cost, Availability and Reliability). Other, less common attributes are grouped under the label of Others. As is
clearly shown, the vast majority of the reviewed papers deals with the Performance, Cost and Availability
attributes.
Table 4.4.c, instead, considers the Dimensionality sub-category. It classifies the considered approaches in single-
objective (SOO) and multi-objective (MOO). In this case, one can see that the methodologies presented in
literature mainly belong to the single objective approach.
Finally, the considered taxonomy categorizes the Constraint sub-category into 5 possible attributes (Table 4.4.d)
the constraints considered by the state-of-the-art works, namely Cost, Performance, Availability, Throughput and
Memory usage. The literature is almost evenly distributed among these attributes.
Solution
The type of approach (Centralized, Distributed or Hierarchical) implemented is one of the fundamental
characteristics of a solution. Table 4.4.e details how the reviewed papers are subdivided according to this
attribute. Notice that the vast majority of them show a distributed architecture.
Table 4.4.f reports about the Decision Variables (DVs) exploited by the various solution methods in order to
effectively explore the design space. A DV is the set of possible actions that can be taken upon a current design
alternative in order to create new alternatives with, possibly, higher quality. It can be easily noticed that most of
the literature leverages the Capacity Allocation as DV.
The Architecture representation is exposed in Table 4.4.g. Clearly the state-of-the-art solutions prefer
Optimization over Architecture based models.
As far as the Optimization strategy is concerned, the proposed techniques are grouped into two main categories:
Exact methods and Meta-heuristics. Table 4.4.h demonstrates that the literature is evenly distributed between
those two approaches.
Finally, only few papers include information about the Time scale and Constraint handling approach. They are
reported and classified in Tables 4.4.i and 4.4.j.
Discipline
The last two of tables to present, namely Tables 4.4.k and 4.4.l, regroup the considered researches with respect to
their Discipline. A discipline is fully described by means of a certain Type and Quality model. Table 4.4.k faces
the type sub-category. Three typology are considered: Utility based, Control theory and Bio-inspired. The Utility
based and Bio-inspired approaches are dominant whereas only three works fall within the Control theory field.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 75
In Table 4.4.l, instead, are reported the references to the considered works according to the underlying Quality
model.
Table 4.4.a: Problem category: perspective.
Quality Attributes
Performance Cost Availability Reliability Others
[Gou11], [Tia11],
[Bjo12], [Wu12],
[Xia12], [Zaf12],
[Cap10], [Ard12b],
[Ard11b];
[Liu10], [Zaf12],
[Son12], [Men11],
[Had12], [Dub07],
[Fen11], [Wan12],
[Kon12], [Ard12b],
[Ard11b];
[Wei10],[Zam12],
[Dut12], [Had12],
[HE12], [Val11];
[LinC12],
[Xio11];
[Ala09],
[Sri08],
[Maz12],
[Dou12],
[Cap10];
Table 4.4.b: Problem category: quality attributes.
Dimensionality
Single-objective optimization Multi-objective optimization
[Gou11], [Tia11], [Bjo12], [Fen11], [Liu10], [Zaf12],
[Son12], [Men11], [Had12], [Wei10], [Zam12],
[Dut12], [HE12]; [LinC12], [Xio11], [Dou12], [Sri08],
[Wan12], [Cap10], [Kon12], [Ard12b], [Ard11b],
[Ala09];
[Wu12], [Xia12], [Dub07], [Meh12], [Maz12];
Table 4.4.c: Problem category: dimensionality.
Constraints
Cost Performance Availability Throughput Memory
[Tia11],
[Liu10], [Dub07],
[Son12], [Fen11],
[Wei10], [LinC12],
[Ard12b],
[Ard11b];
[Wu12], [Gou11],
[Tia11], [Fen11],
[Had12], [HE12],
[LinC12], [Dou12],
[Ard12b], [Ard11b];
[Zaf12], [Wei10],
[Zam12],
[Had12];
[Son12], [Dut12]; [Gou11], [Liu10],
[Meh12], [Had12],
[Xio11];
Table 4.4.d: Problem category: constraints.
Type
Centralized Distributed Hierarchical
[Bjo12], [Liu10], [Men11], [Tia11], [Dub07], [Meh12], [Wan12];
Perspective
Cloud provider Cloud end-user
[Xia12], [Dut12], [Dou12], [LinC12], [Sri08],
[Maz12], [Kon12], [Zam12], [Xio11], [NeeV11],
[Gou11], [Bjo12], [Liu10], [Wu12], [Fen11],
[Men11], [WanDJ12], [Abh12], [Ard11], [Ard12],
[Ala09];
[Tia11], [Had12], [HE12], [Zaf12], [Son12],
[Ard11], [Ard12], [Cap10], [Ard12b], [Ard11b];
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 76
[Meh12]; [Wei10], [Sri08], [Cap10], [Val11],
[Ard12b], [Ard11b], [Ala09];
Table 4.4.e: Solution category: type.
Degrees of freedom
Provider
selection
Application
placement
Capacity allocation Load
balancing
Admission control
[Xia12],
[Dut12], [Xio11],
[Kon12],
[Xia12],
[Had12],
[HE12],
[Sri08],
[Cap10],
[Kon12],
[Wan12], [Ala09];
[Wu12],[Gou11],
[Tia11],[Bjo12],
[Liu10],[Zaf12],
[Son12],[Men11],
[Fen11], [Had12],
[Wei10],[Zam12],
[LinC12],[Xio11],
[Dou12], [Maz12],
[Sri08],[Ard12b],
[Ard11b];
[Xia12],
[Val11],
[Ard12b],
[Ard11b],
[Ala09];
[Wu12],
[Kon12],
[Ala09];
Table 4.4.f: Solution category: degrees of freedom.
Architecture representation
Architecture models Optimization model
[Xia12]; [Gou11], [Tia11], [Wu12], [Fen11], [Liu10], [Men11],
[Zam12], [Xio11], [Dut12], [Dou12],
[Wan12],[Kon12], [Ard12b], [Ard11b], [Ala09];
Table 4.4.g: Solution category: architecture representation.
Optimization strategy
Exact Meta-heuristic
[Bjo12], [Liu10], [Zaf12], [Son12], [Men11],
[Fen11], [Had12], [Wei10], [Had12],[Xio11],
[Dou12], [Ard11b], [Ala09];
[Wu12],[Gou11], [Wei10], [HE12],[LinC12],
[Maz12], [Sri08], [Wan12],[Ard12b];
Table 4.4.h: Solution category: optimization strategy.
Constraints handling
Not present Hard Penalty
[Dub07]; [Fen11],[HE12],[Ard12b],[Ard11b]; [Liu10];
Table 4.4.i: Solution category: constraints handling.
Time scale
Minute Hour Day
[Wu12], [Ard12b], [Ard11b]; [Maz12], [Ard12b], [Ard11b]; [Fen11];
Table 4.4.j: Solution category: time scale.
Type
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 77
Utility based Control theory Bio-inspired
[Gou11], [Tia11], [Dub07],
[Men11], [Fen11], [Wu12], [Wei10],
[Ard12b], [Ard11b], [Ala09];
[Zaf12], [Had12], [Kon12], [Xio11]; [Cap10] ,
[Val11], [Wan12];
Table 4.4.k: Discipline category: type.
Quality Model
Markov chain Queuing network State based model
[Zaf12], [Son12]; [Gou11], [Tia11], [Dub07],
[Men11], [Fen11], [Wu12],
[Wei10], [Ard12b], [Ard11b],
[Ala09];
[Wu12];
Table 4.4.l: Discipline category: quality model.
4.4.6. Criteria for evaluation
In order to assess the quality of the solution methods proposed in the literature several evaluation criteria can be
considered. Given the run-time constraints, the time required to find a solution or the maximum size of the
problem instance that can be solved in a given time horizon need to be considered. These measures depend on
practical and physical limitations, specific application under study, industrys aim or researchs purpose as well
on the chances, tools and resources available.
Another important evaluation criterion is scalability as the ability of the solution method to handle problems of
growing size or its ability to enlarge the optimization scope (e.g., adding additional quality metrics or
constraints).
Another important aspect is the accuracy which can be achieved by the underlying quality evaluation model, that
is the accuracy that can be achieved comparing the QoS metrics evaluated through the QoS model with the real
figures measured in the real system.
In [Sri08] four simulations are compared, maintaining constant the number of applications but varying disk and
CPU utilizations, showing that the energy used by the proposed heuristic is about 5.4% above the optimal value
on an average 20% tolerance. No information about scalability is reported.
To evaluate the scalability of the resource allocation algorithm they proposed, in [Ard11] the authors considered
a very large set of randomly generated instances. Such instances have been created varying the number of SaaS
providers between 10 and 100 while the number of applications between 1000 and 10000. They showed that the
problem can be solved, in the worst case, in less than twenty minutes.
In [Dut12] the authors varied the number of servers in an emulated data center and observe the performance,
demonstrating that the total cost for their approach increases linearly with the number of servers. They also
demonstrated that the running time is statistically independent from the number of servers.
A large-scale simulation demonstrates that the algorithm presented in [Xia12] is extremely scalable: the decision
time remains under 4 seconds for a system with 10000 servers and 10000 applications.
In [Had12] is reported a complete scalability study. The deviation from the optimal value is shown to be
consistently small and tends to zero as the number of physical machines (PMs) increases. This means that the
proposed algorithm is capable to find solutions very close to the optimal for a large number of PMs and for a big
Cloud provider with many data centers.
The proposed method scales much better than common Bin-Packing algorithms which encounter scalability
problems and take longer times to find the optimal solution for the problem.
Finally, in [Ard12] the inefficiency of the two algorithms presented is calculated in terms of Price of Anarchy
(PoA) and Individual Worst Case (IWC). A very large number of randomly generated instances is considered,
the number of SaaS providers varies between 10 and 100, while the number of applications between 100 and
1000. Furthermore, the article focused on the scalability arguing that the algorithms scale linearly with the
cardinality of the set of SaaS.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 78
5. MODAClouds Run-Time Platform
5.1. Overview
The aim of this section is to define the requirements for the MODAClouds runtime platform that will be
developed in the project. After the introduction to the overall goals of the runtime environment, the general
approach and the high-level conceptual architecture described in Section 1, we define the actors (Section 5.1.1)
that are referenced in the requirements specifications (Sections 5.2-5.5) and the requirement sets (Section 5.1.2).
The requirements elicitation methodology that we have adopted is overviewed in Section 5.1.3. Finally, Section
5.6 provides a roadmap for WP6, focusing in particular on Year 1 of the project.
5.1.1. Actors
In this section, we consider the three platforms defined in the conceptual architecture as actors included in the
requirements specifications. In additions to these, we consider in the requirements specifications a set of common
actors that are referenced also in the other WPs requirements specifications:
Cloud app developer: A developer who designs, implements, and tests cloud-based applications.
Cloud app: the cloud application developed by the Cloud app developer using the MODAClouds IDE.
Application cloud: the cloud platform where the Cloud app is (or will be or was) running.
Service cloud: the cloud platform where the runtime services offered by the runtime platform are (or
will be or were) running. A service is not part of the Cloud app, rather it is part of the execution
platform (e.g., discovery service).
MODAClouds IDE: this is the envisioned technical output of WP4 and WP5, a design-time
environment that will implement the MODACloudML language and that will provide the application
code and the initial deployment decisions that are needed by the runtime platform to instantiate the
application.
Cloud app admin: An administrator who configures, deploys, operates, and tests cloud-based
applications on cloud platforms.
Cloud app provider: A provider that provides cloud-based applications.
QoS engineer: An engineer who specifies quality of service (QoS) constraints and alternatives for
design time exploration and run-time adaptation.
Throughout the requirement elicitation, we use the notation <A>to indicate actor A, e.g., <Cloud app admin>.
Furthermore, we refer generically to QoS constraints to mean any hard or constraints regarding QoS (e.g.,
imposed by an SLA) and specified in the MODAClouds IDE.
5.1.2. Requirement Sets
In the following sections, we describe the requirements for the runtime platform. The requirements have been
grouped into four categories inspired by the conceptual architecture. The main distinction from the conceptual
architecture mapping is that the requirements for the monitoring platform are distinguished into two further sets:
monitoring requirements and analysis requirements. The former set mainly deals with monitoring data collection
and distribution, while the latter set emphasizes the analysis of the acquired monitoring data to extract
knowledge.
The sets of requirement elicited in the rest of this section are as follows:
Execution Requirements (Section 5.2): this group provides requirements for application deployment,
initial testing, execution, and runtime management. Management functionalities include runtime
services (e.g., discovery, logging, application health controllers) and data management (archival and
synchronization).
Monitoring Requirements (Section 5.3): this group provides requirements for the part of the
monitoring platform that will deal with data collection, preprocessing, distribution and consumption by
means of monitoring data observers.
Analysis Requirements (Section 5.4): this group provides a list of requirements for the data analysis
part. These requirements deal with high-level aggregation and processing of the monitoring data and
characterize the analysis step of the MAPE-K loop.
Self-adaptivity Requirements (Section 5.5): this group provides requirements for the subsystems that
will implement the runtime models and runtime policies developed in WP6.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 79
5.1.3. Requirement Elicitation Methodology
For each group of requirements, we use the guidelines provided in D3.1.1 to define use case scenarios. For the
sake of readability, unused entries in tables are omitted. Furthermore, qualitative requirements that provide more
details about a use case and the environment with which it interacts are provided in the Other requirements
subsection. These additional requirements also form necessary requirements for the WP6 runtime architecture.
To help readability, we express these Other requirements using the keywords proposed in the Internet
Engineering Task Force RFC 2119 which are here briefly summarized and related to the Priority of
accomplishment keywords indicated in D3.1.1 (i.e., Must/Should/Could have):
"MUST"/"MUST NOT"/"REQUIRED"/"SHALL"/"SHALL NOT": equivalent expressions to indicate
Must have priority of accomplishment.
"SHOULD"/"SHOULD NOT"/"RECOMMENDED"/NOT RECOMMENDED: equivalent keywords
to indicate Should have priority of accomplishment.
"MAY"/"OPTIONAL": equivalent expressions to indicate Could have priority of accomplishment.
We point to http://www.ietf.org/rfc/rfc2119.txt for further details.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 80
5.2. Execution Requirements
5.2.1. Context and System Overview
5.2.1.1. Context
Use case template Description
Category name Execution
The scope of the following use case specification is to elicit requirements for application deployment and
execution. The scope of this in the context of the MODAClouds reference architecture falls primarily within the
Execution Platform.
5.2.1.2. System Boundary Model
Figure 5.2.a: Execution Requirements
5.2.2. Use case specification for the Run Application use case
Use case heading Description
Use case name Run application
Use case ID UC-MC.wp6.Execution.Run Application.-V01
Priority of accomplishment Must Have
Use case
description
Description
Use case diagram See the Run application use case of the system boundary model in Section 5.2.1.2.
Goal The goal of the Run application use case is to start, stop, status query, and manage a
<Cloud app> instance on the <Application Cloud>.
Main Actors
<Cloud app admin>
<Cloud app>
<MODAClouds IDE>
Use case scenarios Description
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 81
Main success
scenarios
1. The <MODAClouds IDE> requests to start or stop <Cloud app>. Alternatively,
the <Cloud app admin> requests via a web-based UI to start or stop <Cloud
app>. The <Execution Platform> starts or stops automatically the application
on the target <Application Cloud>.
2. The <Cloud app admin> requests via a web-based UI to the <Execution
Platform> to view the configuration and the logs of the running <Cloud app>
3. The <Execution Platform> feeds back information to the caller about the status
of the application.
Preconditions
1. The application is compliant with the restrictions of the <Application Cloud>,
and uses the appropriate API's, packaging, etc.
Postconditions
1. The application has been successfully deployed on the <Application Cloud>
Other requirements:
1. An instance of the <Execution Platform> can start or stop a single instance of <Cloud app> and be
deployed on a single <Application cloud>. Therefore, separate <Cloud app> MUST have separate
<Execution Platform>.
2. The <Execution Platform> and the <Cloud app> MUST be treated as independent software artifacts.
They MAY run on different clouds, preferably within (network) topological proximity to reduce
latency. Therefore, they SHOULD rely as much as possible on services and protocols that can operate
in any cloud environment (e.g., HTTP-based RESTful services).
5.2.3. Use case specification for the Deploy Application use case
Use case heading Description
Use case name Deploy Application
Use case ID UC-MC.wp6.Execution.Deploy Application.-V01
Priority of accomplishment Must Have
Use case description Description
Use case diagram See the Deploy application use case of the system boundary model in Section 5.2.1.2.
Goal To deploy the packaged <Cloud app> to the targeted <Application cloud>
Main Actors
< Cloud app >
<Cloud app admin>
<MODAClouds IDE>
Use case scenarios Description
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 82
Main success
scenarios
1. The <Execution Platform> is instructed by the <MODAClouds IDE>, or by
<Cloud app admin>, to deploy the application.
2. The <Execution Platform> will provision the required resources from the
<Application cloud> on behalf of the <Application administrator>
3. The <Execution Platform> will then deploy all the needed software artifacts to
run the <Cloud app>, which includes the <Cloud app> itself and other
MODAClouds Services needed for the application.
Preconditions
1. The <Cloud app> was packaged properly for the <Application cloud>
2. The <Cloud app admin> has the proper credentials to access the <Application
cloud>.
3. The <Cloud app admin> has delegated the credentials to the <Execution
Platform>
Postconditions
1. The <Cloud app> has been successfully deployed on the <Application cloud>
Other requirements:
Deploying the application includes deploying the necessary <MODAClouds services>.
5.2.4. Use case specification for the Start/Stop Application Sandbox use case
Use case heading Description
Use case name Start/Stop Application Sandbox
Use case ID UC-MC.wp6.Execution. Start/Stop Application Sandbox.-V01
Priority of accomplishment Should Have
Use case description Description
Use case diagram See the Start/stop application sandbox use case of the system boundary model in
Section 5.2.1.2.
Goal Start/stop the application in a controlled container for the purpose of application
testing or calibration of the services and their internal data structures (e.g., runtime
models).
Main Actors <Cloud app>
<MODAClouds IDE>
<Cloud app admin>
Use case scenarios Description
Main success scenarios 1. <Cloud app admin> or <MODAClouds IDE> requests to the <Execution
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 83
Platform> to create a sandbox environment for <Cloud App>.
2. <Execution Platform> creates a sandbox environment and configures the
services for executing in this special environment.
Postconditions
1. <Execution Platform> accepts the same requests as in a normal
environment (e.g., Deploy Application, etc) but these are all performed
in a sandbox environment.
Use case specification for the Synchronise Application Data use case
Use case heading Description
Use case name Synchronise Application Data
Use case ID UC-MC.wp6.Execution.Synchronise Application Data.-V01
Priority of accomplishment Must Have
Use case
description
Description
Use case diagram See the Synchronise Application Data use case of the system boundary model in Section
5.2.1.2.
Goal To offer to the Cloud app designer or QoS engineer to specify data replica and the
synchronization requirements for them
Actors <QoS engineer>
<Cloud app developer>
Use case scenarios Description
Main success
scenarios
1. The QoS engineer or Cloud app developer selects, for a portion of the database
or for the whole database, the synchronization requirements between replicas.
These can be: consistent or eventual consistent
2. The execution platform examines the deployment configuration of the database
and creates and activates the proper synchronization connectors between the
replicas
Preconditions The application has been already deployed on the execution platform
Postconditions The execution platform is ready to keep the data replicas synchronized according to the
selected synchronization requirements
Other requirements:
1. The execution platform MAY offer the possibility to check that the synchronization choices made by
the user are consistent with the way the system is deployed
2. The execution platform MAY offer the possibility to change the synchronization requirements
dynamically
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 84
5.3. Monitoring Requirements
5.3.1. Context and System Overview
5.3.1.1. Context
Use case template Description
Category name Monitoring
The scope of the following use case specifications is to detail the main functionalities offered by the monitoring
platform.
5.3.1.2. System boundary model
Figure 5.3.a: Monitoring Requirements
5.3.2. Use case specification for the Install Monitoring Rule use case
Use case heading Description
Use case name Install Monitoring Rule
Use case ID UC-MC.wp6.Monitoring.Install Monitoring Rule.-V01
Priority of accomplishment Must Have
Use case
description
Description
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 85
Use case
diagram
See the Install Monitoring Rule use case of the system boundary model in Section 5.3.1.2.
Goal Monitoring rules are produced at design time by WP5 and define the object to be monitored,
which measures should be gathered, the time window in which monitoring should happen,
the frequency of monitoring. The goal of this use case is to allow new rules to be installed in
the monitoring platform.
Main Actors <MODAClouds IDE>
<Cloud app admin>
Use case scenarios Description
Main success
scenarios
1. The Cloud app developer, through the <MODACloudsIDE> or the <Cloud app
admin>, through a direct interface, requests the installation of a new rule.
2. The <Monitoring Platform> checks that the rule has not been previously installed.
3. If the previous check is successful, then the <Monitoring Platform> installs the
rule and put it in the state Inactive.
Preconditions The <Monitoring Platform> is ready to start its service
Postconditions The monitoring rule is properly installed in the monitoring platform
Other requirements:
1. The <Monitoring Platform> MUST allow for installation of new monitoring rules before monitoring
starts
2. The <Monitoring Platform> MUST allow for the installation of multiple monitoring rules
3. The Monitoring Platform, upon installation of a monitoring rule, MAY check that it can be actually
executed in the current <Monitoring Platform> configuration, i.e., the associated data can be gathered
and the corresponding computations/filtering/compositions can be executed
4. The <Monitoring Platform> MAY allow for installation of new monitoring rules during its execution
5.3.3. Use Case Specification for the Activate/Deactivate Monitoring Rule use case
Use case heading Description
Use case name Activate/Deactivate Monitoring Rule
Use case ID UC-MC.wp6.Monitoring.Activate/Deactivate Monitoring Rule.-V01
Priority of accomplishment Must Have
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 86
Use case
description
Description
Use case
diagram
See the Activate/Deactivate Monitoring Rule use case of the system boundary model in
Section 5.3.1.2.
Goal Monitoring rules are produced at design time by WP5 and define the object to be monitored,
which measures should be gathered, the time window in which monitoring should happen,
the frequency of monitoring. The goal of this use case is to activate a monitoring rule
already installed in the <Monitoring Platform> or to deactivate an activated one.
Main Actors <Cloud app admin>
Use case scenarios Description
Main success
scenarios
Activation scenario
1. The <Cloud app admin> through a specific user interface requests the
activation of a rule that is installed and in the Inactive state
2. The <Monitoring Platform> checks that it can collect the requirement
measures based on its current internal configuration
Deactivation scenario
1. The <Cloud app admin> through a specific user interface requests the
deactivation of a rule that is in the Active state
2. The <Monitoring Platform> stops the execution of the monitoring rule and
puts it in the Inactivestate.
3. If the deactivated rule is the last one in the monitoring platform, then this last
one stops collecting monitoring data.
Preconditions The <Monitoring Platform> is ready to start its service or it is already running
Postconditions Upon activation of a monitoring rule, the <Monitoring Platform> starts executing it
Upon deactivation of a monitoring rule, the <Monitoring Platform> stops executing it
Other requirements:
1. The <Monitoring Platform> MUST allow for activation and deactivation of monitoring rules during
execution
2. The <Monitoring Platform> MUST execute all Active rules.
3. The Monitoring Platform, upon activation of a monitoring rule, MAY check that it can be actually
executed in the current <Monitoring Platform> configuration, i.e., the associated data can be gathered
and the corresponding computations/filtering/compositions can be executed.
5.3.4. Use case specification for the Add/Remove Observer use case
Use case heading Description
Use case name Add/Remove Observer
Use case ID UC-MC.wp6.Monitoring.Add/Remove Observer.-V01
Priority of accomplishment Must Have
Use case
description
Description
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 87
Use case
diagram
See the Add/Remove Observer use case of the system boundary model in Section 5.3.1.2.
Goal An observer is any software component that needs to receive information from the
monitoring platform. The objective of Add Observer is to allow new components to
subscribe to the monitoring platform. Upon subscription they will start receiving a specific
stream of data. Such a stream is specified as part of the New Observer operation in terms of
an RDF query. The Remove Observer operation simply detaches an observer from a stream.
Main Actors <MODAClouds IDE>
<Cloud app admin>
<Self-adaptation platform>
Use case scenarios Description
Main success
scenarios
Add Observer scenario
1. The <MODAClouds IDE>, the <Cloud app admin> or the <Self-adaptation
platform> (generically colled Observer) requests the Add Observer operation by
passing an RDF query and the reference of itself as parameter
2. The <Monitoring Platform> checks if it can fulfil the specified RDF query
3. If yes, it adds the observer to its list and returns a reference to the proper stream.
Delete Observer scenario
1. The Observer requests the Delete Observer operation by passing the reference
of itself as parameter.
2. The <Monitoring Platform> checks if the observer is in the list.
3. If yes, then it removes the observer from the list.
Preconditions The <Monitoring Platform> is up and running
Postconditions The list of observers remains in a consistent state
5.3.5. Use case specification for the Collect Monitoring Data use case
Use case heading Description
Use case name Collect Monitoring Data
Use case ID UC-MC.wp6.Monitoring.Collect Monitoring Data.-V01
Priority of accomplishment Must Have
Use case
description
Description
Use case diagram See the Collect Monitoring Data use case of the system boundary model in Section 5.3.1.2.
Goal The successful collection of required metrics from both application level and cloud level
(PaaS and IaaS containers), based on the monitoring rules in the Active state.
Main Actors <Cloud app> (here the Cloud app generically represents any monitorable
resource. It may include also the Application cloud if this provides proper
monitoring mechanisms.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 88
Use case scenarios Description
Main success
scenarios
Collect monitoring data in pull mode
1. Periodically the <Monitoring Platform> checks if the assigned monitoring cost
constraints is still positive
2. If not, then the <Monitoring Platform> closes the connection with the
Application cloud or Cloud all
3. If yes, it queries the Application cloud or <Cloud app> in order to receive
monitoring information.
4. If the query is well formed and the Application cloud or <Cloud app> interface
is running, the Application cloud or <Cloud app> provides the required data.
5. The <Monitoring Platform> executes the Active monitoring rules on the
collected information
6. Then it gives the control to the Distribute Data use case
Collect monitoring data in push mode
7. The <Monitoring Platform> periodically receives data from the Application
cloud or the <Cloud app>
8. The <Monitoring Platform> executes the Active monitoring rules on the
collected information
9. Then it gives the control to the Distribute Data use case
10. Periodically the <Monitoring Platform> checks if the assigned monitoring cost
constraints are still positive
11. If it is negative, then the <Monitoring Platform> closes the connection with the
Application cloud or Cloud all
Exceptions
In the pull mode if data do not arrive within the expected (configurable) time frame, the
<Monitoring Platform> raises an alarm to the <Cloud app> administrator
Preconditions At least a monitoring rule is active in the monitoring platform.
The Application cloud and <Cloud app> components that are able to provide the
required data are known to the <Monitoring Platform> and a connection with them has
been already established.
Postconditions Data are acquired and passed to the Distribute Data use case.
Other requirements:
1. The <Monitoring Platform> MUST acquire at runtime QoS metrics (the performance, availability,
and health metrics specified in deliverable D6.2) from the <Cloud app> and, if exposed by the
cloud provider, from its Application cloud (either, IaaS or PaaS).
2. The <Monitoring Platform> MUST acquire historical and current information about the resource
usage costs incurred to run the CloudApp.
3. For cloud platforms offering resources at spot prices (e.g., EC2 spot instances), the <Monitoring
Platform> MAY be able to acquire spot prices also relatively to a custom time horizon.
4. The <Monitoring Platform> MAY rely on existing standard monitoring APIs (e.g., JMX), tools
(e.g., SIGAR, sar), and cloud provider monitoring APIs.
5. Each monitoring cost constraint MUST be configured within the <Monitoring Platform> probe by
the Execution Platform at deployment time of the CloudApp.
6. The monitoring cost constraint value MAY be updated at runtime by the Self-Adaptation Platform
for cost or overhead management purposes. If a metric can be acquired at no cost, then its cost
constraint will be infinite.
7. The <Monitoring Platform> MUST offer the ability to activate and deactivate the acquisition of
certain information at application runtime.
8. The <Monitoring Platform> MAY offer the ability to adjust the sampling rate at which data is
acquired from the Application cloud or Cloud app. This adjustment is requested by the Self-
Adaptation Platform.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 89
5.3.6. Use case specification for the Distribute Data use case
Use case heading Description
Use case name Distribute Data
Use case ID UC-MC.wp6.Monitoring.Distribute Data.-V01
Priority of accomplishment Must Have
Use case
description
Description
Use case diagram See the Distribute Data use case of the system boundary model in Section 5.3.1.2.
Goal The successful distribution of information to the observers connected to the monitoring
platform.
Main Actors <MODAClouds IDE>
<Cloud app admin>
<Self-adaptation platform>
Use case scenarios Description
Main success
scenarios
1. The <Monitoring Platform> executes the queries defined by the observers on the
data collected through the Collect Data use case
2. The <Monitoring Platform> sends to the observers all data that match the queries
associated to them (these data are sent through a stream)
Preconditions The <Monitoring Platform> is acquiring data through the Collect Data use case
5.4. Analysis of Requirements
5.4.1. Context and System Overview
5.4.1.1. Context
Use case template Description
Category name Analysis
The scope of the following use case specification is to define analysis and measurement functionalities of the
Execution Platform. These functionalities have the goal of receiving monitoring data and extract aggregate
metrics and knowledge from it.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 90
5.4.1.2. System Boundary Model
Figure 5.4.a: Analysis Requirements
5.4.1.3. Use case specification for the Detect Violation use case
Use case heading Description
Use case name Detect Violation
Use case ID UC-MC.wp6.Analysis.Detect Violation.-V01
Priority of accomplishment Must Have
Use case description Description
Use case diagram See the Detect Violation use case of the system boundary model in Section 5.4.1.2.
Goal Detect a violation in a QoS constraint a measured metric and raise a trigger to the
<Self-Adaptation Platform>.
Main Actors <Self-Adaptation Platform>
Use case scenarios Description
Main success scenarios
1. A Monitoring rule of the <Monitoring Platform> detects a violation
in the value of one or more QoS metrics.
2. <Monitoring Platform> automatically raises a trigger to all
registered observers.
Preconditions
1. One or more monitoring rules are installed and active on the
<Monitoring Platform>
2. There exist one or more registered observers to the triggers.
Postconditions
1. Triggers are raised in presents of QoS violations
Other requirements:
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 91
1. Detection rules SHOULD be specified as part of the monitoring queries installed in the <Monitoring
Platform>. These CAN be either SLA requirements or soft QoS constraints that are requested by the
developer.
5.4.2. Use case specification for the Correlate Monitoring Data use case
Use case heading Description
Use case name Correlate Monitoring Data
Use case ID UC-MC.wp6.Analysis.Correlate Monitoring Data.-V01
Priority of accomplishment Should Have
Use case description Description
Use case diagram See the Correlate Monitoring Data use case of the system boundary model in
Section 5.4.1.2.
Goal The goal is to establish a relationship between measurements collected on different
components of the application, with the aim of generating measures that summarize
component runtime execution correlations.
Main Actors
<Self-Adaptation Platform>
Use case scenarios Description
Main success scenarios
Correlation in deterministic mode
1. <Monitoring Platform> collects monitoring data from a set of streams
2. Based on the timestamps, <Monitoring Platform> outputs on a new
stream a measure that pair events from different sources as being
related to each other
Correlation in statistical mode (black box)
1. <Monitoring Platform> collects monitoring data from a set of streams
2. Within a time window, <Monitoring Platform> runs a statistical
correlation algorithm
3. <Monitoring Platform> outputs on a new stream a measure that
describes the statistical similarities between metrics coming on the
different streams
Correlation in statistical mode (white box)
1. <Monitoring Platform> collects monitoring data from a set of streams
2. For each monitoring metric to be correlated, <Monitoring Platform>
determines the associated streams and checks that the metric is
supported for correlation, returning an error if not
3. <Monitoring Platform> periodically obtains a description of the current
application topology from <Execution Platform>
4. Within a time window, <Monitoring Platform> runs a statistical
correlation algorithm, that exploits the application model available to
the <Monitoring Platform> (precondition), to find statistical similarities
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 92
between metrics coming on the different streams
5. <Monitoring Platform> outputs on a new stream a measure that
describes the statistical similarities between metrics coming on the
different streams
Preconditions
<Monitoring Platform> exposes a set of high-level measures
that it can provide to any observer
One or more observers are registered to receive data from these
measures, most likely components of the <Self-Adaptation Platform>.
<Execution Platform> maintains time synchronization for
monitoring data collected from different sources
<MODAClouds IDE> has provided to <Monitoring Platform>
information on the dependancies between the application components
<Self-Adaptation Platform> has provided to <Monitoring
Platform> information on the current topology of the application
<Self-Adaptation Platform> has registered as observer on the
ouput streams
Postconditions
Correlation measures provided in output by
<MonitoringPlatform> on one or more streams
Other requirements:
1. Correlation in deterministic mode MAY be offered via a standard data stream aggregator solution
programmed to account for the specificity of the MODAClouds Execution Platform.
2. The Monitoring Platform SHOULD be capable of correlating events only based on information
independent of the specific target cloud being considered
5.4.3. Use case specification for the Estimate Measure use case
Use case heading Description
Use case name Estimate Measure
Use case ID UC-MC.wp6.Analysis.Estimate Measure.-V01
Priority of accomplishment Must Have
Use case description Description
Use case diagram See the Estimate Measure use case of the system boundary model in Section 5.4.1.2.
Goal
Estimate QoS metrics of the system within a time horizon specified by the
<MODAClouds IDE> that are not directly observable by the data collectors, or that
cannot be observed due to overhead concerns.
Main Actors
<MODAClouds IDE>
<Self-Adaptation Platform>
Use case scenarios Description
Main success scenarios
Estimation in blackbox mode
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 93
1. < Monitoring Platform> parses the estimation requirements
specification provided by <MODAClouds IDE>
2. For each metric to be estimated, <Monitoring Platform> determines the
streams needed to estimate that metrics and returns an error if one or
more are unavailable
3. < Monitoring Platform> continuously run estimation algorithms to
estimate the value of the metrics that cannot be directly observed by the
monitoring probes
4. The results of the estimation algorithms are put in output on streams
consumed by observers of the <Self-Adaptation Platform> and by
<MODAClouds IDE> via the feedback loop
Estimation in white-box mode
5. < Monitoring Platform> parses the estimation requirements
specification provided by <MODAClouds IDE>
6. For each metric to be estimated, <Monitoring Platform> determines the
streams needed to estimate that metrics and returns an error if one or
more are unavailable
7. <Monitoring Platform> periodically obtains a description of the current
application topology from <Execution Platform>
8. < Monitoring Platform> continuously run estimation algorithms to
estimate the value of the metrics that cannot be directly observed by the
monitoring probes
The results of the estimation algorithms are put in output on streams consumed
by observers of the <Self-Adaptation Platform> and by <MODAClouds IDE>
via the feedback loop
Preconditions
<Monitoring Platform> exposes a set of high-level measures that it can
provide to any observer
One or more observers of the <Self-Adaptation Platform> are registered
to receive data from these measures.
<MODAClouds IDE> has provided to <Monitoring Platform>
indications of which metrics should be estimated
Postconditions
Estimated measures provided in output by
<MonitoringPlatform> on one or more streams
Other requirements:
1. For a given application, the Monitoring Platform MUST be able to estimate, if requested by the
monitoring queries and the monitoring data is available, at least mean value of traffic arrival rates,
service demand, number of active users, throughputs, failure events, startup times/uptimes/downtimes.
2. For some the same performance indicators, the Monitoring Platform MAY be able to provide an
estimate of a variance and percentiles over a reference time window .
3. If requested by the monitoring queries, the Monitoring Platform MUST be able to differentiate the
estimation across workload classes and different resources.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 94
4. The estimation CAN depend on the runtime models, when this dependence does not introduce a circular
dependence that cannot be resolved.
5. The estimation COULD return confidence information on the estimates.
The estimation MUST support timeouts for the algorithms and MUST cope with abnormal termination and
infeasibilities in the solutions without cascading errors in the dependent systems.
5.4.4. Use case specification for the Forecast Measure use case
5.4.4.1. Use case description
Use case heading Description
Use case name Forecast Measure
Use case ID UC-MC.wp6.Analysis.Forecast Measure.-V01
Priority of accomplishment Should Have
Use case
description
Description
Use case
diagram
See the Forecast Measure use case of the system boundary model in Section 5.4.1.2.
Goal These services will forecast, using statistical methods, some of the metrics needed by the
<Self-Adaptation Platform> to manage the application QoS.
Main Actors 1. <MODAClouds IDE>
2. <Self-Adaptation Platform>
Use case scenarios Description
Main success scenarios
Estimation in blackbox mode
1. < Monitoring Platform> parses the estimation requirements
specification provided by <MODAClouds IDE>
2. For each monitoring metric to be forecasted, <Monitoring Platform>
determines the associated streams and checks that the metric is
supported for forecasting, returning an error if not
3. < Monitoring Platform> continuously run forecasting algorithms to
predict the value of the metrics on the input streams
4. The results of the blackbox forecasting algorithms are put in output on
streams consumed by observers of the <Self-Adaptation Platform>
Estimation in white-box mode
1. < Monitoring Platform> parses the specification provided by
<MODAClouds IDE>
2. For each monitoring metric to be forecasted, <Monitoring Platform>
determines the associated streams and checks that the metric is
supported for forecasting, returning an error if not
3. <Monitoring Platform> periodically obtains a description of the current
application topology from <Execution Platform>
4. < Monitoring Platform> continuously run whitebox forecasting
algorithms to predict the value of the metrics on the input streams based
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 95
on the Application Models and the topology information
5. The results of the whitebox forecasting algorithms are put in output on
streams consumed by observers of the <Self-Adaptation Platform>
Preconditions
<Monitoring Platform> exposes a set of high-level measures
that it can provide to any observer
One or more observers are registered to receive data from these
measures, normally from <Self-Adaptation Platform>
Postconditions
1. Forecasted measures are given in output on one or more output
streams
Other requirements:
1. The Monitoring Platform MUST able to carry out forecasting at predefined times or periodically at a
given period included in the <MODACLouds IDE> specification.
2. The forecasting MAY depend on the application models, when this dependence does not introduce a
circular dependence that cannot be resolved.
3. The forecasting MUST support timeouts to provide forecasts and MUST cope with abnormal
termination and infeasibilities in the predictions without generating errors in the dependent systems.
5.4.5. Use case specification for the Feedback Measure use case
Use case heading Description
Use case name Feedback Measure
Use case ID UC-MC.wp6.Analysis.Feedback Measure.-V01
Priority of
accomplishment
Must Have
Use case description Description
Use case diagram See the Feedback Measure use case of the system boundary model in Section
5.4.1.2.
Goal Return a measure to <MODAClouds IDE> to support the design-runtime
feedback loop.
Main Actors 1. <MODAClouds IDE>
Use case scenarios Description
Main success scenarios
1. <MODAClouds IDE> requests to <Monitoring Platform> to provide
feedback on a set of raw metrics or high-level measures.
2. <Monitoring Platform> creates feedback streams for the data to be
pushed to <MODAClouds IDE>.
3. <Monitoring Platform>, following the input specification provided by
<MODAClouds IDE>, binds either raw metrics streams or measures to
the feedback streams.
4. <Monitoring Platform> deactivates the feedback streams upon request of
<MODAClouds IDE> or when <Cloud app> is not running.
Preconditions
1. <Monitoring Platform> running
2. <Cloud app> deployed, not necessarily running
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 96
Postconditions
1. Feedback streams push raw metrics and measurements to
<MODAClouds IDE>
5.5. Self-Adaptivity Requirements
5.5.1. Context and System Overview
5.5.1.1. Context
Use case template Description
Category name Self-Adaptivity
The scope of the following use case specifications is to define the Self-Adaptation management services of the
MODAClouds runtime environment.
5.5.1.2. System Boundary Model
Figure 5.5.a: Self-Adaptivity Requirements
5.5.2. Use case specification for the Define/Undefine QoS Constraints use case
Use case heading Description
Use case name Define/Undefine QoS Constraints
Use case ID UC-MC.wp6.Self-Adaptivity.Define/Undefine QoS Constraints.-V01
Priority of accomplishment Must Have
Use case description Description
Use case diagram See the Define/Undefine QoS Constraints use case of the system boundary model in
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 97
Section 5.5.1.2.
Goal These services will allow to define or undefine in the <Self-Adaptation Platform> a
set of QoS constraints for <Cloud app> specified by the <QoS engineer> in the
<MODACloudsIDE>.
Main Actors
<MODAClouds IDE>
<Self-Adaptation Platform>
<Cloud app admin>
Use case scenarios Description
Main success scenarios 1. <MODACLoudsIDE> requests to < Self-Adaptation Platform> to
define/undefine a set of QoS Constraints for <Cloud app>.
2. <Self-Adaptation Platform> stores the information, adds a log entry for
the operation, checks correctness of the information received.
3. <Self-Adaptation Platform> returns a success or failure code to
<MODACloudsIDE>.
Preconditions 1. <Cloud app> is deployed on <Application Cloud>
2. <Self-Adaptation Platform> is deployed and running on <Service
Cloud>
Postconditions 1. <Self-Adaptation Platform> updated its internal information to
define/undefine the QoS Constraints.
Other requirements:
1. The correctness of the QoS Constraints specification SHOULD be also checked by <MODAClouds
IDE>.
2. QoS Constraints MUST be specified in parsable inter-change format, e.g., a SLA specification in XML.
3. Define QoS contraints SHOULD be automatically invoked by <Execution Platform> when running the
<Deploy Application> use case.
5.5.3. Use case specification for the Start/Stop Feedback of Self-Adaptivity Data use case
Use case heading Description
Use case name Start/Stop Feedback of Self-Adaptivity Data
Use case ID UC.MC.wp6.Self-Adaptivity.Start/Stop Feedback of Self-Adaptivity Data-
V01
Priority of accomplishment Could Have
Use case
description
Description
Use case
diagram
See the Start/Stop Feedback Self-Adaptivity Data use case of the system boundary model in
Section 5.5.1.2.
Goal Return detailed data on the actions taken by the <Self-Adaptation Platform> in a reference
time horizon and their outcomes.
Main Actors <MODAClouds IDE>
<Monitoring Platform>
<Execution Platform>
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 98
Use case scenarios Description
Main success scenarios
1. <MODACloudsIDE> requests to <Self-Adaptation Platform> to
start/stop feedback of self-adaptivity data in a reference time horizon.
2. <Self-Adaptation Platform> configures the runtime models to
record/stop recording data.
3. <Self-Adaptation Platform> registers/deregisters with <Monitoring
Platform> to start/stop a Data Collector of the self-adaptivity data.
4. <Self-Adaptation Platform> returns to <MODACloudsIDE> the success
or failure of the operation.
Preconditions 1. <Self-Adaptation Platform> deployed and running on <Service Cloud>
2. Runtime models running for <Cloud app>.
Postconditions 1. Data collectors for self-adaptivity data started/stopped.
5.5.4. Use case specification for the Define/Undefine Cost Constraints use case
Use case heading Description
Use case name Define/Undefine Cost Constraints
Use case ID UC.MC.wp6.Self-Adaptivity.Define/Undefine Cost Constraints-V01
Priority of accomplishment Must Have
Use case
description
Description
Use case
diagram
See the Define/Undefine ost Constraints use case of the system boundary model in Section
5.5.1.2.
Goal These services will allow defining or undefining in the <Self-Adaptation Platform> a set of
Cost constraints for <Cloud app> or <Monitoring Platform> specified by the <QoS
engineer> in <MODACloudsIDE>.
Main Actors
<MODAClouds IDE>
<Self-Adaptation Platform>
<Execution Platform>
<Cloud app admin>
Use case scenarios Description
Main success scenarios 1. <MODACLoudsIDE> requests to < Self-Adaptation Platform> to
define/undefine a set of QoS Constraints for <Cloud app>.
2. <Self-Adaptation Platform> stores the information, adds a log entry for
the operation, checks correctness of the information received.
3. <Self-Adaptation Platform> returns a success or failure code to
<MODACloudsIDE>.
Preconditions 1. <Cloud app> is deployed on <Application Cloud>
2. <Self-Adaptation Platform> is deployed and running on <Service
Cloud>
Postconditions 1. <Self-Adaptation Platform> updated its internal information to
define/undefine the cost constraints.
Other requirements:
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 99
1. The correctness of the Cost Constraints specification SHOULD also be checked by <MODAClouds IDE>.
2. Define Cost Constraints SHOULD be automatically invoked by <Execution Platform> when running the
<Deploy Application> use case.
5.6. Roadmap
In this section, we describe the roadmap for year 1 activities. In this first year of the project, the MODAClouds
consortium will focus on realizing the initial prototypes for the <Monitoring Platform> and for the execution
platform. The Self-Adaptation platform and the multi-cloud deployment components will be included in the
workplan for the following years. Interfaces between these components will be specified in year 1.
In terms of target clouds, in Year 1 at least on the Amazon EC2 and the Flexiscale IaaS platforms will be
considered. The IaaS focus will continue in Year 2 when initial support for PaaS will be provided. We envision
at this stage that the focus will shift more towards PaaS in the last year of the project.
In terms of timelines for implementation of the requirements, the following table outline the general roadmap:
# Group Use case scenarios (UC-MC.wp6.*) Priority Year(s)
1 Execution Run Application Must Have 1,2
2 Execution Start/Stop Application Sandbox Should Have 2,3
3 Execution Synchronise Application Data Must Have 2,3
4 Execution Deploy Application Must Have 1,2
5 Monitoring Collect Monitoring Data Must Have 1,2
6 Monitoring Distribute Data Must Have 1
7 Monitoring Install Monitoring Rule Must Have 1,2
8 Monitoring Activate/Deactivate Monitoring Rule Must Have 1,2
10 Monitoring Add/Remove Observer Must Have 1,2
11 Analysis Detect Violation Must Have 2
12 Analysis Correlate Monitoring Data Should Have 2
13 Analysis Estimate Measure Must Have 1,2,3
14 Analysis Forecast Measure Must Have 2,3
15 Analysis Feedback Measure Must Have 2
16 Self-Adaptivity Define/Undefine QoS Constraints Must Have 2
17 Self-Adaptivity Start/stop Feedback of Self-Adaptivity Data Could Have 3
18 Self-Adaptivity Define/Undefine Cost Constraints Must Have 2,3
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 100
References
[Aba05] Abadi, D. J., Ahmad, Y., Balazinska, M. , C !etintemel, U., Cherniack, M., Hwang, J.-H., Lindner, W.,
Maskey, A. S. , Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., and Zdonik, S. The Design of the Borealis Stream
Processing Engine. In Proc. Intl. Conf. on Innovative Data Systems Research (CIDR 2005), 2005.
[Aba05b]: Abadi, D. J., Madden, S., Lindner, W. Reed. Robust, efficient filtering and event detection in sensor
networks. In VLDB, 2005.
[Abh12] Abhishek, V, Kash, I, Key, P. Fixed and market pricing for cloud services. International Conference on
Computer Communications Workshops (INFOCOM WKSHPS). 2012.
[ABS1] AWS Elastic Beanstalk -- Developer Guide -- What Is AWS Elastic Beanstalk and Why Do I Need It?
(accessed in 2013); http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/Welcome.html
[ABS2] AWS Elastic Beanstalk -- Developer Guide -- How Does AWS Elastic Beanstalk Work? (accessed in
2013); http://docs.amazonwebservices.com/elasticbeanstalk/latest/dg/concepts.html
[ABS3] AWS Elastic Beanstalk -- Developer Guide -- Components (accessed in 2013);
http://docs.amazonwebservices.com/elasticbeanstalk/latest/dg/concepts.components.html
[ABS4] AWS Elastic Beanstalk -- Developer Guide -- Managing and Configuring Applications and
Environments Using the Console, CLI, and APIs (accessed in 2013);
http://docs.amazonwebservices.com/elasticbeanstalk/latest/dg/using-features.html
[ABS5] AWS Elastic Beanstalk -- Developer Guide -- Customizing and Configuring AWS Elastic Beanstalk
Environments (accessed in 2013); http://docs.amazonwebservices.com/elasticbeanstalk/latest/dg/customize-
containers.html
[ACF1] AWS CloudFormation (accessed in 2013); https://aws.amazon.com/cloudformation
[ACF2] AWS CloudFormation -- User Guide (accessed in 2013);
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide
[ACF3] AWS CloudFormation -- FAQ (accessed in 2013); https://aws.amazon.com/cloudformation/faqs
[ACF4] AWS CloudFormation -- Templates (accessed in 2013); https://aws.amazon.com/cloudformation/aws-
cloudformation-templates
[Aga07] Agarwala, S, Alegre, F, Schwan, K, Mehalingham, J. E2EProf: Automated End-to-End Performance
Management for Enterprise Systems. IEEE/IFIP International Conference on Dependable Systems and Networks
(DSN). 2007.
[Ala09] I. Al-Azzoni and D. Down. Decentralized Load Balancing for Heterogeneous Grids. Proceedings of the
2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns
(COMPUTATIONWORLD '09), 2009.
[Alt06] Altman, E, Boulogne, T, Azouzi, R, Jimnez, T, Wynter, L. A survey on networking games in
telecommunications. Computers & Operation Research. 2006.
[ANA13] Ana Network architecture. Last visited on March 16, 2013. http://www.ana-project.org/web/
[APF2] AppFog Documentation -- Languages (accessed in 2013); https://docs.appfog.com/languages
[APF3] AppFog Documentation -- Services (accessed in 2013); https://docs.appfog.com/services
[APF4] AppFog Documentation -- Feature Roadmap (accessed in 2013); https://docs.appfog.com/roadmap
[APF5] AppFog Documentation -- Tunneling (accessed in 2013); https://docs.appfog.com/services/tunneling
[Ara03] Arasu, A., Babcock, B. , Babu, S., Datar, M., Ito, K., Nishizawa, I., Rosenstein, J., and Widom, J.
STREAM: The Stanford Stream Data Manager (Demonstration Description). In Proc. ACM Intl. Conf. on
Management of data (SIGMOD 2003), page 665, 2003.
[Ara06] Arasu, A., Babu, S., and Widom, J. The CQL Continuous Query Language: Semantic Foundations and
Query Execution. The VLDB Journal, 15(2):121142, 2006.
[Ard08] D. Ardagna, C. Ghezzi, R. Mirandola. Rethinking the use of models in software architecture. QoSA
2008 Proceedings, 1-27, Karlsruhe, Germany, October 2008.
[Ard11] Ardagna, D, Casolari, S, Panicucci, B. Flexible distributed capacity allocation and load redirect
algorithms for cloud systems. IEEE International Conference on Cloud Computing (CLOUD). 2011.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 101
[Ard11b] D. Ardagna, S. Casolari, B. Panicucci. Flexible distributed capacity allocation and load redirect
algorithms for cloud systems. Cloud Computing (CLOUD), 2011 IEEE International Conference on, 2011, 163-
170.
[Ard11c] Ardagna, D, Panicucci, B, Passacantando, M. A game theoretic formulation of the service provisioning
problem in cloud systems. International Conference on World Wide Web. 2011.
[Ard12] Ardagna, D, Panicucci, B, Passacantando, M. Generalized nash equilibria for the service provisioning
problem in cloud systems. IEEE Transactions on Services Computing. 2012.
[Ard12b] D. Ardagna, S. Casolari, M. Colajanni, B. Panicucci. Dual Time-scale Distributed Capacity Allocation
and Load Redirect Algorithms for Cloud Systems. Journal of Parallel and Distributed Computing, Elsevier.
72(6), 796-808, 2012.
[Ard12c] Ardagna, D, Panicucci, B, Trubian, M, Zhang, L. Energy-aware autonomic resource allocation in
multi-tier virtualized environments. IEEE Transactions on Services Computing. 2012.
[AUT13] The autonomic Internet. Last visited on March 16, 2013. http://ist-autoi.eu/autoi/index.php
[Bab01] Babu, S., and Widom, J. Continuous Queries over Data Streams. SIGMOD Rec., 30(3):109120, 2001.
[Bai06] Bai, Y., Thakkar, H., Wang, H., Luo, C., and Zaniolo, C. A Data Stream Language and System
Designed for Power and Extensibility. In Proc. Intl. Conf. on Information and Knowledge Management (CIKM
2006), pages 337346, 2006.
[Bal04] Balakrishnan, H., Balazinska, M., Carney, D., C !etintemel, U., Cherniack, M., Convey, C., Galvez, E.,
Salz, J., Stonebraker, M., Tatbul, N., Tibbetts, R. , and Zdonik S. Retro- spective on Aurora. The VLDB Journal,
13(4):370383, 2004.
[Bam99] Bamieh, B, Giarr, L. Identification of linear parameter varying models. IEEE Conference on Decision
Control. 1999.
[Bar04] Barham, P, Donnelly, A, Isaacs, R, Mortier, R. Using Magpie for request extraction and workload
modelling. USENIX Symposium on Opearting Systems Design & Implementation (OSDI). 2004.
[Bar10]: Baresi, L., Caporuscio, M., Ghezzi, C., and Guinea, S. Model-Driven Management of Services.
Proceedings of the Eighth European Conference on Web Services, ECOWS. IEEE Computer Society, 2010, pp.
147-154.
[Bar12]: Baresi, L., Guinea, S. Event-based Multi-level Service Monitoring. 2012.
[Ben04] Bennani, M, Menasce, D. Assessing the robustness of self-managing computer systems under highly
variable workloads. International Conference on Autonomic Computing (ICAC). 2004.
[Ben05] Bennani, M, Menasce, D. Resource allocation for autonomic data centers using analytic performance
models. nternational Conference on Autonomic Computing (ICAC). 2005.
[Bjo12] Bjrkqvist, M, Chen, L, Binder, W. Opportunistic service provisioning in the cloud. International
Conference on Cloud Computing. 2012.
[Bla09]: Blair, G., Bencomo, N. France, R. Models@run.time. Computer, vol. 42, no. 10, pp. 22-27, oct. 2009.
[Bra10]: Brandic, I. FoSII Project: Autonomic Resource Management in Clouds Considering Cloud-based
Resource Monitoring and Knowledge Management. Seoul National University, Seoul, South Korea, July 15th
2010.
[Cal12] Calcavecchia, N, Caprarescu, B, Nitto, E, Dubois, D, Petcu, D. Depas: A decentralized probabilistic
algorithm for auto-scaling. Computing. 2012.
[Cap10] Bogdan Alexandru Caprarescu, Nicolo Maria Calcavecchia, Elisabetta Di Nitto, and Daniel J. Dubois.
Sos cloud: Self-organizing services in the cloud. In BIONETICS, pages 48-55, 2010..
[Car01]: Carzaniga, A., Rosenblum, D. S., Wolf, A. L. Design and Evaluation of a Wide-Area Event
Notification Service. ACM Transactions on Computer Systems, vol. 19, no. 3, pp. 332-383, August, 2001.
[Cas08a] Casale, G, Cremonesi, P, Turrin, R. Robust workload estimation in queueing network performance
models. Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP). 2008.
[Cas08b] Casale, G, Mi, N, Cherkasova, L, Smirni, E. How to parameterize models with bursty workloads.
ACM SIGMETRICS Performance Evaluation Revie. 2008.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 102
[CAS13] CASCADAS. Last visited on March 16, 2013. http://acetoolkit.sourceforge.net/cascadas/index.php
[CDF1] Cloudify documentation -- Anatomy of a recipe (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/developing/recipes_overview
[CDF2] Cloudify documentation -- Scaling rules (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/developing/scaling_rules
[CDF3] Cloudify documentation -- Boostrapping any cloud (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/bootstrapping/bootstrapping_cloud
[CDF4] Cloudify documentation -- Application recipe (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/developing/application_recipe
[CDF5] Cloudify documentation -- Service recipe (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/developing/service_recipe
[CDF6] Cloudify documentation -- Configuring security (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/setup/configuring_security
[CDF7] Cloudify documentation -- Attributes API (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/developing/attributes_api
[CDF8] Cloudify documentation -- Custom commands (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/developing/custom_commands
[CDF9] Cloudify documentation -- Probes (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/developing/plugins
[CDF10] Cloudify documentation -- Cloud driver (accessed in 2013);
http://www.cloudifysource.org/guide/2.5/clouddrivers/cloud_driver
[CFY1] Cloud Foundry -- FAQ (accessed in 2013); http://docs.cloudfoundry.com/faq.html#limits
[CFY2] Cloud Foundry -- Services (accessed in 2013); http://docs.cloudfoundry.com/services.html
[CFY3] Cloud Foundry -- Frameworks (accessed in 2013); http://docs.cloudfoundry.com/frameworks.html
[CFY4] Micro Cloud Foundry (accessed in 2013); https://micro.cloudfoundry.com/
[Cha82] Chandy, K, Neuse, D. Linearizer: A heuristic algorithm for queuing network models of computing
systems. Communications of the ACM. 1982.
[Cha12] Rong Chang, editor. 2012 IEEE Fifth International Conference on Cloud Computing, Honolulu, HI,
USA, June 24-29, 2012. IEEE, 2012.
[Che00a] L. Cherkasova, M. DeSouza and S. Ponnekanti. Performance Analysis of Content-Aware'' Load
Balancing Strategy FLEX: Two Case Studies'. In Proceedings of Thirty-Fourth Hawaii International Conference
on System Sciences (HICSS-34), Software Technology Track, January 3-6, 2001.
[Che00b] Chen, J., DeWitt, D. J., Tian, F., and Wang,Y. NiagaraCQ: A Scalable Continuous Query System for
Internet Databases. In W. Chen, J. F. Naughton, and P. A. Bernstein, editors, Proc. ACM Intl. Conf. on
Management of Data (SIGMOD 2000), pages 379390, 2000.
[Che06] cheng Tu,Y., Liu, S., Prabhakar, S., and Yao, B., Load Shedding in Stream Databases: A Control-based
Approach. In Proc. Intl. Conf. on Very Large Data Bases (VLDB 2006), pages 787798, 2006.
[Che08] Chen, Y, Iyer, S, Liu, X, Milojicic, D, Sahai, A. Translating Service Level Objectives to lower level
policies for multi-tier services. Cluster Computing. 2008.
[Coh04] Cohen, I, Goldszmidt, M, Kelly, T, Symons, J, Chase, J. Correlating instrumentation data to system
states: A building block for automated diagnosis and control. USENIX Symposium on Operating Systems
Design & Implementation (OSDI). 2004.
[Coh05] Cohen, I, Zhang, S, Goldszmidt, M, Symons, J, Kelly, T, Fox, A. Capturing, indexing, clustering, and
retrieving system history. ACM symposium on Operating systems principles (SOSP). 2005.
[Cor05]: Cormode, G., Garofalakis, M. N. Sketching streams through the net: Distributed approximate query
tracking. In VLDB, 2005, pp. 13-24.
[Cre02] Cremonesi, P, Schweitzer, P, Serazzi, G. A unifying framework for the approximate solution of closed
multiclass queuing networks. IEEE Transactions on Computers. 2002.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 103
[Cre10] Cremonesi, P, Dhyani, K, Sansottera, A. Service time estimation with a refinement enhanced hybrid
clustering algorithm. International Conference on Analytical and Stochastic Modeling Techniques and
Applications (ASMTA). 2010.
[Cre12] Cremonesi, P, Sansottera, A. Indirect Estimation of service demands in the presence of structural
changes. International Coference on Quantitative Evaluation of Systems (QEST). 2012.
[Cza98]: Czajkowski, G., Eicken, T. V. JRes: A Resource Accounting Interface for Java. Proceedings of the
13th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, 1998.
[Dea12] Dean, D, Nguyen, H, Gu, X. UBL: Unsupervised behavior learning for predicting performance
anomalies in virtualized cloud systems. International Conference on Autonomic Computing (ICAC). 2012.
[Des12] Desnoyers, P, Wood, T, Shenoy, P, Patil, S, Vin, H. Modellus: Automated modeling of complex data
center applications. ACM Transactions on Internet Technology. 2012.
[Di12] Di, S, Kondo, D, Cirne, W. Host load prediction in a Google compute cloud with a Bayesian model.
International Conference on High Performance Computing, Networking, Storage and Analysis (SC). 2012.
[Dou12] Brian Dougherty, Jules White, and Douglas C. Schmidt. Model-driven auto-scaling of green cloud
computing infrastructure. Future Generation Comp. Syst., 28(2):371 378, 2012.
[Dua09] Duan, S, Babu, S, Munagala, K. Fa: A system for automating failure diagnosis. IEEE International
Conference on Data Engineering (ICDE). 2009.
[Dub07] Parijat Dube, Zhen Liu, Laura Wynter, and Cathy H. Xia. Competitive equilibrium in e-commerce:
Pricing and outsourcing. Computers & OR, 34(12):35413559, 2007.
[Dut10] Dutreilh, X, Rivierre, N, Moreau, A, Malenfant, J, Truck, I. From data center resource allocation to
control theory and back. International Conference on Cloud Computing. 2010.
[Dut12] Sourav Dutta, Sankalp Gera, Akshat Verma, and Balaji Viswanathan. Smartscale: Automatic
application scaling in enterprise clouds. In Chang [Cha12], pages 221228.
[Eme12]: Emeakaroha, V. C., Ferreto, T. C., Netto, M. A. S., Brandic, I., Rose, De C. A. F. CASViD:
Application Level Monitoring for SLA Violation Detection in Clouds. 2012.
[Fen11] Yuan Feng, Baochun Li, and Bo Li. Price competition in an oligopoly cloud market. 2011.
[Fer13] Fernandez,R.C., Migliavacca,M., Kalyvianaki,E., and Pietzuch,P. Integrating Scale Out and Fault
Tolerance in Stream Processing using Operator State Management. In Sigmod, 2013. TO APPEAR
[GAE1] GAE Documentation --- The Java Servlet Environment (accessed in 2013);
https://developers.google.com/appengine/docs/java/runtime
[GAE2] GAE Documentation --- Quotas (accessed in 2013);
https://developers.google.com/appengine/docs/quotas
[GAE3] GAE Documentation --- Backends and Java API Overview (accessed in 2013);
https://developers.google.com/appengine/docs/java/backends/overview
[GAE4] GAE Documentation --- Datastore Overview (accessed in 2013);
https://developers.google.com/appengine/docs/java/datastore/overview
[GAE5] GAE Documentation --- Java Service APIs (accessed in 2013);
https://developers.google.com/appengine/docs/java/apis
[Gam12] Gambi, A, Toffetti, G. Modeling cloud performance with Kriging. International Conference on
Software Engineering (ICSE). 2012.
[Gia11] Giani, P, Tanelli, M, Lovera, M. Controller design and closed-loop stability analysis for admission
control in Web service systems. World Congress. 2011.
[Gma07] Gmach, D, Rolia, J, Cherkasova, L, Kemper. A. Workload analysis and demand prediction of
enterprise data center applications. International Symposium on Workload Characterization (IISWC). 2007.
[Gou11] Hadi Goudarzi and Massoud Pedram. Multi-dimensional sla-based resource allo- cation for multi-tier
cloud computing systems. In Liu and Parashar [Liu11], pages 324331.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 104
[Gol03] Golab, L., DeHaan, D., Demaine, E. D., Lopez-Ortiz, A., and Munro, J. I. Identifying Frequent Items in
Sliding Windows over On-line Packet Streams. In Proc. Intl. Conf. on Internet Measurement (IMC 2003), pages
173178, 2003.
[Gol08] Golab,L., Johnson,T., Koudas, N. , Srivastava, D., and Toman. D., Optimizing Away Joins on Data
Streams. In Proc. Intl. Workshop on Scalable Stream Processing System (SSPS 2008), pages 4857, 2008.
[Gol09]: Goldsack, P., Guijarro, J., Loughran, S., et al. The SmartFrog configuration management framework.
ACM SIGOPS Oper. Syst. Rev., 2009, 43, pp. 16-25.
[Gon11]: Gonzalez, J., Munoz, A., Mana, A. Multi-layer Monitoring for Cloud Computing. IEEE 13th
International Symposium on High-Assurance Systems Engineering 2011.
[GPA13] General purpose autonomic computing. Last visited on March 16, 2013.- http://www-
users.aston.ac.uk/~calinerc/gpac.html
[GT13] G. Trends, "Results for "cloud computing", DOI: http://www.google.com/trends?q=cloud+computing,"
2013.
[Gul12] Gulisano, R. Jimenez-Peris, et al. StreamCloud: An Elastic and Scalable Data Streaming System.
TPDS, 99(PP), 2012.
[Had12] Makhlouf Hadji and Djamal Zeghlache. Minimum cost maximum flow algorithm for dynamic resource
allocation in clouds. In Chang [Cha12], pages 876882.
[Has11] Hassan, M, Song, B, Huh, E. Distributed resource allocation games in horizontal dynamic cloud
federation platform. International Conference on High Performance Computing and Communications (HPCC).
2011.
[Has12] Hassan, M, Hossain, M, Sarkar, A, Huh, E. Cooperative game-based distributed resource allocation in
horizontal dynamic cloud federation platform. Information Systems Frontiers. 2012.
[He12] Ting He, Shiyao Chen, Hyoil Kim, Lang Tong, and Kang-Won Lee. Scheduling parallel tasks onto
opportunistically available cloud resources. In Chang [Cha12], pages 180187.
[HER1] Heroku DevCenter -- The Process Model (accessed in 2013);
https://devcenter.heroku.com/articles/process-model
[HER2] Heroku DevCenter -- Dynos and the Dyno Manifold (accessed in 2013);
https://devcenter.heroku.com/articles/dynos
[HER3] Heroku DevCenter -- Languages (accessed in 2013); https://devcenter.heroku.com/categories/language-
support
[HER4] Heroku DevCenter -- Buildpacks (accessed in 2013); https://devcenter.heroku.com/articles/buildpacks
[HER5] Heroku DevCenter -- Scaling Your Process Formation (accessed in 2013);
https://devcenter.heroku.com/articles/scaling
[HER6] Heroku Add-ons (accessed in 2013); https://addons.heroku.com/
[HER7] Heroku DevCenter -- HTTP Routing and the Routing Mesh (accessed in 2013);
https://devcenter.heroku.com/articles/http-routing
[HER8] Heroku DevCenter -- Slug Compiler (accessed in 2013); https://devcenter.heroku.com/articles/slug-
compiler
[HER9] Heroku API (accessed in 2013); https://api-docs.heroku.com/
[HER10] Heroku DevCenter -- Frequently Asked Questions about Java (accessed in 2013);
https://devcenter.heroku.com/articles/java-faq
[HER11] The Twelve-Factor App (accessed in 2013); http://12factor.net/
[Hol09] Holub, V., Parsons, T., O'Sullivan, P., Murphy, J. Runtime correlation engine for system monitoring and
testing. In ICAC-INDST '09 Proceedings of the 6th international conference industry session on Autonomic
computing and communications industry session, pages 9-18, New York, NY, USA, 2009. ACM.
[Hol10] Holze, M, Haschimi, A, Ritter, N. Towards workload-aware self-management: Predicting significant
workload shifts. International Conference on Data Engineering Workshops (ICDEW). 2010.
[Hol10b]: Holub, V., Parsons, T., O'Sullivan, P. Run-Time Correlation Engine for System Monitoring and
Testing (RTCE). 2010.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 105
[Hue05]: Huebsch, R., Chun, B. N., Hellerstein, J. M., Loo, B. T., Maniatis, P., Roscoe, T., Shenker, S., Stoica,
I., Yumerefendi, A. R. The architecture of pier: an internet-scale query processor. In CIDR, 2005.
[Jag95] Jagadish, H. V., Mumick, I. S., and Silberschatz, A. View Maintenance Issues for the Chronicle Data
Model. In Proc. ACM Symp. on Principles of Database Systems (PODS 1995), pages 113124, 1995.
[Jer97]: Jerding, D. F., Stasko, J. T., Ball, T. Visualizing Interactions in Program Executions. Proceedings of the
International Conference on Software Engineering, 1997.
[JUJ1] Frequently Asked Questions (accessed in 2013); https://juju.ubuntu.com/docs/faq.html
[JUJ2] Juju Charms (accessed in 2013); http://jujucharms.com/charms
[JUJ3] Juju Documentation (accessed in 2013); https://juju.ubuntu.com/docs
[JUJ4] Juju Documentation -- Getting started (accessed in 2013); https://juju.ubuntu.com/docs/getting-
started.html
[JUJ5] Juju Documentation -- User tutorial (accessed in 2013); https://juju.ubuntu.com/docs/user-tutorial.html
[JUJ6] Juju Documentation -- Charms (accessed in 2013); https://juju.ubuntu.com/docs/charm.html
[JUJ7] Juju Documentation -- Service configuration (accessed in 2013); https://juju.ubuntu.com/docs/service-
config.html
[JUJ8] Juju Documentation -- Machine constraints (accessed in 2013);
https://juju.ubuntu.com/docs/constraints.html
[JUJ9] Juju Documentation -- Operating systems (accessed in 2013); https://juju.ubuntu.com/docs/operating-
systems.html
[Jun09] Jung, G, Joshi, K, Hiltunen, M, Schlichting, R, Pu, C. A Cost-sensitive adaptation engine for server
consolidation of multitier applications. Middleware. 2009.
[Jun10] Jung, G, Hiltunen, M, Joshi, K, Schlichting, P, Pu, C. Mistral: Dynamically managing power,
performance, and adaptation cost in cloud infrastructures. International Conference on Distributed Computing
Systems (ICDCS). 2010.
[Kal09] Kalyvianaki, E, Charalambous, T, Hand, S. Self-adaptive and self-con*gured CPU resource
provisioning for virtualized servers using Kalman filters. International Conference on Autonomic Computing
(ICAC). 2009
[Kal11] Kalbasi, A, Krishnamurthy, D, Rolia, J, Richter. MODE: mix driven on-line resource demand
estimation. International Conference on Network and Services Management (CNSM). 2011.
[Kal12] Kalbasi, A, Krishnamurthy, D, Rolia, J, Dawson, S. DEC: service demand estimation with confidence.
IEEE Transactions on Software Engineering. 2012.
[Kar11] Kari, C.; Yoo-Ah Kim; Russell, A., "Data Migration in Heterogeneous Storage Systems," Distributed
Computing Systems (ICDCS), 2011 31st International Conference on , vol., no., pp.143,150, 20-24 June 2011.
[Kel79] Kelly, F. Reversibility and Stochastic Networks. Cambridge University Press. 1979.
[Kha12] Khan, A, Yan, X, Tao, S, Anerousis, N. Workload characterization and prediction in the cloud: A
multiple time series approach. IEEE/IFIP International Workshop on Cloud Management (Cloudman). 2012.
[Kir10]: Kirschnick, J., Calero, J. A. M., Wilcock, L., Edwards, N. Towards an architecture for the automated
provisioning of cloud services. IEEE Commun. Mag., 2010, 48, (12), pp. 124-131.
[Kle75] Kleinrock, L. Queueing Systems. Wiley-Interscience. 1975.
[Kon12] Kleopatra Konstanteli, Tommaso Cucinotta, Konstantinos Psychas, and Theodora A. Varvarigou.
Admission control for elastic cloud services. In Chang [Cha12], pages 4148.
[Kon12b]: Knig, B., Calero, J. A. M., Kirschnick, J. Elastic monitoring framework for cloud infrastructures.
Communications, IET, vol. 6, num. 10, pp. 1306-1315, July, 2012.
[Law04] Law, Y.-N., Wang, H., and Zaniolo, C. Query Languages and Data Models for Database Sequences and
Data Streams. In Proc. Intl. Conf. on Very Large Data Bases (VLDB 2004), pages 492503, 2004.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 106
[Law05] Law, Y.-N. , and Zaniolo, C., An Adaptive Nearest Neighbor Classification Algorithm for Data
Streams. In Proc. Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD 2005),
pages 108120, 2005.
[Lee99] Lee, L, Poolla, K. Identification of linear parameter-varying systems using nonlinear programming.
Transactions-American Society Of Mechanical Engineers Journal Of Dynamic Systems Measurement And
Control. 1999.
[Lim10] Lim, H, Babu, S, Chase, J. Automated control for elastic storage. International Conference on
Autonomic Computing (ICAC). 2010.
[Lin12] Yi-Kuei Lin and Ping-Chen Chang. Reliability evaluation of a computer network in cloud computing
environment subject to maintenance budget. Applied Mathematics and Computation, 219(8):38933902, 2012.
[Lit05] Litoiu, M, Woodside, C, Zheng, T. Hierarchical model-based autonomic control of software systems.
ACM SIGSOFT Software Engineering Notes. 2005.
[Liu99] Liu, L. , Pu, C., and Tang W. Continual Queries for Internet Scale Event-Driven Information Delivery.
IEEE Trans. Knowl. Data Eng., 11(4):610628, 1999.
[Liu05] Liu, Y, Gorton, I, Fekete, K. Design-level performance prediction of component-based applications.
IEEE Transactions on Software Engineering. 2005.
[Liu06] Liu, Z, Wynter, L, Xia, C, Zhang, F. Parameter inference of queueing models for IT systems using end-
to-end measurements. ACM SIGMETRICS Performance Evaluation Review. 2006.
[Liu10] Liu, T, Methapatara, C, Wynter, L. Revenue management model for on-demand it services. European
Journal of Operational Research. 2010.
[Liu11] Ling Liu and Manish Parashar, editors. IEEE International Conference on Cloud Computing, CLOUD
2011, Washington, DC, USA, 4-9 July, 2011. IEEE, 2011.
[Lov98] Lovera, M, Verhaegen, M, Chou, C. State space identification of MIMO linear parameter varying
models. International Symposium on the Mathematical Theory of Networks and Systems. 1998.
[Lu02] Lu, C., Alvarez, G. A. and Wilkes, J. 2002. Aqueduct: online data migration with performance
guarantees. In Proceedings of the 1st USENIX conference on File and storage technologies (FAST'02). USENIX
Association, Berkeley, CA, USA, 18-18.
[Lu03] Lu, Y, Abdelzaher, T, Lu, C, Sha, L, Liu, X. Feedback control with queueing-theoretic prediction for
relative delay guarantees in web servers. IEEE Real-Time and Embedded Technology and Applications
Symposium. 2003.
[Lu09] Lu, Y, AbouRizk, S. Automated BoxJenkins forecasting modelling. Automation in Construction. 2009.
[Mad02]: Madden, S., Franklin, M. J., Hellerstein, J. M., Hong, W. Tag: A tiny aggregation service for ad-hoc
sensor networks. In OSDI, 2002.
[Mal11] Malkowski, S, Hedwig, M, Li, J, Pu, C, Neumann, D. Automated control for elastic n-tier workloads
based on empirical modeling. International Conference on Autonomic Computing (ICAC). 2011.
[Mar11] Martin,A., Knauth,T., et al. Scalable and Low-Latency Data Processing with Stream MapReduce. In
CLOUDCOM, 2011.
[Mar12] Marek, L, Zheng, Y, Ansaloni, D, Sarimbekov, A, Binder, W, Tuma, P. Java bytecode instrumentation
made easy: The DiSL framework for dynamic program analysis. 2012
[Mas11]: Mastelic, T., Emeakaroha, V. C., Maurer, M., Brandic, I. M4CLOUD - Generic Application Level
Monitoring For Resource-Shared Cloud Environments. 2011.
[Maz12] Michele Mazzucco and Dmytro Dyachuk. Optimizing cloud providers revenues via energy efficient
server allocation. Sustainable Computing: Informatics and Systems, 2(1):1 12, 2012.
[Men94] Capacity Planning and Performance Modeling: from mainframes to client-server systems, D. Menasc,
V. Almeida, and L. Dowdy, Prentice Hall, 1994.
[Men03] Menasce, D, Bennani, M. On the use of performance models to design self-managing computer
systems. Computer Measurement Group Conference. 2003.
[Men05] Menasce, D, Bennani, M, Ruan, H. On the use of online analytic performance models in self-managing
and self-organizing computer systems. Self-star Properties in Complex Information Systems. 2005.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 107
[Men07] Menasce, D, Ruan, H, Gomaa, H. QoS management in service-oriented architecture. Performance
evaluation. 2007
[Men08]: Meng, S., Kashyap, S. R., Venkatramani, C., Liu, L. Resource-Aware Application State Monitoring
(REMO). IEEE Transactions On Parallel And Distributed Systems. 2008.
[Men11] Ishai Menache, Asuman Ozdaglar, and Nahum Shimkin. Socially optimal pricing of cloud computing
resources. In Proceedings of the 5th International ICST Conference on Performance Evaluation Methodologies
and Tools, VALUETOOLS 11, pages 322331, ICST, Brussels, Belgium, Belgium, 2011. ICST (Institute for
Computer Sciences, Social-Informatics and Telecommunications Engineering).
[Mey04]: Meyerhfer, M., Neumann, C. TESTEJB - A Measurement Framework for EJBs. Proceedings of the
7th International Symposium on Component-Based Software Engineering (CBSE 2004), Edinburgh, UK, May
24-2, 2004, pp. 294-301.
[Mos02]: Mos, A., Murphy, J. A framework for performance monitoring, modelling and prediction of
component oriented distributed systems. Proceedings of the 3rd international workshop on Software and
performance (WOSP '02), 2002.
[MOS1] Dana Petcu, Ciprian Craciun, Massimiliano Rak: Towards a Cross Platform Cloud API -- Components
for Cloud Federation; 2011
[MOS2] Ciprian Craciun: Building blocks of scalable applications; Masters thesis; 2012;
https://github.com/downloads/cipriancraciun/masters-thesis/thesis.pdf
[MOS3] mOSAIC notes -- Component controller (accessed in 2013);
http://wiki.volution.ro/Mosaic/Notes/Platform
[MOS4] mOSAIC notes -- Component hub (accessed in 2013); http://wiki.volution.ro/Mosaic/Notes/Hub
[MOS5] mOSAIC BitBucket repositories (accessed in 2013); https://bitbucket.org/mosaic
[Mun07] Munagala, K., Srivastava, U., and Widom, J. Optimization of Continuous Queries with Shared
Expensive Filters. In Proc. ACM Intl. Symp. on Principles of Database Systems (PODS 2007), pages 215224,
2007.
[Nash54] J. Nash. Non-cooperative games. The Annals of Mathematics, 54(2):286295, 1951.
[Nee11] Neelakanta, G, Veeravalli, B. On the resource allocation and pricing strategies in compute clouds using
bargaining approaches. International Conference on Networks (ICON). 2011.
[Nem95] Nemani, M, Ravikanth, R, Bamieh, B. Identification of linear parametrically varying systems. IEEE
Conference on Decision Control. 1995.
[Neu10] Neumeyer,L., Robbing,B., et al. S4: Distributed Stream Computing Platform. In ICDMW, 2010.
[Ope07]: OpenSOA, Service Data Objects Specification. http://www.oasis-opencsa.org/sdo, 2007.
[Pac08] Pacifici, G, Segmuller, W, Spreitzer, M, Tantawi, A. CPU demand for web serving: Measurement
analysis and dynamic estimation. ACM SIGMETRICS Performance Evaluation Review. 2008.
[Par06]: Parsons, T., Murphy, J. The 2nd International Middleware Doctoral Symposium: Detecting
Performance Antipatterns in Component-Based Enterprise Systems. IEEE Distributed Systems Online, vol. 7,
no. 3, March, 2006.
[Par07]: Parsons, T. Automatic Detection of Performance Design and Deployment Antipatterns in Component
Based Enterprise Systems. Ph.D. Thesis, 2007, University College Dublin.
[Par08]: Parsons, T., Murphy, J. Detecting Performance Antipatterns in Component Based Enterprise Systems.
Journal of Object Technology, vol. 7, no. 3, 2008.
[Pou10] Poussot-Vassal, C, Tanelli, M, Lovera, M. Linear parametrically varying MPC for combined quality of
service and energy management in web service systems. American Control Conference. 2010.
[Pow05] Powers, R, Goldszmidt, M, Cohen, I. Short term performance forecasting in enterprise
systems.International Conference on Knowledge Discovery and Data Mining (SIGKDD). 2005.
[PUP1] Puppet Labs (accessed in 2013); https://puppetlabs.com/
[PUP2] Puppet Labs -- What is Puppet? (accessed in 2013); https://puppetlabs.com/puppet/what-is-puppet
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 108
[PUP3] Puppet Labs -- Big Picture (accessed in 2013);
http://projects.puppetlabs.com/projects/puppet/wiki/Big_Picture
[PUP4] Puppet Labs -- What is Puppet? (slides) (accessed in 2013);
http://www.mit.edu/people/marthag/talks/puppet/img2.html
[PUP5] Puppet Labs -- Glossary (accessed in 2013); http://docs.puppetlabs.com/references/glossary.html
[PUP6] Puppet Labs -- Reference Manual (accessed in 2013); http://docs.puppetlabs.com/puppet/2.7/reference
[PUP7] Puppet Labs -- Tools (accessed in 2013); http://docs.puppetlabs.com/guides/tools.html
[PUP8] Puppet Labs -- Exported Resources (accessed in 2013);
http://docs.puppetlabs.com/puppet/2.7/reference/lang_exported.html
[PUP9] Puppet Labs -- Compare Puppet Enterprise (accessed in 2013);
https://puppetlabs.com/puppet/enterprise-vs-open-source
[PUP10] Puppet Labs -- System Requirements (accessed in 2013);
http://docs.puppetlabs.com/puppet/3/reference/system_requirements.html
[Ran06] P. Ranganathan, P. Leech, D. Irwin, and J. Chase. Ensemble-level Power Management for Dense Blade
Servers. SIGARCH Comput. Archit. News, 34, 2006. [Ris02] A. Riska. Aggregate Matrix-Analytic techniques
and their applications. PhD thesis. Computer Science College of William & Mary, Williamsburg, VA.
[Rol95] Rolia, J, Vetland, V. Parameter estimation for performance models of distributed application systems.
Conference of the Centre for Advanced Studies on Collaborative Research (CASCON). 1995.
[Rol98] Rolia, J, Vetland, V. Correlating resource demand information with ARM data for application services.
International Workshop on Software and Performance (WOSP). 1998.
[Rub] RUBiS: Rice University Bidding System. http://rubis.ow2.org.
[Sei87] Seidmann, A, Schweitzer, P, Shalev-Oren, S. Computerized closed queueing network models of flexible
manufacturing systems. Large Scale Systems. 1987.
[Sha08] Sharma, A, Bhagwan, R, Choudhury, M, Golubchik, L, Govindan, R, Voelker, G. Automatic request
categorization in internet services. ACM SIGMETRICS Performance Evaluation Review. 2008.
[Sha10]: Shao, J., Wei, H., Wang, Q., Mei, H. A Runtime Model Based Monitoring Approach for Cloud
(RMCM). 2010 IEEE 3rd International Conference on Cloud Computing.
[Shi06] Shivam, P, Babu, S, Chase, J. Learning application models for utility resource planning. International
Conference on Autonomic Computing (ICAC). 2006.
[Son12] Yang Song, Murtaza Zafer, and Kang-Won Lee. Optimal bidding in spot instance market. In Albert G.
Greenberg and Kazem Sohraby, editors, INFOCOM, pages 190198. IEEE, 2012.
[Spi11] Spinner, S. Evaluating approaches to resource demand estimation (Master Thesis). Karlsruhe Institute of
Technology. 2011.
[Sri05]: Srivastava, U., Munagala, K., Widom, J. Operator placement for in-network stream query processing. In
PODS, 2005, pp. 250-258.
[Sri08] Shekhar Srikantaiah, Aman Kansal, and Feng Zhao. Energy aware consolidation for cloud computing. In
Proceedings of the 2008 conference on Power aware computing and systems, HotPower08, pages 1010,
Berkeley, CA, USA, 2008. USENIX Association.
[Sut08] Sutton, C, Jordan, M. Probabilistic inference in queueing networks. In Workshop on Tackling Computer
Systems Problems with Machine Learning Techniques (SysML). 2008.
[Tan12] Tan, Y, Nguyen, H, Shen, Z, Gu, X, Venkatramani, C, Rajan, D. PREPARE: Predictive performance
anomaly prevention for virtualized cloud systems. International Conference on Distributed Computing Systems
(ICDCS). 2012.
[Tes05] Tesauro, G, Das, R, Walsh, W, Kephart, J. Utility-function driven resource allocation in autonomic
systems. International Conference on Autonomic Computing (ICAC). 2005.
[Tes06] Tesauro, G, Jongt, N, Das, R, Bennanit, M. A hybrid reinforcement learning approach to autonomic
resource allocation. International Conference on Autonomic Computing (ICAC). 2006.
[The08] Thereska, E, Ganger, G. IRONModel: Robust performance models in the wild. ACM SIGMETRICS
Performance Evaluation Review. 2008.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 109
[Tia11] Fengguang Tian and Keke Chen. Towards optimal resource provisioning for running mapreduce
programs in public clouds. In Liu and Parashar [Liu11], pages 155162.
[Tpc] Transaction processing performance council. TPC-W. http://www.tpc.org/tpcw.
[Tur07]: Turnbull, J. Pulling Strings with Puppet, FristPress, 2007, 1st edn.
[Twi13] Twitter Storm. github.com/nathanmarz/storm/wiki , 2013
[Urg05] Urgaonkar, B, Pacifici, G, Shenoy, P, Spreitzery, M, Tantawi, A. An analytical model for multitier
internet services and its applications. ACM SIGMETRICS Performance Evaluation Review. 2005.
[Vaq08] Luis M. Vaquero, Luis Rodero-Merino, Juan Caceres, and Maik Lindner. A break in the clouds:
towards a cloud definition. SIGCOMM Comput. Commun. Rev., 39(1):5055, December 2008.
[Val11] Giuseppe Valetto, Paul L. Snyder, Daniel J. Dubois, Elisabetta Di Nitto, and Nicolo Maria
Calcavecchia. A self-organized load-balancing algorithm for overlay-based decentralized service networks. In
SASO, pages 168-177, 2011.
[Ven11]: Venticinque S., Martino, Di B., Pectu, D. Agent-based cloud provisioning and management, design
and prototypal implementation. In M. v. S. Frank Leymann, Ivan Ivanov and B. Shishkov, editors, 1st
International Conference on Cloud Computing and Services Science (CLOSER2011), pages 184-191. ScitePress,
2011.
[Ver02] Verdult, V. Nonlinear system identification: A state-space approach. Ph.D. dissertation. Twente
University Press. 2002.
[Ver07] Vercauteren, T, Aggarwal, P, Wang, X, Li, T. Hierarchical forecasting of web server workload using
sequential monte carlo training. IEEE Transactions on Signal Processing. 2007.
[Wan03] W. Zhang and W. Zhang. Linux Virtual Server Clusters. Linux Magazine, November,2003.
[Wan05] Wang, X, Abraham, A, Smith, K. Intelligent web traffic mining and analysis. Journal of Network and
Computer Applications. 2005.
[Wan12] Jian Wan, Dechuan Deng, and Congfeng Jiang. Non-cooperative gaming and bidding model based
resource allocation in virtual machine environment. In IPDPS Workshops, pages 21832188. IEEE Computer
Society, 2012.
[Wan12b] Lijuan Wang and Jun Shen. Towards bio-inspired cost minimisation for data-intensive service
provision. In Services Economics (SE), 2012 IEEE First International Conference on, pages 16 23, june 2012.
[WAZ1] Windows Azure Documentation -- Introducing Windows Azure (accessed in 2013);
http://www.windowsazure.com/en-us/develop/net/fundamentals/intro-to-windows-azure
[WAZ2] Windows Azure Documentation -- Windows Azure Execution Models (accessed in 2013);
http://www.windowsazure.com/en-us/develop/net/fundamentals/compute
[Wei10] Guiyi Wei, Athanasios V. Vasilakos, Yao Zheng, and Naixue Xiong. A game- theoretic method of fair
resource allocation for cloud computing services. The Journal of Supercomputing, 54(2):252269, 2010.
[Wik13] Wikipedia, Data Migration, http://en.wikipedia.org/wiki/Data_migration 2013.
[Win09] Wingerden, J. Control of wind turbines with smart rotors: Proof of concept & LPV subspace
identification. Ph.D. dissertation. Delft University of Technology. 2009.
[Woo95] Woodside, C, Neilson, J, Petriu, D, Majumdar, S. The stochastic rendezvous network model for
performance of synchronous Client-Server-like distributed software. IEEE Transactions on Computers. 1995.
[Woo06] Woodside, C, Zheng, T, Litoiu, M. Service system resource management based on a tracked layered
performance model. International Conference on Autonomic Computing (ICAC). 2006.
[Wu08] Wu, X, Woodside, M. A calibration framework for capturing and calibrating software performance
models. European Performance Engineering Workshop on Computer Performance Engineering (EPEW). 2008.
[Wu10] Wu, Y, Hwang, K, Yuan, Y, Zheng, W. Adaptive workload prediction of grid performance in
confidence windows. IEEE Transaction on Parallel and Distributed Systems. 2010.
[Wu12] Linlin Wu, Saurabh Kumar Garg, and Rajkumar Buyya. Sla-based admission control for a software-as-
a-service provider in cloud computing environments. J. Comput. Syst. Sci., 78(5):12801299, 2012.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 110
[Xia12] Z. Xiao, Q. Chen, and H. Luo. Automatic scaling of internet applications for cloud computing services.
Computers, IEEE Transactions on, PP(99):1, 2012.
[Xio11] PengCheng Xiong, Zhikui Wang, Simon Malkowski, Qingyang Wang, Deepal Jayasinghe, and Calton
Pu. Economical and robust provisioning of n-tier cloud workloads: A multi-level control approach. In ICDCS,
pages 571580. IEEE Computer Society, 2011.
[Xio13] Xiong, P, Pu, C, Zhu, X, Griffith, R. vPerfGuard: an automated model-driven framework for application
performance diagnosis in consolidated cloud environments. International Conference on Performance
Engineering (ICPE), 2013.
[Xu07] Xu, J, Zhao, M, Fortes, J, Carpenter, R, Yousif, M. On the use of fuzzy modeling in virtualized data
center management. International Conference on Autonomic Computing (ICAC). 2007.
[Yal04]: Yalagandula P., Dahlin, M.. A scalable distributed information management system. In SIGCOMM,
2004, pp. 379-390.
[Zaf12] Murtaza Zafer, Yang Song, and Kang-Won Lee. Optimal bids for spot vms in a cloud for deadline
constrained jobs. In Chang [Cha12], pages 7582
[Zam12] Sharrukh Zaman and Daniel Grosu. An online mechanism for dynamic vm provisioning and allocation
in clouds. In Chang [Cha12], pages 253260.
[Zha03] Zhang, G. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing.
2003.
[Zha07] Zhang, Q, Cherkasova, L, Smirni, E. A regression-based analytic model for dynamic resource
provisioning of multi-tier applications. International Conference on Autonomic Computing (ICAC). 2007.
[Zhe05] Zheng, T, Yang, J, Woodside, M, Litoiu, M, Iszlai, M. Tracking time-varying parameters in software
systems with extended Kalman filters. Conference of the Centre for Advanced Studies on Collaborative
Research (CASCON). 2005.
[Zhe08] Zheng, T, Woodside, M, Litoiu, M. Performance model estimation and tracking using optimal filters.
IEEE Transactions on Software Engineering. 2008.
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 111
Appendix A Run-time platform evaluation criteria
Although many of the surveyed, or other existing, solutions are production-ready --- or even better backed up by
powerful companies in the IT sector --- and offer many features, we must focus our effort in determining if they
are a good match with ModaClouds requirements, described in a later section. Such a goal implies two separate
conditions:
first of all they should be suitable for our industrial partners' case study applications; this in turn implies
matching the supported programming languages, palette of available resources and middleware, and
nonetheless security requirements;
and in order to fulfill our project's goal, they must provide a certain flexibility, to allow our run-time
environment to integrate, and provide enhanced services and support for the user's application;
Therefore, we are especially interested in the following aspects:
type
Actually one of the categories mentioned in the beginning of the section, which broadly describes what is the
purpose of the solution and the range of features it offers.
PaaS --- fully integrated solution that abstracts away all low-level details of the deployment and
execution;
application execution --- suitable only for application execution, meaning that it does not manage the
host environment it runs in, like operating system, machine, etc.; (classical examples would be Tomcat
and derivatives, Ruby on Rails, etc.;)
application deployment --- as above suitable only for application deployment, thus implying that the
environment be provided by other means; (classical examples would be package managers, Capistrano
for Ruby, etc.;)
server deployment --- suitable to deploy the entire host environment, possibly even including the
application deployment, but it will still require an application execution solution; (classical examples
would be Chef or Puppet;)
task automation --- low-level tools that, if required, would allow to quickly implement our own
solution that would fit in one of the above categories; (classical examples would be Ant for Java, Fabric
for Python, etc.;)
library --- the described solution is actually a library to be used inside our programs; here we include
also platforms or frameworks, which although more complex than libraries, are still used only to
develop applications;
service --- referring to solutions which are stand-alone services, which on their own do not provide
direct benefits, but which are either used as dependencies of our environment, or if integrated it would
provide added value to it and then to our users; (for example database systems, various middleware,
logging or monitoring systems or SaaS, etc.;)
standard --- although not a ready to be used solution, this could be a protocol, data format, guidelines
or other kind of specification, that could prove useful to implement or follow ourselves;
suitability
Shortly, how mature, or production ready, is the solution? Does it have a supportive community built around it.
production;
emerging --- usually either a very popular solution, or one backed by a large company, but not yet
reaching or surpassing the beta status;
prototype --- maybe not the best solution to adopt, but it could have important features that we could
leverage or re-implement;
legacy --- although not a choice for most new developments, it could prove important to address,
because it either has a large deployment base, or it is mandated by one of the case studies;
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 112
application domain
What would be the main flavour of targeted applications?
web applications;
map-reduce applications;
generic compute-, data-, or network-intensive applications;
application architecture
Broadly matching a targeted application architecture.
2-tier applications --- monolithically applications that besides the data storage or communication layer,
have a single layer handling all the concerts from user interface to logic;
n-tier applications --- SOA-inspired applications where parts of the application are clearly identified as
independent layers, and deployed accordingly;
application restrictions
What constraints would the application (and part of our run-time environment) be subjected at?
none --- the application is able to use all the features of the targeted programming language and the
targeted framework, including full control over the run-time environment; moreover the application is
able to interact with other OS artifacts (like file-system, processes, sockets, etc.); (e.g. Amazon
Beanstalk;)
container --- like in the case of no restrictions, except that interactions with the run-time or the OS are
limited;
limited --- the application is able to use only some features of the targeted language or framework, and
most likely interactions with the run-time and the OS are limited (i.e. native libraries are forbidden, file-
system access is restricted, etc.); (e.g. Google App Engine;)
programming languages
Self explanatory
programming frameworks
Some solutions target a particular framework (such as Servlets for Google App Engine's Java environment,
Capistrano tightly focused on Ruby on Rails deployment, etc.). Thus it would prove useful to know in advanced
which are the officially sanctioned or preferred frameworks.
scalability
How can scalability be achieved?
automatic scalability --- based on user defined policies the platform is able to provision and commit
new computing resources; (i.e. the platform decides and executes;)
manual scalability --- the user is able to control via a high-level UI or CLI the amount of provisioned
and committed computing resources; (i.e. the operator decides, the platform executes;) (this implies that
the platform is able to provision new resources by itself;)
passive scalability --- the platform itself is able to scale if computing resources are manually provided
by the operator himself; (i.e. the operator decides and executes, the platform only takes notice and
reacts;) (this implies that the platform is not able to provision resources by itself;)
session affinity
Usually PaaS offers HTTP request routers (or dispatchers); how does they load-balance clients between the
multiple available service instances?
transparent --- the solution provides automatic session replication between multiple instances (most
likely through a shared database);
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final Version 1.0, March 29
th
2013 113
sticky-sessions --- all the requests originating from the same client are routed to the same instance;
non-deterministic --- (self-explanatory);
interaction
How can we pragmatically interact with the proposed solution?
WS (Web Service) --- the interaction can be made through HTTP calls (either SOAP+WSDL or REST-
full); (this implies that the is a public specification of such calls, or they are easily reveres engineered);
WUI (Web User Interface) --- although this interface is provided remotely through HTTP, it's suitable
for human operators and can't be easily consumed by an automated tool;
CLI (Command Line Interface) --- there are command line tools that interact with the solution (most
likely through HTTP or some form of RPC); (this implies that the input / output format are easy to parse
by another tool, and as above specification is available);
CUI (Console User Interface) --- the provided command line tools are not suitable for being invoked by
other tools, because for example the input / output are human-centric and difficult to parse;
API (Application Programmable Interface) --- the solution also provides a library that abstracts one of
the previous interaction methods;
hosting type
How would we be able to use the proposed solution?
hosted --- the proper meaning of the term PaaS;
deployable (closed-source) --- available for deployment in a private cloud, but the code is closed-
source;
deployable (open-source) --- available for deployment in a private cloud, and the code is available as
open-source, thus enabling modifications;
simulated --- there is an option to deploy locally a similar solution for development and debugging
purposes;
portability
If a developer uses a particular solution, how easy is to him to move to another solution having the same role?
locked -- to move to a different solution would require massive rewriting of the application;
portable -- possible with minor updates to the application;
out-of-the-box -- the solution uses existing standards thus portability is guaranteed;
services
Especially in the case of PaaS, what additional resources or services (such as databases, middleware, etc.) are
available and managed directly by the solution, and thus integrated with the application life-cycle?
monitoring coverage
Especially in the case of PaaS, how much do the monitoring facilities cover and expose to the operator?
none -- the solution provides no monitoring options (except maybe the listing of running processes or
logging, etc.);
basic -- the usual information that is comprised of CPU, memory and disk usage;
extensive -- it provides many other metrics than the ones above;
monitoring level
MODAClouds
MOdel-Driven Approach for design and execution of applications on multiple Clouds Deliverable # D6.1
Public Final version 1.0, March 29
th
2013 114
From which perspective, or at which level of the software and infrastructure stack, are the metrics provided?
application -- the data is collected from within the application itself; (for example by using NewRelic,
etc.;)
container -- the data is collected from within the VM or the container; it could refer to the VM or the
container itself or the whole running application;
hypervisor -- the data is collected by the virtualization solution;
fabric -- the data is collected at the infrastructure layer; (for example raw disks, load balancers, routers,
switches, etc.);
monitoring interface
What technique --- such as standard, API, library, etc. --- is used to expose the monitoring information to the
operator?
resource providers
Most of the PaaS do not also have their own hardware resources, but instead are built on top of other publicly
accessible IaaS providers. Thus if the user needs services not offered by the PaaS itself, it could use that IaaS to
host the missing functionality himself.
multi-tenancy
This characteristic pertains mainly to PaaS or PaaS-like solutions, and tries to asses if multiple applications can
share the same instance of the PaaS.
single application --- the entire PaaS instance is dedicated to only one application; (some deployable
PaaS solutions fit into this category;)
single organization --- the PaaS is able to host multiple independent applications, but they should
belong to the same organization, mainly because the security model is restricted, or the scheduling
model implies a fair behaviour; (almost all other deployable PaaS solutions fit into this category;)
multiple organizations --- the PaaS is shared between multiple parties, each possibly with multiple
applications; (all hosted PaaS's fit in this category;)
resource sharing
This characteristic pertains mainly to PaaS or PaaS-like solutions, and tries to assess how are the application's
components or services mapped on the provisioned VM's.
1:1 --- each component or service (from each application where applicable) is deployed on its own VM;
such a usage pattern would better fit heavy-weight applications, that have few component or service
types, featuring constantly high load; thus an instance wouldn't interfere with another, through shared
resource consumption;
n:1 --- more than one component or service (potentially from different applications in case of multi-
tenancy) can be deployed on the same VM, thus sharing its resources; this usage pattern would allow
cost savings, especially in development or initial deployments, until the product gains traction and
increased load, where a 1:1 pattern would prove more efficient;
limitations
Most of the solutions impose quantitative limitations (such as memory, bandwidth, storage, etc.) on the running
applications, which could be of interest especially in determining the suitability for our case studies.
We should observe that not all of these properties or capabilities apply to all the surveyed solutions.